Optimize reverb buffer locality#90
Conversation
|
Whitespace / style changes are uncalled for. Apart from that, what do you think @AliceLR ? |
|
|
Thanks for the review.
Huh. It isn't defined in library, it is defined in player: mutilities.h |
|
We're going to separate the changes so we can measure the performance improvements they bring. We're not sure we understand-is it not compiling for you? |
You changed SSIZE_T usage with SINTPTR_T without explaining why |
|
Here is the cleaned-up patch with only intended changes: 90.patch May I force-push to your branch for reviewers' convenience? |
|
Thanks for the review and for cleaning up the patch. Yes, you are right, we worked from an older MikMod-based project and missed some unrelated changes when adapting it to the current tree. Sorry about that. Please feel free to force-push the cleaned-up patch to the branch. Let us know if there is anything else we should adjust. |
f62c017 to
23ee402
Compare
|
P.S.: Looks like this first appeared as JHGuitarFreak/UQM-MegaMod#301 but got rejected.
Done
@AliceLR requested benchmark numbers. Maybe a few things more.. |
I think once you peel away the unnecessary style modifications, the changes are probably not copyrightable (and are reviewable and ostensibly useful), so I didn't push the point further than that. Separate benchmarks of the two changes would still be nice, and would help us determine if both changes are useful (vs. if the improvement is just from the modulo removal). |
Summary
Hi MikMod maintainers,
My collaborator @thomasstaheli and I recently studied a MikMod-based audio pipeline as part of our High-Performance Computing course at HEIG-VD in Yverdon, Switzerland. During this work, we profiled the reverb mixing code and identified
MixReverb_Stereoas a noticeable hotspot in the audio path.This pull request proposes two small optimizations for the reverb mixer:
The goal is to improve performance in a frequently executed DSP path without changing the reverb algorithm or the intended audio output.
Motivation
The stereo reverb mixer repeatedly accesses 16 circular buffers:
RVbufL1toRVbufL8for the left channel;RVbufR1toRVbufR8for the right channel.In the original implementation, these buffers are allocated separately. This can scatter them across the heap and reduce spatial locality. Since the reverb loop reads and writes these buffers very frequently, a more cache-friendly layout can help reduce memory stalls.
The inner loop also performs repeated modulo operations for circular indexing. These modulo operations may compile to integer divisions, which are relatively expensive compared to simple increments and comparisons.
Changes
1. Contiguous reverb buffer allocation
The separate reverb buffer allocations are replaced with a single contiguous allocation. The existing
RVbufL*andRVbufR*pointers are then assigned linearly inside this block.This keeps the rest of the code structure mostly unchanged, while improving spatial locality and reducing heap fragmentation.
2. Strength reduction for circular indexing
The repeated modulo-based index updates in the inner reverb loop are replaced with conditional increments:
The modulo is only needed for initialization. During the loop, the index advances with cheaper operations: increment, comparison, and conditional reset.
Performance notes
This change was motivated by profiling in a downstream project using MikMod-based audio mixing. In that environment,
MixReverb_Stereowas one of the relevant audio hotspots.Using Linux
perf, we observed the following self CPU cost for the targeted function:Using the average of the two baseline runs, this corresponds to a relative reduction of approximately 21.8% for
MixReverb_Stereoin that workload.These numbers are workload- and platform-dependent, so they should be interpreted as an indication rather than a universal benchmark. The main goal of this PR is to make the reverb buffer access pattern more cache-friendly and reduce unnecessary arithmetic in a hot loop.
Compatibility
The reverb algorithm is not intentionally changed. The existing buffer pointer interface is preserved, and the optimization only changes how the buffers are allocated and how circular indices are advanced internally.
Please let us know if you would prefer the patch to be split differently, rebased, or adjusted to better match the coding style of the project.
Best regards,
@Ali-Z0 and @thomasstaheli