optimize imuMahonyAHRSupdate() hot path (5 micro-opts)#11358
Open
sensei-hacker wants to merge 1 commit intoiNavFlight:maintenance-9.xfrom
Open
optimize imuMahonyAHRSupdate() hot path (5 micro-opts)#11358sensei-hacker wants to merge 1 commit intoiNavFlight:maintenance-9.xfrom
sensei-hacker wants to merge 1 commit intoiNavFlight:maintenance-9.xfrom
Conversation
Five cycle-saving changes to imuMahonyAHRSupdate(), which accounts for
~205 µs (30%) of the PID loop on RP2350. Each change is safe on all
targets; the gains are proportionally smaller on F7/H7 where the function
already lives in ITCM RAM.
1. Replace quaternionRotateVector({0,0,1}) with rMat[2][*] reads.
imuComputeRotationMatrix() is called at the end of every invocation and
keeps rMat in sync with orientation. Rotating the constant gravity
vector EF→BF yields exactly the third row of rMat, so three float loads
replace ~56 floating-point multiply/add operations.
2. Eliminate sqrt() from the Taylor-series threshold.
Original: thetaMagnitudeSq < sqrt(24e-6).
Squaring both sides (both non-negative) gives the equivalent condition
thetaMagnitudeSq² < 24e-6 with no sqrt call.
3. First-order Newton quaternion renormalization replaces quaternionNormalize()
(sqrt + 4 divides, ~35 cycles) with scale = (3 - normSq) * 0.5 (~14 cycles).
At 1 kHz the per-step drift |ε| < 1e-6, making the O(ε²) error < 1e-12.
imuCheckAndResetOrientationQuaternion() remains as the catastrophic-failure
safety net.
4. Precompute the anti-windup i_limit in imuConfigure() (called only on
settings save). The hot path now reads a single float instead of
performing an add, a multiply, and a divide every PID cycle.
Adds dcm_i_limit to imuRuntimeConfig_t and imuConfigure().
5. Reduce prevOrientation snapshot frequency from every PID cycle to
every 100 cycles (~100 ms at 1 kHz). The snapshot is only used by
the fault-recovery path which should never fire in normal flight;
16 bytes of unnecessary SRAM write every ms is not justified.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Member
Author
|
Ps, note that thetaMagnitudeSq is mathematically guaranteed to be non-negative, because it is the sum of three squares. Therefore the comparison is arithmetically equivalent, just with fewer CPU cycles. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five micro-optimizations to the 1 kHz Mahony AHRS update hot path:
rMat[2][0..2]reads instead ofquaternionRotateVector()— saves ~32 multiplies + 24 adds per PID cyclethetaMagnitudeSq² < 24e-6instead ofthetaMagnitudeSq < sqrt(24e-6)— eliminates asqrt()callquaternionNormalize()— eliminatessqrt()+ 4 divides per cycledcm_i_limitinimuConfigure()(called on settings save) instead of recalculating every PID cycleprevOrientationsnapshot every 100 cycles instead of every cycle (only used by the fault-recovery path)Files Changed
src/main/flight/imu.csrc/main/flight/imu.hTesting
prevOrientation) still functionalNone of these are expected to make a huge difference in performance, but they are free.
It should be essentially the same result in fewer CPU cycles.