fix(opt): regalloc clobbers parameter registers — AAPCS violation in i64 ops#86
Open
avrabe wants to merge 1 commit intofix/synth-i64-locals-and-framefrom
Open
fix(opt): regalloc clobbers parameter registers — AAPCS violation in i64 ops#86avrabe wants to merge 1 commit intofix/synth-i64-locals-and-framefrom
avrabe wants to merge 1 commit intofix/synth-i64-locals-and-framefrom
Conversation
…i64 ops In the optimized lowering path (`optimizer_bridge::ir_to_arm`), every i64 opcode hardcoded its register pair to R0:R1 / R2:R3 and ignored the actual operand vregs from `vreg_to_arm`. For any function that emitted an i64 op before all its i32 params had been read (e.g. `i64.const 1` issued before the first `local.get 0`), the very first emitted instruction would clobber the AAPCS arg register holding the param. Concretely, `match_gale.wat` (3 i32 params + 1 i64 local + an `i64.const` prologue) compiled to: movs r0, #1 ; <-- destroys param 0 movs r1, #0 ; <-- destroys param 1 ... cmp r0, r1 ; reads garbage Same shape on the engine_control bench wasm: `gale_k_sem_give_decide` started with `movs r0, #1; movs r1, #0` before any `local.get 0`. Fix: - Compute `param_reserved_regs: Vec<Reg>` = R0..R(min(num_params, 4)) at function entry. Use Vec because `Reg` doesn't derive Hash; matches `instruction_selector::alloc_consecutive_pair` style. - Add an `alloc_i64_pair` helper that searches `[(R4,R5), (R6,R7), (R8,R9), (R10,R11)]` for a free callee-saved pair, skipping any reg that's already in `vreg_to_arm.values()`, `local_to_reg.values()`, or `param_reserved_regs`. - Rewrite every i64 handler — I64Const, I64Load (non-param case), I64Add/Sub/And/Or/Xor/Mul/Shl/ShrS/ShrU/Rotl/Rotr/DivS/DivU/RemS/RemU, the comparisons (Eq/Ne/Lt/Gt/Le/Ge {S,U}, Eqz), Clz/Ctz/Popcnt, and Extend{8,16,32}S — to (a) read source regs from `vreg_to_arm` instead of assuming R0:R1/R2:R3, and (b) place the destination on a fresh pair via `alloc_i64_pair`. - Update the function epilogue: previously `is_i64_result` short-circuited the move-to-R0 because the result was already pinned at R0:R1. With the fix, the result lives wherever regalloc placed it, so emit explicit moves into R0 and R1, ordered to handle the rare case where lo lives in R1 (route via R12). - For Clz/Ctz/Popcnt, whose ArmOp encoder zeros `rnhi` in-place to produce the i64 result hi, copy the source hi into a fresh callee-saved hi slot before the op so we don't smash the original input or any unrelated AAPCS reg. Track the chosen physical hi reg via `last_result_vreg_hi_reg` for the epilogue. Verification: /tmp/match_gale.wat post-fix: 0: movs r4, #1 ; was: movs r0, #1 2: movs r5, #0 ; was: movs r1, #0 ... c: cmp r0, r1 ; params intact gale_k_sem_give_decide post-fix: e8: movs r4, #1 ; was: movs r0, #1 ea: movs r5, #0 ; was: movs r1, #0 `cargo test --workspace` is green. The no-optimize path (`instruction_selector.rs`) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
optimizer_bridge::ir_to_arm) hardcoded every i64 opcode to use R0:R1 / R2:R3 — regardless of whether those AAPCS arg slots still held live params. Any function that emitted an i64 op (e.g.i64.const) before reading all its i32 params would clobber R0/R1 on the very first instruction.alloc_i64_pair, a helper that picks a free callee-saved pair from[(R4,R5), (R6,R7), (R8,R9), (R10,R11)]and skips anything invreg_to_arm.values(),local_to_reg.values(), or the reserved AAPCS slotsR0..R(min(num_params,4)).vreg_to_arminstead of assumed pinned. The function epilogue emits the explicit (R0, R1) move for i64 results since they no longer pre-land at R0:R1.instruction_selector.rs) is untouched (separate fix already on the parent branch).Repro / before-after
Minimal repro
/tmp/match_gale.wat— 3 i32 params, an i64 local, ani64.const 1prologue:Before:
After:
gale_k_sem_give_decidefrom the engine_control bench wasm:Before: started with
movs r0, #1; movs r1, #0(clobbering 2 params).After: starts with
movs r4, #1; movs r5, #0(params preserved).Test plan
cargo build --release --bin synth— cleancargo test --workspace— green (no failures, contracts:: tests pass on this rev)/tmp/match_gale.wat— first instr ismovs r4, #1, params intactgale_ffi.loom.wasm—gale_k_sem_give_decideno longer clobbers R0/R1Notes / scope
dest_lo, dest_hi, ..to listing all 6 source-vreg fields explicitly so we can resolve them throughvreg_to_arm. Real new logic (alloc helper, epilogue update, Clz/Ctz/Popcnt hi-reg dance) is ~150 LOC; the rest is field-name verbosity. Architectural rewrite of the optimized path would be a bigger project than what this PR is scoped for.🤖 Generated with Claude Code