Skip to content

fix(opt): regalloc clobbers parameter registers — AAPCS violation in i64 ops#86

Open
avrabe wants to merge 1 commit intofix/synth-i64-locals-and-framefrom
fix/synth-optimized-regalloc-params
Open

fix(opt): regalloc clobbers parameter registers — AAPCS violation in i64 ops#86
avrabe wants to merge 1 commit intofix/synth-i64-locals-and-framefrom
fix/synth-optimized-regalloc-params

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Apr 29, 2026

Summary

  • The optimized lowering path (optimizer_bridge::ir_to_arm) hardcoded every i64 opcode to use R0:R1 / R2:R3 — regardless of whether those AAPCS arg slots still held live params. Any function that emitted an i64 op (e.g. i64.const) before reading all its i32 params would clobber R0/R1 on the very first instruction.
  • Replace the hardcoding with alloc_i64_pair, a helper that picks a free callee-saved pair from [(R4,R5), (R6,R7), (R8,R9), (R10,R11)] and skips anything in vreg_to_arm.values(), local_to_reg.values(), or the reserved AAPCS slots R0..R(min(num_params,4)).
  • Source operands are now resolved through vreg_to_arm instead of assumed pinned. The function epilogue emits the explicit (R0, R1) move for i64 results since they no longer pre-land at R0:R1.
  • Touches every i64 op handler in the optimized path; the no-optimize path (instruction_selector.rs) is untouched (separate fix already on the parent branch).

Repro / before-after

Minimal repro /tmp/match_gale.wat — 3 i32 params, an i64 local, an i64.const 1 prologue:

Before:

00000000 <match_gale>:
   0:  movs  r0, #1   ; <-- destroys param 0
   2:  movs  r1, #0   ; <-- destroys param 1
   ...
   c:  cmp   r0, r1   ; reads garbage

After:

00000000 <match_gale>:
   0:  movs  r4, #1
   2:  movs  r5, #0
   ...
   c:  cmp   r0, r1   ; params intact

gale_k_sem_give_decide from the engine_control bench wasm:

Before: started with movs r0, #1; movs r1, #0 (clobbering 2 params).
After: starts with movs r4, #1; movs r5, #0 (params preserved).

Test plan

  • cargo build --release --bin synth — clean
  • cargo test --workspace — green (no failures, contracts:: tests pass on this rev)
  • Repro /tmp/match_gale.wat — first instr is movs r4, #1, params intact
  • Bench wasm gale_ffi.loom.wasmgale_k_sem_give_decide no longer clobbers R0/R1
  • Renode CI re-run on the engine_control bench (depends on environment — recommend triggering after merge)

Notes / scope

  • Diff is ~790 insertions / ~293 deletions, over the 400 LOC soft cap noted in the task spec. The bulk of the additions are mechanical: every i64 handler had to switch from dest_lo, dest_hi, .. to listing all 6 source-vreg fields explicitly so we can resolve them through vreg_to_arm. Real new logic (alloc helper, epilogue update, Clz/Ctz/Popcnt hi-reg dance) is ~150 LOC; the rest is field-name verbosity. Architectural rewrite of the optimized path would be a bigger project than what this PR is scoped for.

🤖 Generated with Claude Code

…i64 ops

In the optimized lowering path (`optimizer_bridge::ir_to_arm`), every i64
opcode hardcoded its register pair to R0:R1 / R2:R3 and ignored the actual
operand vregs from `vreg_to_arm`. For any function that emitted an i64 op
before all its i32 params had been read (e.g. `i64.const 1` issued before
the first `local.get 0`), the very first emitted instruction would clobber
the AAPCS arg register holding the param.

Concretely, `match_gale.wat` (3 i32 params + 1 i64 local + an `i64.const`
prologue) compiled to:

  movs r0, #1     ; <-- destroys param 0
  movs r1, #0     ; <-- destroys param 1
  ...
  cmp  r0, r1     ; reads garbage

Same shape on the engine_control bench wasm: `gale_k_sem_give_decide`
started with `movs r0, #1; movs r1, #0` before any `local.get 0`.

Fix:
- Compute `param_reserved_regs: Vec<Reg>` = R0..R(min(num_params, 4))
  at function entry. Use Vec because `Reg` doesn't derive Hash; matches
  `instruction_selector::alloc_consecutive_pair` style.
- Add an `alloc_i64_pair` helper that searches `[(R4,R5), (R6,R7),
  (R8,R9), (R10,R11)]` for a free callee-saved pair, skipping any reg
  that's already in `vreg_to_arm.values()`, `local_to_reg.values()`, or
  `param_reserved_regs`.
- Rewrite every i64 handler — I64Const, I64Load (non-param case),
  I64Add/Sub/And/Or/Xor/Mul/Shl/ShrS/ShrU/Rotl/Rotr/DivS/DivU/RemS/RemU,
  the comparisons (Eq/Ne/Lt/Gt/Le/Ge {S,U}, Eqz), Clz/Ctz/Popcnt, and
  Extend{8,16,32}S — to (a) read source regs from `vreg_to_arm` instead
  of assuming R0:R1/R2:R3, and (b) place the destination on a fresh
  pair via `alloc_i64_pair`.
- Update the function epilogue: previously `is_i64_result` short-circuited
  the move-to-R0 because the result was already pinned at R0:R1. With
  the fix, the result lives wherever regalloc placed it, so emit explicit
  moves into R0 and R1, ordered to handle the rare case where lo lives
  in R1 (route via R12).
- For Clz/Ctz/Popcnt, whose ArmOp encoder zeros `rnhi` in-place to
  produce the i64 result hi, copy the source hi into a fresh callee-saved
  hi slot before the op so we don't smash the original input or any
  unrelated AAPCS reg. Track the chosen physical hi reg via
  `last_result_vreg_hi_reg` for the epilogue.

Verification:

  /tmp/match_gale.wat post-fix:
    0:  movs  r4, #1   ; was: movs r0, #1
    2:  movs  r5, #0   ; was: movs r1, #0
    ...
    c:  cmp   r0, r1   ; params intact

  gale_k_sem_give_decide post-fix:
    e8:  movs  r4, #1  ; was: movs r0, #1
    ea:  movs  r5, #0  ; was: movs r1, #0

`cargo test --workspace` is green. The no-optimize path
(`instruction_selector.rs`) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant