feat: jointly solve coupled rank-deficible inline-linear SCC families (#98)#100
Conversation
|
Ran the pending end-to-end confirmation on HalfCar (this branch + #99's 1. Self-check needed a fix (pushed, cec2073)
2. The check is valuable — it disproves the construction-bug theoryEvery emitted block, including the big 396 → 295 reduction, passes at generic (random) symbol values: The repair pass never fires on this model (0 re-expansions) — there is no buried SCC variable in 3. The real #98 mechanism: cross-block indeterminacy at the degenerate parameter pointAt the true parameter/state values (where the joint-axis components are 0 and rank drops), the per-corner block chains behave differently: The blocks are individually exact as symbolic functions, but they are solved sequentially, and at the degenerate point the static-reaction indeterminacy spans blocks: the upstream rank-deficient block's solution choice (min-norm here; garbage LU in production) fixes a gauge that no solution of the downstream block is compatible with. The union of the corner's equations is satisfiable (the state satisfies the model to 4e-10; the full 396-block is consistent), so splitting into sequentially-solved sub-blocks is what invalidates the system — not how each block is built. Implications
I'll update #98 with the corrected mechanism. |
…#98) The diagnostics added earlier disproved the construction-bug theory: every emitted block is exact at generic parameter values and the repair pass never fires on HalfCar. The real #98 mechanism is cross-block indeterminacy at the degenerate parameter point — blocks are individually exact but solved sequentially, so an upstream rank-deficient block's gauge choice (min-norm, or garbage from a plain LU) is substituted downstream and makes a dependent block inconsistent, even though the union of the family's equations is satisfiable. Fix: group coupled, runtime-rank-deficible inline-linear SCCs into families and solve each family as one joint linear system, so the gauge is resolved consistently across the whole family instead of being frozen between blocks. A family is a maximal run of consecutive blocks that are each rank-deficible (their equations reference a `maybe_zeros` parameter, so a coefficient can vanish and drop the rank) and chained by structural coupling (each block's equations reference the previous block's variables). Non-deficible blocks (e.g. a large full-rank chassis block) are never pulled in. When `maybe_zeros` is empty the grouping is inert, so the default one-block-per-SCC behaviour — and all existing behaviour — is unchanged. The reassembly loop is refactored to emit one block at a time via an `emit_block!` helper; merged families pass the union of their equations/ variables to a single `get_linear_scc_linsol`, falling back to per-member emission if the joint inline solve does not apply. Adds a unit test for the grouping decision. Note: the joint family is still emitted as `INLINE_LINEAR_SCC_OP(A, b)`; a rank-tolerant runtime solve (the sparse direction of #95) is still required for the legitimately rank-deficient-but-consistent merged block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed the joint-solve approach (2e16a5f), per your finding that #98 is cross-block indeterminacy rather than a construction bug. What it does: in A family is a maximal run of consecutive blocks that are:
This should merge each corner's Mechanics: the reassembly loop now emits one block at a time via an Still needed on your side: the joint family is emitted as
A couple of points I'd value your read on:
|
|
Re-ran HalfCar against 2e16a5f (+ #99, rank-tolerant runtime override, Instrumented So, answering your question 1 directly: adjacency-run grouping cannot work here. The prepared block list has 681 entries, almost all singletons, and the members of each corner family are separated by 20–45 interleaved singleton blocks in the topological order. No two droppable blocks are ever adjacent, hence What the grouping needs insteadFollow the block dependency DAG:
One caveat to decide on: the 396 chassis block is itself droppable (it references the axis parameters too) and is downstream of the corner families, so a pure reachability closure will pull it in and produce one ~500-unknown joint block. That is correct (and the measured 295 reduction was consistent at the true point, so it is not necessary) — if the size is a concern, the grouping could stop at blocks whose own rank was never observed to drop, but there is no static information to distinguish "deficible and actually drops" from "deficible but stays full rank", so I'd accept the big merge and lean on the #95 sparse/rank-tolerant runtime solve for cost. On question 2: emission-time merging is the right place, IMO. Refusing to split during tearing would forfeit the block-triangular structure for all operating points to guard a degenerate one, and tearing doesn't know the runtime solve strategy; Side note from this run: both the 5× and the warm-started 30× init LM solves converged this time (resid 1.6e-9 / 4.9e-9) — the init story is in place; the joint-family emission is the last blocker for HalfCar integration. |
…lies Adjacency-run grouping is inert on HalfCar: the prepared block list has 681 entries and the members of each corner family are separated by 20-45 interleaved singleton blocks in the topological order, so no two rank-deficible blocks are ever adjacent. Rework `_group_inline_linear_families` to operate on the block-level dependency DAG (edge j -> k iff block k's equations structurally reference block j's variables): - a family is a connected component of rank-deficible blocks under reachability through the DAG (possibly via intermediate blocks); - each family is closed over the blocks lying on dependency paths between its members, so the merged system is self-contained (every referenced variable is either solved upstream or part of the merged block); - groups are emitted in a topological order of the DAG with each family contracted to one node (families are path-closed, so the contraction cannot create cycles), stable by smallest original block index. Closures of distinct families cannot overlap, and a defensive check falls back to per-SCC blocks if the contracted order fails to cover every block. When `maybe_zeros` is empty the grouping (and the emission order) remains unchanged. This will pull the downstream chassis block into the family when it is itself rank-deficible; that is correct, and cost is deferred to the rank-tolerant/ sparse runtime solve direction of #95. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Pushed the DAG-based grouping (e77da0d), following your instrumented findings exactly:
On the chassis caveat: agreed — a pure reachability component will pull the droppable 396 block into the corner families' component (they all reach it), producing one big joint block. I left that as-is per your "accept the big merge" call; the cost story is #95's sparse/rank-tolerant solve. One thing to watch in your instrumentation: with the corner families now merged into a single large block, the Same Tests: 6/6 on the grouping unit tests (incl. non-adjacent members merged through a path, and an unrelated in-between block left out), full |
|
Tested e77da0d on HalfCar (same setup). The DAG grouping does form a family now — but it over-collects, the joint solve fails on a nonlinear member, and the fallback silently restores the old behavior (runtime blocks unchanged: 15×15 still The trace you asked for: Two compounding problems:
Suggested refinement: peel-on-failure instead of all-or-nothing fallbackThe failure site already identifies the offending equation and variable. Rather than falling back to per-member emission for the whole family:
This converges (each iteration removes ≥1 block), needs no new analysis machinery, and automatically prunes the kinematic/gravity blocks while keeping the force-network core — which is the part that actually exchanges gauge. The per-attempt cost is the existing symbolic An equivalent pre-filter (test each candidate block's equations for linearity in the union variables before forming the family) would avoid the retries, but it duplicates what |
…probe - get_linear_scc_linsol returns NonlinearBlockEq(ieq) instead of nothing when a specific equation is nonlinear in the block's variables; the family loop peels the member owning that equation and retries the joint solve, pruning e.g. kinematic blocks while keeping the linear reaction-force core that exchanges gauge (issue #98). Peeled members are emitted as their own blocks. - self-check: bounded (0.15, 0.85) array-aware probe draws keep common expression domains valid (sqrt(1-x^2), indexing of array parameters); check call site catches and reports probe failures instead of aborting compilation (opt-in diagnostics must never break a build). - family-formation summary logged under MTKTEARING_CHECK_REDUCTION. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, orderable subset Findings from HalfCar (issue #98) drove four changes: 1. Forward dependency edges only: the prepared blocks are in BLT order, which is a valid linearization of the matching-based dependencies; raw residual incidence also contains backward references (through torn variables of later blocks), which made the 'DAG' genuinely cyclic and silently collapsed every grouping to singletons via the Kahn coverage fallback. 2. Linearity is set-dependent, not global: a block may be nonlinear in some other block's variables (cos of another block's angle) while being exactly the linear rank-deficient block a family must absorb. Mergeability is now a memoized pairwise check (block b linear in block j's variables) validated over a candidate family's full closure. 3. Greedy convex growth: families grow member-by-member in BLT order while the full-DAG closure stays pairwise-linear and unclaimed, so every finalized family is convex (no emission cycles) and jointly linearly solvable by construction. Emission-time peeling is gone — it broke convexity and produced real evaluation cycles. 4. Mutually-unorderable families: even disjoint convex families can interleave such that contraction creates a cycle; Kahn now dissolves the latest stalled family and retries, keeping a maximal orderable subset. On HalfCar this merges each corner's reaction chain into one joint block: runtime 131x131, rank 129 (the two corners' gauge dimensions) and CONSISTENT (relres <= 2e-10, vs 0.945 for the old per-block emission), with the remaining 12/12/301 blocks consistent as well. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Pushed the grouping rework in 590d510, after a long instrumented session on HalfCar. The #98 inconsistency is now fixed at runtime. Headline measurement (rank-revealing runtime solve, true parameter point): The corners' gauge-coupled reaction chains are merged into one joint block; min-norm resolves both corners' static indeterminacy with a single consistent gauge. Getting there required four corrections to e77da0d, each driven by an instrumented failure:
Two caveats / follow-ups:
Suites: Side fix landed in MultibodyComponents (04f7359): |
- Cap candidate family closures at 512 equations: the symbolic cost of building and reducing the joint block grows superlinearly, and an unbounded greedy can chain corner families through a free chassis into one enormous family (compile-time OOM on FullCar). 512 comfortably covers the per-corner reaction families this pass exists for; the fallback is status-quo per-block emission. - Family-grouping progress logs now also available under MTKT_FAMGRP_LOG without paying for the full MTKTEARING_CHECK_REDUCTION snapshots; flush after the formation summary so it survives an OOM kill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
End-to-end confirmation on HalfCar (the pending item in the description), run with this branch merged together with #99 into
|
|
I'm convinced that |
|
You're right on both counts, and yes — it's two separable things in two different functions, I'll split them. PR 1 — the PR 2 — the family grouping. This is the actual #98 mechanism (sequential per-block solves freeze an upstream rank-deficient gauge that then makes a downstream block inconsistent, even though the family's union is satisfiable). It reworks the emission loop and is the larger, more speculative change — and it's incomplete: the merged block still needs a rank-tolerant runtime solve (the #95 direction) to actually be solved. Given that dependency, it probably belongs in the same conversation as #105 rather than here. I'll pull it into its own PR so we can evaluate it against that direction without blocking the correctness fix. |
|
Split done:
So the correctness fix can land on its own, and the grouping stays open for evaluation against the rank-tolerant-runtime-solve direction (#95 / #105) without blocking it. |
The family-grouping half of the original PR, now stacked on #106 (the
get_linear_scc_linsolcorrectness fix). The base of this PR isfbc/inline-linear-scc-buried-b-repair, so the diff below is only the grouping work; it will auto-retarget tomainonce #106 merges.Mechanism (the actual #98 cause)
The diagnostics added in #106 disproved the construction-bug theory: every emitted block is exact at generic parameter values and the repair pass never fires on HalfCar. The real #98 mechanism is cross-block indeterminacy at the degenerate parameter point — blocks are individually exact but solved sequentially, so an upstream rank-deficient block's gauge choice (min-norm, or garbage from a plain LU) is substituted downstream and makes a dependent block inconsistent, even though the union of the family's equations is satisfiable.
Fix
Group coupled, runtime-rank-deficible inline-linear SCCs into families along the block-level dependency DAG and solve each family as one joint linear system, so the gauge is resolved consistently across the whole family. When
maybe_zerosis empty the grouping is inert and all existing behaviour is unchanged. Later commits refine this with set-dependent linearity + greedy convex growth, peel-on-failure for nonlinear members, and an equation-count budget on family closures.Known dependency / why it's separated
The merged family is still emitted as
INLINE_LINEAR_SCC_OP(A, b); a legitimately rank-deficient-but-consistent merged block needs a rank-tolerant runtime solve (the sparse direction of #95) to actually be solved. That rank-tolerant solve is closer to the true root cause than the grouping is — see the analysis in #105. Keeping this separate from #106 lets the correctness fix land independently while the grouping is evaluated against that direction.