Pivot-solvability inline-linear row elimination with size-budgeted composition (#95) by baggepinnen · Pull Request #105 · JuliaComputing/StateSelection.jl

baggepinnen · 2026-06-13T06:21:41Z

Fixes the block-size half of #95: the inline-linear-SCC pass emits a dense runtime A \ b an order of magnitude larger than the irreducible core (HalfCar 297, FullCar 571 for ~12/50-variable cores).

Change

__reduce_linear_system! previously eliminated a matched row only when every coefficient in the row was constant. This keeps hundreds of sequentially-solvable rows in the numeric block (their non-pivot coefficients are state-dependent rotation-matrix entries). Relax the gate to pivot solvability: eliminate a matched row when its pivot coefficient is a known-nonzero constant, folding the symbolic off-pivot coefficients into the alias expression.

The naive form of this (noted in #95 as "shrinks but explodes symbolically") stalls codegen because composed alias expressions blow up. This PR bounds that with:

a single topologically-ordered greedy pass that composes references to already-eliminated variables inline, so committed aliases are always closed over retained variables;
a node budget (MTKTEARING_ELIM_SIZE_BUDGET, default 1000 unique nodes, counted with an early-abort memoized DAG walk): a variable whose composed expression exceeds the budget stays in the numeric core. This keeps elimination cheap and the residual core small without unbounded symbolic growth.

Semantic iszero(::Num) is avoided on the now-symbolic coefficients (cheap_iszero in get_new_mm duplicate-summing; syntactic filtering in the composition — cf. #99).

Results (MultibodyComponents suspension cars)

Emitted blocks: HalfCar [46,46,297] → [6,6,64]-class; FullCar [46×4,571] → small core.
Warm Rodas5P solve: HalfCar ~0.14 s (was ~90 s-class), FullCar ~0.42 s (was ~65 s).
The reduction is verified exact: reconstructing the full HalfCar acceleration block from either the old all-const reduction or this pivot-gate reduction reproduces the ground-truth solution (full-block residual ~1e-10).
MTT test suite: same pass set as the branch this stacks on. PR#100's MTKTEARING_CHECK_REDUCTION self-check: zero violations on quarter/half car.

Robustness note (separate concern, kept open)

While validating, I found the large block this PR shrinks is ill-conditioned / numerically singular at the cars' axis-aligned joint configurations, and the runtime solve mishandles it there: the DyadCompilerPasses LDIV rewrite (optimize=:basic, unpivoted) returns silently-wrong values (a mirrored suspension wheel's spin acceleration 0.117 instead of ~0), while plain pivoted \ (optimize=:none) throws Infs or NaNs on the exactly-singular step. Shrinking to the small core makes the block well-conditioned, so both failure modes vanish for these models — but the general fix is the rank-tolerant runtime solve (the secondary ask in #95), which this PR does not address. I'll file that separately with the repro.

Stacking

Branched off the integration of #99 + #100 + #103 (shares reassemble.jl/CLIL with #100, needs #99's cheap_iszero). Merge those first; the only reviewable commit here is the final one (pivot-solvability row elimination …).

🤖 Generated with Claude Code

… values) dropzeros!(::SparseMatrixCLIL) (added in #97) called Base.iszero on the stored values. For symbolic value types (Num) that is a semantic, expansion-based zero test that OOMs on large coefficient expressions. Explicit stored zeros are always structural, so a structural check suffices: add a cheap_iszero hook (default Base.iszero) and overload it for Num/SymbolicT in ModelingToolkitTearing via SU._iszero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

`get_linear_scc_linsol` gates column extraction on `has_edge(graph, ...)`. That structural graph is mutated during reassembly and can desync from the `total_sub`-rewritten residual, so a false-negative edge leaves a live SCC variable buried inside `b[eqidx]`. `__reduce_linear_system!` then eliminates that row (it inspects only the coefficient row, never `b`), smuggling the variable onto the RHS of retained equations and producing a runtime `A \ b` that is rank-deficient with `b` outside range(A) at a fully consistent state -> `SingularException`/garbage and `Unstable` integration. Restore the invariant "`b` is free of all SCC variables" with a repair pass that re-expands any leftover term into `A` (backstop: `return nothing` to fall back to the safe non-inlined path if a term is non-linearizable). The reduction in `__reduce_linear_system!`/`get_new_mm` is itself exact and unchanged. Also add an opt-in self-check (env var `MTKTEARING_CHECK_REDUCTION`, default off, zero cost when off): a positional numeric reduction-identity check plus a rank-tolerant full-vs-reduced rank/consistency report, to confirm/localize the issue on large models. New synthetic tests cover transitive alias chains, multi-eliminated-variable rows, rank-deficient-but-consistent blocks, symbolic RHS, and that the identity check catches an injected error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

substitute defaults to fold=Val(false), leaving fully-numeric expressions like -0.63*sin(-0.54) symbolic, so Float64() conversion in _evalnum threw on any block with trigonometric coefficients. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…#98) The diagnostics added earlier disproved the construction-bug theory: every emitted block is exact at generic parameter values and the repair pass never fires on HalfCar. The real #98 mechanism is cross-block indeterminacy at the degenerate parameter point — blocks are individually exact but solved sequentially, so an upstream rank-deficient block's gauge choice (min-norm, or garbage from a plain LU) is substituted downstream and makes a dependent block inconsistent, even though the union of the family's equations is satisfiable. Fix: group coupled, runtime-rank-deficible inline-linear SCCs into families and solve each family as one joint linear system, so the gauge is resolved consistently across the whole family instead of being frozen between blocks. A family is a maximal run of consecutive blocks that are each rank-deficible (their equations reference a `maybe_zeros` parameter, so a coefficient can vanish and drop the rank) and chained by structural coupling (each block's equations reference the previous block's variables). Non-deficible blocks (e.g. a large full-rank chassis block) are never pulled in. When `maybe_zeros` is empty the grouping is inert, so the default one-block-per-SCC behaviour — and all existing behaviour — is unchanged. The reassembly loop is refactored to emit one block at a time via an `emit_block!` helper; merged families pass the union of their equations/ variables to a single `get_linear_scc_linsol`, falling back to per-member emission if the joint inline solve does not apply. Adds a unit test for the grouping decision. Note: the joint family is still emitted as `INLINE_LINEAR_SCC_OP(A, b)`; a rank-tolerant runtime solve (the sparse direction of #95) is still required for the legitimately rank-deficient-but-consistent merged block. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lies Adjacency-run grouping is inert on HalfCar: the prepared block list has 681 entries and the members of each corner family are separated by 20-45 interleaved singleton blocks in the topological order, so no two rank-deficible blocks are ever adjacent. Rework `_group_inline_linear_families` to operate on the block-level dependency DAG (edge j -> k iff block k's equations structurally reference block j's variables): - a family is a connected component of rank-deficible blocks under reachability through the DAG (possibly via intermediate blocks); - each family is closed over the blocks lying on dependency paths between its members, so the merged system is self-contained (every referenced variable is either solved upstream or part of the merged block); - groups are emitted in a topological order of the DAG with each family contracted to one node (families are path-closed, so the contraction cannot create cycles), stable by smallest original block index. Closures of distinct families cannot overlap, and a defensive check falls back to per-SCC blocks if the contracted order fails to cover every block. When `maybe_zeros` is empty the grouping (and the emission order) remains unchanged. This will pull the downstream chassis block into the family when it is itself rank-deficible; that is correct, and cost is deferred to the rank-tolerant/ sparse runtime solve direction of #95. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…probe - get_linear_scc_linsol returns NonlinearBlockEq(ieq) instead of nothing when a specific equation is nonlinear in the block's variables; the family loop peels the member owning that equation and retries the joint solve, pruning e.g. kinematic blocks while keeping the linear reaction-force core that exchanges gauge (issue #98). Peeled members are emitted as their own blocks. - self-check: bounded (0.15, 0.85) array-aware probe draws keep common expression domains valid (sqrt(1-x^2), indexing of array parameters); check call site catches and reports probe failures instead of aborting compilation (opt-in diagnostics must never break a build). - family-formation summary logged under MTKTEARING_CHECK_REDUCTION. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…, orderable subset Findings from HalfCar (issue #98) drove four changes: 1. Forward dependency edges only: the prepared blocks are in BLT order, which is a valid linearization of the matching-based dependencies; raw residual incidence also contains backward references (through torn variables of later blocks), which made the 'DAG' genuinely cyclic and silently collapsed every grouping to singletons via the Kahn coverage fallback. 2. Linearity is set-dependent, not global: a block may be nonlinear in some other block's variables (cos of another block's angle) while being exactly the linear rank-deficient block a family must absorb. Mergeability is now a memoized pairwise check (block b linear in block j's variables) validated over a candidate family's full closure. 3. Greedy convex growth: families grow member-by-member in BLT order while the full-DAG closure stays pairwise-linear and unclaimed, so every finalized family is convex (no emission cycles) and jointly linearly solvable by construction. Emission-time peeling is gone — it broke convexity and produced real evaluation cycles. 4. Mutually-unorderable families: even disjoint convex families can interleave such that contraction creates a cycle; Kahn now dissolves the latest stalled family and retries, keeping a maximal orderable subset. On HalfCar this merges each corner's reaction chain into one joint block: runtime 131x131, rank 129 (the two corners' gauge dimensions) and CONSISTENT (relres <= 2e-10, vs 0.945 for the old per-block emission), with the remaining 12/12/301 blocks consistent as well. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ching-SCCs The SCCs of the full Pantelides matching can be strictly finer than the blocks of the (differentiated equations) x (highest-derivative candidates) subproblem on which Mattsson-Soderlind selection is posed. A differentiated equation matched in one SCC but incident to candidate variables in another (e.g. a twice-differentiated connection alias) creates a singleton SCC whose candidate is demoted unconditionally, making state_priority silently ineffective and potentially forcing a state realization with singularities. Merge such SCCs with union-find before selection so the priority-sorted greedy demotion sees the full block. Also permute the integer Jacobian's columns when the candidates are priority-sorted, so bareiss col_order indexes the sorted variable list consistently. Fixes #101. Fixes #102. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- Cap candidate family closures at 512 equations: the symbolic cost of building and reducing the joint block grows superlinearly, and an unbounded greedy can chain corner families through a free chassis into one enormous family (compile-time OOM on FullCar). 512 comfortably covers the per-corner reaction families this pass exists for; the fallback is status-quo per-block emission. - Family-grouping progress logs now also available under MTKT_FAMGRP_LOG without paying for the full MTKTEARING_CHECK_REDUCTION snapshots; flush after the formation summary so it survives an OOM kill. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… fbc/halfcar-stack

…istent-b' into fbc/halfcar-stack

…omposition (#95) Eliminate matched inline-linear-SCC rows when the PIVOT coefficient is a known-nonzero constant (previous gate required the whole row constant), folding symbolic off-pivot coefficients into the alias expressions. A single topologically-ordered greedy pass composes references to already-eliminated variables inline (committed aliases stay closed over retained variables) and keeps a variable in the numeric core when its composed expression exceeds a node budget (MTKTEARING_ELIM_SIZE_BUDGET, default 1000 unique nodes, counted with an early-abort memoized DAG walk). The budget bounds the symbolic growth that made an unbudgeted relaxation stall codegen (#95). Semantic iszero(::Num) avoided on the now-symbolic coefficients: cheap_iszero in get_new_mm duplicate-summing and syntactic filtering in the composition (cf. #99). Effect: emitted HalfCar blocks [46,46,297] -> [6,6,64]-class, FullCar [46x4,571] -> small core; warm Rodas5P HalfCar solve ~0.14 s (was ~90 s-class), FullCar warm ~0.42 s (was ~65 s). The reduction is an exact algebraic substitution (verified: reconstructing the full HalfCar acceleration block from either the old all-const or this pivot-gate reduction reproduces the ground-truth solution; full-block residual ~1e-10). MTT test suite: same pass set as the unmodified branch. PR#100 reduction-identity self-check: zero violations on quarter/half car. Robustness note (separate from the speedup): the large emitted block this pass shrinks is ill-conditioned / numerically singular at the cars' axis-aligned configurations. The runtime inline-linear solve mis-handles it there -- the DyadCompilerPasses LDIV rewrite (optimize=:basic, unpivoted) returns silently wrong values (a mirrored suspension wheel's spin acceleration 0.117 instead of ~0), while plain pivoted `\` (optimize=:none) throws on the exactly-singular step. Shrinking to the small core makes the block well-conditioned, so both the correctness hazard and the throw disappear for these models. The general fix is a rank-tolerant runtime solve (the secondary ask in #95); that remains open. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

AayushSabharwal · 2026-06-14T07:20:43Z

I think this is a reasonable change. We can probably do it unconditionally if I improve how substitution is done in ODEProblem (translation: throw IRStructure at the problem)

baggepinnen · 2026-06-14T07:48:12Z

Not quite sure what

throw IRStructure at the problem

means. You want to take over?

AayushSabharwal · 2026-06-14T07:51:42Z

I'll take this PR, remove the check which avoids making the expression "too large," figure out what is making ODEProblem slow, and fix that. Then this PR doesn't need to jump through hoops.

KristofferC · 2026-06-14T13:22:41Z

I think many of the recent performance PRs I've made (with caching) etc should eliminate the need for the MTKTEARING_ELIM_SIZE_BUDGET. I have a similar version to this without it and have no problems with expression performance. SciML/ModelingToolkit.jl#4621 is important, though.

baggepinnen and others added 13 commits June 11, 2026 06:47

test: dummy-derivative block merging and Jacobian column permutation

f19b11a

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/fbc/cheap-iszero-dropzeros' into…

8b521ee

… fbc/halfcar-stack

Merge remote-tracking branch 'origin/fbc/fix-inline-linear-scc-incons…

21fa954

…istent-b' into fbc/halfcar-stack

baggepinnen mentioned this pull request Jun 13, 2026

Inline linear SCCs: 296×296 dense runtime solve for an ~18-unknown core; non-rank-tolerant; symbolic shrinkage intractable — solve sparse+rank-tolerant instead #95

Open

baggepinnen referenced this pull request in KristofferC/StateSelection.jl Jun 13, 2026

robot perf

29d5816

baggepinnen mentioned this pull request Jun 14, 2026

feat: jointly solve coupled rank-deficible inline-linear SCC families (#98) #100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot-solvability inline-linear row elimination with size-budgeted composition (#95)#105

Pivot-solvability inline-linear row elimination with size-budgeted composition (#95)#105
baggepinnen wants to merge 13 commits into
mainfrom
fbc/pivot-criterion-elimination

baggepinnen commented Jun 13, 2026

Uh oh!

AayushSabharwal commented Jun 14, 2026

Uh oh!

baggepinnen commented Jun 14, 2026

Uh oh!

AayushSabharwal commented Jun 14, 2026

Uh oh!

KristofferC commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

baggepinnen commented Jun 13, 2026

Change

Results (MultibodyComponents suspension cars)

Robustness note (separate concern, kept open)

Stacking

Uh oh!

AayushSabharwal commented Jun 14, 2026

Uh oh!

baggepinnen commented Jun 14, 2026

Uh oh!

AayushSabharwal commented Jun 14, 2026

Uh oh!

KristofferC commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants