Skip to content

Pivot-solvability inline-linear row elimination with size-budgeted composition (#95)#105

Open
baggepinnen wants to merge 13 commits into
mainfrom
fbc/pivot-criterion-elimination
Open

Pivot-solvability inline-linear row elimination with size-budgeted composition (#95)#105
baggepinnen wants to merge 13 commits into
mainfrom
fbc/pivot-criterion-elimination

Conversation

@baggepinnen

Copy link
Copy Markdown
Contributor

Fixes the block-size half of #95: the inline-linear-SCC pass emits a dense runtime A \ b an order of magnitude larger than the irreducible core (HalfCar 297, FullCar 571 for ~12/50-variable cores).

Change

__reduce_linear_system! previously eliminated a matched row only when every coefficient in the row was constant. This keeps hundreds of sequentially-solvable rows in the numeric block (their non-pivot coefficients are state-dependent rotation-matrix entries). Relax the gate to pivot solvability: eliminate a matched row when its pivot coefficient is a known-nonzero constant, folding the symbolic off-pivot coefficients into the alias expression.

The naive form of this (noted in #95 as "shrinks but explodes symbolically") stalls codegen because composed alias expressions blow up. This PR bounds that with:

  • a single topologically-ordered greedy pass that composes references to already-eliminated variables inline, so committed aliases are always closed over retained variables;
  • a node budget (MTKTEARING_ELIM_SIZE_BUDGET, default 1000 unique nodes, counted with an early-abort memoized DAG walk): a variable whose composed expression exceeds the budget stays in the numeric core. This keeps elimination cheap and the residual core small without unbounded symbolic growth.

Semantic iszero(::Num) is avoided on the now-symbolic coefficients (cheap_iszero in get_new_mm duplicate-summing; syntactic filtering in the composition — cf. #99).

Results (MultibodyComponents suspension cars)

  • Emitted blocks: HalfCar [46,46,297][6,6,64]-class; FullCar [46×4,571] → small core.
  • Warm Rodas5P solve: HalfCar ~0.14 s (was ~90 s-class), FullCar ~0.42 s (was ~65 s).
  • The reduction is verified exact: reconstructing the full HalfCar acceleration block from either the old all-const reduction or this pivot-gate reduction reproduces the ground-truth solution (full-block residual ~1e-10).
  • MTT test suite: same pass set as the branch this stacks on. PR#100's MTKTEARING_CHECK_REDUCTION self-check: zero violations on quarter/half car.

Robustness note (separate concern, kept open)

While validating, I found the large block this PR shrinks is ill-conditioned / numerically singular at the cars' axis-aligned joint configurations, and the runtime solve mishandles it there: the DyadCompilerPasses LDIV rewrite (optimize=:basic, unpivoted) returns silently-wrong values (a mirrored suspension wheel's spin acceleration 0.117 instead of ~0), while plain pivoted \ (optimize=:none) throws Infs or NaNs on the exactly-singular step. Shrinking to the small core makes the block well-conditioned, so both failure modes vanish for these models — but the general fix is the rank-tolerant runtime solve (the secondary ask in #95), which this PR does not address. I'll file that separately with the repro.

Stacking

Branched off the integration of #99 + #100 + #103 (shares reassemble.jl/CLIL with #100, needs #99's cheap_iszero). Merge those first; the only reviewable commit here is the final one (pivot-solvability row elimination …).

🤖 Generated with Claude Code

baggepinnen and others added 13 commits June 11, 2026 06:47
… values)

dropzeros!(::SparseMatrixCLIL) (added in #97) called Base.iszero on the stored
values. For symbolic value types (Num) that is a semantic, expansion-based
zero test that OOMs on large coefficient expressions. Explicit stored zeros
are always structural, so a structural check suffices: add a cheap_iszero
hook (default Base.iszero) and overload it for Num/SymbolicT in
ModelingToolkitTearing via SU._iszero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
`get_linear_scc_linsol` gates column extraction on `has_edge(graph, ...)`.
That structural graph is mutated during reassembly and can desync from the
`total_sub`-rewritten residual, so a false-negative edge leaves a live SCC
variable buried inside `b[eqidx]`. `__reduce_linear_system!` then eliminates
that row (it inspects only the coefficient row, never `b`), smuggling the
variable onto the RHS of retained equations and producing a runtime `A \ b`
that is rank-deficient with `b` outside range(A) at a fully consistent state
-> `SingularException`/garbage and `Unstable` integration.

Restore the invariant "`b` is free of all SCC variables" with a repair pass
that re-expands any leftover term into `A` (backstop: `return nothing` to fall
back to the safe non-inlined path if a term is non-linearizable). The reduction
in `__reduce_linear_system!`/`get_new_mm` is itself exact and unchanged.

Also add an opt-in self-check (env var `MTKTEARING_CHECK_REDUCTION`, default
off, zero cost when off): a positional numeric reduction-identity check plus a
rank-tolerant full-vs-reduced rank/consistency report, to confirm/localize the
issue on large models. New synthetic tests cover transitive alias chains,
multi-eliminated-variable rows, rank-deficient-but-consistent blocks, symbolic
RHS, and that the identity check catches an injected error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
substitute defaults to fold=Val(false), leaving fully-numeric expressions
like -0.63*sin(-0.54) symbolic, so Float64() conversion in _evalnum threw
on any block with trigonometric coefficients.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#98)

The diagnostics added earlier disproved the construction-bug theory: every
emitted block is exact at generic parameter values and the repair pass never
fires on HalfCar. The real #98 mechanism is cross-block indeterminacy at the
degenerate parameter point — blocks are individually exact but solved
sequentially, so an upstream rank-deficient block's gauge choice (min-norm, or
garbage from a plain LU) is substituted downstream and makes a dependent block
inconsistent, even though the union of the family's equations is satisfiable.

Fix: group coupled, runtime-rank-deficible inline-linear SCCs into families and
solve each family as one joint linear system, so the gauge is resolved
consistently across the whole family instead of being frozen between blocks.

A family is a maximal run of consecutive blocks that are each rank-deficible
(their equations reference a `maybe_zeros` parameter, so a coefficient can
vanish and drop the rank) and chained by structural coupling (each block's
equations reference the previous block's variables). Non-deficible blocks (e.g.
a large full-rank chassis block) are never pulled in. When `maybe_zeros` is
empty the grouping is inert, so the default one-block-per-SCC behaviour — and
all existing behaviour — is unchanged.

The reassembly loop is refactored to emit one block at a time via an
`emit_block!` helper; merged families pass the union of their equations/
variables to a single `get_linear_scc_linsol`, falling back to per-member
emission if the joint inline solve does not apply. Adds a unit test for the
grouping decision.

Note: the joint family is still emitted as `INLINE_LINEAR_SCC_OP(A, b)`; a
rank-tolerant runtime solve (the sparse direction of #95) is still required for
the legitimately rank-deficient-but-consistent merged block.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lies

Adjacency-run grouping is inert on HalfCar: the prepared block list has 681
entries and the members of each corner family are separated by 20-45
interleaved singleton blocks in the topological order, so no two rank-deficible
blocks are ever adjacent.

Rework `_group_inline_linear_families` to operate on the block-level dependency
DAG (edge j -> k iff block k's equations structurally reference block j's
variables):

- a family is a connected component of rank-deficible blocks under reachability
  through the DAG (possibly via intermediate blocks);
- each family is closed over the blocks lying on dependency paths between its
  members, so the merged system is self-contained (every referenced variable is
  either solved upstream or part of the merged block);
- groups are emitted in a topological order of the DAG with each family
  contracted to one node (families are path-closed, so the contraction cannot
  create cycles), stable by smallest original block index.

Closures of distinct families cannot overlap, and a defensive check falls back
to per-SCC blocks if the contracted order fails to cover every block. When
`maybe_zeros` is empty the grouping (and the emission order) remains unchanged.

This will pull the downstream chassis block into the family when it is itself
rank-deficible; that is correct, and cost is deferred to the rank-tolerant/
sparse runtime solve direction of #95.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…probe

- get_linear_scc_linsol returns NonlinearBlockEq(ieq) instead of nothing when
  a specific equation is nonlinear in the block's variables; the family loop
  peels the member owning that equation and retries the joint solve, pruning
  e.g. kinematic blocks while keeping the linear reaction-force core that
  exchanges gauge (issue #98). Peeled members are emitted as their own blocks.
- self-check: bounded (0.15, 0.85) array-aware probe draws keep common
  expression domains valid (sqrt(1-x^2), indexing of array parameters);
  check call site catches and reports probe failures instead of aborting
  compilation (opt-in diagnostics must never break a build).
- family-formation summary logged under MTKTEARING_CHECK_REDUCTION.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, orderable subset

Findings from HalfCar (issue #98) drove four changes:

1. Forward dependency edges only: the prepared blocks are in BLT order, which is
   a valid linearization of the matching-based dependencies; raw residual
   incidence also contains backward references (through torn variables of later
   blocks), which made the 'DAG' genuinely cyclic and silently collapsed every
   grouping to singletons via the Kahn coverage fallback.
2. Linearity is set-dependent, not global: a block may be nonlinear in some
   other block's variables (cos of another block's angle) while being exactly
   the linear rank-deficient block a family must absorb. Mergeability is now a
   memoized pairwise check (block b linear in block j's variables) validated
   over a candidate family's full closure.
3. Greedy convex growth: families grow member-by-member in BLT order while the
   full-DAG closure stays pairwise-linear and unclaimed, so every finalized
   family is convex (no emission cycles) and jointly linearly solvable by
   construction. Emission-time peeling is gone — it broke convexity and
   produced real evaluation cycles.
4. Mutually-unorderable families: even disjoint convex families can interleave
   such that contraction creates a cycle; Kahn now dissolves the latest stalled
   family and retries, keeping a maximal orderable subset.

On HalfCar this merges each corner's reaction chain into one joint block:
runtime 131x131, rank 129 (the two corners' gauge dimensions) and CONSISTENT
(relres <= 2e-10, vs 0.945 for the old per-block emission), with the remaining
12/12/301 blocks consistent as well.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ching-SCCs

The SCCs of the full Pantelides matching can be strictly finer than the
blocks of the (differentiated equations) x (highest-derivative
candidates) subproblem on which Mattsson-Soderlind selection is posed.
A differentiated equation matched in one SCC but incident to candidate
variables in another (e.g. a twice-differentiated connection alias)
creates a singleton SCC whose candidate is demoted unconditionally,
making state_priority silently ineffective and potentially forcing a
state realization with singularities. Merge such SCCs with union-find
before selection so the priority-sorted greedy demotion sees the full
block.

Also permute the integer Jacobian's columns when the candidates are
priority-sorted, so bareiss col_order indexes the sorted variable list
consistently.

Fixes #101. Fixes #102.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Cap candidate family closures at 512 equations: the symbolic cost of building
  and reducing the joint block grows superlinearly, and an unbounded greedy can
  chain corner families through a free chassis into one enormous family
  (compile-time OOM on FullCar). 512 comfortably covers the per-corner reaction
  families this pass exists for; the fallback is status-quo per-block emission.
- Family-grouping progress logs now also available under MTKT_FAMGRP_LOG
  without paying for the full MTKTEARING_CHECK_REDUCTION snapshots; flush after
  the formation summary so it survives an OOM kill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…omposition (#95)

Eliminate matched inline-linear-SCC rows when the PIVOT coefficient is a
known-nonzero constant (previous gate required the whole row constant),
folding symbolic off-pivot coefficients into the alias expressions. A single
topologically-ordered greedy pass composes references to already-eliminated
variables inline (committed aliases stay closed over retained variables) and
keeps a variable in the numeric core when its composed expression exceeds a
node budget (MTKTEARING_ELIM_SIZE_BUDGET, default 1000 unique nodes, counted
with an early-abort memoized DAG walk). The budget bounds the symbolic growth
that made an unbudgeted relaxation stall codegen (#95).

Semantic iszero(::Num) avoided on the now-symbolic coefficients: cheap_iszero
in get_new_mm duplicate-summing and syntactic filtering in the composition
(cf. #99).

Effect: emitted HalfCar blocks [46,46,297] -> [6,6,64]-class, FullCar
[46x4,571] -> small core; warm Rodas5P HalfCar solve ~0.14 s (was ~90 s-class),
FullCar warm ~0.42 s (was ~65 s). The reduction is an exact algebraic
substitution (verified: reconstructing the full HalfCar acceleration block from
either the old all-const or this pivot-gate reduction reproduces the
ground-truth solution; full-block residual ~1e-10). MTT test suite: same pass
set as the unmodified branch. PR#100 reduction-identity self-check: zero
violations on quarter/half car.

Robustness note (separate from the speedup): the large emitted block this pass
shrinks is ill-conditioned / numerically singular at the cars' axis-aligned
configurations. The runtime inline-linear solve mis-handles it there -- the
DyadCompilerPasses LDIV rewrite (optimize=:basic, unpivoted) returns silently
wrong values (a mirrored suspension wheel's spin acceleration 0.117 instead of
~0), while plain pivoted `\` (optimize=:none) throws on the exactly-singular
step. Shrinking to the small core makes the block well-conditioned, so both the
correctness hazard and the throw disappear for these models. The general fix is
a rank-tolerant runtime solve (the secondary ask in #95); that remains open.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AayushSabharwal

Copy link
Copy Markdown
Member

I think this is a reasonable change. We can probably do it unconditionally if I improve how substitution is done in ODEProblem (translation: throw IRStructure at the problem)

@baggepinnen

Copy link
Copy Markdown
Contributor Author

Not quite sure what

throw IRStructure at the problem

means. You want to take over?

@AayushSabharwal

Copy link
Copy Markdown
Member

I'll take this PR, remove the check which avoids making the expression "too large," figure out what is making ODEProblem slow, and fix that. Then this PR doesn't need to jump through hoops.

@KristofferC

Copy link
Copy Markdown
Contributor

I think many of the recent performance PRs I've made (with caching) etc should eliminate the need for the MTKTEARING_ELIM_SIZE_BUDGET. I have a similar version to this without it and have no problems with expression performance. SciML/ModelingToolkit.jl#4621 is important, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants