Skip to content

Inline-linear-SCC: assert b-free invariant + opt-in reduction self-check (#98)#106

Open
baggepinnen wants to merge 5 commits into
mainfrom
fbc/inline-linear-scc-buried-b-repair
Open

Inline-linear-SCC: assert b-free invariant + opt-in reduction self-check (#98)#106
baggepinnen wants to merge 5 commits into
mainfrom
fbc/inline-linear-scc-buried-b-repair

Conversation

@baggepinnen

@baggepinnen baggepinnen commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Split out of #100. After the investigation there, this PR is scoped to hardening + diagnostics for the inline-linear-SCC construction, not an active fix — because the construction turns out to be exact in every case we can observe.

Background (issue #98)

get_linear_scc_linsol extracts SCC-variable columns into A gated on has_edge(graph, BipartiteEdge(eq, var)). The structural graph is mutated during reassembly and could in principle desync from the total_sub-substituted residual, leaving a live SCC variable buried in some b[eqidx]. __reduce_linear_system! inspects only the coefficient row (never b), so such a buried variable would be smuggled onto the RHS of retained equations → an inconsistent runtime A \ b.

What this PR does

  • Asserts the invariant. _assert_b_free_of_scc_vars checks that b is free of all SCC variables and errors (loudly identifying the offender) if not. It is invoked only when the self-check is enabled (MTKTEARING_CHECK_REDUCTION), so the default path pays nothing. The construction is expected to maintain the invariant unconditionally; this guards against a regression rather than repairing a live bug.
  • Adds an opt-in reduction self-check (MTKTEARING_CHECK_REDUCTION, default off, zero cost when off): a positional numeric reduction-identity check that __reduce_linear_system! is the exact substitution-projection of the full system, plus a rank-tolerant full-vs-reduced rank/consistency report. This is the machinery that established the construction is exact on HalfCar and the test models (and redirected Inline linear SCC emits an inconsistent runtime system (rank 14/15, ‖Ax−b‖/‖b‖ ≈ 1) at a consistent state — integration goes Unstable #98 to the cross-block grouping in feat: jointly solve coupled rank-deficible inline-linear SCC families (#98) #100).
  • Tests the assertion (passes on a clean b, catches a buried SCC variable) and the reduction-identity check (catches an injected error).

Earlier revisions of this branch implemented an active re-expansion repair; it was demoted to the assertion above once the self-check showed it never fires.

Commits are kept as an evolution trail — happy to squash-merge.

baggepinnen and others added 2 commits June 11, 2026 06:47
`get_linear_scc_linsol` gates column extraction on `has_edge(graph, ...)`.
That structural graph is mutated during reassembly and can desync from the
`total_sub`-rewritten residual, so a false-negative edge leaves a live SCC
variable buried inside `b[eqidx]`. `__reduce_linear_system!` then eliminates
that row (it inspects only the coefficient row, never `b`), smuggling the
variable onto the RHS of retained equations and producing a runtime `A \ b`
that is rank-deficient with `b` outside range(A) at a fully consistent state
-> `SingularException`/garbage and `Unstable` integration.

Restore the invariant "`b` is free of all SCC variables" with a repair pass
that re-expands any leftover term into `A` (backstop: `return nothing` to fall
back to the safe non-inlined path if a term is non-linearizable). The reduction
in `__reduce_linear_system!`/`get_new_mm` is itself exact and unchanged.

Also add an opt-in self-check (env var `MTKTEARING_CHECK_REDUCTION`, default
off, zero cost when off): a positional numeric reduction-identity check plus a
rank-tolerant full-vs-reduced rank/consistency report, to confirm/localize the
issue on large models. New synthetic tests cover transitive alias chains,
multi-eliminated-variable rows, rank-deficient-but-consistent blocks, symbolic
RHS, and that the identity check catches an injected error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
substitute defaults to fold=Val(false), leaving fully-numeric expressions
like -0.63*sin(-0.54) symbolic, so Float64() conversion in _evalnum threw
on any block with trigonometric coefficients.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@AayushSabharwal AayushSabharwal left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides these points and other performance concerns I can point to, the tests don't actually seem to test the broken behavior?

# coefficient row, never `b`), the buried variable is smuggled onto the RHS of
# retained equations, producing an inconsistent runtime `A \ b` (issue #98). Restore
# the invariant "`b` is free of all SCC variables" by re-expanding any leftover term.
var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars]
var_atoms = map(_ -> Set{SymbolicT}(), vars)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6b6073avar_atoms now built as Set{SymbolicT} (each populated in place with get_variables!).

# retained equations, producing an inconsistent runtime `A \ b` (issue #98). Restore
# the invariant "`b` is free of all SCC variables" by re-expanding any leftover term.
var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars]
scc_atoms = union(Set{Any}(), var_atoms...)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scc_atoms = union(Set{Any}(), var_atoms...)
scc_atoms = reduce(union, var_atoms; init = Set{SymbolicT}())

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6b6073areduce(union, var_atoms; init = Set{SymbolicT}()).

repaired = 0
if !isempty(scc_atoms)
for eqidx in 1:N
bsyms = Symbolics.get_variables(unwrap(b[eqidx]))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should preallocate a Set{SymbolicT} outside the loop, and repeatedly reuse it after empty!.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6b6073absyms is preallocated once and reused via get_variables!(empty!(bsyms), b[eqidx]).

for eqidx in 1:N
bsyms = Symbolics.get_variables(unwrap(b[eqidx]))
any(in(scc_atoms), bsyms) || continue
present = Set(A.row_cols[eqidx])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also be preallocated and reused.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6b6073apresent is preallocated once and reused via union!(empty!(present), A.row_cols[eqidx]).

baggepinnen and others added 2 commits June 14, 2026 08:29
Address review on #106:
- type-concrete Set{SymbolicT}/Set{Int} instead of Set{Any}
- reduce(union!, ...; init) instead of splatting union(Set{Any}(), ...)
- preallocate bsyms/present once and reuse via empty!, populating bsyms
  in place with get_variables!
- replace allocating intersect with any(in(bsyms), ...)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing synthetic tests cover __reduce_linear_system! (the reduction,
which was exact and never the bug). They did not exercise the actual fix:
the re-expansion of an SCC variable left buried in b by a false-negative
has_edge gate.

Extract the repair from get_linear_scc_linsol into _reexpand_buried_scc_vars!
and add a unit test that reproduces the desync precondition directly (a row
missing the column for a variable still live in b) and asserts the repair
restores the b-free invariant, re-expands the coefficient into A, and bails
to the non-inlined path (returns nothing) on a non-linearizable buried term.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@baggepinnen

Copy link
Copy Markdown
Contributor Author

Thanks — both addressed.

Performance (the four inline comments): done in 6b6073a. var_atoms/scc_atoms/bsyms are now type-concrete Set{SymbolicT} and present is Set{Int}; bsyms and present are preallocated once and reused via empty! (+ in-place get_variables!); scc_atoms uses reduce(union, …; init) instead of the Set{Any} splat; and the allocating intersect is replaced by any(in(bsyms), …).

Tests — you're right, they didn't. The two existing tests exercise __reduce_linear_system!, i.e. the reduction, which is exact and was never the bug; nothing exercised the actual fix (re-expanding an SCC variable left buried in b by a false-negative has_edge). Fixed in 341c018: I pulled the repair out of get_linear_scc_linsol into _reexpand_buried_scc_vars! and added a unit test that reproduces the desync precondition directly — a row whose column for a variable is missing from A while that variable is still live in b — and asserts the repair (a) restores the "b free of SCC vars" invariant, (b) re-expands the coefficient into the right column of A, and (c) bails to the non-inlined path (=== nothing) on a non-linearizable buried term.

One honesty note on scope: I have not found a model that triggers this desync through the normal compile path. With MTKTEARING_CHECK_REDUCTION on, the repair never fires on HalfCar or the inline-linear test models — the emitted blocks are exact at generic parameters (this is also what redirected #98 toward the cross-block grouping in #100). So this PR is really "restore + assert an invariant the construction is supposed to maintain, with a self-check that proves it," and the new test injects the precondition rather than discovering it from a model. If you'd rather treat the repair as defensive-only (e.g. demote it to an assertion behind the self-check) I'm happy to go that way — but the unit test now covers the behavior either way.

@AayushSabharwal

Copy link
Copy Markdown
Member

Testing the fix directly doesn't test that the fix is necessary. If we can't reproduce a case where this happens, then it's better this remain unchanged. The add_edge!/rem_edge! we do in reassembly during codegen_system_equations! only serves to update the graphs according to the substitutions total_sub will perform. These substitutions are applied prior to identifying A and b, so the incidence information is correct.

The construction is exact in every observed case (the self-check never sees
the repair fire on HalfCar or the inline-linear test models), so the active
re-expansion was machinery for a state that does not occur. Replace
_reexpand_buried_scc_vars! with _assert_b_free_of_scc_vars, invoked only when
MTKTEARING_CHECK_REDUCTION is set: it documents and enforces the invariant
that b is free of all SCC variables, erroring (rather than silently producing
an inconsistent A \ b) if a desync ever buries one. Default path is now zero
extra cost. The unit test asserts the guard passes on a clean b and catches a
buried SCC variable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@baggepinnen baggepinnen changed the title fix: ensure inline-linear-SCC b is free of SCC variables (#98) Inline-linear-SCC: assert b-free invariant + opt-in reduction self-check (#98) Jun 14, 2026
@baggepinnen

Copy link
Copy Markdown
Contributor Author

Demoted the repair to an assertion in 475aaef. _reexpand_buried_scc_vars! is gone; get_linear_scc_linsol now calls _assert_b_free_of_scc_vars(b, vars) only under MTKTEARING_CHECK_REDUCTION, which errors (naming the offending variable) if the construction ever leaves an SCC variable buried in b. Default path is back to zero added cost, and the PR is now squarely "assert + diagnose the invariant" rather than "repair." The unit test correspondingly checks the guard passes on a clean b and throws on a buried variable. I also retitled/rescoped the PR description to match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants