Inline-linear-SCC: assert b-free invariant + opt-in reduction self-check (#98)#106
Inline-linear-SCC: assert b-free invariant + opt-in reduction self-check (#98)#106baggepinnen wants to merge 5 commits into
b-free invariant + opt-in reduction self-check (#98)#106Conversation
`get_linear_scc_linsol` gates column extraction on `has_edge(graph, ...)`. That structural graph is mutated during reassembly and can desync from the `total_sub`-rewritten residual, so a false-negative edge leaves a live SCC variable buried inside `b[eqidx]`. `__reduce_linear_system!` then eliminates that row (it inspects only the coefficient row, never `b`), smuggling the variable onto the RHS of retained equations and producing a runtime `A \ b` that is rank-deficient with `b` outside range(A) at a fully consistent state -> `SingularException`/garbage and `Unstable` integration. Restore the invariant "`b` is free of all SCC variables" with a repair pass that re-expands any leftover term into `A` (backstop: `return nothing` to fall back to the safe non-inlined path if a term is non-linearizable). The reduction in `__reduce_linear_system!`/`get_new_mm` is itself exact and unchanged. Also add an opt-in self-check (env var `MTKTEARING_CHECK_REDUCTION`, default off, zero cost when off): a positional numeric reduction-identity check plus a rank-tolerant full-vs-reduced rank/consistency report, to confirm/localize the issue on large models. New synthetic tests cover transitive alias chains, multi-eliminated-variable rows, rank-deficient-but-consistent blocks, symbolic RHS, and that the identity check catches an injected error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
substitute defaults to fold=Val(false), leaving fully-numeric expressions like -0.63*sin(-0.54) symbolic, so Float64() conversion in _evalnum threw on any block with trigonometric coefficients. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
AayushSabharwal
left a comment
There was a problem hiding this comment.
Besides these points and other performance concerns I can point to, the tests don't actually seem to test the broken behavior?
| # coefficient row, never `b`), the buried variable is smuggled onto the RHS of | ||
| # retained equations, producing an inconsistent runtime `A \ b` (issue #98). Restore | ||
| # the invariant "`b` is free of all SCC variables" by re-expanding any leftover term. | ||
| var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars] |
There was a problem hiding this comment.
| var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars] | |
| var_atoms = map(_ -> Set{SymbolicT}(), vars) |
There was a problem hiding this comment.
Done in 6b6073a — var_atoms now built as Set{SymbolicT} (each populated in place with get_variables!).
| # retained equations, producing an inconsistent runtime `A \ b` (issue #98). Restore | ||
| # the invariant "`b` is free of all SCC variables" by re-expanding any leftover term. | ||
| var_atoms = [Set{Any}(Symbolics.get_variables(unwrap(var))) for var in vars] | ||
| scc_atoms = union(Set{Any}(), var_atoms...) |
There was a problem hiding this comment.
| scc_atoms = union(Set{Any}(), var_atoms...) | |
| scc_atoms = reduce(union, var_atoms; init = Set{SymbolicT}()) |
There was a problem hiding this comment.
Done in 6b6073a — reduce(union, var_atoms; init = Set{SymbolicT}()).
| repaired = 0 | ||
| if !isempty(scc_atoms) | ||
| for eqidx in 1:N | ||
| bsyms = Symbolics.get_variables(unwrap(b[eqidx])) |
There was a problem hiding this comment.
This should preallocate a Set{SymbolicT} outside the loop, and repeatedly reuse it after empty!.
There was a problem hiding this comment.
Done in 6b6073a — bsyms is preallocated once and reused via get_variables!(empty!(bsyms), b[eqidx]).
| for eqidx in 1:N | ||
| bsyms = Symbolics.get_variables(unwrap(b[eqidx])) | ||
| any(in(scc_atoms), bsyms) || continue | ||
| present = Set(A.row_cols[eqidx]) |
There was a problem hiding this comment.
Can also be preallocated and reused.
There was a problem hiding this comment.
Done in 6b6073a — present is preallocated once and reused via union!(empty!(present), A.row_cols[eqidx]).
Address review on #106: - type-concrete Set{SymbolicT}/Set{Int} instead of Set{Any} - reduce(union!, ...; init) instead of splatting union(Set{Any}(), ...) - preallocate bsyms/present once and reuse via empty!, populating bsyms in place with get_variables! - replace allocating intersect with any(in(bsyms), ...) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The existing synthetic tests cover __reduce_linear_system! (the reduction, which was exact and never the bug). They did not exercise the actual fix: the re-expansion of an SCC variable left buried in b by a false-negative has_edge gate. Extract the repair from get_linear_scc_linsol into _reexpand_buried_scc_vars! and add a unit test that reproduces the desync precondition directly (a row missing the column for a variable still live in b) and asserts the repair restores the b-free invariant, re-expands the coefficient into A, and bails to the non-inlined path (returns nothing) on a non-linearizable buried term. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks — both addressed. Performance (the four inline comments): done in 6b6073a. Tests — you're right, they didn't. The two existing tests exercise One honesty note on scope: I have not found a model that triggers this desync through the normal compile path. With |
|
Testing the fix directly doesn't test that the fix is necessary. If we can't reproduce a case where this happens, then it's better this remain unchanged. The |
The construction is exact in every observed case (the self-check never sees the repair fire on HalfCar or the inline-linear test models), so the active re-expansion was machinery for a state that does not occur. Replace _reexpand_buried_scc_vars! with _assert_b_free_of_scc_vars, invoked only when MTKTEARING_CHECK_REDUCTION is set: it documents and enforces the invariant that b is free of all SCC variables, erroring (rather than silently producing an inconsistent A \ b) if a desync ever buries one. Default path is now zero extra cost. The unit test asserts the guard passes on a clean b and catches a buried SCC variable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b is free of SCC variables (#98)b-free invariant + opt-in reduction self-check (#98)
|
Demoted the repair to an assertion in 475aaef. |
Split out of #100. After the investigation there, this PR is scoped to hardening + diagnostics for the inline-linear-SCC construction, not an active fix — because the construction turns out to be exact in every case we can observe.
Background (issue #98)
get_linear_scc_linsolextracts SCC-variable columns intoAgated onhas_edge(graph, BipartiteEdge(eq, var)). The structuralgraphis mutated during reassembly and could in principle desync from thetotal_sub-substituted residual, leaving a live SCC variable buried in someb[eqidx].__reduce_linear_system!inspects only the coefficient row (neverb), so such a buried variable would be smuggled onto the RHS of retained equations → an inconsistent runtimeA \ b.What this PR does
_assert_b_free_of_scc_varschecks thatbis free of all SCC variables and errors (loudly identifying the offender) if not. It is invoked only when the self-check is enabled (MTKTEARING_CHECK_REDUCTION), so the default path pays nothing. The construction is expected to maintain the invariant unconditionally; this guards against a regression rather than repairing a live bug.MTKTEARING_CHECK_REDUCTION, default off, zero cost when off): a positional numeric reduction-identity check that__reduce_linear_system!is the exact substitution-projection of the full system, plus a rank-tolerant full-vs-reduced rank/consistency report. This is the machinery that established the construction is exact on HalfCar and the test models (and redirected Inline linear SCC emits an inconsistent runtime system (rank 14/15, ‖Ax−b‖/‖b‖ ≈ 1) at a consistent state — integration goes Unstable #98 to the cross-block grouping in feat: jointly solve coupled rank-deficible inline-linear SCC families (#98) #100).b, catches a buried SCC variable) and the reduction-identity check (catches an injected error).Earlier revisions of this branch implemented an active re-expansion repair; it was demoted to the assertion above once the self-check showed it never fires.
Commits are kept as an evolution trail — happy to squash-merge.