perf: matrix accessor rewrite#630
Conversation
7259d42 to
19125ac
Compare
Independent benchmark results (using pytest-benchmark suite from #567)Matrix generation (the core win — direct solver path)
Matrix generation is 30–120x faster, with speedup increasing at larger problem sizes. Build phase (model construction)Roughly neutral — within noise for small models, ~7% slower for LP file writingLP writing regressed 15–70% because frozen constraints must call Additionally, Fix: #631 adds a direct CSR-to-polars path in CI failures
|
|
I implemented a first version of a new lp writer on #631. The result is already quite a bit faster than the current master. So no issues there. |
- Fix __repr__ passing CSR positions instead of variable labels - Fix set_blocks failing on frozen Constraint - Extract _active_to_dataarray helper to reduce DRY violations - Simplify reset_dual to direct mutation instead of reconstruction - Add tests for freeze/mutable roundtrip, VariableLabelIndex, to_matrix_with_rhs, from_mutable mixed signs, repr correctness
|
this is great, one big question that came up when looking at the code is: why renaming |
Well two reasons:
Currently they interact nicely enough that we might want to keep the MutableConstraint indefinitely and then I agree that it makes sense to keep the old representation under the old name and maybe name the new one |
|
@FabianHofmann I didnt realize that freezing was opt in. I just ran integration tests again with freezing on Model, and my testing fails due to linopy.testing.assert_con_equal raising. Bug report
from linopy import Model
from linopy.testing import assert_linequal
import pandas as pd
m = Model(freeze_constraints=True)
coords = pd.RangeIndex(3, name="time")
x = m.add_variables(coords=[coords], name="x")
y = m.add_variables(coords=[coords], name="y")
m.add_constraints(y - x == 0, name="con")
desired = y == x
assert_linequal(m.constraints["con"].lhs, desired.lhs) # failsSorting both sides by |
@FBumann (I am back from a small break I had to take to restore my computer) good catch. that is a good question. I would argue we want to test the semantic equality of the linear expressions so sorting before would be fair. could you spot where exactly the resorting takes place? |
Sort both sides by variable labels along _term before comparing, so expressions with different term orderings (e.g. from CSR round-trip with freeze_constraints=True) are correctly recognized as equal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
constraints.py:478 — csr.sum_duplicates() sorts CSR column indices by variable label within each row. When CSRConstraint._to_dataset() (line 730-731) reconstructs the constraint, terms come out in variable-label order instead of insertion order.
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
coroa
left a comment
There was a problem hiding this comment.
Ok, went over it. Quite some stuff still to clean up
| @property | ||
| def rhs(self) -> DataArray: | ||
| """Get RHS DataArray, shape (*coord_dims).""" | ||
| return self._active_to_dataarray(self._rhs, fill=np.nan) | ||
|
|
||
| @rhs.setter | ||
| def rhs(self, value: ExpressionLike | VariableLike | ConstantLike) -> None: | ||
| self._refreeze_after(lambda mc: setattr(mc, "rhs", value)) | ||
|
|
||
| @property | ||
| def lhs(self) -> expressions.LinearExpression: | ||
| """Get LHS as LinearExpression (triggers Dataset reconstruction).""" | ||
| return self.mutable().lhs | ||
|
|
||
| @lhs.setter | ||
| def lhs(self, value: ExpressionLike | VariableLike | ConstantLike) -> None: | ||
| self._refreeze_after(lambda mc: setattr(mc, "lhs", value)) | ||
|
|
||
| def _refreeze_after(self, mutate: Callable[[Constraint], None]) -> None: | ||
| mc = self.mutable() | ||
| mutate(mc) | ||
| refrozen = CSRConstraint.from_mutable(mc, self._cindex) | ||
| self._csr = refrozen._csr | ||
| self._con_labels = refrozen._con_labels | ||
| self._rhs = refrozen._rhs | ||
| self._sign = refrozen._sign | ||
| self._coords = refrozen._coords | ||
| self._dual = None |
There was a problem hiding this comment.
I really don't like this hack
| def lhs(self) -> expressions.LinearExpression: | ||
| """Get the left-hand-side linear expression of the constraint.""" | ||
| data = self.data[["coeffs", "vars"]].rename({self.term_dim: TERM_DIM}) | ||
| return expressions.LinearExpression(data, self.model) |
There was a problem hiding this comment.
I'd move this onto Constraint
| def to_matrix( | ||
| self, label_index: VariableLabelIndex | ||
| ) -> tuple[scipy.sparse.csr_array, np.ndarray]: | ||
| """ | ||
| Construct a dense CSR matrix for this constraint. | ||
|
|
||
| Only active (non-masked) rows are included. Column indices are dense | ||
| positions in the active variable array, as given by ``label_index``. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| label_index : VariableLabelIndex | ||
| Variable label index providing ``label_to_pos`` and ``n_active_vars``. | ||
|
|
||
| Returns | ||
| ------- | ||
| csr : scipy.sparse.csr_array | ||
| Shape (n_active_cons, n_active_vars). | ||
| con_labels : np.ndarray | ||
| Active constraint labels in row order. | ||
| """ | ||
| con_labels, _, cols, data, indptr = _matrix_export_data(self, label_index) | ||
| csr = scipy.sparse.csr_array( | ||
| (data, cols, indptr), shape=(len(con_labels), label_index.n_active_vars) | ||
| ) | ||
| csr.sum_duplicates() | ||
| return csr, con_labels |
There was a problem hiding this comment.
I'd make this abstract here and move the implementation onto Constraint
…helper Replace the convoluted cumsum/diff/range loop with a clean while-loop helper that uses searchsorted directly on indptr. Batch slices pass coords=[] since batches cover contiguous active rows, not a contiguous slice of the coordinate grid. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use to_polars() instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…straint _matrix_export_data becomes a method on Constraint instead of a module-level function. ncons, lhs, and to_matrix are now abstract in ConstraintBase, with xarray-based implementations on Constraint and CSR-based implementations on CSRConstraint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…array round-trip - rhs setter writes _rhs directly, rejects expressions - lhs setter raises AttributeError (call .mutable() to modify terms) - lhs getter skips mutable() wrapper, builds LinearExpression from _to_dataset - to_polars uses pl.lit for scalar sign
|
@coroa thanks for the follow up. I just ran the tests in pypsa to confirm the implementation works, all lopf tests pass. I would say let's merge this do not longer create merge conflicts (I would like to get the ball rolling for solver refac). we can always follow up further |
|
Amazing that you are progressing 👍 |
Changes proposed in this Pull Request
Ok, i went ahead and tried to optimize the direct solver path which looked very promising from #596 by @CharlieFModo . Most solvers accept scipy sparse matrices in CSR format, those CSR matrices are easy to stack and correspond quite closely to how constraints are stored within xarray currently.
New constraint class hierarchy
A new abstract base class
ConstraintBaseprovides allshared read-only properties and methods. A new frozen
CSRConstraintclassstores constraint data directly as a scipy CSR sparse matrix plus flat numpy
arrays for RHS and duals — avoiding the xarray Dataset overhead
constraints.
One can losslessly convert between Constraint and CSRConstraint, by calling mutable() or freeze().
add_constraints(..., freeze=True)converts theConstraintto aCSRConstraintbefore assigning it to the model. I measured memory savings of up to 90% for constraints with many terms.EDIT:
The feature is made opt-in and the globally freezing constraint can be configured with
additional gains from not setting the name (even though I made it more performant)
.to_matrix()on both of these now always returns acsr matrixofshape(n_active_constraints, n_active_variables)andcon_labelsa numpy array of sizen_active_constraintswith the label of the constraint in it.So that the number of columns of these non-missing matrices can match with n_active_variables there is a new
VariableLabelIndexonm.varswhich holds two cached properties:vlabels: active variable labels in encounter order, shape (n_active_vars,)label_to_pos: derived from vlabels; size _xCounter, maps label -> position (-1 if masked)Chunking is not immediately useful anymore, because each constraint will be turned quite quickly into a numpy array before being reduced to the csr matrix. We probably could implement an optimized to_matrix path which uses the chunking.
Open ToDos
Think about ChunkingQuick benchmarking with pypsa.examples.carbon_management
Slightly cheated, because i commented out the names setting in
to_highspy, which is unnecessary for solving and retrieving the solution, because we can just work with the indices. This takes ~2s. But i removed it from both branches (because it affected them equally).On main branch
On
perf/matrix-acccesor-rewrite(this PR)EDIT (by @FabianHofmann): Changed back naming to keep
Constraintclass untouched and addCSRConstriaintChecklist
doc.doc/release_notes.rstof the upcoming release is included.