refactor: unify as_dataarray; split broadcasting from coords validation#726
Open
FBumann wants to merge 5 commits into
Open
refactor: unify as_dataarray; split broadcasting from coords validation#726FBumann wants to merge 5 commits into
FBumann wants to merge 5 commits into
Conversation
Closes #723. Folds the body of `as_dataarray_in_coords` into `as_dataarray` and extracts the contract checks into `assert_compatible_with_coords`, so linopy now has one broadcasting primitive and one validation companion. `as_dataarray(arr, coords)` aligns the result against `coords` for every input type: labels positional inputs (numpy / unnamed pandas / scalar) by position, reindexes same-values-different-order, expands missing dims, and transposes to coords order. Extra dims and disagreeing value sets on shared dims pass through unchanged, so xarray broadcasting in expression arithmetic keeps working. `assert_compatible_with_coords(arr, coords)` enforces the strict contract (`arr.dims ⊆ coords.dims`, plus exact coord-value equality on shared dims). `add_variables` and `add_constraints` now call it after `as_dataarray` for `lower` / `upper` / `mask`, replacing the deleted `as_dataarray_in_coords` helper. `_coords_to_dict` filters MultiIndex level coords out of `xarray.Coordinates` inputs so the new strict-by-default path treats `station` (and not its derived `letter` / `num` levels) as the dim. Test suite: 3698 passed (no regressions). Two existing tests were updated to reflect the new "coords is source of truth" semantics: `test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned` (extra coord entries now broadcast in) and `test_dataarray_extra_dims` (now triggers the subset check rather than the value-mismatch check). Microbenchmark in dev-scripts/benchmark_as_dataarray.py shows flat timings vs the base branch on both add_variables-heavy and arithmetic- heavy workloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a silent-failure gap in the strict coords-as-truth path: when the
caller passed ``coords=[[1, 2, 3]], dims=["x"]`` to ``add_variables``,
``_coords_to_dict`` returned an empty mapping (unnamed sequences carry
no dim name), so the strict checks short-circuited and bounds with
extra dims or mismatched values flowed through unchecked, producing
variables with frankenstein outer-joined coord values.
``_coords_to_dict`` now accepts an optional ``dims`` argument that
names unnamed sequence entries by position. ``as_dataarray`` and
``assert_compatible_with_coords`` plumb it through; ``add_variables``
forwards ``kwargs.get("dims")`` to the assertions for ``lower`` and
``upper``. ``coords=[[1, 2, 3]], dims=["x"]`` now enforces the same
contract as ``coords={"x": [1, 2, 3]}`` or
``coords=[pd.Index([1, 2, 3], name="x")]``.
Docstring of ``add_variables.coords`` documents the contract
(subset-of-dims, dim order, value match with auto-reindex, missing-dim
broadcast) and includes four doctests pinning it: the extra-dim raise,
the value-mismatch raise, the same-values-different-order auto-reindex,
and the unnamed-coords-plus-dims opt-in.
Test suite: 3698 passed (parity with the previous commit on this
branch). ``pytest --doctest-modules linopy/model.py -k add_variables``
also green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce align_to_coords to wrap as_dataarray and assert_compatible_with_coords with user-facing labels (lower bound, upper bound, mask). Errors now name the argument and distinguish extra dimensions, coordinate mismatches, and conversion failures. Extend mask validation to use coords+dims= when provided. Co-authored-by: Cursor <cursoragent@cursor.com>
for more information, see https://pre-commit.ci
…coords Three cleanups on top of align_to_coords: - Drop the trailing ``.broadcast_like(data.labels)`` in ``add_variables`` and ``add_constraints`` mask paths. ``as_dataarray`` already expands missing dims to ``coords`` shape, so the broadcast was a no-op. - Stop overriding the caller's ``dims=`` in the ``add_variables`` mask path when ``coords is None``. The previous code stripped ``dims`` and forced ``dims=data.dims``; with ``data.coords`` being an xarray ``Coordinates`` with already-named dims, the user's ``dims`` is harmless to forward and the override was just hiding intent. Mask now goes through one ``align_to_coords`` call regardless of whether ``coords`` is supplied. - Split the exception handler in ``align_to_coords``: ``TypeError`` from unsupported input types is re-raised as ``TypeError`` (still labeled), while ``ValueError`` / ``CoordinateValidationError`` stay ``ValueError``. Preserves the original type signature for callers that want to ``except TypeError``. New test ``test_align_to_coords_preserves_type_errors`` pins the TypeError pass-through. Suite: 3703 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #723. Stacked on #725 (which is stacked on #722).
What changes
as_dataarray_in_coordsis folded intoas_dataarrayand split along the seam between "broadcastarragainstcoords" and "enforce thecoordscontract":as_dataarray(arr, coords)— the broadcasting primitive. For every input type, the result is aligned withcoords: positional inputs (numpy / unnamed pandas / scalar) are labeled by position, shared-dim coords are reindexed when values are equal in a different order, dims present incoordsbut not inarrare expanded, and the result is transposed tocoordsorder. Extra dims and disagreeing value sets on shared dims pass through, so xarray broadcasting in expression arithmetic keeps working.assert_compatible_with_coords(arr, coords)— the validation companion. Raises ifarrintroduces dims not incoords(was:as_dataarray_in_coords's extra-dim raise) or if a shared dim has disagreeing coord values (was: its "do not match" raise).add_variablesandadd_constraintsnow callas_dataarrayfollowed byassert_compatible_with_coordsforlower/upper/mask. The previousas_dataarray_in_coordshelper is deleted._coords_to_dictnow filters MultiIndex level coords out ofxarray.Coordinatesinputs, so the new strict-by-default path treats e.g.station(and not its derivedletter/numlevels) as the dim. This was already a latent issue once strict semantics governedas_dataarray's coords arg.Audit summary (the 12 call sites listed in #723)
model.py:705/706lower/upperinadd_variablesas_dataarray+assert_compatible_with_coordsmodel.py:715maskinadd_variablesmodel.py:979maskinadd_constraintsexpressions.py:584/613/1105/1668/2154arithmeticvariables.py:330to_linexpr(coefficient)expressions.py:341/2002/2030/2289,model.py:912/919/972,variables.py:1369Breaking changes for end users
This PR introduces no new user-visible breaking changes. It's a structural refactor that keeps the strict contract at the
add_variables/add_constraintscall sites viaassert_compatible_with_coords. The breaking changes a user sees when this PR lands onmasterare the union of what #722 and #725 already deliver:From #725 (mask handling)
Falseand emitsFutureWarningValueErrormask.reindex({...}, fill_value=False)AssertionErrorValueErrorpd.Series/pd.DataFramemissing a dim was sometimes silently droppedFrom #722 (bounds handling)
Series/DataFramebound missing a dim is silently droppedDataArraybound dim order depends on bound type (#706)coordsorderDataArraybound with extra dim was silently acceptedValueErrorDataArraybound with mismatched coord values raised varied downstream errorsValueError: ...coordinates do not matchPiecewise._broadcast_pointshad hash-randomized dim order across processesRefactor-only changes (no user-visible effect on the strict sites)
as_dataarrayitself no longer raises on extra dims or shared-dim value-set mismatch — every public site that used to depend on those raises now gets them from the explicitassert_compatible_with_coordscall. Surface behavior atadd_variables/add_constraintsis preserved.as_dataarrayitself:test_as_dataarray_with_ndarray_coords_dict_set_dims_not_aligned: extra coord entries now expand into the result (the test was pinning an undocumented "dims wins, coord entry dropped" corner of the old lax path).test_dataarray_extra_dims: rewritten so the subset check fires (rather than the value-mismatch check) — the assertion still raises on extra dims, just via the newassert_compatible_with_coordshelper.Test plan
test/test_common.py— added five tests pinning the new split (extra-dim preservation, disjoint shared-dim values,assert_compatible_with_coordsextra-dim raise, value-mismatch raise, subset-dims allowed).pre-commit run --all-filesclean (ruff/format/blackdoc/codespell).dev-scripts/benchmark_as_dataarray.py, untracked): flat timings vs the base branch on both add_variables-heavy (≈22 ms / 50–57 ms mean) and arithmetic-heavy (≈80–82 ms) workloads.🤖 Generated with Claude Code