Vectorize disaggregate() zone-membership loop (#3408) by brendancol · Pull Request #3420 · xarray-contrib/xarray-spatial

brendancol · 2026-06-20T09:56:16Z

What changed

disaggregate() distributed zone values by looping over zone IDs and
rebuilding a full-shape boolean mask (zones_arr == zid) for each zone.
Three call sites did this: the numpy zone-sum computation, the numpy
distribute step, and the dask per-chunk reducer. Peak work scaled as
O(n_zones x n_pixels).

This replaces the per-zone loop with a single vectorized pass:

np.searchsorted maps each valid pixel to its zone slot once
(_pixel_zone_index).
np.add.at accumulates per-zone weight sums in one pass
(_zone_weight_sums).
distribution gathers each pixel's zone value and weight sum, then divides.

A shared _preprocess helper does the float64 coercion, negative-weight
clamp, binary binarization, and invalid-mask construction so the numpy core
and the dask chunks stay identical.

Performance

numpy weighted path on a 1500x1500 raster:

n_zones	before	after
50	0.35s	0.18s
500	1.9s	0.19s
2000	6.8s	0.19s

Runtime is now flat in zone count instead of linear.

Backends

numpy, cupy (CPU fallback), dask+numpy (per-chunk), dask+cupy. All four
validated. OOM verdict SAFE, bottleneck compute-bound.

Test plan

Existing 61 dasymetric tests pass (cross-backend parity + memory guards)
New test: many-zone (200) conservation, with missing-zone and
zero-weight-sum zones staying NaN
New test: many-zone (150) dask matches numpy per-chunk
cupy and dask+cupy paths run end-to-end on a CUDA host and match numpy

Found by /sweep-performance.

Replace the per-zone boolean-mask loop in the numpy core, the dask per-chunk zone-sum reducer, and the distribute step with a single searchsorted lookup plus np.add.at accumulation. Peak work drops from O(n_zones x n_pixels) to O(n_pixels + n_zones); runtime is now flat in zone count (was 0.35s/1.9s/6.8s for 50/500/2000 zones on a 1500x1500 raster, now ~0.18s across the board). Semantics preserved across numpy, cupy (CPU fallback), dask+numpy, and dask+cupy: NaN zones/weights, nodata_zone, negative-weight clamping, binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN.

brendancol

PR Review: Vectorize disaggregate() zone-membership loop (#3408)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

xrspatial/dasymetric.py: in the pure-numpy path _disaggregate_numpy builds zone_index once for the sums, then _disaggregate_numpy_with_sums recomputes the same zone_index from scratch. That is two searchsorted passes where one would do. The dask path genuinely needs two passes (eager sums, then lazy distribute), so the split signature is right there, but the numpy path could thread the index through. Minor; the win over the old per-zone loop is already large.

What looks good

The vectorization holds across the edge cases I checked: NaN zones/weights, nodata_zone, negative-weight clamp, binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN. I confirmed numpy/dask parity on a 150-zone raster and ran cupy + dask+cupy end-to-end on a CUDA host.
searchsorted needs sorted ids and the code passes np.array(sorted(...)), so the lookup is valid. Out-of-range positions get clipped and then rejected by the equality check, so a zone id absent from values cannot alias onto a real slot.
_preprocess pulls together the clamp/binarize/invalid-mask logic that used to be duplicated across the numpy core, _chunk_sums, and _distribute_chunk, so the four backends share one definition.
Empty values dict returns all-NaN; a single-pixel zone gets the full value. Both checked.
The existing 61 tests stay green, and the 2 new tests cover the many-zone case the fix targets.

Checklist

Algorithm matches the documented redistribution semantics
All implemented backends produce consistent results
NaN handling is correct
Edge cases are covered by tests
Dask chunk boundaries handled correctly (no neighborhood access; per-chunk distribute is shape-preserving)
No premature materialization or unnecessary copies introduced (the eager zone-sum compute is pre-existing and algorithmically required)
Benchmark exists (benchmarks/benchmarks/dasymetric.py)
README feature matrix unchanged (no new function, no backend change)
Docstrings present and accurate

…e-dasymetric-2026-06-20

brendancol added 2 commits June 20, 2026 02:55

Record dasymetric performance sweep state (#3408)

ecf411b

brendancol commented Jun 20, 2026

View reviewed changes

brendancol added 2 commits June 20, 2026 02:58

Address review nit: reuse zone_index in numpy distribute path (#3408)

288264c

Merge remote-tracking branch 'origin/main' into deep-sweep-performanc…

7133b7f

…e-dasymetric-2026-06-20

github-actions Bot added the performance PR touches performance-sensitive code label Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize disaggregate() zone-membership loop (#3408)#3420

Vectorize disaggregate() zone-membership loop (#3408)#3420
brendancol wants to merge 4 commits into
mainfrom
deep-sweep-performance-dasymetric-2026-06-20

brendancol commented Jun 20, 2026

Uh oh!

brendancol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 20, 2026

What changed

Performance

Backends

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: Vectorize disaggregate() zone-membership loop (#3408)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant