Vectorize disaggregate() zone-membership loop (#3408)#3420
Open
brendancol wants to merge 4 commits into
Open
Conversation
Replace the per-zone boolean-mask loop in the numpy core, the dask per-chunk zone-sum reducer, and the distribute step with a single searchsorted lookup plus np.add.at accumulation. Peak work drops from O(n_zones x n_pixels) to O(n_pixels + n_zones); runtime is now flat in zone count (was 0.35s/1.9s/6.8s for 50/500/2000 zones on a 1500x1500 raster, now ~0.18s across the board). Semantics preserved across numpy, cupy (CPU fallback), dask+numpy, and dask+cupy: NaN zones/weights, nodata_zone, negative-weight clamping, binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN.
brendancol
commented
Jun 20, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Vectorize disaggregate() zone-membership loop (#3408)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
xrspatial/dasymetric.py: in the pure-numpy path_disaggregate_numpybuildszone_indexonce for the sums, then_disaggregate_numpy_with_sumsrecomputes the samezone_indexfrom scratch. That is two searchsorted passes where one would do. The dask path genuinely needs two passes (eager sums, then lazy distribute), so the split signature is right there, but the numpy path could thread the index through. Minor; the win over the old per-zone loop is already large.
What looks good
- The vectorization holds across the edge cases I checked: NaN zones/weights, nodata_zone, negative-weight clamp, binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN. I confirmed numpy/dask parity on a 150-zone raster and ran cupy + dask+cupy end-to-end on a CUDA host.
searchsortedneeds sorted ids and the code passesnp.array(sorted(...)), so the lookup is valid. Out-of-range positions get clipped and then rejected by the equality check, so a zone id absent from values cannot alias onto a real slot._preprocesspulls together the clamp/binarize/invalid-mask logic that used to be duplicated across the numpy core,_chunk_sums, and_distribute_chunk, so the four backends share one definition.- Empty values dict returns all-NaN; a single-pixel zone gets the full value. Both checked.
- The existing 61 tests stay green, and the 2 new tests cover the many-zone case the fix targets.
Checklist
- Algorithm matches the documented redistribution semantics
- All implemented backends produce consistent results
- NaN handling is correct
- Edge cases are covered by tests
- Dask chunk boundaries handled correctly (no neighborhood access; per-chunk distribute is shape-preserving)
- No premature materialization or unnecessary copies introduced (the eager zone-sum compute is pre-existing and algorithmically required)
- Benchmark exists (benchmarks/benchmarks/dasymetric.py)
- README feature matrix unchanged (no new function, no backend change)
- Docstrings present and accurate
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3408
What changed
disaggregate()distributed zone values by looping over zone IDs andrebuilding a full-shape boolean mask
(zones_arr == zid)for each zone.Three call sites did this: the numpy zone-sum computation, the numpy
distribute step, and the dask per-chunk reducer. Peak work scaled as
O(n_zones x n_pixels).
This replaces the per-zone loop with a single vectorized pass:
np.searchsortedmaps each valid pixel to its zone slot once(
_pixel_zone_index).np.add.ataccumulates per-zone weight sums in one pass(
_zone_weight_sums).A shared
_preprocesshelper does the float64 coercion, negative-weightclamp, binary binarization, and invalid-mask construction so the numpy core
and the dask chunks stay identical.
Performance
numpy weighted path on a 1500x1500 raster:
Runtime is now flat in zone count instead of linear.
Backends
numpy, cupy (CPU fallback), dask+numpy (per-chunk), dask+cupy. All four
validated. OOM verdict SAFE, bottleneck compute-bound.
Test plan
zero-weight-sum zones staying NaN
Found by /sweep-performance.