Skip to content

Vectorize disaggregate() zone-membership loop (#3408)#3420

Open
brendancol wants to merge 4 commits into
mainfrom
deep-sweep-performance-dasymetric-2026-06-20
Open

Vectorize disaggregate() zone-membership loop (#3408)#3420
brendancol wants to merge 4 commits into
mainfrom
deep-sweep-performance-dasymetric-2026-06-20

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3408

What changed

disaggregate() distributed zone values by looping over zone IDs and
rebuilding a full-shape boolean mask (zones_arr == zid) for each zone.
Three call sites did this: the numpy zone-sum computation, the numpy
distribute step, and the dask per-chunk reducer. Peak work scaled as
O(n_zones x n_pixels).

This replaces the per-zone loop with a single vectorized pass:

  • np.searchsorted maps each valid pixel to its zone slot once
    (_pixel_zone_index).
  • np.add.at accumulates per-zone weight sums in one pass
    (_zone_weight_sums).
  • distribution gathers each pixel's zone value and weight sum, then divides.

A shared _preprocess helper does the float64 coercion, negative-weight
clamp, binary binarization, and invalid-mask construction so the numpy core
and the dask chunks stay identical.

Performance

numpy weighted path on a 1500x1500 raster:

n_zones before after
50 0.35s 0.18s
500 1.9s 0.19s
2000 6.8s 0.19s

Runtime is now flat in zone count instead of linear.

Backends

numpy, cupy (CPU fallback), dask+numpy (per-chunk), dask+cupy. All four
validated. OOM verdict SAFE, bottleneck compute-bound.

Test plan

  • Existing 61 dasymetric tests pass (cross-backend parity + memory guards)
  • New test: many-zone (200) conservation, with missing-zone and
    zero-weight-sum zones staying NaN
  • New test: many-zone (150) dask matches numpy per-chunk
  • cupy and dask+cupy paths run end-to-end on a CUDA host and match numpy

Found by /sweep-performance.

Replace the per-zone boolean-mask loop in the numpy core, the dask
per-chunk zone-sum reducer, and the distribute step with a single
searchsorted lookup plus np.add.at accumulation. Peak work drops from
O(n_zones x n_pixels) to O(n_pixels + n_zones); runtime is now flat in
zone count (was 0.35s/1.9s/6.8s for 50/500/2000 zones on a 1500x1500
raster, now ~0.18s across the board).

Semantics preserved across numpy, cupy (CPU fallback), dask+numpy, and
dask+cupy: NaN zones/weights, nodata_zone, negative-weight clamping,
binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Vectorize disaggregate() zone-membership loop (#3408)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/dasymetric.py: in the pure-numpy path _disaggregate_numpy builds zone_index once for the sums, then _disaggregate_numpy_with_sums recomputes the same zone_index from scratch. That is two searchsorted passes where one would do. The dask path genuinely needs two passes (eager sums, then lazy distribute), so the split signature is right there, but the numpy path could thread the index through. Minor; the win over the old per-zone loop is already large.

What looks good

  • The vectorization holds across the edge cases I checked: NaN zones/weights, nodata_zone, negative-weight clamp, binary binarization, missing-zone -> NaN, and zero-weight-sum -> NaN. I confirmed numpy/dask parity on a 150-zone raster and ran cupy + dask+cupy end-to-end on a CUDA host.
  • searchsorted needs sorted ids and the code passes np.array(sorted(...)), so the lookup is valid. Out-of-range positions get clipped and then rejected by the equality check, so a zone id absent from values cannot alias onto a real slot.
  • _preprocess pulls together the clamp/binarize/invalid-mask logic that used to be duplicated across the numpy core, _chunk_sums, and _distribute_chunk, so the four backends share one definition.
  • Empty values dict returns all-NaN; a single-pixel zone gets the full value. Both checked.
  • The existing 61 tests stay green, and the 2 new tests cover the many-zone case the fix targets.

Checklist

  • Algorithm matches the documented redistribution semantics
  • All implemented backends produce consistent results
  • NaN handling is correct
  • Edge cases are covered by tests
  • Dask chunk boundaries handled correctly (no neighborhood access; per-chunk distribute is shape-preserving)
  • No premature materialization or unnecessary copies introduced (the eager zone-sum compute is pre-existing and algorithmically required)
  • Benchmark exists (benchmarks/benchmarks/dasymetric.py)
  • README feature matrix unchanged (no new function, no backend change)
  • Docstrings present and accurate

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vectorize disaggregate() zone-membership loop (O(n_zones x n_pixels) -> O(n_pixels + n_zones))

1 participant