Fix CuPy Bellman-Ford iteration limit in cost_distance by brendancol · Pull Request #1192 · xarray-contrib/xarray-spatial

brendancol · 2026-04-13T21:08:09Z

Closes #1191.

Summary

The CuPy parallel Bellman-Ford loop in cost_distance used max_iterations = height + width. On maze-like friction surfaces where the only passable route zigzags across the grid, shortest paths can have up to height * width - 1 edges. The old limit caused early termination -- reachable pixels were incorrectly reported as NaN.
Changed to height * width, the standard Bellman-Ford V-1 bound. The early-exit changed flag still short-circuits on open grids, so this only costs extra iterations when they're actually needed.

Test plan

New test_snake_maze_long_path -- 5x5 snake maze with a 16-edge shortest path (exceeds old h+w=10 limit), verified across all four backends
Full test_cost_distance.py suite passes (48 tests)

Three performance fixes from the Phase 2 sweep targeting WILL OOM verdicts under 30TB workloads: geotiff: read_geotiff_dask() was reading the entire file into RAM just to extract metadata before building the lazy dask graph. Now uses _read_geo_info() which parses only the IFD via mmap -- O(1) memory regardless of file size. Peak memory during dask setup dropped from 4.41 MB to 0.21 MB at 512x512 (21x reduction). sieve: region_val_buf was allocated at rows*cols (16 GB for a 46K x 46K raster) when the actual region count is typically orders of magnitude smaller. Now counts regions first, allocates at actual size. Also reuses the dead rank array as root_to_id, saving another 4 bytes/pixel. Memory guard fixed from a misleading 5x multiplier to an accurate 28 bytes/pixel estimate. reproject: _reproject_dask_cupy pre-allocated the full output on GPU via cp.full(out_shape), which OOMs for large outputs. Now checks available GPU memory and falls back to the existing map_blocks path (with is_cupy=True) when the output exceeds VRAM. Fast path preserved for outputs that fit.

Four more performance fixes from the Phase 2 sweep: polygonize: _polygonize_dask called dask.compute(*delayed_results) which held all chunk polygon data in memory at once. Now processes chunks incrementally -- interior polygons go straight to the output list and only boundary polygons accumulate for the merge step. polygon_clip: clip_polygon called mask.compute() to materialize the entire rasterized mask before applying it. For a polygon covering most of a 30TB raster, the uint8 mask alone would be multi-TB. Now keeps the mask lazy for dask paths and applies it via xarray.where (dask+numpy) or da.map_blocks (dask+cupy). kde: Both dask paths captured the full point arrays (xs, ys, ws) in every tile task's closure, serializing O(n_tiles * n_points) data. Now pre-filters points per tile using a bounding-box + cutoff-radius check, so each task receives only nearby points. pathfinding: When friction=None, the A* kernel allocated a dummy np.ones((h, w)) array that was never read (use_friction=False skips all friction lookups). For a 100K x 100K grid that's 80 GB of wasted allocation. Now passes a 1x1 dummy instead.

The parallel Bellman-Ford loop used max_iterations = height + width, which is too low for maze-like friction surfaces where shortest paths can snake across the entire grid (up to height * width - 1 edges). Changed to height * width, the standard Bellman-Ford V-1 bound.

Tests a maze where the shortest path has 16 edges on a 5x5 grid, which requires more than height + width Bellman-Ford iterations. Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).

brendancol added 4 commits April 10, 2026 21:51

Add snake-maze regression test for cost_distance (#1191)

0901936

Tests a maze where the shortest path has 16 edges on a 5x5 grid, which requires more than height + width Bellman-Ford iterations. Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).

github-actions bot added the performance PR touches performance-sensitive code label Apr 13, 2026

brendancol merged commit f3e8603 into master Apr 14, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192

Fix CuPy Bellman-Ford iteration limit in cost_distance#1192
brendancol merged 4 commits intomasterfrom
issue-1191

brendancol commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Apr 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant