Fix CuPy Bellman-Ford iteration limit in cost_distance#1192
Merged
brendancol merged 4 commits intomasterfrom Apr 14, 2026
Merged
Fix CuPy Bellman-Ford iteration limit in cost_distance#1192brendancol merged 4 commits intomasterfrom
brendancol merged 4 commits intomasterfrom
Conversation
Three performance fixes from the Phase 2 sweep targeting WILL OOM verdicts under 30TB workloads: geotiff: read_geotiff_dask() was reading the entire file into RAM just to extract metadata before building the lazy dask graph. Now uses _read_geo_info() which parses only the IFD via mmap -- O(1) memory regardless of file size. Peak memory during dask setup dropped from 4.41 MB to 0.21 MB at 512x512 (21x reduction). sieve: region_val_buf was allocated at rows*cols (16 GB for a 46K x 46K raster) when the actual region count is typically orders of magnitude smaller. Now counts regions first, allocates at actual size. Also reuses the dead rank array as root_to_id, saving another 4 bytes/pixel. Memory guard fixed from a misleading 5x multiplier to an accurate 28 bytes/pixel estimate. reproject: _reproject_dask_cupy pre-allocated the full output on GPU via cp.full(out_shape), which OOMs for large outputs. Now checks available GPU memory and falls back to the existing map_blocks path (with is_cupy=True) when the output exceeds VRAM. Fast path preserved for outputs that fit.
Four more performance fixes from the Phase 2 sweep: polygonize: _polygonize_dask called dask.compute(*delayed_results) which held all chunk polygon data in memory at once. Now processes chunks incrementally -- interior polygons go straight to the output list and only boundary polygons accumulate for the merge step. polygon_clip: clip_polygon called mask.compute() to materialize the entire rasterized mask before applying it. For a polygon covering most of a 30TB raster, the uint8 mask alone would be multi-TB. Now keeps the mask lazy for dask paths and applies it via xarray.where (dask+numpy) or da.map_blocks (dask+cupy). kde: Both dask paths captured the full point arrays (xs, ys, ws) in every tile task's closure, serializing O(n_tiles * n_points) data. Now pre-filters points per tile using a bounding-box + cutoff-radius check, so each task receives only nearby points. pathfinding: When friction=None, the A* kernel allocated a dummy np.ones((h, w)) array that was never read (use_friction=False skips all friction lookups). For a 100K x 100K grid that's 80 GB of wasted allocation. Now passes a 1x1 dummy instead.
The parallel Bellman-Ford loop used max_iterations = height + width, which is too low for maze-like friction surfaces where shortest paths can snake across the entire grid (up to height * width - 1 edges). Changed to height * width, the standard Bellman-Ford V-1 bound.
Tests a maze where the shortest path has 16 edges on a 5x5 grid, which requires more than height + width Bellman-Ford iterations. Covers all four backends (numpy, cupy, dask+numpy, dask+cupy).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1191.
Summary
cost_distanceusedmax_iterations = height + width. On maze-like friction surfaces where the only passable route zigzags across the grid, shortest paths can have up toheight * width - 1edges. The old limit caused early termination -- reachable pixels were incorrectly reported as NaN.height * width, the standard Bellman-Ford V-1 bound. The early-exitchangedflag still short-circuits on open grids, so this only costs extra iterations when they're actually needed.Test plan
test_snake_maze_long_path-- 5x5 snake maze with a 16-edge shortest path (exceeds old h+w=10 limit), verified across all four backendstest_cost_distance.pysuite passes (48 tests)