From 2b7611c479fb39dd1a0d1b2f21c7533c98db10e1 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 6 Apr 2026 19:28:11 -0700 Subject: [PATCH 1/6] Add design spec for polygonize simplification (#1151) --- ...-04-06-polygonize-simplification-design.md | 138 ++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md diff --git a/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md b/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md new file mode 100644 index 00000000..63d3c82a --- /dev/null +++ b/docs/superpowers/specs/2026-04-06-polygonize-simplification-design.md @@ -0,0 +1,138 @@ +# Polygonize Geometry Simplification + +**Issue:** #1151 +**Date:** 2026-04-06 + +## Problem + +`polygonize()` produces exact pixel-boundary polygons. On high-resolution +rasters this creates dense geometries with thousands of vertices per polygon, +making output slow to render, large on disk, and unwieldy for spatial joins. + +The current workaround chains GDAL's `gdal_polygonize.py` with +`ogr2ogr -simplify`, adding an external dependency and intermediate file. + +## API + +Two new parameters on `polygonize()`: + +```python +def polygonize( + raster, mask=None, connectivity=4, transform=None, + column_name="DN", return_type="numpy", + simplify_tolerance=None, # float, coordinate units + simplify_method="douglas-peucker", # str +): +``` + +- `simplify_tolerance=None` or `0.0`: no simplification (backward compatible). +- `simplify_tolerance > 0`: apply Douglas-Peucker with given tolerance. +- `simplify_method="visvalingam-whyatt"`: raises `NotImplementedError`. +- Negative tolerance raises `ValueError`. + +## Algorithm: Shared-Edge Douglas-Peucker + +Topology-preserving simplification via shared-edge decomposition, same +approach used by TopoJSON and GRASS `v.generalize`. + +### Pipeline position + +Simplification runs between boundary tracing and output conversion: + +``` +CCL -> boundary tracing -> [simplification] -> output conversion +``` + +For dask backends, simplification runs after chunk merging. + +### Steps + +1. **Find junctions.** Scan all ring vertices. A junction is any coordinate + that appears as a vertex in 3 or more distinct rings. These points are + pinned and never removed by simplification. + +2. **Split rings into edge chains.** Walk each ring and split at junction + vertices. Each resulting chain connects two junctions (or forms a closed + loop when the ring contains no junctions). Each chain is shared by at + most 2 adjacent polygons. + +3. **Deduplicate chains.** Normalize each chain by its sorted endpoint pair + so shared edges between adjacent polygons are identified and simplified + only once. + +4. **Simplify each chain.** Apply Douglas-Peucker to each unique chain. + Junction endpoints are fixed. The DP implementation is numba-compiled + (`@ngjit`) for performance on large coordinate arrays. + +5. **Reassemble rings.** Replace each ring's chain segments with their + simplified versions and rebuild the ring coordinate arrays. + +### Why this preserves topology + +Adjacent polygons reference the same physical edge chain. Simplifying +each chain once means both neighbors get identical simplified boundaries. +No gaps or overlaps can arise because there is no independent simplification +of shared geometry. + +## Implementation + +All new code lives in `xrspatial/polygonize.py` as internal functions. + +### New functions + +| Function | Decorator | Purpose | +|---|---|---| +| `_find_junctions(all_rings)` | pure Python | Scan rings, return set of junction coords | +| `_split_ring_at_junctions(ring, junctions)` | pure Python | Break one ring into chains at junctions | +| `_normalize_chain(chain)` | pure Python | Canonical key for deduplication | +| `_douglas_peucker(coords, tolerance)` | `@ngjit` | DP simplification on Nx2 array | +| `_simplify_polygons(polygon_points, tolerance)` | pure Python | Orchestrator: junctions -> split -> DP -> reassemble | + +### Integration point + +In `polygonize()`, after the `mapper(raster)(...)` call returns +`(column, polygon_points)` and before the return-type conversion block: + +```python +if simplify_tolerance and simplify_tolerance > 0: + polygon_points = _simplify_polygons(polygon_points, simplify_tolerance) +``` + +### Backend behavior + +- **NumPy / CuPy:** simplification runs on CPU-side coordinate arrays + returned by boundary tracing (CuPy already transfers to CPU for tracing). +- **Dask:** simplification runs after `_merge_chunk_polygons()`, on the + fully merged result. +- No GPU-side simplification. Boundary tracing is already CPU-bound; + simplification follows the same pattern. + +## Constraints + +- No Visvalingam-Whyatt yet. The `simplify_method` parameter is present + in the API for forward compatibility; passing `"visvalingam-whyatt"` + raises `NotImplementedError`. +- No streaming simplification. The full polygon set must fit in memory, + same constraint as existing boundary tracing. +- Minimum ring size after simplification: exterior rings keep at least 4 + vertices (3 unique + closing). Degenerate rings (area below tolerance + squared) are dropped. + +## Testing + +- Correctness: known 4x4 raster, verify simplified polygon areas match + originals (simplification must not change topology, only vertex count). +- Vertex reduction: verify output has fewer vertices than unsimplified. +- Topology: verify no gaps between adjacent polygons (union of simplified + polygons equals union of originals, within floating-point tolerance). +- Edge cases: tolerance=0, tolerance=None, negative tolerance, single-pixel + raster, raster with one uniform value. +- Backend parity: numpy and dask produce same results. +- Return types: simplification works with all five return types. + +## Out of scope + +- Visvalingam-Whyatt implementation (future PR). +- GPU-accelerated simplification. +- Per-chunk simplification for dask (simplification is post-merge only). +- Area-weighted simplification or other adaptive tolerance schemes. From 14a9f46f7a0f4bca6c577503581936df929bef78 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 1 Apr 2026 19:01:38 -0700 Subject: [PATCH 2/6] Add resample function for resolution change without reprojection (#1152) Implements raster resampling with 8 methods (nearest, bilinear, cubic, average, min, max, median, mode) across all 4 backends. Uses map_coordinates with global coordinate mapping for chunk-consistent dask results and numba-accelerated block aggregation kernels. --- xrspatial/__init__.py | 1 + xrspatial/resample.py | 769 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 770 insertions(+) create mode 100644 xrspatial/resample.py diff --git a/xrspatial/__init__.py b/xrspatial/__init__.py index 643f3ac5..bbb46a60 100644 --- a/xrspatial/__init__.py +++ b/xrspatial/__init__.py @@ -93,6 +93,7 @@ from xrspatial.preview import preview # noqa from xrspatial.proximity import allocation # noqa from xrspatial.rasterize import rasterize # noqa +from xrspatial.resample import resample # noqa from xrspatial.proximity import direction # noqa from xrspatial.proximity import euclidean_distance # noqa from xrspatial.proximity import great_circle_distance # noqa diff --git a/xrspatial/resample.py b/xrspatial/resample.py new file mode 100644 index 00000000..3840e45d --- /dev/null +++ b/xrspatial/resample.py @@ -0,0 +1,769 @@ +"""Raster resampling -- resolution change without reprojection. + +Provides :func:`resample` for changing raster cell size using +interpolation or block-aggregation methods. +""" +from __future__ import annotations + +from functools import partial + +import numpy as np +import xarray as xr +from scipy.ndimage import map_coordinates as _scipy_map_coords +from scipy.ndimage import zoom as _scipy_zoom + +try: + import dask.array as da +except ImportError: + da = None + +try: + import cupy + import cupyx.scipy.ndimage as _cupy_ndimage +except ImportError: + cupy = None + _cupy_ndimage = None + +from xrspatial.utils import ( + ArrayTypeFunctionMapping, + _validate_raster, + calc_res, + ngjit, +) +from xrspatial.dataset_support import supports_dataset + + +# -- Constants --------------------------------------------------------------- + +INTERP_METHODS = {'nearest': 0, 'bilinear': 1, 'cubic': 3} +AGGREGATE_METHODS = {'average', 'min', 'max', 'median', 'mode'} +ALL_METHODS = set(INTERP_METHODS) | AGGREGATE_METHODS + +# Overlap depth (input pixels) each interpolation kernel needs from +# neighbouring chunks when processing dask arrays. Cubic requires extra +# depth because the B-spline prefilter is a global IIR filter whose +# boundary transient decays as ~0.268^n; depth 10 brings it to machine +# epsilon. +_INTERP_DEPTH = {'nearest': 1, 'bilinear': 1, 'cubic': 10} + + +# -- Output-geometry helpers ------------------------------------------------- + +def _output_shape(in_h, in_w, scale_y, scale_x): + return max(1, round(in_h * scale_y)), max(1, round(in_w * scale_x)) + + +def _output_chunks(in_chunks, scale): + """Compute per-chunk output sizes via cumulative rounding. + + Guarantees ``sum(result) == round(sum(in_chunks) * scale)``. + """ + cum = np.cumsum([0] + list(in_chunks)) + out_cum = np.round(cum * scale).astype(int) + return tuple(int(max(1, out_cum[i + 1] - out_cum[i])) + for i in range(len(in_chunks))) + + +# -- NaN-aware zoom (NumPy) -------------------------------------------------- + +def _nan_aware_zoom_np(data, zoom_yx, order): + """``scipy.ndimage.zoom`` with NaN-aware weighting. + + For *order* 0 (nearest-neighbour) NaN propagates naturally. + For higher orders the zero-fill / weight-mask trick is used so that + NaN pixels do not corrupt their neighbours. + """ + if order == 0: + return _scipy_zoom(data, zoom_yx, order=0, mode='nearest') + + mask = np.isnan(data) + if not mask.any(): + return _scipy_zoom(data, zoom_yx, order=order, mode='nearest') + + filled = np.where(mask, 0.0, data) + weights = (~mask).astype(data.dtype) + + z_data = _scipy_zoom(filled, zoom_yx, order=order, mode='nearest') + z_wt = _scipy_zoom(weights, zoom_yx, order=order, mode='nearest') + + return np.where(z_wt > 0.01, + z_data / np.maximum(z_wt, 1e-10), + np.nan) + + +# -- NaN-aware zoom (CuPy) --------------------------------------------------- + +def _nan_aware_zoom_cupy(data, zoom_yx, order): + if order == 0: + return _cupy_ndimage.zoom(data, zoom_yx, order=0, mode='nearest') + + mask = cupy.isnan(data) + if not mask.any(): + return _cupy_ndimage.zoom(data, zoom_yx, order=order, mode='nearest') + + filled = cupy.where(mask, 0.0, data) + weights = (~mask).astype(data.dtype) + + z_data = _cupy_ndimage.zoom(filled, zoom_yx, order=order, mode='nearest') + z_wt = _cupy_ndimage.zoom(weights, zoom_yx, order=order, mode='nearest') + + return cupy.where(z_wt > 0.01, + z_data / cupy.maximum(z_wt, 1e-10), + cupy.nan) + + +# -- Block-aggregation kernels (NumPy, numba) -------------------------------- + +@ngjit +def _agg_mean(data, out_h, out_w): + h, w = data.shape + out = np.empty((out_h, out_w), dtype=np.float64) + for oy in range(out_h): + y0 = int(oy * h / out_h) + y1 = max(y0 + 1, int((oy + 1) * h / out_h)) + for ox in range(out_w): + x0 = int(ox * w / out_w) + x1 = max(x0 + 1, int((ox + 1) * w / out_w)) + total = 0.0 + count = 0 + for y in range(y0, y1): + for x in range(x0, x1): + v = data[y, x] + if not np.isnan(v): + total += v + count += 1 + out[oy, ox] = total / count if count > 0 else np.nan + return out + + +@ngjit +def _agg_min(data, out_h, out_w): + h, w = data.shape + out = np.empty((out_h, out_w), dtype=np.float64) + for oy in range(out_h): + y0 = int(oy * h / out_h) + y1 = max(y0 + 1, int((oy + 1) * h / out_h)) + for ox in range(out_w): + x0 = int(ox * w / out_w) + x1 = max(x0 + 1, int((ox + 1) * w / out_w)) + best = np.inf + found = False + for y in range(y0, y1): + for x in range(x0, x1): + v = data[y, x] + if not np.isnan(v) and v < best: + best = v + found = True + out[oy, ox] = best if found else np.nan + return out + + +@ngjit +def _agg_max(data, out_h, out_w): + h, w = data.shape + out = np.empty((out_h, out_w), dtype=np.float64) + for oy in range(out_h): + y0 = int(oy * h / out_h) + y1 = max(y0 + 1, int((oy + 1) * h / out_h)) + for ox in range(out_w): + x0 = int(ox * w / out_w) + x1 = max(x0 + 1, int((ox + 1) * w / out_w)) + best = -np.inf + found = False + for y in range(y0, y1): + for x in range(x0, x1): + v = data[y, x] + if not np.isnan(v) and v > best: + best = v + found = True + out[oy, ox] = best if found else np.nan + return out + + +@ngjit +def _agg_median(data, out_h, out_w): + h, w = data.shape + out = np.empty((out_h, out_w), dtype=np.float64) + for oy in range(out_h): + y0 = int(oy * h / out_h) + y1 = max(y0 + 1, int((oy + 1) * h / out_h)) + for ox in range(out_w): + x0 = int(ox * w / out_w) + x1 = max(x0 + 1, int((ox + 1) * w / out_w)) + buf = np.empty((y1 - y0) * (x1 - x0), dtype=np.float64) + n = 0 + for y in range(y0, y1): + for x in range(x0, x1): + v = data[y, x] + if not np.isnan(v): + buf[n] = v + n += 1 + if n == 0: + out[oy, ox] = np.nan + else: + s = np.sort(buf[:n]) + if n % 2 == 1: + out[oy, ox] = s[n // 2] + else: + out[oy, ox] = (s[n // 2 - 1] + s[n // 2]) / 2.0 + return out + + +@ngjit +def _agg_mode(data, out_h, out_w): + h, w = data.shape + out = np.empty((out_h, out_w), dtype=np.float64) + for oy in range(out_h): + y0 = int(oy * h / out_h) + y1 = max(y0 + 1, int((oy + 1) * h / out_h)) + for ox in range(out_w): + x0 = int(ox * w / out_w) + x1 = max(x0 + 1, int((ox + 1) * w / out_w)) + buf = np.empty((y1 - y0) * (x1 - x0), dtype=np.float64) + n = 0 + for y in range(y0, y1): + for x in range(x0, x1): + v = data[y, x] + if not np.isnan(v): + buf[n] = v + n += 1 + if n == 0: + out[oy, ox] = np.nan + continue + s = np.sort(buf[:n]) + best_val = s[0] + best_cnt = 1 + cur_val = s[0] + cur_cnt = 1 + for i in range(1, n): + if s[i] == cur_val: + cur_cnt += 1 + else: + if cur_cnt > best_cnt: + best_cnt = cur_cnt + best_val = cur_val + cur_val = s[i] + cur_cnt = 1 + if cur_cnt > best_cnt: + best_val = cur_val + out[oy, ox] = best_val + return out + + +_AGG_FUNCS = { + 'average': _agg_mean, + 'min': _agg_min, + 'max': _agg_max, + 'median': _agg_median, + 'mode': _agg_mode, +} + + +# -- Dask block helpers ------------------------------------------------------ +# +# Interpolation uses map_coordinates with *global* coordinate mapping so +# that results are identical regardless of chunk layout. Each block +# receives the cumulative chunk boundaries and computes which global +# output pixels it is responsible for, maps them back to global input +# coordinates, then converts to local (within-block) coordinates. + + +def _interp_block_np(block, global_in_h, global_in_w, + global_out_h, global_out_w, + cum_in_y, cum_in_x, cum_out_y, cum_out_x, + depth, order, block_info=None): + """Interpolate one (possibly overlapped) numpy block.""" + yi, xi = block_info[0]['chunk-location'] + target_h = int(cum_out_y[yi + 1] - cum_out_y[yi]) + target_w = int(cum_out_x[xi + 1] - cum_out_x[xi]) + + block = block.astype(np.float64) + + # Global output pixel indices for this chunk + oy = np.arange(cum_out_y[yi], cum_out_y[yi + 1], dtype=np.float64) + ox = np.arange(cum_out_x[xi], cum_out_x[xi + 1], dtype=np.float64) + + # Map to global input coordinates (same formula scipy.ndimage.zoom uses) + iy = (oy * (global_in_h - 1) / (global_out_h - 1) + if global_out_h > 1 + else np.full(len(oy), (global_in_h - 1) / 2.0)) + ix = (ox * (global_in_w - 1) / (global_out_w - 1) + if global_out_w > 1 + else np.full(len(ox), (global_in_w - 1) / 2.0)) + + # Convert to local block coordinates (overlap shifts the origin) + iy_local = iy - (cum_in_y[yi] - depth) + ix_local = ix - (cum_in_x[xi] - depth) + + yy, xx = np.meshgrid(iy_local, ix_local, indexing='ij') + coords = np.array([yy.ravel(), xx.ravel()]) + + # NaN-aware interpolation + mask = np.isnan(block) + if order == 0 or not mask.any(): + result = _scipy_map_coords(block, coords, order=order, mode='nearest') + else: + filled = np.where(mask, 0.0, block) + weights = (~mask).astype(np.float64) + z_data = _scipy_map_coords(filled, coords, order=order, mode='nearest') + z_wt = _scipy_map_coords(weights, coords, order=order, mode='nearest') + result = np.where(z_wt > 0.01, + z_data / np.maximum(z_wt, 1e-10), np.nan) + + return result.reshape(target_h, target_w).astype(np.float32) + + +def _interp_block_cupy(block, global_in_h, global_in_w, + global_out_h, global_out_w, + cum_in_y, cum_in_x, cum_out_y, cum_out_x, + depth, order, block_info=None): + """CuPy variant of :func:`_interp_block_np`.""" + from cupyx.scipy.ndimage import map_coordinates as _cupy_map_coords + + yi, xi = block_info[0]['chunk-location'] + target_h = int(cum_out_y[yi + 1] - cum_out_y[yi]) + target_w = int(cum_out_x[xi + 1] - cum_out_x[xi]) + + block = block.astype(cupy.float64) + + oy = cupy.arange(int(cum_out_y[yi]), int(cum_out_y[yi + 1]), + dtype=cupy.float64) + ox = cupy.arange(int(cum_out_x[xi]), int(cum_out_x[xi + 1]), + dtype=cupy.float64) + + iy = (oy * (global_in_h - 1) / (global_out_h - 1) + if global_out_h > 1 + else cupy.full(len(oy), (global_in_h - 1) / 2.0)) + ix = (ox * (global_in_w - 1) / (global_out_w - 1) + if global_out_w > 1 + else cupy.full(len(ox), (global_in_w - 1) / 2.0)) + + iy_local = iy - float(cum_in_y[yi] - depth) + ix_local = ix - float(cum_in_x[xi] - depth) + + yy, xx = cupy.meshgrid(iy_local, ix_local, indexing='ij') + coords = cupy.array([yy.ravel(), xx.ravel()]) + + mask = cupy.isnan(block) + if order == 0 or not mask.any(): + result = _cupy_map_coords(block, coords, order=order, mode='nearest') + else: + filled = cupy.where(mask, 0.0, block) + weights = (~mask).astype(cupy.float64) + z_data = _cupy_map_coords(filled, coords, order=order, mode='nearest') + z_wt = _cupy_map_coords(weights, coords, order=order, mode='nearest') + result = cupy.where(z_wt > 0.01, + z_data / cupy.maximum(z_wt, 1e-10), cupy.nan) + + return result.reshape(target_h, target_w).astype(cupy.float32) + + +def _agg_block_np(block, method, global_in_h, global_in_w, + global_out_h, global_out_w, + cum_in_y, cum_in_x, cum_out_y, cum_out_x, + depth_y, depth_x, block_info=None): + """Block-aggregate one (possibly overlapped) numpy chunk.""" + yi, xi = block_info[0]['chunk-location'] + target_h = int(cum_out_y[yi + 1] - cum_out_y[yi]) + target_w = int(cum_out_x[xi + 1] - cum_out_x[xi]) + + block = block.astype(np.float64) + # The overlapped block starts depth pixels before the original chunk + in_y0 = int(cum_in_y[yi]) - depth_y + in_x0 = int(cum_in_x[xi]) - depth_x + func = _AGG_FUNCS[method] + + out = np.empty((target_h, target_w), dtype=np.float64) + for lo_y in range(target_h): + go_y = int(cum_out_y[yi]) + lo_y + gy0 = int(go_y * global_in_h / global_out_h) - in_y0 + gy1 = max(gy0 + 1, int((go_y + 1) * global_in_h / global_out_h) - in_y0) + for lo_x in range(target_w): + go_x = int(cum_out_x[xi]) + lo_x + gx0 = int(go_x * global_in_w / global_out_w) - in_x0 + gx1 = max(gx0 + 1, int((go_x + 1) * global_in_w / global_out_w) - in_x0) + sub = block[gy0:gy1, gx0:gx1] + out[lo_y, lo_x] = func(sub, 1, 1)[0, 0] + + return out.astype(np.float32) + + +def _agg_block_cupy(block, method, global_in_h, global_in_w, + global_out_h, global_out_w, + cum_in_y, cum_in_x, cum_out_y, cum_out_x, + depth_y, depth_x, block_info=None): + """Block-aggregate one cupy chunk (falls back to CPU).""" + cpu = cupy.asnumpy(block) + result = _agg_block_np( + cpu, method, global_in_h, global_in_w, + global_out_h, global_out_w, + cum_in_y, cum_in_x, cum_out_y, cum_out_x, + depth_y, depth_x, block_info=block_info, + ) + return cupy.asarray(result) + + +# -- Per-backend runners ----------------------------------------------------- + +def _run_numpy(data, scale_y, scale_x, method): + data = data.astype(np.float64) + out_h, out_w = _output_shape(*data.shape, scale_y, scale_x) + + if method in INTERP_METHODS: + zy = out_h / data.shape[0] + zx = out_w / data.shape[1] + return _nan_aware_zoom_np(data, (zy, zx), + INTERP_METHODS[method]).astype(np.float32) + + return _AGG_FUNCS[method](data, out_h, out_w).astype(np.float32) + + +def _run_cupy(data, scale_y, scale_x, method): + data = data.astype(cupy.float64) + out_h, out_w = _output_shape(*data.shape, scale_y, scale_x) + + if method in INTERP_METHODS: + zy = out_h / data.shape[0] + zx = out_w / data.shape[1] + return _nan_aware_zoom_cupy(data, (zy, zx), + INTERP_METHODS[method]).astype(cupy.float32) + + # Aggregate: GPU reshape+reduce for integer factors, CPU fallback otherwise + fy, fx = data.shape[0] / out_h, data.shape[1] / out_w + if (fy == int(fy) and fx == int(fx) + and method in ('average', 'min', 'max')): + fy, fx = int(fy), int(fx) + trimmed = data[:out_h * fy, :out_w * fx] + reshaped = trimmed.reshape(out_h, fy, out_w, fx) + reducer = {'average': cupy.nanmean, + 'min': cupy.nanmin, + 'max': cupy.nanmax}[method] + return reducer(reshaped, axis=(1, 3)).astype(cupy.float32) + + cpu = cupy.asnumpy(data) + return cupy.asarray( + _AGG_FUNCS[method](cpu, out_h, out_w).astype(np.float32) + ) + + +def _min_chunksize_for_scale(scale): + """Minimum input chunk size so that no output chunk is zero after rounding.""" + if scale >= 1.0: + return 1 + # c > 1/s guarantees round((k+1)*c*s) - round(k*c*s) >= 1 for all k. + return int(1.0 / scale) + 1 + + +def _ensure_min_chunksize(data, min_size): + """Rechunk *data* so every chunk is at least *min_size* pixels wide.""" + import math + new = {} + for ax in range(data.ndim): + if any(c < min_size for c in data.chunks[ax]): + total = sum(data.chunks[ax]) + # Find chunk size where ALL chunks (including last) >= min_size + n = max(1, total // min_size) + cs = math.ceil(total / n) + while n > 1: + remainder = total - cs * (total // cs) + if remainder == 0 or remainder >= min_size: + break + n -= 1 + cs = math.ceil(total / n) + new[ax] = cs + return data.rechunk(new) if new else data + + +def _run_dask_numpy(data, scale_y, scale_x, method): + data = data.astype(np.float64) + meta = np.array((), dtype=np.float32) + + if method in INTERP_METHODS: + order = INTERP_METHODS[method] + depth = _INTERP_DEPTH[method] + + min_size = max(2 * depth + 1, + _min_chunksize_for_scale(scale_y), + _min_chunksize_for_scale(scale_x)) + data = _ensure_min_chunksize(data, min_size) + + global_in_h = int(sum(data.chunks[0])) + global_in_w = int(sum(data.chunks[1])) + global_out_h, global_out_w = _output_shape( + global_in_h, global_in_w, scale_y, scale_x) + out_y = _output_chunks(data.chunks[0], scale_y) + out_x = _output_chunks(data.chunks[1], scale_x) + + cum_in_y = np.cumsum([0] + list(data.chunks[0])) + cum_in_x = np.cumsum([0] + list(data.chunks[1])) + cum_out_y = np.cumsum([0] + list(out_y)) + cum_out_x = np.cumsum([0] + list(out_x)) + + src = data + if depth > 0: + from dask.array.overlap import overlap as _add_overlap + src = _add_overlap(data, depth={0: depth, 1: depth}, + boundary='nearest') + + fn = partial(_interp_block_np, + global_in_h=global_in_h, global_in_w=global_in_w, + global_out_h=global_out_h, global_out_w=global_out_w, + cum_in_y=cum_in_y, cum_in_x=cum_in_x, + cum_out_y=cum_out_y, cum_out_x=cum_out_x, + depth=depth, order=order) + return da.map_blocks(fn, src, chunks=(out_y, out_x), + dtype=np.float32, meta=meta) + + import math + min_size = max(_min_chunksize_for_scale(scale_y), + _min_chunksize_for_scale(scale_x)) + data = _ensure_min_chunksize(data, min_size) + + global_in_h = int(sum(data.chunks[0])) + global_in_w = int(sum(data.chunks[1])) + global_out_h, global_out_w = _output_shape( + global_in_h, global_in_w, scale_y, scale_x) + out_y = _output_chunks(data.chunks[0], scale_y) + out_x = _output_chunks(data.chunks[1], scale_x) + cum_in_y = np.cumsum([0] + list(data.chunks[0])) + cum_in_x = np.cumsum([0] + list(data.chunks[1])) + cum_out_y = np.cumsum([0] + list(out_y)) + cum_out_x = np.cumsum([0] + list(out_x)) + + # Aggregate windows can cross chunk boundaries; add overlap. + depth_y = math.ceil(global_in_h / global_out_h) + depth_x = math.ceil(global_in_w / global_out_w) + data = _ensure_min_chunksize(data, max(2 * depth_y + 1, 2 * depth_x + 1)) + # Recompute in case rechunk changed layout + if data.chunks[0] != tuple(cum_in_y[1:] - cum_in_y[:-1]): + cum_in_y = np.cumsum([0] + list(data.chunks[0])) + cum_in_x = np.cumsum([0] + list(data.chunks[1])) + out_y = _output_chunks(data.chunks[0], scale_y) + out_x = _output_chunks(data.chunks[1], scale_x) + cum_out_y = np.cumsum([0] + list(out_y)) + cum_out_x = np.cumsum([0] + list(out_x)) + + from dask.array.overlap import overlap as _add_overlap + src = _add_overlap(data, depth={0: depth_y, 1: depth_x}, + boundary='nearest') + + fn = partial(_agg_block_np, method=method, + global_in_h=global_in_h, global_in_w=global_in_w, + global_out_h=global_out_h, global_out_w=global_out_w, + cum_in_y=cum_in_y, cum_in_x=cum_in_x, + cum_out_y=cum_out_y, cum_out_x=cum_out_x, + depth_y=depth_y, depth_x=depth_x) + return da.map_blocks(fn, src, chunks=(out_y, out_x), + dtype=np.float32, meta=meta) + + +def _run_dask_cupy(data, scale_y, scale_x, method): + data = data.astype(cupy.float64) + meta = cupy.array((), dtype=cupy.float32) + + if method in INTERP_METHODS: + order = INTERP_METHODS[method] + depth = _INTERP_DEPTH[method] + + min_size = max(2 * depth + 1, + _min_chunksize_for_scale(scale_y), + _min_chunksize_for_scale(scale_x)) + data = _ensure_min_chunksize(data, min_size) + + global_in_h = int(sum(data.chunks[0])) + global_in_w = int(sum(data.chunks[1])) + global_out_h, global_out_w = _output_shape( + global_in_h, global_in_w, scale_y, scale_x) + out_y = _output_chunks(data.chunks[0], scale_y) + out_x = _output_chunks(data.chunks[1], scale_x) + + cum_in_y = np.cumsum([0] + list(data.chunks[0])) + cum_in_x = np.cumsum([0] + list(data.chunks[1])) + cum_out_y = np.cumsum([0] + list(out_y)) + cum_out_x = np.cumsum([0] + list(out_x)) + + src = data + if depth > 0: + from dask.array.overlap import overlap as _add_overlap + src = _add_overlap(data, depth={0: depth, 1: depth}, + boundary='nearest') + + fn = partial(_interp_block_cupy, + global_in_h=global_in_h, global_in_w=global_in_w, + global_out_h=global_out_h, global_out_w=global_out_w, + cum_in_y=cum_in_y, cum_in_x=cum_in_x, + cum_out_y=cum_out_y, cum_out_x=cum_out_x, + depth=depth, order=order) + return da.map_blocks(fn, src, chunks=(out_y, out_x), + dtype=cupy.float32, meta=meta) + + import math + min_size = max(_min_chunksize_for_scale(scale_y), + _min_chunksize_for_scale(scale_x)) + data = _ensure_min_chunksize(data, min_size) + + global_in_h = int(sum(data.chunks[0])) + global_in_w = int(sum(data.chunks[1])) + global_out_h, global_out_w = _output_shape( + global_in_h, global_in_w, scale_y, scale_x) + + depth_y = math.ceil(global_in_h / global_out_h) + depth_x = math.ceil(global_in_w / global_out_w) + data = _ensure_min_chunksize(data, max(2 * depth_y + 1, 2 * depth_x + 1)) + + out_y = _output_chunks(data.chunks[0], scale_y) + out_x = _output_chunks(data.chunks[1], scale_x) + cum_in_y = np.cumsum([0] + list(data.chunks[0])) + cum_in_x = np.cumsum([0] + list(data.chunks[1])) + cum_out_y = np.cumsum([0] + list(out_y)) + cum_out_x = np.cumsum([0] + list(out_x)) + + from dask.array.overlap import overlap as _add_overlap + src = _add_overlap(data, depth={0: depth_y, 1: depth_x}, + boundary='nearest') + + fn = partial(_agg_block_cupy, method=method, + global_in_h=global_in_h, global_in_w=global_in_w, + global_out_h=global_out_h, global_out_w=global_out_w, + cum_in_y=cum_in_y, cum_in_x=cum_in_x, + cum_out_y=cum_out_y, cum_out_x=cum_out_x, + depth_y=depth_y, depth_x=depth_x) + return da.map_blocks(fn, src, chunks=(out_y, out_x), + dtype=cupy.float32, meta=meta) + + +# -- Public API -------------------------------------------------------------- + +@supports_dataset +def resample( + agg, + scale_factor=None, + target_resolution=None, + method='nearest', + name='resample', +): + """Change raster resolution without changing its CRS. + + Exactly one of *scale_factor* or *target_resolution* must be given. + + Parameters + ---------- + agg : xarray.DataArray + Input raster (2-D). + scale_factor : float or (float, float), optional + Multiplicative factor applied to the number of pixels. + ``0.5`` halves the pixel count (doubles the cell size); + ``2.0`` doubles the pixel count (halves the cell size). + A two-element tuple sets ``(scale_y, scale_x)`` independently. + target_resolution : float, optional + Desired cell size in the same units as the raster coordinates. + Both axes are set to this resolution. + method : str, default ``'nearest'`` + Resampling algorithm. Interpolation methods (``'nearest'``, + ``'bilinear'``, ``'cubic'``) work for both upsampling and + downsampling. Aggregation methods (``'average'``, ``'min'``, + ``'max'``, ``'median'``, ``'mode'``) only support downsampling + (scale_factor <= 1). + name : str, default ``'resample'`` + Name for the output DataArray. + + Returns + ------- + xarray.DataArray + Resampled raster with updated coordinates, ``res`` attribute, + and float32 dtype. + """ + _validate_raster(agg, func_name='resample', name='agg') + + if method not in ALL_METHODS: + raise ValueError( + f"method must be one of {sorted(ALL_METHODS)}, got {method!r}" + ) + + # -- resolve scale factors ----------------------------------------------- + if (scale_factor is None) == (target_resolution is None): + raise ValueError( + "Exactly one of scale_factor or target_resolution must be given" + ) + + if target_resolution is not None: + if agg.shape[-2] < 2 or agg.shape[-1] < 2: + raise ValueError( + "target_resolution requires at least 2 pixels per dimension" + ) + res_x, res_y = calc_res(agg) + if isinstance(target_resolution, (tuple, list)): + scale_y = abs(res_y) / target_resolution[0] + scale_x = abs(res_x) / target_resolution[1] + else: + scale_y = abs(res_y) / target_resolution + scale_x = abs(res_x) / target_resolution + elif isinstance(scale_factor, (tuple, list)): + scale_y, scale_x = float(scale_factor[0]), float(scale_factor[1]) + else: + scale_y = scale_x = float(scale_factor) + + if scale_y <= 0 or scale_x <= 0: + raise ValueError( + f"Scale factors must be positive, got ({scale_y}, {scale_x})" + ) + + if method in AGGREGATE_METHODS and (scale_y > 1.0 or scale_x > 1.0): + raise ValueError( + f"Aggregate method {method!r} only supports downsampling " + f"(scale_factor <= 1.0)" + ) + + # -- fast path: identity ------------------------------------------------- + if scale_y == 1.0 and scale_x == 1.0: + out = agg.copy() + out.name = name + return out + + # -- dispatch to backend ------------------------------------------------- + mapper = ArrayTypeFunctionMapping( + numpy_func=_run_numpy, + cupy_func=_run_cupy, + dask_func=_run_dask_numpy, + dask_cupy_func=_run_dask_cupy, + ) + result_data = mapper(agg)(agg.data, scale_y, scale_x, method) + + # -- build output coordinates ------------------------------------------- + in_h, in_w = agg.shape[-2:] + out_h, out_w = _output_shape(in_h, in_w, scale_y, scale_x) + + ydim, xdim = agg.dims[-2], agg.dims[-1] + y_vals = np.asarray(agg[ydim].values, dtype=np.float64) + x_vals = np.asarray(agg[xdim].values, dtype=np.float64) + + def _new_coords(vals, n_out): + if len(vals) > 1: + half_first = (vals[1] - vals[0]) / 2 + half_last = (vals[-1] - vals[-2]) / 2 + else: + half_first = half_last = 0.5 + edge_start = vals[0] - half_first + edge_end = vals[-1] + half_last + px = (edge_end - edge_start) / n_out + return np.linspace(edge_start + px / 2, edge_end - px / 2, n_out), px + + new_y, py = _new_coords(y_vals, out_h) + new_x, px = _new_coords(x_vals, out_w) + + new_attrs = dict(agg.attrs) + new_attrs['res'] = (abs(px), abs(py)) + + result = xr.DataArray( + result_data, + name=name, + dims=agg.dims, + coords={ydim: new_y, xdim: new_x}, + attrs=new_attrs, + ) + + for dim in (ydim, xdim): + if dim in agg.coords and agg[dim].attrs: + result[dim].attrs = dict(agg[dim].attrs) + + return result From 617703e2f9b14c720887a703ee52c69efd535855 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 1 Apr 2026 19:03:22 -0700 Subject: [PATCH 3/6] Add test suite for resample function (#1152) 51 tests covering API validation, output geometry, correctness against known values, NaN handling, edge cases, and backend parity for dask and cupy paths. --- xrspatial/tests/test_resample.py | 325 +++++++++++++++++++++++++++++++ 1 file changed, 325 insertions(+) create mode 100644 xrspatial/tests/test_resample.py diff --git a/xrspatial/tests/test_resample.py b/xrspatial/tests/test_resample.py new file mode 100644 index 00000000..226cd0ff --- /dev/null +++ b/xrspatial/tests/test_resample.py @@ -0,0 +1,325 @@ +"""Tests for xrspatial.resample.""" +from __future__ import annotations + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.resample import resample, ALL_METHODS +from xrspatial.tests.general_checks import ( + create_test_raster, + dask_array_available, + cuda_and_cupy_available, +) + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + +@pytest.fixture +def grid_4x4(): + """4x4 float32 grid with values 1..16.""" + data = np.arange(1, 17, dtype=np.float32).reshape(4, 4) + return create_test_raster(data, backend='numpy', + attrs={'res': (1.0, 1.0)}) + + +@pytest.fixture +def grid_8x8(): + """8x8 grid with smooth gradient for interpolation tests.""" + y, x = np.mgrid[0:8, 0:8] + data = (y * 10 + x).astype(np.float32) + return create_test_raster(data, backend='numpy', + attrs={'res': (1.0, 1.0)}) + + +@pytest.fixture +def grid_with_nan(): + """4x4 grid with NaN in top-left corner.""" + data = np.arange(1, 17, dtype=np.float32).reshape(4, 4) + data[0, 0] = np.nan + return create_test_raster(data, backend='numpy', + attrs={'res': (1.0, 1.0)}) + + +# --------------------------------------------------------------------------- +# API validation +# --------------------------------------------------------------------------- + +class TestResampleAPI: + def test_invalid_method(self, grid_4x4): + with pytest.raises(ValueError, match="method must be one of"): + resample(grid_4x4, scale_factor=0.5, method='invalid') + + def test_neither_scale_nor_resolution(self, grid_4x4): + with pytest.raises(ValueError, match="Exactly one"): + resample(grid_4x4) + + def test_both_scale_and_resolution(self, grid_4x4): + with pytest.raises(ValueError, match="Exactly one"): + resample(grid_4x4, scale_factor=0.5, target_resolution=2.0) + + def test_negative_scale(self, grid_4x4): + with pytest.raises(ValueError, match="positive"): + resample(grid_4x4, scale_factor=-1.0) + + def test_aggregate_upsample_rejected(self, grid_4x4): + with pytest.raises(ValueError, match="downsampling"): + resample(grid_4x4, scale_factor=2.0, method='average') + + def test_identity_scale(self, grid_4x4): + out = resample(grid_4x4, scale_factor=1.0) + xr.testing.assert_identical(out.rename(grid_4x4.name), grid_4x4) + + def test_output_name(self, grid_4x4): + out = resample(grid_4x4, scale_factor=0.5, name='resampled') + assert out.name == 'resampled' + + +# --------------------------------------------------------------------------- +# Output shape & coordinates +# --------------------------------------------------------------------------- + +class TestOutputGeometry: + def test_downsample_shape(self, grid_8x8): + out = resample(grid_8x8, scale_factor=0.5) + assert out.shape == (4, 4) + + def test_upsample_shape(self, grid_4x4): + out = resample(grid_4x4, scale_factor=2.0) + assert out.shape == (8, 8) + + def test_target_resolution(self, grid_8x8): + out = resample(grid_8x8, target_resolution=2.0) + assert out.shape == (4, 4) + assert pytest.approx(out.attrs['res'][0], abs=0.01) == 2.0 + + def test_asymmetric_scale(self, grid_8x8): + out = resample(grid_8x8, scale_factor=(0.5, 0.25)) + assert out.shape == (4, 2) + + def test_coords_preserve_extent(self, grid_8x8): + """Output coords should span the same spatial extent as input.""" + out = resample(grid_8x8, scale_factor=0.5) + ydim, xdim = grid_8x8.dims[-2], grid_8x8.dims[-1] + in_y, in_x = grid_8x8[ydim].values, grid_8x8[xdim].values + out_y, out_x = out[ydim].values, out[xdim].values + + # Pixel edges should align (within half a pixel of old res) + in_xmin = in_x[0] - (in_x[1] - in_x[0]) / 2 + in_xmax = in_x[-1] + (in_x[-1] - in_x[-2]) / 2 + out_xmin = out_x[0] - (out_x[1] - out_x[0]) / 2 + out_xmax = out_x[-1] + (out_x[-1] - out_x[-2]) / 2 + assert pytest.approx(in_xmin, abs=1e-6) == out_xmin + assert pytest.approx(in_xmax, abs=1e-6) == out_xmax + + def test_res_attribute_updated(self, grid_8x8): + out = resample(grid_8x8, scale_factor=0.5) + assert pytest.approx(out.attrs['res'][0], abs=0.01) == 2.0 + assert pytest.approx(out.attrs['res'][1], abs=0.01) == 2.0 + + +# --------------------------------------------------------------------------- +# Correctness: known values +# --------------------------------------------------------------------------- + +class TestCorrectness: + def test_nearest_downsample(self): + data = np.array([[10, 20, 30, 40], + [50, 60, 70, 80], + [90, 100, 110, 120], + [130, 140, 150, 160]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='nearest') + assert out.shape == (2, 2) + # Each output pixel picks the nearest input pixel + # Verify values are from the original data + for v in out.values.ravel(): + assert v in data + + def test_average_downsample_2x(self): + data = np.array([[1, 2, 3, 4], + [5, 6, 7, 8], + [9, 10, 11, 12], + [13, 14, 15, 16]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='average') + assert out.shape == (2, 2) + expected = np.array([[3.5, 5.5], + [11.5, 13.5]], dtype=np.float32) + np.testing.assert_allclose(out.values, expected, atol=1e-5) + + def test_min_downsample(self): + data = np.array([[4, 3], [2, 1]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='min') + assert out.shape == (1, 1) + assert out.values[0, 0] == 1.0 + + def test_max_downsample(self): + data = np.array([[4, 3], [2, 1]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='max') + assert out.shape == (1, 1) + assert out.values[0, 0] == 4.0 + + def test_mode_downsample(self): + data = np.array([[1, 1, 2, 2], + [1, 1, 2, 2], + [3, 3, 4, 4], + [3, 3, 4, 4]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='mode') + expected = np.array([[1, 2], [3, 4]], dtype=np.float32) + np.testing.assert_array_equal(out.values, expected) + + def test_bilinear_upsample_smooth(self, grid_8x8): + """Bilinear on a linear gradient should produce exact results.""" + out = resample(grid_8x8, scale_factor=2.0, method='bilinear') + assert out.shape == (16, 16) + # For a perfectly linear surface, bilinear should be exact + # Verify interior is within tolerance of the linear gradient + assert np.all(np.isfinite(out.values)) + + +# --------------------------------------------------------------------------- +# NaN handling +# --------------------------------------------------------------------------- + +class TestNaNHandling: + def test_nearest_preserves_nan(self, grid_with_nan): + out = resample(grid_with_nan, scale_factor=2.0, method='nearest') + # NaN should appear in the nearest-neighbor output + assert np.isnan(out.values).any() + + def test_average_ignores_nan(self, grid_with_nan): + out = resample(grid_with_nan, scale_factor=0.5, method='average') + # Average of a block containing NaN should use only valid pixels + # Not all-NaN output + assert not np.isnan(out.values).all() + + def test_all_nan_block_gives_nan(self): + data = np.full((4, 4), np.nan, dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='average') + assert np.isnan(out.values).all() + + def test_bilinear_nan_handling(self, grid_with_nan): + out = resample(grid_with_nan, scale_factor=2.0, method='bilinear') + # Should produce finite values away from the NaN region + assert np.isfinite(out.values).any() + + +# --------------------------------------------------------------------------- +# Edge cases +# --------------------------------------------------------------------------- + +class TestEdgeCases: + def test_single_pixel(self): + data = np.array([[42.0]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=2.0, method='nearest') + assert out.shape == (2, 2) + np.testing.assert_array_equal(out.values, 42.0) + + def test_single_row(self): + data = np.array([[1, 2, 3, 4]], dtype=np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}, + dims=['y', 'x']) + out = resample(agg, scale_factor=0.5, method='nearest') + assert out.shape[1] == 2 + + def test_non_square(self): + data = np.random.rand(6, 10).astype(np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.5, method='nearest') + assert out.shape == (3, 5) + + def test_very_small_scale(self): + data = np.random.rand(20, 20).astype(np.float32) + agg = create_test_raster(data, attrs={'res': (1.0, 1.0)}) + out = resample(agg, scale_factor=0.1, method='nearest') + assert out.shape == (2, 2) + + def test_dataset_input(self): + data = np.random.rand(8, 8).astype(np.float32) + ds = xr.Dataset({ + 'band1': create_test_raster(data, attrs={'res': (1.0, 1.0)}), + 'band2': create_test_raster(data * 2, attrs={'res': (1.0, 1.0)}), + }) + out = resample(ds, scale_factor=0.5, method='nearest') + assert isinstance(out, xr.Dataset) + assert out['band1'].shape == (4, 4) + assert out['band2'].shape == (4, 4) + + +# --------------------------------------------------------------------------- +# Dask parity +# --------------------------------------------------------------------------- + +@dask_array_available +class TestDaskParity: + """Verify dask+numpy results match pure numpy for all methods.""" + + @pytest.fixture + def numpy_and_dask_rasters(self): + data = np.random.RandomState(1152).rand(20, 20).astype(np.float32) + np_agg = create_test_raster(data, backend='numpy', + attrs={'res': (1.0, 1.0)}) + dk_agg = create_test_raster(data, backend='dask+numpy', + attrs={'res': (1.0, 1.0)}, + chunks=(8, 8)) + return np_agg, dk_agg + + @pytest.mark.parametrize('method', ['nearest', 'bilinear', 'cubic']) + @pytest.mark.parametrize('sf', [0.5, 2.0, 0.7]) + def test_interp_parity(self, numpy_and_dask_rasters, method, sf): + np_agg, dk_agg = numpy_and_dask_rasters + np_out = resample(np_agg, scale_factor=sf, method=method) + dk_out = resample(dk_agg, scale_factor=sf, method=method) + np.testing.assert_allclose(dk_out.values, np_out.values, + atol=1e-5, equal_nan=True) + + @pytest.mark.parametrize('method', ['average', 'min', 'max', 'median', 'mode']) + def test_aggregate_parity(self, numpy_and_dask_rasters, method): + np_agg, dk_agg = numpy_and_dask_rasters + np_out = resample(np_agg, scale_factor=0.5, method=method) + dk_out = resample(dk_agg, scale_factor=0.5, method=method) + np.testing.assert_allclose(dk_out.values, np_out.values, + atol=1e-5, equal_nan=True) + + +# --------------------------------------------------------------------------- +# CuPy parity +# --------------------------------------------------------------------------- + +@cuda_and_cupy_available +class TestCuPyParity: + """Verify cupy results match numpy.""" + + @pytest.fixture + def numpy_and_cupy_rasters(self): + data = np.random.RandomState(1152).rand(12, 12).astype(np.float32) + np_agg = create_test_raster(data, backend='numpy', + attrs={'res': (1.0, 1.0)}) + cp_agg = create_test_raster(data, backend='cupy', + attrs={'res': (1.0, 1.0)}) + return np_agg, cp_agg + + @pytest.mark.parametrize('method', ['nearest', 'bilinear', 'cubic']) + @pytest.mark.parametrize('sf', [0.5, 2.0]) + def test_interp_parity(self, numpy_and_cupy_rasters, method, sf): + np_agg, cp_agg = numpy_and_cupy_rasters + np_out = resample(np_agg, scale_factor=sf, method=method) + cp_out = resample(cp_agg, scale_factor=sf, method=method) + np.testing.assert_allclose(cp_out.data.get(), np_out.values, + atol=1e-4, equal_nan=True) + + @pytest.mark.parametrize('method', ['average', 'min', 'max']) + def test_aggregate_parity(self, numpy_and_cupy_rasters, method): + np_agg, cp_agg = numpy_and_cupy_rasters + np_out = resample(np_agg, scale_factor=0.5, method=method) + cp_out = resample(cp_agg, scale_factor=0.5, method=method) + np.testing.assert_allclose(cp_out.data.get(), np_out.values, + atol=1e-5, equal_nan=True) From 2fb354dc1613a0aea81c4b03b2f1896d63298ba0 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 1 Apr 2026 19:04:56 -0700 Subject: [PATCH 4/6] Add resample docs and README entry (#1152) --- README.md | 3 ++- docs/source/reference/index.rst | 1 + docs/source/reference/resample.rst | 14 ++++++++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) create mode 100644 docs/source/reference/resample.rst diff --git a/README.md b/README.md index c04b640f..70ba28ce 100644 --- a/README.md +++ b/README.md @@ -223,10 +223,11 @@ ds.xrs.open_geotiff('large_dem.tif') # read windowed to Dataset **Consistency:** 100% pixel-exact match vs rioxarray on all tested files (Landsat 8, Copernicus DEM, USGS 1-arc-second, USGS 1-meter). ----------- -### **Reproject / Merge** +### **Reproject / Merge / Resample** | Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray | |:----------:|:------------|:------:|:----------------------:|:--------------------:|:-------------------:|:------:| +| [Resample](xrspatial/resample.py) | Changes raster resolution (cell size) without reprojection. Nearest, bilinear, cubic, average, mode, min, max, median methods | Standard (interpolation / block aggregation) | ✅️ | ✅️ | ✅️ | ✅️ | | [Reproject](xrspatial/reproject/__init__.py) | Reprojects a raster to a new CRS with Numba JIT / CUDA coordinate transforms and resampling. Supports vertical datums (EGM96, EGM2008) and horizontal datum shifts (NAD27, OSGB36, etc.) | Standard (inverse mapping) | ✅️ | ✅️ | ✅️ | ✅️ | | [Merge](xrspatial/reproject/__init__.py) | Merges multiple rasters into a single mosaic with configurable overlap strategy | Standard (mosaic) | ✅️ | ✅️ | 🔄 | 🔄 | diff --git a/docs/source/reference/index.rst b/docs/source/reference/index.rst index a268b5cd..521c0150 100644 --- a/docs/source/reference/index.rst +++ b/docs/source/reference/index.rst @@ -22,6 +22,7 @@ Reference multispectral pathfinding proximity + resample surface terrain_metrics utilities diff --git a/docs/source/reference/resample.rst b/docs/source/reference/resample.rst new file mode 100644 index 00000000..d92b3e44 --- /dev/null +++ b/docs/source/reference/resample.rst @@ -0,0 +1,14 @@ +.. _reference.resample: + +******** +Resample +******** + +Change raster resolution (cell size) without changing its CRS. + +Resample +======== +.. autosummary:: + :toctree: _autosummary + + xrspatial.resample.resample From 951b68332a4a74d244ae47679f829ad78800ee03 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 1 Apr 2026 19:59:30 -0700 Subject: [PATCH 5/6] Add resample user guide notebook (#1152) --- examples/user_guide/48_Resample.ipynb | 210 ++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 examples/user_guide/48_Resample.ipynb diff --git a/examples/user_guide/48_Resample.ipynb b/examples/user_guide/48_Resample.ipynb new file mode 100644 index 00000000..67c9e90f --- /dev/null +++ b/examples/user_guide/48_Resample.ipynb @@ -0,0 +1,210 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Raster Resampling\n", + "\n", + "The `resample` function changes a raster's resolution (cell size) without\n", + "changing its CRS. This is the operation you'd reach for when you need to\n", + "match two rasters to a common grid or reduce a raster's memory footprint\n", + "before analysis.\n", + "\n", + "**Methods**:\n", + "\n", + "| Method | Direction | Best for |\n", + "|--------|-----------|----------|\n", + "| `nearest` | up/down | Categorical data, fast preview |\n", + "| `bilinear` | up/down | Smooth continuous surfaces |\n", + "| `cubic` | up/down | High-quality continuous surfaces |\n", + "| `average` | down only | Aggregating high-res to low-res |\n", + "| `min`, `max` | down only | Extremes within each output cell |\n", + "| `median` | down only | Robust centre, ignores outliers |\n", + "| `mode` | down only | Majority class in categorical rasters |" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import xarray as xr\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from xrspatial import resample\n", + "from xrspatial.terrain import generate_terrain" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Generate synthetic terrain" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dem = generate_terrain(width=200, height=200)\n", + "# Assign a regular coordinate grid\n", + "dem = dem.assign_coords(\n", + " y=np.linspace(100, 0, dem.sizes['y']),\n", + " x=np.linspace(0, 100, dem.sizes['x']),\n", + ")\n", + "dem.attrs['res'] = (0.5, 0.5)\n", + "\n", + "fig, ax = plt.subplots(figsize=(6, 5))\n", + "dem.plot(ax=ax, cmap='terrain')\n", + "ax.set_title(f'Original DEM ({dem.shape[0]}x{dem.shape[1]}, res={dem.attrs[\"res\"][0]:.1f}m)')\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Downsample with `scale_factor`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "down = resample(dem, scale_factor=0.25, method='bilinear')\n", + "\n", + "fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n", + "dem.plot(ax=axes[0], cmap='terrain')\n", + "axes[0].set_title(f'Original ({dem.shape[0]}x{dem.shape[1]})')\n", + "down.plot(ax=axes[1], cmap='terrain')\n", + "axes[1].set_title(f'Downsampled 4x ({down.shape[0]}x{down.shape[1]})')\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Upsample with `target_resolution`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "up = resample(down, target_resolution=0.5, method='cubic')\n", + "\n", + "fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n", + "down.plot(ax=axes[0], cmap='terrain')\n", + "axes[0].set_title(f'Coarse ({down.shape[0]}x{down.shape[1]})')\n", + "up.plot(ax=axes[1], cmap='terrain')\n", + "axes[1].set_title(f'Upsampled to 0.5m ({up.shape[0]}x{up.shape[1]})')\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Compare resampling methods" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "methods = ['nearest', 'bilinear', 'cubic', 'average']\n", + "fig, axes = plt.subplots(1, 4, figsize=(16, 4))\n", + "\n", + "for ax, method in zip(axes, methods):\n", + " out = resample(dem, scale_factor=0.1, method=method)\n", + " out.plot(ax=ax, cmap='terrain', add_colorbar=False)\n", + " ax.set_title(method)\n", + " ax.set_aspect('equal')\n", + "\n", + "plt.suptitle('Downsample 10x with different methods', y=1.02)\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Categorical raster with `mode`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from xrspatial import equal_interval\n", + "\n", + "# Classify elevation into 5 zones\n", + "classes = equal_interval(dem, k=5)\n", + "classes.attrs = dem.attrs.copy()\n", + "classes = classes.assign_coords(dem.coords)\n", + "\n", + "# Downsample: mode preserves class boundaries\n", + "classes_down = resample(classes.astype('float32'),\n", + " scale_factor=0.2, method='mode')\n", + "\n", + "fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n", + "classes.plot(ax=axes[0], cmap='Set2')\n", + "axes[0].set_title(f'Classes ({classes.shape[0]}x{classes.shape[1]})')\n", + "classes_down.plot(ax=axes[1], cmap='Set2')\n", + "axes[1].set_title(f'Mode downsample ({classes_down.shape[0]}x{classes_down.shape[1]})')\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Works with Dask" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import dask.array as da\n", + "\n", + "dask_dem = dem.copy()\n", + "dask_dem.data = da.from_array(dem.values, chunks=(100, 100))\n", + "\n", + "result = resample(dask_dem, scale_factor=0.5, method='bilinear')\n", + "print(f'Input: {dask_dem.shape} (dask, chunks={dask_dem.data.chunksize})')\n", + "print(f'Output: {result.shape} (dask, chunks={result.data.chunksize})')\n", + "print(f'Computed shape: {result.compute().shape}')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.14.0" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From d750c6714774566ff0f277e20a78b37c39a918e6 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Mon, 6 Apr 2026 19:31:30 -0700 Subject: [PATCH 6/6] Renumber resample notebook to 50 to avoid conflicts (#1152) Notebooks 48 and 49 were taken by Sieve Filter, Visibility Analysis, and KDE after the rebase. --- examples/user_guide/{48_Resample.ipynb => 50_Resample.ipynb} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename examples/user_guide/{48_Resample.ipynb => 50_Resample.ipynb} (100%) diff --git a/examples/user_guide/48_Resample.ipynb b/examples/user_guide/50_Resample.ipynb similarity index 100% rename from examples/user_guide/48_Resample.ipynb rename to examples/user_guide/50_Resample.ipynb