Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb w
contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)."
focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-<hash>, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-<hash>. hotspots: numpy/cupy gave None, dask paths gave _trim-<hash>. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings."
mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat."
polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)."
proximity,2026-05-29,2723,MEDIUM,4;5,"Audited 2026-05-29 (agent-a61dbadc2452a2003 worktree, branch deep-sweep-metadata-proximity-2026-05-29). CUDA+cupy available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for proximity/allocation/direction, both bounded (finite max_distance) and unbounded. Cat 1 (attrs res/crs/transform/nodatavals/_FillValue), Cat 2 (coords + coord dtype), and Cat 3 (dims) all preserved and identical across the 4 backends -- public funcs wrap with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #2723 (Cat 4 + Cat 5): (a) bounded dask+numpy path (_process_dask -> da.map_overlap with meta=np.array(())) declared output dtype float64 while the chunk fn returns float32 and numpy/cupy/dask+cupy + the unbounded KDTree path all declare float32; docstrings show dtype=float32. Fix: meta=np.array((), dtype=np.float32). (b) dask backends leaked an internal dask op name (_trim-<hash>, _kdtree_chunk_fn-<hash>, asarray-<hash>) into result.name while numpy/cupy return None. Fix: assign result.name=None after construction in all 3 public funcs (xarray ignores a name=None kwarg for named dask arrays, so the reset must happen post-construction). Same .name-leak class as zonal #2611. PR #2728 off child branch deep-sweep-metadata-proximity-2026-05-29-01. New parametrized regression test test_output_metadata_consistent_across_backends asserts declared dtype float32 + name None across all 4 backends x 3 funcs x bounded/unbounded; full test_proximity.py suite 93 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
rasterize,2026-06-09,3087,MEDIUM,1,GeoDataFrame .crs dropped on no-like path (Cat 1); fixed via #3087 emitting attrs crs/crs_wkt when output has no CRS. like-path attrs/coords/dims/nodata verified live on all 4 backends (CUDA available); Cats 2-5 clean.
Expand Down
8 changes: 7 additions & 1 deletion xrspatial/mcda/constrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ def constrain(
Returns
-------
xr.DataArray
Constrained suitability surface.
Constrained suitability surface. Keeps the input's dims,
coords, and attrs (``res``, ``crs``, ``nodatavals``, ...).
"""
if name is None:
name = suitability.name
Expand All @@ -39,5 +40,10 @@ def constrain(
for mask in exclude:
result = xr.where(mask, fill, result)

# xr.where takes attrs from its first value argument (the scalar
# ``fill``), which strips res/crs/nodatavals from the output.
# Restore the input's attrs so the constrained surface stays
# georeferenced (#3147).
result.attrs = dict(suitability.attrs)
result.name = name
return result
84 changes: 84 additions & 0 deletions xrspatial/tests/test_mcda.py
Original file line number Diff line number Diff line change
Expand Up @@ -1203,6 +1203,90 @@ def test_all_zeros_raises(self):
wpm(ds, {"a": 0.5, "b": 0.5})


class TestConstrainAttrs:
"""constrain must keep the input's attrs (#3147).

xr.where takes attrs from its first value argument (the scalar
fill), which used to strip res/crs/nodatavals whenever exclude
was non-empty.
"""

ATTRS = {"res": (1.0, 1.0), "crs": "EPSG:32611", "nodatavals": (-9999.0,)}

def _suit_and_mask(self):
suit = xr.DataArray(
np.array([[0.8, 0.6, 0.9]], dtype=np.float64),
dims=["y", "x"],
coords={"y": [10.5], "x": [100.5, 101.5, 102.5]},
attrs=dict(self.ATTRS),
name="suit",
)
mask = xr.DataArray(
np.array([[True, False, False]]),
dims=["y", "x"],
coords={"y": [10.5], "x": [100.5, 101.5, 102.5]},
)
return suit, mask

def test_attrs_kept_with_mask(self):
suit, mask = self._suit_and_mask()
result = constrain(suit, exclude=[mask])
assert result.attrs == self.ATTRS
assert result.dims == suit.dims
np.testing.assert_array_equal(result["x"].values, suit["x"].values)

def test_attrs_kept_with_custom_fill(self):
suit, mask = self._suit_and_mask()
result = constrain(suit, exclude=[mask], fill=-1.0)
assert result.attrs == self.ATTRS

def test_attrs_kept_with_empty_exclude(self):
suit, _ = self._suit_and_mask()
result = constrain(suit, exclude=[])
assert result.attrs == self.ATTRS

def test_attrs_not_shared_with_input(self):
suit, mask = self._suit_and_mask()
result = constrain(suit, exclude=[mask])
result.attrs["crs"] = "EPSG:4326"
assert suit.attrs["crs"] == "EPSG:32611"

@pytest.mark.skipif(not HAS_DASK, reason="Requires dask")
def test_attrs_kept_dask(self):
suit, mask = self._suit_and_mask()
result = constrain(
suit.chunk({"x": 2}), exclude=[mask.chunk({"x": 2})],
)
assert result.attrs == self.ATTRS
assert hasattr(result.data, "compute")
np.testing.assert_allclose(
result.compute().values,
constrain(suit, exclude=[mask]).values,
equal_nan=True,
)

def test_attrs_kept_dask_cupy(self):
from xrspatial.utils import has_cuda_and_cupy
if not (HAS_DASK and has_cuda_and_cupy()):
pytest.skip("Requires dask, CUDA and CuPy")
import cupy

suit, mask = self._suit_and_mask()
suit_gpu = suit.copy()
suit_gpu.data = cupy.asarray(suit_gpu.data)
mask_gpu = mask.copy()
mask_gpu.data = cupy.asarray(mask_gpu.data)
result = constrain(
suit_gpu.chunk({"x": 2}), exclude=[mask_gpu.chunk({"x": 2})],
)
# Attrs must survive on the lazy result. Computing values here
# trips an unrelated cupy/xarray incompatibility inside
# xr.where (numpy scalar broadcast against cupy chunks), so the
# value-parity check stays on the numpy/dask+numpy tests above.
assert result.attrs == self.ATTRS
assert result.dims == suit.dims


class TestConstrainEdgeCases:
def test_empty_exclude_list(self):
"""Empty exclude list should return input unchanged."""
Expand Down
Loading