From 252146530bea143aaf2e2baa856005f2512d018d Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 10 Jun 2026 05:43:41 -0700 Subject: [PATCH 1/2] Preserve input attrs in mcda.constrain() (#3147) xr.where takes attrs from its first value argument (the scalar fill), so any non-empty exclude list stripped res/crs/nodatavals from the constrained suitability surface. Restore the input's attrs on the output and add regression tests for numpy, dask+numpy, and dask+cupy. Also record the mcda metadata sweep result in the sweep state CSV. --- .claude/sweep-metadata-state.csv | 1 + xrspatial/mcda/constrain.py | 5 ++ xrspatial/tests/test_mcda.py | 84 ++++++++++++++++++++++++++++++++ 3 files changed, 90 insertions(+) diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 3b0954457..1e63982c0 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -3,6 +3,7 @@ aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb w contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-. hotspots: numpy/cupy gave None, dask paths gave _trim-. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings." +mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat." polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." proximity,2026-05-29,2723,MEDIUM,4;5,"Audited 2026-05-29 (agent-a61dbadc2452a2003 worktree, branch deep-sweep-metadata-proximity-2026-05-29). CUDA+cupy available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for proximity/allocation/direction, both bounded (finite max_distance) and unbounded. Cat 1 (attrs res/crs/transform/nodatavals/_FillValue), Cat 2 (coords + coord dtype), and Cat 3 (dims) all preserved and identical across the 4 backends -- public funcs wrap with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #2723 (Cat 4 + Cat 5): (a) bounded dask+numpy path (_process_dask -> da.map_overlap with meta=np.array(())) declared output dtype float64 while the chunk fn returns float32 and numpy/cupy/dask+cupy + the unbounded KDTree path all declare float32; docstrings show dtype=float32. Fix: meta=np.array((), dtype=np.float32). (b) dask backends leaked an internal dask op name (_trim-, _kdtree_chunk_fn-, asarray-) into result.name while numpy/cupy return None. Fix: assign result.name=None after construction in all 3 public funcs (xarray ignores a name=None kwarg for named dask arrays, so the reset must happen post-construction). Same .name-leak class as zonal #2611. PR #2728 off child branch deep-sweep-metadata-proximity-2026-05-29-01. New parametrized regression test test_output_metadata_consistent_across_backends asserts declared dtype float32 + name None across all 4 backends x 3 funcs x bounded/unbounded; full test_proximity.py suite 93 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." rasterize,2026-06-09,3087,MEDIUM,1,GeoDataFrame .crs dropped on no-like path (Cat 1); fixed via #3087 emitting attrs crs/crs_wkt when output has no CRS. like-path attrs/coords/dims/nodata verified live on all 4 backends (CUDA available); Cats 2-5 clean. diff --git a/xrspatial/mcda/constrain.py b/xrspatial/mcda/constrain.py index 59275ffe2..c8cff0db5 100644 --- a/xrspatial/mcda/constrain.py +++ b/xrspatial/mcda/constrain.py @@ -39,5 +39,10 @@ def constrain( for mask in exclude: result = xr.where(mask, fill, result) + # xr.where takes attrs from its first value argument (the scalar + # ``fill``), which strips res/crs/nodatavals from the output. + # Restore the input's attrs so the constrained surface stays + # georeferenced (#3147). + result.attrs = dict(suitability.attrs) result.name = name return result diff --git a/xrspatial/tests/test_mcda.py b/xrspatial/tests/test_mcda.py index af7f84198..ec828946b 100644 --- a/xrspatial/tests/test_mcda.py +++ b/xrspatial/tests/test_mcda.py @@ -1203,6 +1203,90 @@ def test_all_zeros_raises(self): wpm(ds, {"a": 0.5, "b": 0.5}) +class TestConstrainAttrs: + """constrain must keep the input's attrs (#3147). + + xr.where takes attrs from its first value argument (the scalar + fill), which used to strip res/crs/nodatavals whenever exclude + was non-empty. + """ + + ATTRS = {"res": (1.0, 1.0), "crs": "EPSG:32611", "nodatavals": (-9999.0,)} + + def _suit_and_mask(self): + suit = xr.DataArray( + np.array([[0.8, 0.6, 0.9]], dtype=np.float64), + dims=["y", "x"], + coords={"y": [10.5], "x": [100.5, 101.5, 102.5]}, + attrs=dict(self.ATTRS), + name="suit", + ) + mask = xr.DataArray( + np.array([[True, False, False]]), + dims=["y", "x"], + coords={"y": [10.5], "x": [100.5, 101.5, 102.5]}, + ) + return suit, mask + + def test_attrs_kept_with_mask(self): + suit, mask = self._suit_and_mask() + result = constrain(suit, exclude=[mask]) + assert result.attrs == self.ATTRS + assert result.dims == suit.dims + np.testing.assert_array_equal(result["x"].values, suit["x"].values) + + def test_attrs_kept_with_custom_fill(self): + suit, mask = self._suit_and_mask() + result = constrain(suit, exclude=[mask], fill=-1.0) + assert result.attrs == self.ATTRS + + def test_attrs_kept_with_empty_exclude(self): + suit, _ = self._suit_and_mask() + result = constrain(suit, exclude=[]) + assert result.attrs == self.ATTRS + + def test_attrs_not_shared_with_input(self): + suit, mask = self._suit_and_mask() + result = constrain(suit, exclude=[mask]) + result.attrs["crs"] = "EPSG:4326" + assert suit.attrs["crs"] == "EPSG:32611" + + @pytest.mark.skipif(not HAS_DASK, reason="Requires dask") + def test_attrs_kept_dask(self): + suit, mask = self._suit_and_mask() + result = constrain( + suit.chunk({"x": 2}), exclude=[mask.chunk({"x": 2})], + ) + assert result.attrs == self.ATTRS + assert hasattr(result.data, "compute") + np.testing.assert_allclose( + result.compute().values, + constrain(suit, exclude=[mask]).values, + equal_nan=True, + ) + + def test_attrs_kept_dask_cupy(self): + from xrspatial.utils import has_cuda_and_cupy + if not (HAS_DASK and has_cuda_and_cupy()): + pytest.skip("Requires dask, CUDA and CuPy") + import cupy + + suit, mask = self._suit_and_mask() + suit_gpu = suit.copy() + suit_gpu.data = cupy.asarray(suit_gpu.data) + mask_gpu = mask.copy() + mask_gpu.data = cupy.asarray(mask_gpu.data) + result = constrain( + suit_gpu.chunk({"x": 2}), exclude=[mask_gpu.chunk({"x": 2})], + ) + # Attrs must survive on the lazy result. Computing values here + # trips an unrelated cupy/xarray incompatibility inside + # xr.where (numpy scalar broadcast against cupy chunks), so the + # value-parity check stays on the numpy/dask+numpy tests above. + assert result.attrs == self.ATTRS + assert result.dims == suit.dims + + class TestConstrainEdgeCases: def test_empty_exclude_list(self): """Empty exclude list should return input unchanged.""" From 5cb58dc106e816e5e8d95b97bac30c04ae8d9bbf Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 10 Jun 2026 05:45:25 -0700 Subject: [PATCH 2/2] Document attrs preservation in constrain docstring (#3147) --- xrspatial/mcda/constrain.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xrspatial/mcda/constrain.py b/xrspatial/mcda/constrain.py index c8cff0db5..b77768a1c 100644 --- a/xrspatial/mcda/constrain.py +++ b/xrspatial/mcda/constrain.py @@ -30,7 +30,8 @@ def constrain( Returns ------- xr.DataArray - Constrained suitability surface. + Constrained suitability surface. Keeps the input's dims, + coords, and attrs (``res``, ``crs``, ``nodatavals``, ...). """ if name is None: name = suitability.name