diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 8e41af0ad..d1f3f8538 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -1,4 +1,5 @@ module,last_inspected,issue,severity_max,categories_found,notes +aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings." polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." rasterize,2026-05-27,2504,HIGH,4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean. | Re-audited 2026-05-21 (agent-a645dc07f847ae8ae worktree, branch deep-sweep-metadata-rasterize-2026-05-21). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified: all 4 backends route through the same final xr.DataArray constructor in rasterize(); crs / spatial_ref non-dim coord / coords / dims agree across backends. NEW HIGH finding #2251 (Cat 1): when rasterize(geoms, like=template, bounds=..., width=..., height=..., resolution=...) overrides the grid relative to like, the inherited attrs['transform'] and attrs['res'] from like are propagated unchanged so they describe the template's grid, not the actual output. get_dataarray_resolution() prefers attrs['res'] over calc_res from coords, so downstream slope/aspect/proximity see the wrong cellsize. Same class as #1407 sky_view_factor bug. Fix in rasterize(): out_attrs.pop('res') / out_attrs.pop('transform') when like_attrs is present but reuse_like_coords is False (output grid != template grid). Preserves crs / nodata triplet / spatial_ref handling. 9 new tests in TestLikeStaleGridAttrs2251 cover bounds override, width/height override, resolution override, matching width/height preserves attrs, get_dataarray_resolution consistency, and parity across all 4 backends. Full rasterize test suite (224 passed, 2 skipped) clean. | Re-audited 2026-05-27 (agent-ae44e871ba3e6bc50 worktree, branch deep-sweep-metadata-rasterize-2026-05-27). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified end-to-end with explicit cupy and dask+cupy live runs on the CUDA host. attrs / coords / dims / non-dim coords (spatial_ref) all agree across backends; the existing TestMetadataPropagation and TestLikeStaleGridAttrs2251 suites still pass cleanly. NEW HIGH finding #2504 (Cat 4): rasterize(..., dtype=) with the default fill=np.nan silently coerced NaN to a platform-specific sentinel (INT_MIN on x86, 0 on Apple Silicon, 0 for unsigned dtypes) and emitted no _FillValue / nodata / nodatavals attr to mark unwritten pixels. Downstream consumers (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as legitimate burns -- a metadata propagation failure equivalent in shape to #1407. Fix in rasterize() before any host/device allocation: detect NaN fill against an integer final_dtype via np.issubdtype + float(fill) + np.isnan and raise ValueError with a pointer to fill=0/fill=-9999 or a floating dtype. Same guard fires on all 4 backends because it runs before backend dispatch. 18 new tests in test_rasterize_nan_int_fill_2504.py cover every signed/unsigned int width, the like= branch, all 4 backends, explicit-vs-default NaN, numpy-typed NaN, and the unaffected float-dtype path. The previous TestIntegerDtypeNanFill test (which had pinned the silent cast as observed behaviour on 2026-05-17) was rewritten to pin the raise. Full rasterize test suite (476 passed, 2 skipped) clean." diff --git a/xrspatial/aspect.py b/xrspatial/aspect.py index 55f0943dd..01f7e95b2 100644 --- a/xrspatial/aspect.py +++ b/xrspatial/aspect.py @@ -169,7 +169,7 @@ def _run_dask_numpy(data: da.Array, boundary: str = 'nan') -> da.Array: out = data.map_overlap(_func, depth=(1, 1), boundary=_boundary_to_dask(boundary), - meta=np.array(())) + meta=np.array((), dtype=np.float32)) return out @@ -179,7 +179,7 @@ def _run_dask_cupy(data: da.Array, boundary: str = 'nan') -> da.Array: out = data.map_overlap(_func, depth=(1, 1), boundary=_boundary_to_dask(boundary, is_cupy=True), - meta=cupy.array(())) + meta=cupy.array((), dtype=cupy.float32)) return out diff --git a/xrspatial/tests/test_aspect.py b/xrspatial/tests/test_aspect.py index a7c86cb72..85065c0a0 100644 --- a/xrspatial/tests/test_aspect.py +++ b/xrspatial/tests/test_aspect.py @@ -143,3 +143,29 @@ def test_boundary_invalid(): agg = create_test_raster(data) with pytest.raises(ValueError, match="boundary must be one of"): aspect(agg, boundary='invalid') + + +@dask_array_available +@pytest.mark.parametrize("boundary", ['nan', 'nearest', 'reflect', 'wrap']) +def test_dask_numpy_advertised_dtype_matches_computed(boundary): + # planar dask map_overlap must advertise float32, matching the realized + # data and the numpy/cupy backends (issue #2682). + data = np.random.default_rng(0).random((8, 10)).astype(np.float64) * 100 + numpy_agg = create_test_raster(data, backend='numpy') + dask_agg = create_test_raster(data, backend='dask+numpy') + np_result = aspect(numpy_agg, boundary=boundary) + da_result = aspect(dask_agg, boundary=boundary) + assert np_result.dtype == np.float32 + assert da_result.dtype == np.float32 + assert da_result.data.compute().dtype == np.float32 + + +@dask_array_available +@cuda_and_cupy_available +def test_dask_cupy_advertised_dtype_matches_computed(): + import cupy + data = np.random.default_rng(0).random((8, 10)).astype(np.float64) * 100 + dask_cupy_agg = create_test_raster(data, backend='dask+cupy') + da_result = aspect(dask_cupy_agg) + assert da_result.dtype == cupy.float32 + assert da_result.data.compute().dtype == cupy.float32