Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
module,last_inspected,issue,severity_max,categories_found,notes
aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings."
geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings."
polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)."
rasterize,2026-05-27,2504,HIGH,4,"rasterize() drops like.attrs, rebuilds like.coords via linspace (not bit-identical), and never emits _FillValue/nodatavals even when fill is non-NaN. Cat 1 HIGH: chained pipelines like slope(rasterize(gdf, like=elevation)) silently lose crs/res/transform. Cat 2 MEDIUM: linspace round-trip from re-derived bounds breaks xr.align with like. Cat 4 MEDIUM: rasterize(..., fill=-9999, dtype=int32) emits no _FillValue. All 4 backends share the same final return so the fix is one place. Fixed in deep-sweep-metadata-rasterize-2026-05-17-01 (worktree agent-ab7a9aee97c1e4cdf): _extract_grid_from_like now returns coords/attrs; rasterize() reuses like.coords directly when grid matches, copies like.attrs, and emits _FillValue + nodatavals when fill is not NaN. 9 new tests in TestMetadataPropagation cover attrs propagation, bit-identical coord reuse, fill-value emission, isolation from template attrs, and parity across numpy/cupy/dask+numpy/dask+cupy backends. Full test suite (193 passing) clean. | Re-audited 2026-05-21 (agent-a645dc07f847ae8ae worktree, branch deep-sweep-metadata-rasterize-2026-05-21). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified: all 4 backends route through the same final xr.DataArray constructor in rasterize(); crs / spatial_ref non-dim coord / coords / dims agree across backends. NEW HIGH finding #2251 (Cat 1): when rasterize(geoms, like=template, bounds=..., width=..., height=..., resolution=...) overrides the grid relative to like, the inherited attrs['transform'] and attrs['res'] from like are propagated unchanged so they describe the template's grid, not the actual output. get_dataarray_resolution() prefers attrs['res'] over calc_res from coords, so downstream slope/aspect/proximity see the wrong cellsize. Same class as #1407 sky_view_factor bug. Fix in rasterize(): out_attrs.pop('res') / out_attrs.pop('transform') when like_attrs is present but reuse_like_coords is False (output grid != template grid). Preserves crs / nodata triplet / spatial_ref handling. 9 new tests in TestLikeStaleGridAttrs2251 cover bounds override, width/height override, resolution override, matching width/height preserves attrs, get_dataarray_resolution consistency, and parity across all 4 backends. Full rasterize test suite (224 passed, 2 skipped) clean. | Re-audited 2026-05-27 (agent-ae44e871ba3e6bc50 worktree, branch deep-sweep-metadata-rasterize-2026-05-27). 4-backend (numpy/cupy/dask+numpy/dask+cupy) metadata parity reverified end-to-end with explicit cupy and dask+cupy live runs on the CUDA host. attrs / coords / dims / non-dim coords (spatial_ref) all agree across backends; the existing TestMetadataPropagation and TestLikeStaleGridAttrs2251 suites still pass cleanly. NEW HIGH finding #2504 (Cat 4): rasterize(..., dtype=<int>) with the default fill=np.nan silently coerced NaN to a platform-specific sentinel (INT_MIN on x86, 0 on Apple Silicon, 0 for unsigned dtypes) and emitted no _FillValue / nodata / nodatavals attr to mark unwritten pixels. Downstream consumers (geotiff writer, rioxarray masks) had no sentinel to key off and treated unwritten cells as legitimate burns -- a metadata propagation failure equivalent in shape to #1407. Fix in rasterize() before any host/device allocation: detect NaN fill against an integer final_dtype via np.issubdtype + float(fill) + np.isnan and raise ValueError with a pointer to fill=0/fill=-9999 or a floating dtype. Same guard fires on all 4 backends because it runs before backend dispatch. 18 new tests in test_rasterize_nan_int_fill_2504.py cover every signed/unsigned int width, the like=<int dtype> branch, all 4 backends, explicit-vs-default NaN, numpy-typed NaN, and the unaffected float-dtype path. The previous TestIntegerDtypeNanFill test (which had pinned the silent cast as observed behaviour on 2026-05-17) was rewritten to pin the raise. Full rasterize test suite (476 passed, 2 skipped) clean."
Expand Down
4 changes: 2 additions & 2 deletions xrspatial/aspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ def _run_dask_numpy(data: da.Array, boundary: str = 'nan') -> da.Array:
out = data.map_overlap(_func,
depth=(1, 1),
boundary=_boundary_to_dask(boundary),
meta=np.array(()))
meta=np.array((), dtype=np.float32))
return out


Expand All @@ -179,7 +179,7 @@ def _run_dask_cupy(data: da.Array, boundary: str = 'nan') -> da.Array:
out = data.map_overlap(_func,
depth=(1, 1),
boundary=_boundary_to_dask(boundary, is_cupy=True),
meta=cupy.array(()))
meta=cupy.array((), dtype=cupy.float32))
return out


Expand Down
26 changes: 26 additions & 0 deletions xrspatial/tests/test_aspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,29 @@ def test_boundary_invalid():
agg = create_test_raster(data)
with pytest.raises(ValueError, match="boundary must be one of"):
aspect(agg, boundary='invalid')


@dask_array_available
@pytest.mark.parametrize("boundary", ['nan', 'nearest', 'reflect', 'wrap'])
def test_dask_numpy_advertised_dtype_matches_computed(boundary):
# planar dask map_overlap must advertise float32, matching the realized
# data and the numpy/cupy backends (issue #2682).
data = np.random.default_rng(0).random((8, 10)).astype(np.float64) * 100
numpy_agg = create_test_raster(data, backend='numpy')
dask_agg = create_test_raster(data, backend='dask+numpy')
np_result = aspect(numpy_agg, boundary=boundary)
da_result = aspect(dask_agg, boundary=boundary)
assert np_result.dtype == np.float32
assert da_result.dtype == np.float32
assert da_result.data.compute().dtype == np.float32


@dask_array_available
@cuda_and_cupy_available
def test_dask_cupy_advertised_dtype_matches_computed():
import cupy
data = np.random.default_rng(0).random((8, 10)).astype(np.float64) * 100
dask_cupy_agg = create_test_raster(data, backend='dask+cupy')
da_result = aspect(dask_cupy_agg)
assert da_result.dtype == cupy.float32
assert da_result.data.compute().dtype == cupy.float32
Loading