Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
module,last_inspected,issue,severity_max,categories_found,notes
aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings."
contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)."
focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-<hash>, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-<hash>. hotspots: numpy/cupy gave None, dask paths gave _trim-<hash>. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
geotiff,2026-06-09,3116,HIGH,2;3,"Re-audited 2026-06-09 (agent-ae89ff94a64e3ee8f worktree, branch deep-sweep-metadata-geotiff-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-05-18 audit (unpack rename + GPU/dask+GPU support #3075, pack=True #3065/#3079, masked int->float promotion #2994, bbox= reads, rioxarray param alignment #2963, no-georef VRT coord synthesis #2824, GeoTransform omission #2971). Live probes: unpack attrs (scale_factor/add_offset/mask_and_scale_dtype/nodata/masked_nodata), masked=True promotion, default masked=False, bbox window+transform shift, multi-band band=N, dims/name/coords (incl. coord dtype) all identical across the 4 backends; nodata_pixels_present absent on dask paths is the documented lazy contract, not a bug. pack->unpack round trips verified on numpy/dask/gpu-write; pack of a cupy-backed read raises via the known cupy+xarray xp.astype incompat (see memory cupy_where_astype_incompat; dependency-pin fix, raises loudly, not a metadata bug). VRT reads (full/masked/window/bbox) and no-georef TIFF reads agree across the 4 backends. NEW HIGH finding #3116 (Cat 2+3): to_geotiff(non_georef_da, out.vrt, tile_size=N) wrote a corrupt index for arrays spanning >1 tile -- write_vrt derives placement from each source GeoTransform and non-georef tiles all carry the identity transform, so rasterX/YSize collapsed to one tile and every DstRect landed at the origin; reads silently returned a single tile (24x32 in -> 16x16 out). Gap left by #2966/#2971 (tests only covered one non-georef source). Fix: _write_vrt_tiled threads per-tile pixel offsets through _build_vrt -> write_vrt via internal dst_offsets kwarg; write_vrt refuses >1 all-non-georef sources without explicit placement and rejects dst_offsets alongside georeferenced sources. 18 new tests in tests/vrt/test_non_georef_placement_3116.py incl. 4-backend round trip, dask-backed and plain-ndarray writes, XML DstRect assertions, georef placement regression, and the write_vrt error contract. Full vrt suite 520 passed; write+round-trip suites 1292 passed."
mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat."
polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)."
Expand Down
22 changes: 11 additions & 11 deletions xrspatial/focal.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ def _mean_dask_numpy(data, excludes, boundary='nan'):
out = data.map_overlap(_func,
depth=(1, 1),
boundary=_boundary_to_dask(boundary),
meta=np.array(()))
meta=np.array((), dtype=data.dtype))
return out


Expand All @@ -214,7 +214,7 @@ def _mean_dask_cupy(data, excludes, boundary='nan'):
out = data.map_overlap(_func,
depth=(1, 1),
boundary=_boundary_to_dask(boundary, is_cupy=True),
meta=cupy.array(()))
meta=cupy.array((), dtype=data.dtype))
return out


Expand Down Expand Up @@ -413,14 +413,14 @@ def mean(agg, passes=1, excludes=None, name='mean', boundary='nan'):
>>> raster_cupy = xr.DataArray(cupy.asarray(data), name='raster_cupy')
>>> mean_cupy = mean(raster_cupy, passes=25)
>>> print(type(mean_cupy.data))
<class 'cupy.core.core.ndarray'>
<class 'cupy.ndarray'>
>>> print(mean_cupy)
<xarray.DataArray 'mean' (dim_0: 5, dim_1: 5)>
array([[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995],
[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995],
[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995],
[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995],
[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995]])
array([[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994],
[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994],
[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994],
[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994],
[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994]])
Dimensions without coordinates: dim_0, dim_1
"""

Expand Down Expand Up @@ -559,7 +559,7 @@ def _apply_dask_numpy(data, kernel, func, boundary='nan'):
out = data.map_overlap(_func,
depth=(pad_h, pad_w),
boundary=_boundary_to_dask(boundary),
meta=np.array(()))
meta=np.array((), dtype=data.dtype))
return out


Expand Down Expand Up @@ -589,7 +589,7 @@ def _apply_dask_cupy(data, kernel, func, boundary='nan'):
out = data.map_overlap(_func,
depth=(pad_h, pad_w),
boundary=_boundary_to_dask(boundary, is_cupy=True),
meta=cupy.array(()))
meta=cupy.array((), dtype=data.dtype))
return out


Expand Down Expand Up @@ -1218,7 +1218,7 @@ def _focal_stats_dask_cupy(agg, kernel, stats_funcs, boundary='nan'):
data = agg.data.astype(_promote_float(agg.data.dtype))
stats_data = data.map_overlap(
_func, depth=(pad_h, pad_w),
boundary=dask_bnd, meta=cupy.array(()))
boundary=dask_bnd, meta=cupy.array((), dtype=data.dtype))
stats_agg = xr.DataArray(
stats_data, dims=agg.dims, coords=agg.coords, attrs=agg.attrs)
stats_aggs.append(stats_agg)
Expand Down
89 changes: 89 additions & 0 deletions xrspatial/tests/test_focal.py
Original file line number Diff line number Diff line change
Expand Up @@ -2040,6 +2040,95 @@ def test_hotspots_name_consistent_across_backends(backend):
assert result.name == 'hotspots'


# --- output dtype consistency across backends (issue #3217) -------------
#
# Regression: mean() hardcoded float32 on the cupy and dask+cupy paths
# while the numpy and dask+numpy paths returned float64, so a float64
# raster silently lost precision on the GPU (fixed on main via the
# duplicate issue #3214: mean() now follows the _promote_float contract,
# preserving float dtypes and promoting ints to float32). The dask paths
# of mean, apply, and focal_stats also passed an untyped meta to
# map_overlap, so the lazy DataArray advertised float64 while .compute()
# returned the promoted float32 for float32/integer input.


def _computed_dtype(result):
data = result.data
if da is not None and isinstance(data, da.Array):
data = data.compute()
return data.dtype


@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy'])
@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32])
def test_mean_dtype_consistent_across_backends_3217(backend, in_dtype):
from xrspatial.tests.general_checks import has_cuda_and_cupy
if 'cupy' in backend and not has_cuda_and_cupy():
pytest.skip("Requires CUDA and CuPy")
if 'dask' in backend and da is None:
pytest.skip("Requires Dask")

data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype)
agg = create_test_raster(data, backend=backend, chunks=(2, 2))
result = mean(agg)
# mean() promotes via _promote_float before dispatch (#3214); every
# backend must keep that dtype, lazily and computed.
expected = np.float64 if in_dtype == np.float64 else np.float32
assert result.dtype == expected
assert _computed_dtype(result) == expected


@cuda_and_cupy_available
def test_mean_gpu_matches_cpu_float64_3217():
# The old float32 cast on the GPU paths produced ~1e-4 relative error
# against the CPU result. In float64 the two backends agree exactly.
import cupy
data = np.random.default_rng(7).random((16, 16))
cpu = mean(xr.DataArray(data))
gpu = mean(xr.DataArray(cupy.asarray(data)))
np.testing.assert_array_equal(cpu.data, gpu.data.get())


@pytest.mark.parametrize("backend", ['dask+numpy', 'dask+cupy'])
@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32])
def test_apply_dask_advertised_dtype_matches_computed_3217(backend, in_dtype):
from xrspatial.tests.general_checks import has_cuda_and_cupy
if 'cupy' in backend and not has_cuda_and_cupy():
pytest.skip("Requires CUDA and CuPy")
if da is None:
pytest.skip("Requires Dask")

data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype)
kernel = custom_kernel(np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]]))
agg = create_test_raster(data, backend=backend, chunks=(2, 2))
if 'cupy' in backend:
from xrspatial.focal import _focal_mean_cuda
result = apply(agg, kernel, func=_focal_mean_cuda)
else:
result = apply(agg, kernel)
expected = np.float64 if in_dtype == np.float64 else np.float32
assert result.dtype == expected
assert _computed_dtype(result) == expected


@pytest.mark.parametrize("backend", ['dask+numpy', 'dask+cupy'])
@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32])
def test_focal_stats_dask_advertised_dtype_matches_computed_3217(backend, in_dtype):
from xrspatial.tests.general_checks import has_cuda_and_cupy
if 'cupy' in backend and not has_cuda_and_cupy():
pytest.skip("Requires CUDA and CuPy")
if da is None:
pytest.skip("Requires Dask")

data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype)
kernel = custom_kernel(np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]]))
agg = create_test_raster(data, backend=backend, chunks=(2, 2))
result = focal_stats(agg, kernel, stats_funcs=['mean', 'std'])
expected = np.float64 if in_dtype == np.float64 else np.float32
assert result.dtype == expected
assert _computed_dtype(result) == expected


# ---------------------------------------------------------------------------
# API-consistency regressions (issue #2689)
# ---------------------------------------------------------------------------
Expand Down
Loading