diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 0e055d74e..afc34d870 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -1,7 +1,7 @@ module,last_inspected,issue,severity_max,categories_found,notes aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." -focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-. hotspots: numpy/cupy gave None, dask paths gave _trim-. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." +focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." geotiff,2026-06-09,3116,HIGH,2;3,"Re-audited 2026-06-09 (agent-ae89ff94a64e3ee8f worktree, branch deep-sweep-metadata-geotiff-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-05-18 audit (unpack rename + GPU/dask+GPU support #3075, pack=True #3065/#3079, masked int->float promotion #2994, bbox= reads, rioxarray param alignment #2963, no-georef VRT coord synthesis #2824, GeoTransform omission #2971). Live probes: unpack attrs (scale_factor/add_offset/mask_and_scale_dtype/nodata/masked_nodata), masked=True promotion, default masked=False, bbox window+transform shift, multi-band band=N, dims/name/coords (incl. coord dtype) all identical across the 4 backends; nodata_pixels_present absent on dask paths is the documented lazy contract, not a bug. pack->unpack round trips verified on numpy/dask/gpu-write; pack of a cupy-backed read raises via the known cupy+xarray xp.astype incompat (see memory cupy_where_astype_incompat; dependency-pin fix, raises loudly, not a metadata bug). VRT reads (full/masked/window/bbox) and no-georef TIFF reads agree across the 4 backends. NEW HIGH finding #3116 (Cat 2+3): to_geotiff(non_georef_da, out.vrt, tile_size=N) wrote a corrupt index for arrays spanning >1 tile -- write_vrt derives placement from each source GeoTransform and non-georef tiles all carry the identity transform, so rasterX/YSize collapsed to one tile and every DstRect landed at the origin; reads silently returned a single tile (24x32 in -> 16x16 out). Gap left by #2966/#2971 (tests only covered one non-georef source). Fix: _write_vrt_tiled threads per-tile pixel offsets through _build_vrt -> write_vrt via internal dst_offsets kwarg; write_vrt refuses >1 all-non-georef sources without explicit placement and rejects dst_offsets alongside georeferenced sources. 18 new tests in tests/vrt/test_non_georef_placement_3116.py incl. 4-backend round trip, dask-backed and plain-ndarray writes, XML DstRect assertions, georef placement regression, and the write_vrt error contract. Full vrt suite 520 passed; write+round-trip suites 1292 passed." mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat." polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." diff --git a/xrspatial/focal.py b/xrspatial/focal.py index 9d24ab00e..fa997c62a 100644 --- a/xrspatial/focal.py +++ b/xrspatial/focal.py @@ -204,7 +204,7 @@ def _mean_dask_numpy(data, excludes, boundary='nan'): out = data.map_overlap(_func, depth=(1, 1), boundary=_boundary_to_dask(boundary), - meta=np.array(())) + meta=np.array((), dtype=data.dtype)) return out @@ -214,7 +214,7 @@ def _mean_dask_cupy(data, excludes, boundary='nan'): out = data.map_overlap(_func, depth=(1, 1), boundary=_boundary_to_dask(boundary, is_cupy=True), - meta=cupy.array(())) + meta=cupy.array((), dtype=data.dtype)) return out @@ -413,14 +413,14 @@ def mean(agg, passes=1, excludes=None, name='mean', boundary='nan'): >>> raster_cupy = xr.DataArray(cupy.asarray(data), name='raster_cupy') >>> mean_cupy = mean(raster_cupy, passes=25) >>> print(type(mean_cupy.data)) - + >>> print(mean_cupy) - array([[0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995], - [0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995], - [0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995], - [0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995], - [0.47928995, 0.47928995, 0.47928995, 0.47928995, 0.47928995]]) + array([[0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994], + [0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994], + [0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994], + [0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994], + [0.47928994, 0.47928994, 0.47928994, 0.47928994, 0.47928994]]) Dimensions without coordinates: dim_0, dim_1 """ @@ -559,7 +559,7 @@ def _apply_dask_numpy(data, kernel, func, boundary='nan'): out = data.map_overlap(_func, depth=(pad_h, pad_w), boundary=_boundary_to_dask(boundary), - meta=np.array(())) + meta=np.array((), dtype=data.dtype)) return out @@ -589,7 +589,7 @@ def _apply_dask_cupy(data, kernel, func, boundary='nan'): out = data.map_overlap(_func, depth=(pad_h, pad_w), boundary=_boundary_to_dask(boundary, is_cupy=True), - meta=cupy.array(())) + meta=cupy.array((), dtype=data.dtype)) return out @@ -1218,7 +1218,7 @@ def _focal_stats_dask_cupy(agg, kernel, stats_funcs, boundary='nan'): data = agg.data.astype(_promote_float(agg.data.dtype)) stats_data = data.map_overlap( _func, depth=(pad_h, pad_w), - boundary=dask_bnd, meta=cupy.array(())) + boundary=dask_bnd, meta=cupy.array((), dtype=data.dtype)) stats_agg = xr.DataArray( stats_data, dims=agg.dims, coords=agg.coords, attrs=agg.attrs) stats_aggs.append(stats_agg) diff --git a/xrspatial/tests/test_focal.py b/xrspatial/tests/test_focal.py index 69e64d8e6..28556bc02 100644 --- a/xrspatial/tests/test_focal.py +++ b/xrspatial/tests/test_focal.py @@ -2040,6 +2040,95 @@ def test_hotspots_name_consistent_across_backends(backend): assert result.name == 'hotspots' +# --- output dtype consistency across backends (issue #3217) ------------- +# +# Regression: mean() hardcoded float32 on the cupy and dask+cupy paths +# while the numpy and dask+numpy paths returned float64, so a float64 +# raster silently lost precision on the GPU (fixed on main via the +# duplicate issue #3214: mean() now follows the _promote_float contract, +# preserving float dtypes and promoting ints to float32). The dask paths +# of mean, apply, and focal_stats also passed an untyped meta to +# map_overlap, so the lazy DataArray advertised float64 while .compute() +# returned the promoted float32 for float32/integer input. + + +def _computed_dtype(result): + data = result.data + if da is not None and isinstance(data, da.Array): + data = data.compute() + return data.dtype + + +@pytest.mark.parametrize("backend", ['numpy', 'dask+numpy', 'cupy', 'dask+cupy']) +@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32]) +def test_mean_dtype_consistent_across_backends_3217(backend, in_dtype): + from xrspatial.tests.general_checks import has_cuda_and_cupy + if 'cupy' in backend and not has_cuda_and_cupy(): + pytest.skip("Requires CUDA and CuPy") + if 'dask' in backend and da is None: + pytest.skip("Requires Dask") + + data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype) + agg = create_test_raster(data, backend=backend, chunks=(2, 2)) + result = mean(agg) + # mean() promotes via _promote_float before dispatch (#3214); every + # backend must keep that dtype, lazily and computed. + expected = np.float64 if in_dtype == np.float64 else np.float32 + assert result.dtype == expected + assert _computed_dtype(result) == expected + + +@cuda_and_cupy_available +def test_mean_gpu_matches_cpu_float64_3217(): + # The old float32 cast on the GPU paths produced ~1e-4 relative error + # against the CPU result. In float64 the two backends agree exactly. + import cupy + data = np.random.default_rng(7).random((16, 16)) + cpu = mean(xr.DataArray(data)) + gpu = mean(xr.DataArray(cupy.asarray(data))) + np.testing.assert_array_equal(cpu.data, gpu.data.get()) + + +@pytest.mark.parametrize("backend", ['dask+numpy', 'dask+cupy']) +@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32]) +def test_apply_dask_advertised_dtype_matches_computed_3217(backend, in_dtype): + from xrspatial.tests.general_checks import has_cuda_and_cupy + if 'cupy' in backend and not has_cuda_and_cupy(): + pytest.skip("Requires CUDA and CuPy") + if da is None: + pytest.skip("Requires Dask") + + data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype) + kernel = custom_kernel(np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])) + agg = create_test_raster(data, backend=backend, chunks=(2, 2)) + if 'cupy' in backend: + from xrspatial.focal import _focal_mean_cuda + result = apply(agg, kernel, func=_focal_mean_cuda) + else: + result = apply(agg, kernel) + expected = np.float64 if in_dtype == np.float64 else np.float32 + assert result.dtype == expected + assert _computed_dtype(result) == expected + + +@pytest.mark.parametrize("backend", ['dask+numpy', 'dask+cupy']) +@pytest.mark.parametrize("in_dtype", [np.float64, np.float32, np.int32]) +def test_focal_stats_dask_advertised_dtype_matches_computed_3217(backend, in_dtype): + from xrspatial.tests.general_checks import has_cuda_and_cupy + if 'cupy' in backend and not has_cuda_and_cupy(): + pytest.skip("Requires CUDA and CuPy") + if da is None: + pytest.skip("Requires Dask") + + data = (np.arange(16).reshape(4, 4) + 0.5).astype(in_dtype) + kernel = custom_kernel(np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])) + agg = create_test_raster(data, backend=backend, chunks=(2, 2)) + result = focal_stats(agg, kernel, stats_funcs=['mean', 'std']) + expected = np.float64 if in_dtype == np.float64 else np.float32 + assert result.dtype == expected + assert _computed_dtype(result) == expected + + # --------------------------------------------------------------------------- # API-consistency regressions (issue #2689) # ---------------------------------------------------------------------------