diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 20a8deba3..0e055d74e 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -1,13 +1,13 @@ -module,last_inspected,issue,severity_max,categories_found,notes -aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." -contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." -focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-. hotspots: numpy/cupy gave None, dask paths gave _trim-. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." -geotiff,2026-05-18,1909,HIGH,4;5,"Re-audit 2026-05-15 (agent-a55b69cec1ef2a092 worktree, branch deep-sweep-metadata-geotiff-2026-05-15). 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity reverified after the #1813 modular refactor: full reads, windowed reads, multi-band, band=N selection, no-georef integer pixel coords, crs/crs_wkt/transform/nodata/x_resolution/y_resolution/resolution_unit/image_description/gdal_metadata all agree across backends. DataArray .name and dims agree (y, x for 2D; y, x, band for 3D). NEW HIGH finding #1909: GDS chunked GPU path (_read_geotiff_gpu_chunked_gds) declared the dask graph dtype as float64 when source had an in-range integer nodata sentinel, matching the CPU dask path's #1597 contract, but the per-chunk _chunk_task did not cast its returned cupy array to declared_dtype -- chunks with no sentinel hit returned the raw uint16/int16 source dtype, producing a silent declared/actual dtype mismatch. Fix mirrors the #1597 + #1624 CPU dask pattern: compute declared_dtype before defining _chunk_task, cast inside the task only when arr.dtype != declared_dtype to skip the no-op astype(copy=True). 6 regression tests added in test_chunked_gpu_declared_dtype_1909.py covering declared vs computed parity, CPU/GPU dask declared-dtype agreement, eager paths preserve source dtype, no-nodata round-trip, explicit dtype= kwarg, and sentinel-hit float64 promotion. Pre-existing test failures in test_predictor2_big_endian_gpu_1517.py and test_size_param_validation_gpu_vrt_1776.py exist on main (read_to_array AttributeError after #1813 refactor, tile_size=4 rejected by stricter _validate_tile_size_arg) and are unrelated to this audit. | Re-audited 2026-05-18 (agent-a59a61958f181c31a worktree, branch deep-sweep-metadata-geotiff-2026-05-18). 4-backend (numpy / cupy / dask+numpy / dask+cupy) metadata parity reverified end-to-end: open_geotiff over a tiled uint16 fixture with crs + transform + GDAL_NODATA sentinel emits identical attrs across all 4 backends (crs=32633, crs_wkt, transform 6-tuple, nodata=5, masked_nodata=True, _xrspatial_geotiff_contract=2, extra_tags, image_description, resolution_unit, x_resolution, y_resolution). Multi-band 3D (y, x, band) with band coord, no-georef int64 pixel coords, windowed reads with transform origin shift, and mask_nodata=False keeping integer dtype all agree across the 4 backends. Write round-trip via to_geotiff (numpy, cupy, dask streaming) re-emits crs / transform / nodata / masked_nodata / contract version with byte-stable transform. Band-first (band, y, x) input correctly remaps to (y, x, band) on disk. _populate_attrs_from_geo_info, _set_nodata_attrs, and _extract_rich_tags centralise attrs emission across all read paths (_init_, _backends/dask, _backends/gpu, _backends/vrt) and write paths (_writers/eager, _writers/gpu, _writers/vrt). _ATTRS_CONTRACT_VERSION=2 is stamped on every path including the chunked GPU GDS and chunked VRT inline-attrs branches. No new CRITICAL/HIGH/MEDIUM/LOW findings." -mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat." -polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." -proximity,2026-05-29,2723,MEDIUM,4;5,"Audited 2026-05-29 (agent-a61dbadc2452a2003 worktree, branch deep-sweep-metadata-proximity-2026-05-29). CUDA+cupy available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for proximity/allocation/direction, both bounded (finite max_distance) and unbounded. Cat 1 (attrs res/crs/transform/nodatavals/_FillValue), Cat 2 (coords + coord dtype), and Cat 3 (dims) all preserved and identical across the 4 backends -- public funcs wrap with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #2723 (Cat 4 + Cat 5): (a) bounded dask+numpy path (_process_dask -> da.map_overlap with meta=np.array(())) declared output dtype float64 while the chunk fn returns float32 and numpy/cupy/dask+cupy + the unbounded KDTree path all declare float32; docstrings show dtype=float32. Fix: meta=np.array((), dtype=np.float32). (b) dask backends leaked an internal dask op name (_trim-, _kdtree_chunk_fn-, asarray-) into result.name while numpy/cupy return None. Fix: assign result.name=None after construction in all 3 public funcs (xarray ignores a name=None kwarg for named dask arrays, so the reset must happen post-construction). Same .name-leak class as zonal #2611. PR #2728 off child branch deep-sweep-metadata-proximity-2026-05-29-01. New parametrized regression test test_output_metadata_consistent_across_backends asserts declared dtype float32 + name None across all 4 backends x 3 funcs x bounded/unbounded; full test_proximity.py suite 93 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." -rasterize,2026-06-09,3087,MEDIUM,1,GeoDataFrame .crs dropped on no-like path (Cat 1); fixed via #3087 emitting attrs crs/crs_wkt when output has no CRS. like-path attrs/coords/dims/nodata verified live on all 4 backends (CUDA available); Cats 2-5 clean. -reproject,2026-06-09,3093,MEDIUM,4;5,"Audited 2026-06-09 (agent-a2f2f5befa9759e9e worktree, branch deep-sweep-metadata-reproject-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for reproject() and merge(). Cat 1 attrs (crs/nodata/res/transform/_FillValue/nodatavals/units refreshed or carried), Cat 2 coords (pixel-center, non-spatial + band coord carry, float64), Cat 3 dims (lat/lon names, band-first round-trip via #2182), and dtype round-trip identical across the 4 backends; vertical_crs/vertical_datum EPSG convention verified; merge attrs from first raster with crs/nodata/res/transform re-emit verified on numpy + dask. NEW MEDIUM finding #3093 (Cat 4 + Cat 5): _reproject_streaming (the reproject() fallback when dask is absent and the in-memory source exceeds 512 MB; dask is an optional dep so reachable on plain pip installs) allocated its assembled output as np.full(out_shape, nodata, dtype=np.float64) in both the local ThreadPoolExecutor and dask.bag distributed branches, so integer sources returned float64 while numpy/cupy/dask+numpy/dask+cupy all round-trip integer dtypes (#2505); it also allocated 2-D so 3-D (y,x,band) sources crashed with a broadcast ValueError. The helper had zero test coverage. Fix in PR #3111 (branch deep-sweep-metadata-reproject-2026-06-09-01): same integer-round-trip dtype rule as _reproject_dask and a (*out_shape, n_bands) allocation for 3-D, in both branches; new TestStreamingDtypeParity (6 tests incl. a LocalCluster run of the distributed branch and value parity vs the in-memory path); full reproject suite 450 passed. LOW (documented, not fixed): geoid_height() docstring says 'Returns N : same type as input' with xr.DataArray listed as accepted, but DataArray inputs return a plain ndarray (coords/dims/attrs dropped) via out.reshape(np.shape(lon)) in _vertical.py." -resample,2026-05-27,2542,MEDIUM,2;4;5,"Audited 2026-05-27 (agent-a8135a6a246ecb93c worktree, branch deep-sweep-metadata-resample-2026-05-27). Cat 2 MEDIUM + Cat 4 MEDIUM + Cat 5 MEDIUM all rolled into issue #2542. (a) 2D non-identity path dropped scalar non-dim coords like rioxarrays spatial_ref and squeezed time/band selectors; identity path (scale==1.0, agg.copy()) and 3D path (per-band xr.concat) preserved them, so the bug was path-inconsistent (Cat 5). (b) _resolve_nodata reads attrs[nodata] as a fallback sentinel but the output post-processing only refreshed _FillValue and nodatavals, leaving attrs[nodata]=-9999 alongside data that was now NaN. Fix in resample(): refresh attrs[nodata] to NaN whenever the input had it, and carry across zero-dim non-dim coords on the 2D non-identity path. 7 new tests in TestMetadataPropagation cover nodata-attr refresh, spatial_ref/scalar coord carry, identity-vs-downsample coord parity, and the explicit choice to drop spatially-shaped extra coords. 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity verified for spatial_ref carry; nodata-attr refresh verified on numpy/cupy/dask+numpy (dask+cupy non-NaN nodata masking hits a pre-existing xarray xr.where + cupy.astype quirk unrelated to this audit). Full resample test suite (175 passed) clean." -viewshed,2026-05-29,2743,MEDIUM,4;5,output .name differed across backends (None/viewshed/dask-token) and dtype float32 on GPU vs float64 on CPU; added name= param and forced float64 on all backends; attrs/coords/dims already preserved -zonal,2026-05-29,2611,MEDIUM,5,"Audited 2026-05-29 (agent-ae8d8b65cc3a5c40a worktree, branch deep-sweep-metadata-zonal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 5 DataArray-returning functions checked end-to-end: apply, regions, hypsometric_integral, trim, crop. attrs (res/crs/transform/nodatavals), dims, and coords preserved correctly on all 4 backends for every function; trim/crop slice coords with no half-pixel drift. stats() and crosstab() return DataFrames by design so Cat 1-3 DataArray checks N/A. NEW MEDIUM finding #2611 (Cat 5): apply() never set output .name, so numpy/cupy returned None while dask+numpy/dask+cupy inherited a non-deterministic internal dask task name (e.g. _chunk_fn-). regions/hypsometric_integral/trim/crop all set deterministic names; apply was the outlier. Fix in PR #2611/#2622: add name param (default None) and assign result.name after DataArray construction (setting name= at construction does not override the dask graph name). New parametrized test test_apply_name_consistent_across_backends covers default-None and explicit-name on all 4 backends. Full zonal suite 213 passed. No other CRITICAL/HIGH/MEDIUM findings; no LOW findings to document." +module,last_inspected,issue,severity_max,categories_found,notes +aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." +contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." +focal,2026-05-29,2733,MEDIUM,5,"Audited 2026-05-29 (agent-a3ec617d177775ea8 worktree, branch deep-sweep-metadata-focal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 4 public functions checked end-to-end: mean, apply, focal_stats, hotspots. attrs (res/crs/nodatavals), coords (x/y + stats), and dims preserved consistently across all 4 backends for every function; focal_stats correctly adds the documented stats dim; hotspots adds unit=% via deepcopy without clobbering input attrs. Cat 1-4 clean. NEW MEDIUM finding #2733 (Cat 5): focal_stats and hotspots returned a .name that differed across backends -- the dask paths built the output DataArray without an explicit name= so xarray adopted the dask array internal graph token (_trim-, non-deterministic per call) as the public .name. focal_stats: numpy/dask+numpy gave focal_apply, cupy gave None, dask+cupy gave _trim-. hotspots: numpy/cupy gave None, dask paths gave _trim-. Same class as zonal #2611. Fix: focal_stats sets result.name=focal_apply (matching the established numpy contract) after construction; hotspots passes name=hotspots. Setting name= at the dask DataArray constructor does not override the graph name, so focal_stats assigns result.name post-construction. 2 new parametrized tests (test_focal_stats_name_consistent_across_backends, test_hotspots_name_consistent_across_backends) cover all 4 backends each. Full focal suite 122 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." +geotiff,2026-06-09,3116,HIGH,2;3,"Re-audited 2026-06-09 (agent-ae89ff94a64e3ee8f worktree, branch deep-sweep-metadata-geotiff-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-05-18 audit (unpack rename + GPU/dask+GPU support #3075, pack=True #3065/#3079, masked int->float promotion #2994, bbox= reads, rioxarray param alignment #2963, no-georef VRT coord synthesis #2824, GeoTransform omission #2971). Live probes: unpack attrs (scale_factor/add_offset/mask_and_scale_dtype/nodata/masked_nodata), masked=True promotion, default masked=False, bbox window+transform shift, multi-band band=N, dims/name/coords (incl. coord dtype) all identical across the 4 backends; nodata_pixels_present absent on dask paths is the documented lazy contract, not a bug. pack->unpack round trips verified on numpy/dask/gpu-write; pack of a cupy-backed read raises via the known cupy+xarray xp.astype incompat (see memory cupy_where_astype_incompat; dependency-pin fix, raises loudly, not a metadata bug). VRT reads (full/masked/window/bbox) and no-georef TIFF reads agree across the 4 backends. NEW HIGH finding #3116 (Cat 2+3): to_geotiff(non_georef_da, out.vrt, tile_size=N) wrote a corrupt index for arrays spanning >1 tile -- write_vrt derives placement from each source GeoTransform and non-georef tiles all carry the identity transform, so rasterX/YSize collapsed to one tile and every DstRect landed at the origin; reads silently returned a single tile (24x32 in -> 16x16 out). Gap left by #2966/#2971 (tests only covered one non-georef source). Fix: _write_vrt_tiled threads per-tile pixel offsets through _build_vrt -> write_vrt via internal dst_offsets kwarg; write_vrt refuses >1 all-non-georef sources without explicit placement and rejects dst_offsets alongside georeferenced sources. 18 new tests in tests/vrt/test_non_georef_placement_3116.py incl. 4-backend round trip, dask-backed and plain-ndarray writes, XML DstRect assertions, georef placement regression, and the write_vrt error contract. Full vrt suite 520 passed; write+round-trip suites 1292 passed." +mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat." +polygonize,2026-05-19,2149,MEDIUM,1,"Audited 2026-05-19 (agent-ad1070530d37a4fdf worktree, branch deep-sweep-metadata-polygonize-2026-05-19). Output is vector (column, polygon_points / GeoDataFrame / GeoJSON dict / awkward) so Cat 2/3 do not apply in the DataArray sense. Cat 1 MEDIUM finding #2149: GeoDataFrame output drops raster.attrs['crs'] (and crs_wkt and rioxarray rio.crs); GeoDataFrame.crs is always None even when input is georeferenced. Fix: new _detect_raster_crs helper + crs= kwarg threaded into _to_geopandas; df.set_crs is called when a CRS is detected. spatialpandas has no CRS slot and GeoJSON RFC 7946 is WGS84-only, so propagation lives only on the geopandas path. CRS propagation runs at the public API level so all 4 backends (numpy / cupy / dask+numpy / dask+cupy) propagate consistently -- verified end-to-end with EPSG:4326 attrs across all 4 backends. 8 new tests in TestPolygonizeCRSPropagation cover EPSG string/int, crs_wkt, no CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction. Cat 2 LOW (not fixed): output coords are pixel-space when input has georeferenced x/y or attrs['transform']; user must pass transform= explicitly. Documented behavior, leave as-is. Cat 4 LOW (not fixed): nodatavals from input attrs is not auto-applied as a mask; documented behavior (explicit mask= kwarg)." +proximity,2026-05-29,2723,MEDIUM,4;5,"Audited 2026-05-29 (agent-a61dbadc2452a2003 worktree, branch deep-sweep-metadata-proximity-2026-05-29). CUDA+cupy available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for proximity/allocation/direction, both bounded (finite max_distance) and unbounded. Cat 1 (attrs res/crs/transform/nodatavals/_FillValue), Cat 2 (coords + coord dtype), and Cat 3 (dims) all preserved and identical across the 4 backends -- public funcs wrap with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #2723 (Cat 4 + Cat 5): (a) bounded dask+numpy path (_process_dask -> da.map_overlap with meta=np.array(())) declared output dtype float64 while the chunk fn returns float32 and numpy/cupy/dask+cupy + the unbounded KDTree path all declare float32; docstrings show dtype=float32. Fix: meta=np.array((), dtype=np.float32). (b) dask backends leaked an internal dask op name (_trim-, _kdtree_chunk_fn-, asarray-) into result.name while numpy/cupy return None. Fix: assign result.name=None after construction in all 3 public funcs (xarray ignores a name=None kwarg for named dask arrays, so the reset must happen post-construction). Same .name-leak class as zonal #2611. PR #2728 off child branch deep-sweep-metadata-proximity-2026-05-29-01. New parametrized regression test test_output_metadata_consistent_across_backends asserts declared dtype float32 + name None across all 4 backends x 3 funcs x bounded/unbounded; full test_proximity.py suite 93 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings." +rasterize,2026-06-09,3087,MEDIUM,1,GeoDataFrame .crs dropped on no-like path (Cat 1); fixed via #3087 emitting attrs crs/crs_wkt when output has no CRS. like-path attrs/coords/dims/nodata verified live on all 4 backends (CUDA available); Cats 2-5 clean. +reproject,2026-06-09,3093,MEDIUM,4;5,"Audited 2026-06-09 (agent-a2f2f5befa9759e9e worktree, branch deep-sweep-metadata-reproject-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for reproject() and merge(). Cat 1 attrs (crs/nodata/res/transform/_FillValue/nodatavals/units refreshed or carried), Cat 2 coords (pixel-center, non-spatial + band coord carry, float64), Cat 3 dims (lat/lon names, band-first round-trip via #2182), and dtype round-trip identical across the 4 backends; vertical_crs/vertical_datum EPSG convention verified; merge attrs from first raster with crs/nodata/res/transform re-emit verified on numpy + dask. NEW MEDIUM finding #3093 (Cat 4 + Cat 5): _reproject_streaming (the reproject() fallback when dask is absent and the in-memory source exceeds 512 MB; dask is an optional dep so reachable on plain pip installs) allocated its assembled output as np.full(out_shape, nodata, dtype=np.float64) in both the local ThreadPoolExecutor and dask.bag distributed branches, so integer sources returned float64 while numpy/cupy/dask+numpy/dask+cupy all round-trip integer dtypes (#2505); it also allocated 2-D so 3-D (y,x,band) sources crashed with a broadcast ValueError. The helper had zero test coverage. Fix in PR #3111 (branch deep-sweep-metadata-reproject-2026-06-09-01): same integer-round-trip dtype rule as _reproject_dask and a (*out_shape, n_bands) allocation for 3-D, in both branches; new TestStreamingDtypeParity (6 tests incl. a LocalCluster run of the distributed branch and value parity vs the in-memory path); full reproject suite 450 passed. LOW (documented, not fixed): geoid_height() docstring says 'Returns N : same type as input' with xr.DataArray listed as accepted, but DataArray inputs return a plain ndarray (coords/dims/attrs dropped) via out.reshape(np.shape(lon)) in _vertical.py." +resample,2026-05-27,2542,MEDIUM,2;4;5,"Audited 2026-05-27 (agent-a8135a6a246ecb93c worktree, branch deep-sweep-metadata-resample-2026-05-27). Cat 2 MEDIUM + Cat 4 MEDIUM + Cat 5 MEDIUM all rolled into issue #2542. (a) 2D non-identity path dropped scalar non-dim coords like rioxarrays spatial_ref and squeezed time/band selectors; identity path (scale==1.0, agg.copy()) and 3D path (per-band xr.concat) preserved them, so the bug was path-inconsistent (Cat 5). (b) _resolve_nodata reads attrs[nodata] as a fallback sentinel but the output post-processing only refreshed _FillValue and nodatavals, leaving attrs[nodata]=-9999 alongside data that was now NaN. Fix in resample(): refresh attrs[nodata] to NaN whenever the input had it, and carry across zero-dim non-dim coords on the 2D non-identity path. 7 new tests in TestMetadataPropagation cover nodata-attr refresh, spatial_ref/scalar coord carry, identity-vs-downsample coord parity, and the explicit choice to drop spatially-shaped extra coords. 4-backend (numpy/cupy/dask+numpy/dask+cupy) parity verified for spatial_ref carry; nodata-attr refresh verified on numpy/cupy/dask+numpy (dask+cupy non-NaN nodata masking hits a pre-existing xarray xr.where + cupy.astype quirk unrelated to this audit). Full resample test suite (175 passed) clean." +viewshed,2026-05-29,2743,MEDIUM,4;5,output .name differed across backends (None/viewshed/dask-token) and dtype float32 on GPU vs float64 on CPU; added name= param and forced float64 on all backends; attrs/coords/dims already preserved +zonal,2026-05-29,2611,MEDIUM,5,"Audited 2026-05-29 (agent-ae8d8b65cc3a5c40a worktree, branch deep-sweep-metadata-zonal-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. 5 DataArray-returning functions checked end-to-end: apply, regions, hypsometric_integral, trim, crop. attrs (res/crs/transform/nodatavals), dims, and coords preserved correctly on all 4 backends for every function; trim/crop slice coords with no half-pixel drift. stats() and crosstab() return DataFrames by design so Cat 1-3 DataArray checks N/A. NEW MEDIUM finding #2611 (Cat 5): apply() never set output .name, so numpy/cupy returned None while dask+numpy/dask+cupy inherited a non-deterministic internal dask task name (e.g. _chunk_fn-). regions/hypsometric_integral/trim/crop all set deterministic names; apply was the outlier. Fix in PR #2611/#2622: add name param (default None) and assign result.name after DataArray construction (setting name= at construction does not override the dask graph name). New parametrized test test_apply_name_consistent_across_backends covers default-None and explicit-name on all 4 backends. Full zonal suite 213 passed. No other CRITICAL/HIGH/MEDIUM findings; no LOW findings to document." diff --git a/xrspatial/geotiff/_vrt.py b/xrspatial/geotiff/_vrt.py index c3c9ee16a..b0900abd3 100644 --- a/xrspatial/geotiff/_vrt.py +++ b/xrspatial/geotiff/_vrt.py @@ -1892,10 +1892,13 @@ def _check_no_mixed_georef(sources_meta: list[dict]) -> None: ``(0, 0)`` as if those identity values were real CRS coordinates. Refuse rather than emit a mosaic that mislocates the tile. - The all-georeferenced and all-non-georeferenced cases both pass: - write_vrt emits a ```` only when every source is - georeferenced, so the all-non-georeferenced VRT preserves - ``georef_status='none'`` on read. + The all-georeferenced and all-non-georeferenced cases both pass + this gate: write_vrt emits a ```` only when every + source is georeferenced, so the all-non-georeferenced VRT preserves + ``georef_status='none'`` on read. Placement for multiple + non-georeferenced sources is handled separately in + :func:`write_vrt`: it requires explicit ``dst_offsets`` because the + identity transforms cannot place the tiles (issue #3116). """ if not sources_meta: return @@ -1978,7 +1981,8 @@ def _check_no_mixed_nodata(sources_meta: list[dict], *, def write_vrt(vrt_path: str, source_files: list[str], *, relative: bool = True, crs_wkt: str | None = None, - nodata: float | int | None = None) -> str: + nodata: float | int | None = None, + dst_offsets: list[tuple[int, int]] | None = None) -> str: """Generate a VRT file that mosaics multiple GeoTIFF tiles. Do not call this symbol directly from external code. This is the @@ -2014,6 +2018,20 @@ def write_vrt(vrt_path: str, source_files: list[str], *, (e.g. ``65535`` for uint16, ``-9999`` for int32) are accepted so the surface lines up with the ``nodata`` kwarg on ``to_geotiff`` and ``_write_geotiff_gpu``. + dst_offsets : list of (int, int) or None + [internal-only] Explicit pixel-space placement for + non-georeferenced sources: one ``(x_off, y_off)`` pair per + entry of ``source_files``, giving the source's top-left corner + in the mosaic. Required when more than one non-georeferenced + source is supplied -- such sources carry only the default + identity transform, so their placement cannot be derived from + georeferencing (issue #3116). Rejected when any source is + georeferenced: georeferenced sources place via their + GeoTransform and an explicit override would let the two + disagree silently. Offsets are not checked for overlap or + full coverage; the tiled write path always supplies a + non-overlapping cover, and a direct caller is responsible + for its own layout. Returns ------- @@ -2131,6 +2149,52 @@ def _pixel_size_mismatch(a: float, b: float) -> bool: _check_no_mixed_georef(sources_meta) _check_no_mixed_nodata(sources_meta, caller_nodata=nodata) + # Placement policy for non-georeferenced sources (issue #3116). + # Mosaic placement is normally derived from each source's + # GeoTransform, but a non-georeferenced source carries only the + # default identity transform, so every such source maps to the same + # extent. Deriving placement from that stacks all tiles at DstRect + # (0, 0) and shrinks the mosaic to one tile -- a silently corrupt + # index. Callers that know the layout (the tiled ``.vrt`` write + # path in ``_writers/eager.py``) pass explicit pixel offsets via + # ``dst_offsets``; anything else with more than one + # non-georeferenced source is refused, mirroring the + # ``_check_no_mixed_georef`` fail-closed precedent. + all_non_georef = not any( + bool(m.get('has_georef')) for m in sources_meta) + if dst_offsets is not None: + if not all_non_georef: + georeffed = [m['path'] for m in sources_meta + if m.get('has_georef')] + raise ValueError( + f"write_vrt: dst_offsets is only valid when every source " + f"is non-georeferenced; georeferenced sources " + f"{georeffed!r} place via their GeoTransform, and an " + f"explicit pixel offset could silently disagree with it. " + f"Drop dst_offsets or strip the georeferencing.") + if len(dst_offsets) != len(sources_meta): + raise ValueError( + f"write_vrt: dst_offsets has {len(dst_offsets)} entries " + f"for {len(sources_meta)} source file(s); pass one " + f"(x_off, y_off) pair per source.") + for i, pair in enumerate(dst_offsets): + if (not isinstance(pair, (tuple, list)) or len(pair) != 2 + or any(isinstance(v, bool) + or not isinstance(v, (int, np.integer)) + or v < 0 for v in pair)): + raise ValueError( + f"write_vrt: dst_offsets[{i}] must be a pair of " + f"non-negative ints (x_off, y_off), got {pair!r}.") + elif all_non_georef and len(sources_meta) > 1: + raise ValueError( + f"write_vrt: {len(sources_meta)} non-georeferenced sources " + f"with no dst_offsets. Non-georeferenced TIFFs carry only a " + f"default identity transform, so their mosaic placement " + f"cannot be derived from georeferencing; writing them " + f"without explicit placement would stack every source at " + f"the origin. Pass dst_offsets with one (x_off, y_off) pair " + f"per source, or georeference the sources.") + # Pixel size, sample format, band count, and CRS share the # documented "all sources must agree with first" contract. These # remain inline here because they predate the grouped gates above @@ -2175,26 +2239,59 @@ def _pixel_size_mismatch(a: float, b: float) -> bool: f"share the same CRS." ) - # Compute the bounding box of all sources - all_x0, all_y0, all_x1, all_y1 = [], [], [], [] - for m in sources_meta: - t = m['transform'] - x0 = t.origin_x - y0 = t.origin_y - x1 = x0 + m['width'] * t.pixel_width - y1 = y0 + m['height'] * t.pixel_height - all_x0.append(min(x0, x1)) - all_y0.append(min(y0, y1)) - all_x1.append(max(x0, x1)) - all_y1.append(max(y0, y1)) - - mosaic_x0 = min(all_x0) - mosaic_y_top = max(all_y1) # top edge (y increases upward in geo) - mosaic_x1 = max(all_x1) - mosaic_y_bottom = min(all_y0) - - total_w = int(round((mosaic_x1 - mosaic_x0) / abs(res_x))) - total_h = int(round((mosaic_y_top - mosaic_y_bottom) / abs(res_y))) + if dst_offsets is not None: + # Explicit pixel-space placement (non-georeferenced mosaic). + # The mosaic extent is the union of the placed sources; no + # GeoTransform is emitted (every source is non-georeferenced, + # enforced above), so the geo bbox math is skipped entirely. + placements = [(int(x), int(y)) for x, y in dst_offsets] + total_w = max(x + m['width'] + for (x, _y), m in zip(placements, sources_meta)) + total_h = max(y + m['height'] + for (_x, y), m in zip(placements, sources_meta)) + mosaic_x0 = mosaic_y_top = None # no geo origin without georef + else: + # Compute the bounding box of all sources + all_x0, all_y0, all_x1, all_y1 = [], [], [], [] + for m in sources_meta: + t = m['transform'] + x0 = t.origin_x + y0 = t.origin_y + x1 = x0 + m['width'] * t.pixel_width + y1 = y0 + m['height'] * t.pixel_height + all_x0.append(min(x0, x1)) + all_y0.append(min(y0, y1)) + all_x1.append(max(x0, x1)) + all_y1.append(max(y0, y1)) + + mosaic_x0 = min(all_x0) + mosaic_y_top = max(all_y1) # top edge (y increases upward in geo) + mosaic_x1 = max(all_x1) + mosaic_y_bottom = min(all_y0) + + total_w = int(round((mosaic_x1 - mosaic_x0) / abs(res_x))) + total_h = int(round((mosaic_y_top - mosaic_y_bottom) / abs(res_y))) + + # Pixel offset of each source in the virtual raster, derived + # from its GeoTransform. ``origin_y`` is the *top* only for + # north-up rasters (pixel_height < 0); sources with ascending y + # (pixel_height > 0) place origin_y at the bottom, so the top is + # origin_y + height * pixel_height. + placements = [] + for m in sources_meta: + t = m['transform'] + src_top = max( + t.origin_y, + t.origin_y + m['height'] * t.pixel_height, + ) + src_left = min( + t.origin_x, + t.origin_x + m['width'] * t.pixel_width, + ) + placements.append(( + int(round((src_left - mosaic_x0) / abs(res_x))), + int(round((mosaic_y_top - src_top) / abs(res_y))), + )) # Determine VRT dtype via the central TIFF-to-numpy resolver so the # VRT header agrees with what the reader will actually decode. The @@ -2239,24 +2336,7 @@ def _pixel_size_mismatch(a: float, b: float) -> bool: if nd is not None: lines.append(f' {_xml_text(nd)}') - for m in sources_meta: - t = m['transform'] - # Top edge of this source in geo coords -- origin_y is the - # *top* only for north-up rasters (pixel_height < 0). Sources - # with ascending y (pixel_height > 0) place origin_y at the - # bottom, so the top is origin_y + height * pixel_height. - src_top = max( - t.origin_y, - t.origin_y + m['height'] * t.pixel_height, - ) - src_left = min( - t.origin_x, - t.origin_x + m['width'] * t.pixel_width, - ) - # Pixel offset in the virtual raster - dst_x_off = int(round((src_left - mosaic_x0) / abs(res_x))) - dst_y_off = int(round((mosaic_y_top - src_top) / abs(res_y))) - + for m, (dst_x_off, dst_y_off) in zip(sources_meta, placements): fname = m['path'] rel_attr = '0' if relative: diff --git a/xrspatial/geotiff/_writers/eager.py b/xrspatial/geotiff/_writers/eager.py index 46a691713..280c17068 100644 --- a/xrspatial/geotiff/_writers/eager.py +++ b/xrspatial/geotiff/_writers/eager.py @@ -1376,7 +1376,12 @@ def _cleanup_staging(): # Tiles are written into ``staging_dir``; ``tile_names`` records the # bare filenames so the final tile paths (under ``tiles_dir``) can be # rebuilt for the VRT index after the atomic rename below. + # ``tile_offsets`` records each tile's (x_off, y_off) pixel + # placement in lockstep: non-georeferenced tiles carry no + # transform, so the VRT index needs the placement passed + # explicitly or every tile lands at the origin (issue #3116). tile_names = [] + tile_offsets = [] delayed_tasks = [] def _safe_write_tile(*args, **kwargs): @@ -1407,6 +1412,7 @@ def _safe_write_tile(*args, **kwargs): tile_name = f'tile_{ri:0{pad_width}d}_{ci:0{pad_width}d}.tif' tile_path = os.path.join(staging_dir, tile_name) tile_names.append(tile_name) + tile_offsets.append((col_offset, row_offset)) # Compute per-tile geo_transform tile_gt = None @@ -1514,7 +1520,13 @@ def _safe_write_tile(*args, **kwargs): # and validated above; passing it again is idempotent. from .vrt import _build_vrt try: - _build_vrt(vrt_path, tile_paths, relative=True, nodata=nodata) + # ``dst_offsets`` carries the tile placement only for + # non-georeferenced input: georeferenced tiles place via their + # per-tile GeoTransform (computed above), and write_vrt rejects + # the kwarg alongside georeferenced sources (issue #3116). + _build_vrt(vrt_path, tile_paths, relative=True, nodata=nodata, + dst_offsets=( + tile_offsets if geo_transform is None else None)) except BaseException: # The index step failed after the rename. Remove the now-renamed # tile dir too so a retry is not blocked by the leftover-state diff --git a/xrspatial/geotiff/_writers/vrt.py b/xrspatial/geotiff/_writers/vrt.py index dc71d23af..0330caa7c 100644 --- a/xrspatial/geotiff/_writers/vrt.py +++ b/xrspatial/geotiff/_writers/vrt.py @@ -22,7 +22,8 @@ def _build_vrt(path: str = _VRT_PATH_MISSING_SENTINEL, relative: bool = True, crs: int | str | None = None, crs_wkt: str | None = _CRS_WKT_DEPRECATED_SENTINEL, - nodata: float | int | None = None) -> str: + nodata: float | int | None = None, + dst_offsets: list[tuple[int, int]] | None = None) -> str: """Generate a VRT file that mosaics multiple GeoTIFF tiles. [internal-only] This helper backs ``to_geotiff``'s ``.vrt`` write @@ -92,6 +93,14 @@ def _build_vrt(path: str = _VRT_PATH_MISSING_SENTINEL, Integer sentinels (e.g. ``65535`` for uint16, ``-9999`` for int32) are accepted so the surface lines up with the ``nodata`` kwarg on ``to_geotiff`` and ``_write_geotiff_gpu``. + dst_offsets : list of (int, int) or None, optional + [internal-only] Explicit pixel-space placement for + non-georeferenced sources, one ``(x_off, y_off)`` pair per + source file. Forwarded to ``_vrt.write_vrt``; see its docstring + for the contract. ``to_geotiff``'s ``.vrt`` path passes this + for non-georeferenced input so the index places each tile by + its pixel offset instead of the identity transform every such + tile carries (issue #3116). Returns ------- @@ -219,4 +228,5 @@ def _build_vrt(path: str = _VRT_PATH_MISSING_SENTINEL, relative=relative, crs_wkt=resolved_wkt, nodata=nodata, + dst_offsets=dst_offsets, ) diff --git a/xrspatial/geotiff/tests/test_polish.py b/xrspatial/geotiff/tests/test_polish.py index fee8eaa53..d8cb18b8d 100644 --- a/xrspatial/geotiff/tests/test_polish.py +++ b/xrspatial/geotiff/tests/test_polish.py @@ -67,13 +67,14 @@ def test_read_geotiff_dask_handles_vrt_directly(self, tmp_path): arr = np.arange(64, dtype=np.float32).reshape(8, 8) a_path = str(tmp_path / 'a_1488.tif') b_path = str(tmp_path / 'b_1488.tif') - # Two tiles side-by-side via geo-transform attrs would normally be - # generated upstream; for this test we just need both files to - # share a CRS and the writer's default transform. + # The tiles carry no georeferencing, so the writer needs the + # side-by-side layout spelled out: write_vrt now refuses + # multiple non-georeferenced sources without explicit placement + # (issue #3116) instead of stacking them at the origin. write(arr, a_path, compression='none') write(arr, b_path, compression='none') vrt_path = str(tmp_path / 'mosaic_1488.vrt') - wv(vrt_path, [a_path, b_path]) + wv(vrt_path, [a_path, b_path], dst_offsets=[(0, 0), (8, 0)]) result = _read_geotiff_dask(vrt_path, chunks=8) assert result.dims == ('y', 'x') diff --git a/xrspatial/geotiff/tests/vrt/test_non_georef_placement_3116.py b/xrspatial/geotiff/tests/vrt/test_non_georef_placement_3116.py new file mode 100644 index 000000000..8ea40b451 --- /dev/null +++ b/xrspatial/geotiff/tests/vrt/test_non_georef_placement_3116.py @@ -0,0 +1,227 @@ +"""Non-georeferenced VRT tile placement (issue #3116). + +``to_geotiff(da, 'out.vrt', tile_size=N)`` on a non-georeferenced array +spanning more than one tile used to write a corrupt index: placement was +derived from each tile's GeoTransform, and non-georef tiles all carry +the default identity transform, so every ``DstRect`` landed at the +origin and ``rasterXSize`` / ``rasterYSize`` shrank to one tile. Reading +the VRT back silently returned a single tile's data. + +The fix threads each tile's pixel offsets from ``_write_vrt_tiled`` +through ``_build_vrt`` to ``write_vrt`` (``dst_offsets``), and makes +``write_vrt`` refuse multiple non-georeferenced sources without +explicit placement. +""" +from __future__ import annotations + +import importlib.util +import pathlib + +import numpy as np +import pytest +import xarray as xr + +from xrspatial.geotiff import open_geotiff, to_geotiff +from xrspatial.geotiff._attrs import GEOREF_STATUS_NONE +from xrspatial.geotiff._vrt import write_vrt as _write_vrt_internal + + +def _gpu_available() -> bool: + if importlib.util.find_spec("cupy") is None: + return False + try: + import cupy + return bool(cupy.cuda.is_available()) + except Exception: + return False + + +_HAS_GPU = _gpu_available() + +# Reader backends: kwargs for open_geotiff. GPU entries skip when no +# CUDA device is present. +_BACKENDS = [ + pytest.param({}, id="numpy"), + pytest.param({"chunks": 10}, id="dask-numpy"), + pytest.param({"gpu": True}, id="cupy", marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), + pytest.param({"gpu": True, "chunks": 10}, id="dask-cupy", + marks=pytest.mark.skipif( + not _HAS_GPU, reason="cupy + CUDA required")), +] + + +def _materialise(da): + data = da.data + if hasattr(data, "compute"): + data = data.compute() + if hasattr(data, "get"): + data = data.get() + return np.asarray(data) + + +_DATA = np.arange(24 * 32, dtype=np.uint8).reshape(24, 32) + + +@pytest.mark.parametrize("reader_kwargs", _BACKENDS) +def test_non_georef_multi_tile_round_trip(tmp_path, reader_kwargs): + """A 24x32 non-georef array tiled at 16 spans a 2x2 tile grid; the + round trip must return the full array with integer pixel coords.""" + vrt_path = str(tmp_path / "ng_multi_3116.vrt") + to_geotiff(xr.DataArray(_DATA, dims=("y", "x")), vrt_path, tile_size=16) + + out = open_geotiff(vrt_path, **reader_kwargs) + + assert out.shape == _DATA.shape + np.testing.assert_array_equal(_materialise(out), _DATA) + np.testing.assert_array_equal(out.y.values, np.arange(24)) + np.testing.assert_array_equal(out.x.values, np.arange(32)) + assert out.attrs.get("georef_status") == GEOREF_STATUS_NONE + assert "transform" not in out.attrs + + +def test_non_georef_index_places_tiles_by_pixel_offset(tmp_path): + """The emitted XML must size the mosaic at the full array and give + each tile its own DstRect offset.""" + vrt_path = str(tmp_path / "ng_xml_3116.vrt") + to_geotiff(xr.DataArray(_DATA, dims=("y", "x")), vrt_path, tile_size=16) + + xml = pathlib.Path(vrt_path).read_text() + assert 'rasterXSize="32" rasterYSize="24"' in xml + for rect in ('', + '', + '', + ''): + assert rect in xml, f"missing {rect}" + + +def test_non_georef_dask_backed_write_round_trip(tmp_path): + """The dask streaming write tiles by chunk grid; placement must + follow the chunk offsets.""" + da_mod = pytest.importorskip("dask.array") + src = xr.DataArray(da_mod.from_array(_DATA, chunks=(16, 16)), + dims=("y", "x")) + vrt_path = str(tmp_path / "ng_dask_3116.vrt") + to_geotiff(src, vrt_path, tile_size=16) + + out = open_geotiff(vrt_path) + assert out.shape == _DATA.shape + np.testing.assert_array_equal(_materialise(out), _DATA) + + +def test_non_georef_plain_ndarray_write_round_trip(tmp_path): + """A bare ndarray write takes the non-DataArray branch of + ``_write_vrt_tiled`` and is just as non-georeferenced.""" + vrt_path = str(tmp_path / "ng_plain_3116.vrt") + to_geotiff(xr.DataArray(_DATA, dims=("y", "x")).data, vrt_path, + tile_size=16) + + out = open_geotiff(vrt_path) + assert out.shape == _DATA.shape + np.testing.assert_array_equal(_materialise(out), _DATA) + + +def test_non_georef_single_tile_still_works(tmp_path): + """One tile needs no placement; the #2966 behaviour is unchanged.""" + small = _DATA[:10, :12] + vrt_path = str(tmp_path / "ng_single_3116.vrt") + to_geotiff(xr.DataArray(small, dims=("y", "x")), vrt_path, tile_size=16) + + out = open_geotiff(vrt_path) + assert out.shape == small.shape + np.testing.assert_array_equal(_materialise(out), small) + assert out.attrs.get("georef_status") == GEOREF_STATUS_NONE + + +def test_georef_multi_tile_placement_unchanged(tmp_path): + """Georeferenced tiles keep placing via their GeoTransform; the + placement refactor must not move them.""" + h, w = 24, 32 + src = xr.DataArray( + _DATA, dims=("y", "x"), + coords={"y": 4000.0 - 5.0 * (np.arange(h) + 0.5), + "x": 100.0 + 5.0 * (np.arange(w) + 0.5)}, + attrs={"crs": 32633, + "transform": (5.0, 0.0, 100.0, 0.0, -5.0, 4000.0)}) + vrt_path = str(tmp_path / "geo_multi_3116.vrt") + to_geotiff(src, vrt_path, tile_size=16) + + out = open_geotiff(vrt_path) + assert out.shape == (h, w) + np.testing.assert_array_equal(_materialise(out), _DATA) + np.testing.assert_allclose(out.y.values, src.y.values) + np.testing.assert_allclose(out.x.values, src.x.values) + assert out.attrs["transform"] == (5.0, 0.0, 100.0, 0.0, -5.0, 4000.0) + + +# --------------------------------------------------------------------------- +# write_vrt-level contract for dst_offsets +# --------------------------------------------------------------------------- + + +def _write_plain_tile(tmp_path, name, arr): + path = str(tmp_path / name) + to_geotiff(xr.DataArray(arr, dims=("y", "x")), path) + return path + + +def _write_georef_tile(tmp_path, name, arr): + h, w = arr.shape + path = str(tmp_path / name) + to_geotiff( + xr.DataArray( + arr, dims=("y", "x"), + coords={"y": 100.0 - (np.arange(h) + 0.5), + "x": np.arange(w) + 0.5}, + attrs={"crs": 4326, + "transform": (1.0, 0.0, 0.0, 0.0, -1.0, 100.0)}), + path) + return path + + +def test_write_vrt_multiple_non_georef_without_offsets_raises(tmp_path): + a = _write_plain_tile(tmp_path, "a_3116.tif", _DATA[:8, :8]) + b = _write_plain_tile(tmp_path, "b_3116.tif", _DATA[:8, 8:16]) + with pytest.raises(ValueError, match="dst_offsets"): + _write_vrt_internal(str(tmp_path / "amb_3116.vrt"), [a, b]) + + +def test_write_vrt_explicit_offsets_place_non_georef_sources(tmp_path): + a = _write_plain_tile(tmp_path, "left_3116.tif", _DATA[:8, :8]) + b = _write_plain_tile(tmp_path, "right_3116.tif", _DATA[:8, 8:16]) + vrt_path = _write_vrt_internal( + str(tmp_path / "placed_3116.vrt"), [a, b], + dst_offsets=[(0, 0), (8, 0)]) + + out = open_geotiff(vrt_path) + assert out.shape == (8, 16) + np.testing.assert_array_equal(_materialise(out), _DATA[:8, :16]) + + +def test_write_vrt_dst_offsets_with_georef_source_raises(tmp_path): + a = _write_georef_tile(tmp_path, "geo_a_3116.tif", _DATA[:8, :8]) + with pytest.raises(ValueError, match="non-georeferenced"): + _write_vrt_internal(str(tmp_path / "geo_off_3116.vrt"), [a], + dst_offsets=[(0, 0)]) + + +def test_write_vrt_dst_offsets_length_mismatch_raises(tmp_path): + a = _write_plain_tile(tmp_path, "len_a_3116.tif", _DATA[:8, :8]) + b = _write_plain_tile(tmp_path, "len_b_3116.tif", _DATA[:8, 8:16]) + with pytest.raises(ValueError, match="entries"): + _write_vrt_internal(str(tmp_path / "len_3116.vrt"), [a, b], + dst_offsets=[(0, 0)]) + + +@pytest.mark.parametrize("bad", [ + (0,), # wrong arity + (-1, 0), # negative + (True, 0), # bool masquerading as int + (0.0, 0), # float + "00", # not a pair at all +]) +def test_write_vrt_dst_offsets_bad_pair_raises(tmp_path, bad): + a = _write_plain_tile(tmp_path, "bad_a_3116.tif", _DATA[:8, :8]) + with pytest.raises(ValueError, match="dst_offsets"): + _write_vrt_internal(str(tmp_path / "bad_3116.vrt"), [a], + dst_offsets=[bad])