Summary
xrspatial.resample() has two metadata-propagation gaps on the 2D non-identity code path that combine to corrupt chained pipelines built on rioxarray conventions.
1. Non-dimension coords are silently dropped (Cat 2)
The 2D path rebuilds the output xr.DataArray with only the spatial dim-coords:
result = xr.DataArray(
result_data,
name=name,
dims=agg.dims,
coords={ydim: new_y, xdim: new_x},
attrs=new_attrs,
)
Any extra non-dim coord on the input -- most notably rioxarray's spatial_ref
scalar coord, but also scalar time / band selectors -- is gone from the output.
A user pipeline like:
agg = rioxarray.open_rasterio('dem.tif').squeeze()
out = resample(agg, scale_factor=0.5)
out.rio.crs # -> None (was EPSG:32633)
silently loses the CRS even though attrs['crs'] round-trips fine.
The identity path (scale_factor=1.0) uses agg.copy() and preserves the
coord, and the 3D path (per-band recursion + xr.concat) also happens to
preserve it. Only the 2D non-identity path is broken, so the bug is also
path-inconsistent (Cat 5).
2. attrs['nodata'] is read but never refreshed (Cat 4)
_resolve_nodata() reads the nodata sentinel from three sources:
for key in ('_FillValue', 'nodata'):
v = agg.attrs.get(key)
if v is not None:
nodata = v
break
After resampling, sentinel pixels are replaced with NaN. The output post-processing
refreshes _FillValue and nodatavals to NaN but leaves attrs['nodata'] at its
old finite value:
Input attrs: {'nodata': -9999, 'res': (1.0, 1.0)}
Output attrs: {'nodata': -9999, '_FillValue': nan, 'res': (2.0, 2.0)}
A downstream consumer that trusts attrs['nodata'] will mask the wrong cells.
Repro
import numpy as np
import xarray as xr
from xrspatial.resample import resample
# Bug 1: spatial_ref dropped
agg = xr.DataArray(
np.arange(16, dtype=np.float32).reshape(4, 4),
dims=('y', 'x'),
coords={'y': np.linspace(3, 0, 4),
'x': np.linspace(0, 3, 4),
'spatial_ref': 0},
attrs={'res': (1.0, 1.0), 'crs': 'EPSG:4326'},
)
out = resample(agg, scale_factor=0.5)
assert 'spatial_ref' not in out.coords # bug
# Bug 2: stale nodata attr
data = np.array([[-9999, -9999, 10, 10],
[-9999, -9999, 10, 10],
[20, 20, 30, 30],
[20, 20, 30, 30]], dtype=np.float32)
agg = xr.DataArray(
data, dims=('y', 'x'),
coords={'y': np.linspace(3, 0, 4), 'x': np.linspace(0, 3, 4)},
attrs={'res': (1.0, 1.0), 'nodata': -9999},
)
out = resample(agg, scale_factor=0.5, method='nearest')
assert out.attrs['nodata'] == -9999 # bug; data now has NaN there
Proposed fix
In the 2D non-identity path in resample():
- After building the new dim-coord dict, fold in input non-dim coords whose
dims are a subset of (ydim, xdim) or empty (scalar coords) so things
like spatial_ref, scalar time, and scalar band round-trip.
- When
has_nodata is true, also refresh attrs['nodata'] to NaN (matching
the existing _FillValue / nodatavals refresh).
This brings the 2D path in line with the identity path's coord preservation
and closes the attrs['nodata'] half of the sentinel refresh.
Backend coverage
Fix lives in the single xr.DataArray constructor at the end of resample()
so it applies uniformly to numpy / cupy / dask+numpy / dask+cupy.
Discovered via
Deep-sweep metadata propagation audit (sweep-metadata, 2026-05-27).
Summary
xrspatial.resample()has two metadata-propagation gaps on the 2D non-identity code path that combine to corrupt chained pipelines built on rioxarray conventions.1. Non-dimension coords are silently dropped (Cat 2)
The 2D path rebuilds the output
xr.DataArraywith only the spatial dim-coords:Any extra non-dim coord on the input -- most notably rioxarray's
spatial_refscalar coord, but also scalar
time/bandselectors -- is gone from the output.A user pipeline like:
silently loses the CRS even though
attrs['crs']round-trips fine.The identity path (
scale_factor=1.0) usesagg.copy()and preserves thecoord, and the 3D path (per-band recursion +
xr.concat) also happens topreserve it. Only the 2D non-identity path is broken, so the bug is also
path-inconsistent (Cat 5).
2.
attrs['nodata']is read but never refreshed (Cat 4)_resolve_nodata()reads the nodata sentinel from three sources:After resampling, sentinel pixels are replaced with NaN. The output post-processing
refreshes
_FillValueandnodatavalsto NaN but leavesattrs['nodata']at itsold finite value:
A downstream consumer that trusts
attrs['nodata']will mask the wrong cells.Repro
Proposed fix
In the 2D non-identity path in
resample():dims are a subset of
(ydim, xdim)or empty (scalar coords) so thingslike
spatial_ref, scalartime, and scalarbandround-trip.has_nodatais true, also refreshattrs['nodata']to NaN (matchingthe existing
_FillValue/nodatavalsrefresh).This brings the 2D path in line with the identity path's coord preservation
and closes the
attrs['nodata']half of the sentinel refresh.Backend coverage
Fix lives in the single
xr.DataArrayconstructor at the end ofresample()so it applies uniformly to numpy / cupy / dask+numpy / dask+cupy.
Discovered via
Deep-sweep metadata propagation audit (sweep-metadata, 2026-05-27).