Skip to content

resample: drops non-dim coords (spatial_ref) and leaves stale attrs['nodata'] on 2D path #2542

@brendancol

Description

@brendancol

Summary

xrspatial.resample() has two metadata-propagation gaps on the 2D non-identity code path that combine to corrupt chained pipelines built on rioxarray conventions.

1. Non-dimension coords are silently dropped (Cat 2)

The 2D path rebuilds the output xr.DataArray with only the spatial dim-coords:

result = xr.DataArray(
    result_data,
    name=name,
    dims=agg.dims,
    coords={ydim: new_y, xdim: new_x},
    attrs=new_attrs,
)

Any extra non-dim coord on the input -- most notably rioxarray's spatial_ref
scalar coord, but also scalar time / band selectors -- is gone from the output.

A user pipeline like:

agg = rioxarray.open_rasterio('dem.tif').squeeze()
out = resample(agg, scale_factor=0.5)
out.rio.crs  # -> None (was EPSG:32633)

silently loses the CRS even though attrs['crs'] round-trips fine.

The identity path (scale_factor=1.0) uses agg.copy() and preserves the
coord, and the 3D path (per-band recursion + xr.concat) also happens to
preserve it. Only the 2D non-identity path is broken, so the bug is also
path-inconsistent (Cat 5).

2. attrs['nodata'] is read but never refreshed (Cat 4)

_resolve_nodata() reads the nodata sentinel from three sources:

for key in ('_FillValue', 'nodata'):
    v = agg.attrs.get(key)
    if v is not None:
        nodata = v
        break

After resampling, sentinel pixels are replaced with NaN. The output post-processing
refreshes _FillValue and nodatavals to NaN but leaves attrs['nodata'] at its
old finite value:

Input  attrs: {'nodata': -9999, 'res': (1.0, 1.0)}
Output attrs: {'nodata': -9999, '_FillValue': nan, 'res': (2.0, 2.0)}

A downstream consumer that trusts attrs['nodata'] will mask the wrong cells.

Repro

import numpy as np
import xarray as xr
from xrspatial.resample import resample

# Bug 1: spatial_ref dropped
agg = xr.DataArray(
    np.arange(16, dtype=np.float32).reshape(4, 4),
    dims=('y', 'x'),
    coords={'y': np.linspace(3, 0, 4),
            'x': np.linspace(0, 3, 4),
            'spatial_ref': 0},
    attrs={'res': (1.0, 1.0), 'crs': 'EPSG:4326'},
)
out = resample(agg, scale_factor=0.5)
assert 'spatial_ref' not in out.coords  # bug

# Bug 2: stale nodata attr
data = np.array([[-9999, -9999, 10, 10],
                 [-9999, -9999, 10, 10],
                 [20, 20, 30, 30],
                 [20, 20, 30, 30]], dtype=np.float32)
agg = xr.DataArray(
    data, dims=('y', 'x'),
    coords={'y': np.linspace(3, 0, 4), 'x': np.linspace(0, 3, 4)},
    attrs={'res': (1.0, 1.0), 'nodata': -9999},
)
out = resample(agg, scale_factor=0.5, method='nearest')
assert out.attrs['nodata'] == -9999  # bug; data now has NaN there

Proposed fix

In the 2D non-identity path in resample():

  1. After building the new dim-coord dict, fold in input non-dim coords whose
    dims are a subset of (ydim, xdim) or empty (scalar coords) so things
    like spatial_ref, scalar time, and scalar band round-trip.
  2. When has_nodata is true, also refresh attrs['nodata'] to NaN (matching
    the existing _FillValue / nodatavals refresh).

This brings the 2D path in line with the identity path's coord preservation
and closes the attrs['nodata'] half of the sentinel refresh.

Backend coverage

Fix lives in the single xr.DataArray constructor at the end of resample()
so it applies uniformly to numpy / cupy / dask+numpy / dask+cupy.

Discovered via

Deep-sweep metadata propagation audit (sweep-metadata, 2026-05-27).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions