Summary
xrspatial.reproject.reproject() silently changes the output dtype to float64 when the input is a dask-backed cupy array with an integer dtype. The other three backends (numpy, cupy, dask+numpy) and the chunked dask+cupy fallback all preserve the source integer dtype.
Reproducer
import numpy as np
import xarray as xr
import cupy as cp
import dask.array as da
from xrspatial.reproject import reproject
data = np.arange(64, dtype=np.int16).reshape(8, 8)
raster = xr.DataArray(
da.from_array(cp.asarray(data), chunks=(4, 4)),
dims=['y', 'x'],
coords={'y': np.linspace(5, -5, 8), 'x': np.linspace(-5, 5, 8)},
attrs={'crs': 'EPSG:4326', 'nodata': -32768},
)
result = reproject(raster, 'EPSG:4326', resolution=1.0)
print(result.dtype) # float64 <-- expected int16
Comparison across backends with the same int16 input:
| Backend |
Output dtype |
| numpy |
int16 |
| dask+numpy |
int16 |
| cupy (eager) |
int16 |
| dask+cupy |
float64 |
Root cause
_reproject_dask_cupy in xrspatial/reproject/__init__.py enters an eager fast path when the output fits in GPU memory. That path allocates the output with cp.full(out_shape, nodata, dtype=cp.float64) and writes _resample_cupy_native / _resample_cupy chunks into it without ever casting back to the source dtype.
The chunked fallback (_reproject_dask(is_cupy=True)) already gets this right because _reproject_dask computes out_dtype from raster.dtype and _reproject_chunk_cupy clamps each chunk back to the source integer dtype.
Proposed fix
In _reproject_dask_cupy:
- Compute
out_dtype the same way _reproject_dask does — integer source keeps its dtype, float source stays float64.
- Allocate
result = cp.full(out_shape, nodata, dtype=out_dtype).
- After each chunk resample, clamp and cast back to
out_dtype when the source was integer (mirroring _reproject_chunk_cupy).
Why this matters
A user reading result.attrs['nodata'] == -32768 while the array is float64 has metadata that disagrees with the data on cross-backend round-trips (Cat 5: backend-inconsistent metadata). Memory blowup for large integer rasters on GPU is a secondary concern.
Test coverage
Add a TestDaskCupyDtypeParity class (gated on CUDA + cupy + dask) mirroring TestDaskDtypeParity:
test_dask_cupy_reproject_int8_preserves_dtype
test_dask_cupy_reproject_uint16_preserves_dtype
test_dask_cupy_reproject_float32_stays_float64
Summary
xrspatial.reproject.reproject()silently changes the output dtype tofloat64when the input is a dask-backed cupy array with an integer dtype. The other three backends (numpy, cupy, dask+numpy) and the chunked dask+cupy fallback all preserve the source integer dtype.Reproducer
Comparison across backends with the same int16 input:
Root cause
_reproject_dask_cupyinxrspatial/reproject/__init__.pyenters an eager fast path when the output fits in GPU memory. That path allocates the output withcp.full(out_shape, nodata, dtype=cp.float64)and writes_resample_cupy_native/_resample_cupychunks into it without ever casting back to the source dtype.The chunked fallback (
_reproject_dask(is_cupy=True)) already gets this right because_reproject_daskcomputesout_dtypefromraster.dtypeand_reproject_chunk_cupyclamps each chunk back to the source integer dtype.Proposed fix
In
_reproject_dask_cupy:out_dtypethe same way_reproject_daskdoes — integer source keeps its dtype, float source staysfloat64.result = cp.full(out_shape, nodata, dtype=out_dtype).out_dtypewhen the source was integer (mirroring_reproject_chunk_cupy).Why this matters
A user reading
result.attrs['nodata'] == -32768while the array is float64 has metadata that disagrees with the data on cross-backend round-trips (Cat 5: backend-inconsistent metadata). Memory blowup for large integer rasters on GPU is a secondary concern.Test coverage
Add a
TestDaskCupyDtypeParityclass (gated on CUDA + cupy + dask) mirroringTestDaskDtypeParity:test_dask_cupy_reproject_int8_preserves_dtypetest_dask_cupy_reproject_uint16_preserves_dtypetest_dask_cupy_reproject_float32_stays_float64