Skip to content

reproject: dask+cupy fast path drops integer dtype to float64 #2505

@brendancol

Description

@brendancol

Summary

xrspatial.reproject.reproject() silently changes the output dtype to float64 when the input is a dask-backed cupy array with an integer dtype. The other three backends (numpy, cupy, dask+numpy) and the chunked dask+cupy fallback all preserve the source integer dtype.

Reproducer

import numpy as np
import xarray as xr
import cupy as cp
import dask.array as da
from xrspatial.reproject import reproject

data = np.arange(64, dtype=np.int16).reshape(8, 8)
raster = xr.DataArray(
    da.from_array(cp.asarray(data), chunks=(4, 4)),
    dims=['y', 'x'],
    coords={'y': np.linspace(5, -5, 8), 'x': np.linspace(-5, 5, 8)},
    attrs={'crs': 'EPSG:4326', 'nodata': -32768},
)
result = reproject(raster, 'EPSG:4326', resolution=1.0)
print(result.dtype)  # float64  <-- expected int16

Comparison across backends with the same int16 input:

Backend Output dtype
numpy int16
dask+numpy int16
cupy (eager) int16
dask+cupy float64

Root cause

_reproject_dask_cupy in xrspatial/reproject/__init__.py enters an eager fast path when the output fits in GPU memory. That path allocates the output with cp.full(out_shape, nodata, dtype=cp.float64) and writes _resample_cupy_native / _resample_cupy chunks into it without ever casting back to the source dtype.

The chunked fallback (_reproject_dask(is_cupy=True)) already gets this right because _reproject_dask computes out_dtype from raster.dtype and _reproject_chunk_cupy clamps each chunk back to the source integer dtype.

Proposed fix

In _reproject_dask_cupy:

  1. Compute out_dtype the same way _reproject_dask does — integer source keeps its dtype, float source stays float64.
  2. Allocate result = cp.full(out_shape, nodata, dtype=out_dtype).
  3. After each chunk resample, clamp and cast back to out_dtype when the source was integer (mirroring _reproject_chunk_cupy).

Why this matters

A user reading result.attrs['nodata'] == -32768 while the array is float64 has metadata that disagrees with the data on cross-backend round-trips (Cat 5: backend-inconsistent metadata). Memory blowup for large integer rasters on GPU is a secondary concern.

Test coverage

Add a TestDaskCupyDtypeParity class (gated on CUDA + cupy + dask) mirroring TestDaskDtypeParity:

  • test_dask_cupy_reproject_int8_preserves_dtype
  • test_dask_cupy_reproject_uint16_preserves_dtype
  • test_dask_cupy_reproject_float32_stays_float64

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-coverageAdding missing dask/cupy/dask+cupy backend supportbugSomething isn't workinggpuCuPy / CUDA GPU support

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions