Skip to content

reproject: preserve integer dtype in dask+cupy fast path (#2505)#2509

Merged
brendancol merged 4 commits into
mainfrom
issue-2505
May 27, 2026
Merged

reproject: preserve integer dtype in dask+cupy fast path (#2505)#2509
brendancol merged 4 commits into
mainfrom
issue-2505

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2505.

Summary

  • _reproject_dask_cupy eager fast path force-allocated the output buffer as cp.float64 regardless of source dtype, so int16/uint8/etc. inputs were silently promoted to float64 while every other backend kept the source dtype.
  • Compute out_dtype the same way _reproject_dask does (integer source keeps its dtype, float source stays float64), allocate the result with out_dtype, and clamp+cast each chunk before placement -- mirrors the per-band cast in _reproject_chunk_cupy.
  • New TestDaskCupyDtypeParity class with six tests covering int8 / int16 / uint8 / uint16 / float32 plus a direct dask+numpy vs dask+cupy cross-backend dtype check.

Backend coverage

Backend Before After
numpy int16 -> int16 unchanged
dask+numpy int16 -> int16 unchanged
cupy (eager) int16 -> int16 unchanged
dask+cupy int16 -> float64 int16 -> int16

Test plan

  • pytest xrspatial/tests/test_reproject.py::TestDaskCupyDtypeParity -- 6 passing
  • pytest xrspatial/tests/test_reproject.py -- 351 passing (345 baseline + 6 new)
  • New tests fail against pre-fix code, pass against fixed code

_reproject_dask_cupy's eager fast path force-allocated the output as
cp.float64 regardless of source dtype, so int16/uint8/etc. inputs were
silently promoted to float64. The other three backends (numpy, cupy,
dask+numpy) and the chunked dask+cupy fallback already preserve the
source integer dtype.

Compute out_dtype the same way _reproject_dask does (integer source ->
source dtype, float source -> float64), allocate the result buffer
with out_dtype, and clamp+cast each chunk before placement -- mirroring
the per-band cast in _reproject_chunk_cupy.
New TestDaskCupyDtypeParity class mirrors TestDaskDtypeParity. The six
new tests cover int8 / int16 / uint8 / uint16 / float32 inputs plus a
direct dask+numpy vs dask+cupy cross-backend check.

Each test fails on pre-fix code because the eager fast path in
_reproject_dask_cupy always allocated float64. The tests pass on the
fixed code where out_dtype tracks the source dtype.

Gated on HAS_DASK and HAS_CUPY since dask+cupy is required to hit the
specific code path.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: preserve integer dtype in dask+cupy fast path (#2505)

Blockers

None.

Suggestions

None.

Nits

  • The new TestDaskCupyDtypeParity tests check result.dtype only. The sibling TestDaskDtypeParity checks both result.data.dtype and result.compute().dtype. Since _reproject_dask_cupy returns an eager cupy ndarray (not a dask array), result.compute() is a no-op so the two forms are equivalent here. Fine as-is; add result.data.dtype if you want exact symmetry with the sibling class.

What looks good

  • Same pattern as _reproject_dask (lines 1839-1843) and _reproject_chunk_cupy (lines 609-611, 633). No new patterns introduced.
  • Clamp+cast on chunk_data runs at the per-chunk write level, so chunks skipped via continue retain the pre-filled out_dtype nodata.
  • The nodata value reaching _reproject_dask_cupy is already integer-compatible when the source is integer (resolved by _detect_nodata(..., dtype=raster.dtype) at line 851), so the cp.full(out_shape, nodata, dtype=int_dtype) allocation can't hit a NaN→int conversion.
  • New tests fail on pre-fix code and pass on fixed code (verified by temporarily reverting).
  • Six tests covering int8 / int16 / uint8 / uint16 / float32 plus a direct dask+numpy vs dask+cupy parity check.

Checklist

  • Pattern matches _reproject_dask and _reproject_chunk_cupy
  • All backends now produce consistent dtypes
  • NaN handling unchanged
  • Edge cases: signed, unsigned, float, skipped chunks
  • Dask chunk boundaries unaffected
  • No premature materialization
  • Benchmark not needed (correctness fix)
  • README feature matrix unaffected
  • Inline comments reference #2505

Mirror the assertion pattern used by TestDaskDtypeParity so the new
TestDaskCupyDtypeParity class checks both result.dtype and
result.data.dtype. The two are equivalent here because the eager fast
path returns a plain cupy ndarray, but the symmetry helps future
readers.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (#2505)

Addressed the nit from the previous review: TestDaskCupyDtypeParity now asserts both result.dtype and result.data.dtype, matching the symmetry used in TestDaskDtypeParity.

No outstanding findings. All 6 new tests still pass.

Disposition of original review:

  • Nit (assertion symmetry): fixed in bd45b82.

@brendancol brendancol merged commit 0fe3816 into main May 27, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: dask+cupy fast path drops integer dtype to float64

1 participant