Keep integer dtype for empty reproject chunks (#3096)#3133
Open
brendancol wants to merge 4 commits into
Open
Conversation
The empty-chunk fills in _reproject_chunk_numpy, _reproject_chunk_cupy, and the footprint-skip path in _reproject_block_adapter were hardcoded to float64. With an integer source, one no-overlap output chunk promoted the whole computed dask array to float64 while the lazy array (and the eager backend) advertised the integer dtype. Fill empty chunks with the source integer dtype instead; the resolved nodata is guaranteed representable for integer rasters (#2185/#2572). Float sources keep float64 fills.
brendancol
commented
Jun 9, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Keep integer dtype for empty reproject chunks (#3096)
Blockers (must fix before merge)
None found.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
- In
_reproject_block_adapter, the 2-D footprint-skip path on dask+cupy still returns a numpy fill while the data chunks are cupy arrays (xrspatial/reproject/__init__.py:2003, pre-existing; the 3-D branch does returncp.full). The dask meta is numpy either way, so the stitched result behaves, but the array-type mix is worth a follow-up note if dask+cupy empty chunks ever grow a device-side consumer. Out of scope here since this PR only changes the fill dtype. - A non-integral float nodata on an integer raster (e.g. explicit
nodata=0.5with int16) would truncate silently in the newnp.full(..., dtype=int16)fills. The data path's round-and-clip cast already collapses such a sentinel the same way, so this PR doesn't make it worse, and_detect_nodatarange-checks but doesn't integrality-check. Pre-existing input-validation gap, not introduced here.
What looks good
- All eleven empty-fill sites got the same treatment: four in
_reproject_chunk_numpy, five in_reproject_chunk_cupy, and the footprint-skip path in_reproject_block_adapter, so the fix can't be dodged by which early-return a chunk happens to hit. - The dtype rule matches the template logic in
_reproject_daskand the eager cast-back exactly (np.issubdtype(..., np.integer)), so bool and float sources keep float64 fills and nothing else shifts. - Leaning on the #2185/#2572 guarantee that integer rasters always resolve an in-range nodata is the right call; the integer fill is safe by construction and the comment says why.
- Tests pin the contract from four angles: dask advertised-vs-computed dtype with forced empty chunks (plus a sentinel check in the empty corner and a has-real-data guard so the test can't pass vacuously), the eager fully-no-overlap case, a float64 regression guard, and a dask+cupy run.
- Merge paths are unaffected: the canonicalize step already promotes via
np.where(arr == r_nd, np.nan, arr), which handles integer fills the same as the old float64 ones.
Checklist
- Algorithm unchanged (fill dtype only; values identical)
- All implemented backends produce consistent results (numpy, dask+numpy, cupy worker, dask+cupy covered)
- NaN handling is correct (float sources keep NaN fills, guarded by test)
- Edge cases are covered by tests (empty corners, fully-no-overlap output, float guard)
- Dask chunk boundaries handled correctly (per-chunk fills now match the template dtype)
- No premature materialization or unnecessary copies
- Benchmark not needed (no hot-path change; fills were already allocated)
- README feature matrix unchanged (no API or backend-support change)
- Docstrings/comments explain the why, with issue references
The 2-D footprint-skip path returned a numpy fill on dask+cupy while the 3-D branch and the data chunks return cupy arrays. Unify on cp.full when is_cupy so empty blocks match the rest of the graph.
brendancol
commented
Jun 9, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (after 6a24085)
The first nit from the previous review is addressed: the 2-D footprint-skip path in _reproject_block_adapter now returns cp.full on dask+cupy, matching the 3-D branch and the data chunks. Full reproject suite re-run on a GPU host: 436 + 12 passed.
Remaining item, dismissed with reason: the non-integral-float-nodata-on-integer-raster truncation is a pre-existing validation gap in _detect_nodata (unmodified by this PR), and the data path's round-and-clip cast already collapses such a sentinel identically. No change needed here.
No new findings.
…eproject-2026-06-09-02 # Conflicts: # .claude/sweep-accuracy-state.csv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3096
_reproject_chunk_numpy,_reproject_chunk_cupy, and the footprint-skip path in_reproject_block_adapternow use the source's integer dtype instead of a hardcoded float64. One no-overlap chunk used to promote the whole computed dask array to float64 while the lazy array (and the eager backend) advertised int16/uint8/etc. Float sources keep float64 fills.TestEmptyChunkDtype: dask+numpy int16 with forced empty chunks (advertised == computed == int16, sentinel in the empty corners), eager numpy with a fully no-overlap output grid, a float64 guard, and a dask+cupy variant.Backend coverage: numpy, cupy, dask+numpy, dask+cupy (the cupy worker and block adapter paths got the same fix).
Test plan:
pytest xrspatial/tests/test_reproject.py -k EmptyChunkDtype(4 passed on a GPU host)pytest xrspatial/tests/test_reproject.py(436 passed)pytest xrspatial/tests/test_reproject_coverage_2026_05_27.py xrspatial/tests/test_reproject_cupy_gate_2564.py(12 passed)