Skip to content

Keep integer dtype for empty reproject chunks (#3096)#3133

Open
brendancol wants to merge 4 commits into
mainfrom
deep-sweep-accuracy-reproject-2026-06-09-02
Open

Keep integer dtype for empty reproject chunks (#3096)#3133
brendancol wants to merge 4 commits into
mainfrom
deep-sweep-accuracy-reproject-2026-06-09-02

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3096

  • Empty-chunk fills in _reproject_chunk_numpy, _reproject_chunk_cupy, and the footprint-skip path in _reproject_block_adapter now use the source's integer dtype instead of a hardcoded float64. One no-overlap chunk used to promote the whole computed dask array to float64 while the lazy array (and the eager backend) advertised int16/uint8/etc. Float sources keep float64 fills.
  • The resolved nodata is always representable for integer rasters (reproject: integer rasters silently corrupt nodata when default NaN is used #2185/reproject: out-of-range explicit nodata corrupts integer output and contradicts attrs #2572), so the integer fill is safe.
  • Adds TestEmptyChunkDtype: dask+numpy int16 with forced empty chunks (advertised == computed == int16, sentinel in the empty corners), eager numpy with a fully no-overlap output grid, a float64 guard, and a dask+cupy variant.
  • Also carries the accuracy-sweep state CSV update for the reproject module.

Backend coverage: numpy, cupy, dask+numpy, dask+cupy (the cupy worker and block adapter paths got the same fix).

Test plan:

  • pytest xrspatial/tests/test_reproject.py -k EmptyChunkDtype (4 passed on a GPU host)
  • pytest xrspatial/tests/test_reproject.py (436 passed)
  • pytest xrspatial/tests/test_reproject_coverage_2026_05_27.py xrspatial/tests/test_reproject_cupy_gate_2564.py (12 passed)

The empty-chunk fills in _reproject_chunk_numpy, _reproject_chunk_cupy,
and the footprint-skip path in _reproject_block_adapter were hardcoded
to float64. With an integer source, one no-overlap output chunk
promoted the whole computed dask array to float64 while the lazy array
(and the eager backend) advertised the integer dtype. Fill empty
chunks with the source integer dtype instead; the resolved nodata is
guaranteed representable for integer rasters (#2185/#2572). Float
sources keep float64 fills.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Keep integer dtype for empty reproject chunks (#3096)

Blockers (must fix before merge)

None found.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • In _reproject_block_adapter, the 2-D footprint-skip path on dask+cupy still returns a numpy fill while the data chunks are cupy arrays (xrspatial/reproject/__init__.py:2003, pre-existing; the 3-D branch does return cp.full). The dask meta is numpy either way, so the stitched result behaves, but the array-type mix is worth a follow-up note if dask+cupy empty chunks ever grow a device-side consumer. Out of scope here since this PR only changes the fill dtype.
  • A non-integral float nodata on an integer raster (e.g. explicit nodata=0.5 with int16) would truncate silently in the new np.full(..., dtype=int16) fills. The data path's round-and-clip cast already collapses such a sentinel the same way, so this PR doesn't make it worse, and _detect_nodata range-checks but doesn't integrality-check. Pre-existing input-validation gap, not introduced here.

What looks good

  • All eleven empty-fill sites got the same treatment: four in _reproject_chunk_numpy, five in _reproject_chunk_cupy, and the footprint-skip path in _reproject_block_adapter, so the fix can't be dodged by which early-return a chunk happens to hit.
  • The dtype rule matches the template logic in _reproject_dask and the eager cast-back exactly (np.issubdtype(..., np.integer)), so bool and float sources keep float64 fills and nothing else shifts.
  • Leaning on the #2185/#2572 guarantee that integer rasters always resolve an in-range nodata is the right call; the integer fill is safe by construction and the comment says why.
  • Tests pin the contract from four angles: dask advertised-vs-computed dtype with forced empty chunks (plus a sentinel check in the empty corner and a has-real-data guard so the test can't pass vacuously), the eager fully-no-overlap case, a float64 regression guard, and a dask+cupy run.
  • Merge paths are unaffected: the canonicalize step already promotes via np.where(arr == r_nd, np.nan, arr), which handles integer fills the same as the old float64 ones.

Checklist

  • Algorithm unchanged (fill dtype only; values identical)
  • All implemented backends produce consistent results (numpy, dask+numpy, cupy worker, dask+cupy covered)
  • NaN handling is correct (float sources keep NaN fills, guarded by test)
  • Edge cases are covered by tests (empty corners, fully-no-overlap output, float guard)
  • Dask chunk boundaries handled correctly (per-chunk fills now match the template dtype)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (no hot-path change; fills were already allocated)
  • README feature matrix unchanged (no API or backend-support change)
  • Docstrings/comments explain the why, with issue references

The 2-D footprint-skip path returned a numpy fill on dask+cupy while
the 3-D branch and the data chunks return cupy arrays. Unify on
cp.full when is_cupy so empty blocks match the rest of the graph.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 6a24085)

The first nit from the previous review is addressed: the 2-D footprint-skip path in _reproject_block_adapter now returns cp.full on dask+cupy, matching the 3-D branch and the data chunks. Full reproject suite re-run on a GPU host: 436 + 12 passed.

Remaining item, dismissed with reason: the non-integral-float-nodata-on-integer-raster truncation is a pre-existing validation gap in _detect_nodata (unmodified by this PR), and the data path's round-and-clip cast already collapses such a sentinel identically. No change needed here.

No new findings.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026
…eproject-2026-06-09-02

# Conflicts:
#	.claude/sweep-accuracy-state.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: dask backend computes float64 for integer rasters when output chunks miss the source footprint

1 participant