Skip to content

reproject: pick integer nodata sentinel for int rasters (#2185)#2191

Merged
brendancol merged 2 commits into
mainfrom
issue-2185
May 20, 2026
Merged

reproject: pick integer nodata sentinel for int rasters (#2185)#2191
brendancol merged 2 commits into
mainfrom
issue-2185

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

Closes #2185.

  • _detect_nodata() takes an optional dtype hint. When the hint is an integer dtype and no nodata came from the user arg or attrs, it returns a dtype-appropriate sentinel (dtype.min for signed, dtype.max for unsigned) instead of NaN. Matches the rasterio/GDAL convention.
  • reproject() passes raster.dtype into _detect_nodata. merge() does the same when detecting each input's own sentinel.
  • New parametrized tests over int8/int16/int32/uint8/uint16 check that out-of-bounds pixels equal attrs['nodata'] exactly across numpy, dask+numpy, cupy, and dask+cupy. The GPU paths skip when cupy isn't installed.
  • One regression test reruns the exact reproject: integer rasters silently corrupt nodata when default NaN is used #2185 repro with warnings.simplefilter('error', RuntimeWarning) so the silent-cast warning that hid the bug now fails the test.

Test plan

  • pytest xrspatial/tests/test_reproject.py: 284 passed locally
  • pytest xrspatial/tests/test_reproject.py -k "Integer or integer or 2185": 19 new tests pass

…#2185)

Integer rasters reprojected without an explicit nodata used to lose
out-of-bounds pixels to silent 0s while attrs['nodata'] kept claiming
NaN. The worker rounds and casts back to the input integer dtype, and
NaN doesn't survive that cast. Pass the raster dtype into
_detect_nodata so integer inputs get a dtype-appropriate sentinel
(dtype.min for signed, dtype.max for unsigned) following the
rasterio/GDAL convention.

The same fix applies to merge's per-input nodata detection so mixed
integer inputs canonicalize correctly during the mosaic step.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 20, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: integer nodata sentinel for int rasters (#2185)

Blockers

None.

Suggestions

  • xrspatial/reproject/_crs_utils.py:130-145: the integer-sentinel fallback only kicks in when the attrs/rio chain returns no value. If a user explicitly sets attrs={'nodata': nan} on an int16 raster, the function returns NaN and the bug from #2185 comes back through the back door. Either swap NaN to the sentinel after resolution when the dtype is integer, or call out that explicit NaN attrs on integer rasters are undefined.

  • xrspatial/reproject/__init__.py:666: this is a verification, not a bug. The dtype hint is raster.dtype, but the resolved value is always float(...)-typed, so the downstream code that expects a float nd is fine. Worth a quick mental check from anyone touching this in the future.

Nits

  • TestReprojectIntegerNodataNumpy puts @pytest.mark.parametrize on the whole class. Anyone adding a non-dtype-parametric test method to that class will accidentally inherit the parametrize. Splitting per-method, or just renaming the class to flag the intent, would help.

  • _int_raster_with_oob in xrspatial/tests/test_reproject.py: the docstring should note that the function assumes EPSG:32633 as the target CRS, since that is what produces the OOB pixels the tests rely on.

What looks good

  • The fix is small and surgical. _default_integer_nodata is a clean helper with a clear docstring that matches rasterio/GDAL.
  • Tests hit every integer dtype called out in the issue.
  • Both merge() and reproject() get the dtype hint, so merge's per-input canonicalization does not leak the same bug.
  • The warnings.simplefilter('error', RuntimeWarning) regression test catches the exact warning that originally hid the bug.

Checklist

  • Algorithm matches reference (rasterio/GDAL convention)
  • Backends consistent (numpy + dask tested; cupy + dask+cupy tests skip without cupy)
  • NaN handling is correct
  • Edge cases covered
  • Dask chunk boundaries handled (no new dask code, just a sentinel change)
  • No premature materialization
  • No benchmark needed (bug fix, no perf impact)
  • README feature matrix: not applicable
  • Docstrings present and accurate

- `_detect_nodata` now applies the dtype-aware swap after resolving
  the raw value, so explicit `attrs['nodata']=nan` or
  `nodata=float('nan')` on an integer raster also yields the integer
  sentinel. The split-out `_detect_nodata_raw` keeps the lookup chain
  readable.
- Add tests for the NaN-in-attrs and explicit-NaN-arg cases.
- Rename `TestReprojectIntegerNodataNumpy` to flag the class-level
  parametrize, and note that future contributors should not add
  non-parametric methods to it.
- Expand `_int_raster_with_oob` docstring to call out the EPSG:32633
  dependence.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 61c3ed9

Disposition of original findings

  • Suggestion 1 (NaN-in-attrs on integer raster) -- fixed in 61c3ed9. _detect_nodata now resolves the value through _detect_nodata_raw, then swaps NaN for the integer sentinel post-resolution. Three new tests lock this in: explicit attrs['nodata']=nan, explicit nodata=float('nan') arg, and float-dtype-keeps-NaN.
  • Suggestion 2 (float-typed nd verification) -- dismissed. Confirmed _detect_nodata always returns float(...), so downstream callers that pass nd into worker functions get the expected float type.
  • Nit 1 (class-level parametrize bleed) -- fixed. Renamed to TestReprojectIntegerNodataNumpyParametrized and added a docstring note that non-parametric tests should live elsewhere.
  • Nit 2 (docstring on _int_raster_with_oob) -- fixed. Docstring now spells out the EPSG:32633 dependence and the corner-OOB construction.

Re-run results

287 passed locally (3 new tests added since the original review).

No new findings.

@brendancol brendancol merged commit 98e88ac into main May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: integer rasters silently corrupt nodata when default NaN is used

1 participant