reproject: pick integer nodata sentinel for int rasters (#2185)#2191
Conversation
…#2185) Integer rasters reprojected without an explicit nodata used to lose out-of-bounds pixels to silent 0s while attrs['nodata'] kept claiming NaN. The worker rounds and casts back to the input integer dtype, and NaN doesn't survive that cast. Pass the raster dtype into _detect_nodata so integer inputs get a dtype-appropriate sentinel (dtype.min for signed, dtype.max for unsigned) following the rasterio/GDAL convention. The same fix applies to merge's per-input nodata detection so mixed integer inputs canonicalize correctly during the mosaic step.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: reproject: integer nodata sentinel for int rasters (#2185)
Blockers
None.
Suggestions
-
xrspatial/reproject/_crs_utils.py:130-145: the integer-sentinel fallback only kicks in when the attrs/rio chain returns no value. If a user explicitly setsattrs={'nodata': nan}on an int16 raster, the function returns NaN and the bug from #2185 comes back through the back door. Either swap NaN to the sentinel after resolution when the dtype is integer, or call out that explicit NaN attrs on integer rasters are undefined. -
xrspatial/reproject/__init__.py:666: this is a verification, not a bug. The dtype hint israster.dtype, but the resolved value is alwaysfloat(...)-typed, so the downstream code that expects a floatndis fine. Worth a quick mental check from anyone touching this in the future.
Nits
-
TestReprojectIntegerNodataNumpyputs@pytest.mark.parametrizeon the whole class. Anyone adding a non-dtype-parametric test method to that class will accidentally inherit the parametrize. Splitting per-method, or just renaming the class to flag the intent, would help. -
_int_raster_with_oobinxrspatial/tests/test_reproject.py: the docstring should note that the function assumes EPSG:32633 as the target CRS, since that is what produces the OOB pixels the tests rely on.
What looks good
- The fix is small and surgical.
_default_integer_nodatais a clean helper with a clear docstring that matches rasterio/GDAL. - Tests hit every integer dtype called out in the issue.
- Both
merge()andreproject()get the dtype hint, so merge's per-input canonicalization does not leak the same bug. - The
warnings.simplefilter('error', RuntimeWarning)regression test catches the exact warning that originally hid the bug.
Checklist
- Algorithm matches reference (rasterio/GDAL convention)
- Backends consistent (numpy + dask tested; cupy + dask+cupy tests skip without cupy)
- NaN handling is correct
- Edge cases covered
- Dask chunk boundaries handled (no new dask code, just a sentinel change)
- No premature materialization
- No benchmark needed (bug fix, no perf impact)
- README feature matrix: not applicable
- Docstrings present and accurate
- `_detect_nodata` now applies the dtype-aware swap after resolving
the raw value, so explicit `attrs['nodata']=nan` or
`nodata=float('nan')` on an integer raster also yields the integer
sentinel. The split-out `_detect_nodata_raw` keeps the lookup chain
readable.
- Add tests for the NaN-in-attrs and explicit-NaN-arg cases.
- Rename `TestReprojectIntegerNodataNumpy` to flag the class-level
parametrize, and note that future contributors should not add
non-parametric methods to it.
- Expand `_int_raster_with_oob` docstring to call out the EPSG:32633
dependence.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review after 61c3ed9
Disposition of original findings
- Suggestion 1 (NaN-in-attrs on integer raster) -- fixed in 61c3ed9.
_detect_nodatanow resolves the value through_detect_nodata_raw, then swaps NaN for the integer sentinel post-resolution. Three new tests lock this in: explicitattrs['nodata']=nan, explicitnodata=float('nan')arg, and float-dtype-keeps-NaN. - Suggestion 2 (float-typed
ndverification) -- dismissed. Confirmed_detect_nodataalways returnsfloat(...), so downstream callers that passndinto worker functions get the expected float type. - Nit 1 (class-level parametrize bleed) -- fixed. Renamed to
TestReprojectIntegerNodataNumpyParametrizedand added a docstring note that non-parametric tests should live elsewhere. - Nit 2 (docstring on
_int_raster_with_oob) -- fixed. Docstring now spells out the EPSG:32633 dependence and the corner-OOB construction.
Re-run results
287 passed locally (3 new tests added since the original review).
No new findings.
Summary
Closes #2185.
_detect_nodata()takes an optionaldtypehint. When the hint is an integer dtype and no nodata came from the user arg or attrs, it returns a dtype-appropriate sentinel (dtype.minfor signed,dtype.maxfor unsigned) instead of NaN. Matches the rasterio/GDAL convention.reproject()passesraster.dtypeinto_detect_nodata.merge()does the same when detecting each input's own sentinel.attrs['nodata']exactly across numpy, dask+numpy, cupy, and dask+cupy. The GPU paths skip when cupy isn't installed.warnings.simplefilter('error', RuntimeWarning)so the silent-cast warning that hid the bug now fails the test.Test plan
pytest xrspatial/tests/test_reproject.py: 284 passed locallypytest xrspatial/tests/test_reproject.py -k "Integer or integer or 2185": 19 new tests pass