Describe the bug
_apply_eager_nodata_mask in xrspatial/geotiff/_attrs.py promotes integer buffers to float64 and then compares against the sentinel:
arr = arr.astype(np.float64)
mask = arr == np.float64(nodata_int)
For int64/uint64 rasters with a sentinel above 253, float64 rounding makes nearby valid values compare equal to the sentinel. With nodata=INT64_MAX, every value in [INT64_MAX - 512, INT64_MAX] rounds to 263 and gets masked to NaN. With UINT64_MAX the window is 1024 values wide.
The dask chunk path (_delayed_read_window in _backends/dask.py), the GPU GDS chunk path (_apply_nodata_mask_gpu in _backends/_gpu_helpers.py), and the VRT path all compute the mask at native integer width before promoting, so they only mask exact sentinel hits. The same file read with masked=True therefore gives different results on the eager backends than on dask:
i64max = np.iinfo(np.int64).max
data = np.array([[i64max, i64max - 1, i64max - 100],
[i64max - 511, i64max - 512, i64max - 513],
[1000, 2000, 3000]], dtype=np.int64)
da = xr.DataArray(data, dims=('y', 'x'),
coords={'y': [2.5, 1.5, 0.5], 'x': [0.5, 1.5, 2.5]})
to_geotiff(da, path, nodata=i64max, compression='deflate')
open_geotiff(path, masked=True) # 4 NaNs (3 are valid pixels)
open_geotiff(path, masked=True, chunks=2) # 1 NaN (correct)
The function is also inconsistent with itself: the mask_nodata=False scan a few lines down compares at native width (arr == arr.dtype.type(nodata_int)), so nodata_pixels_present and the actual masking can disagree about which pixels are nodata.
The write side already defends against exactly this. _overview_kernels.py keeps integer reductions on the numpy path because the sentinel mask has to be computed at native integer width before any float64 promotion (64-bit sentinels like INT64_MAX round when cast), and tests/write/test_overview.py pins that behavior for UINT64_MAX. The eager read path contradicts the module's own convention.
Expected behavior
Eager masked reads mask only pixels that exactly equal the sentinel at the source dtype's width, matching the dask, GPU-chunked, and VRT read paths. Affected backends: numpy eager and cupy eager (both route through _apply_eager_nodata_mask; the GPU eager sites call it via duck-typing from _finalize_eager_read).
Additional context
Found by the accuracy sweep against the geotiff module. Promoting int64 data above 2**53 to float64 is lossy regardless, but the mask should not convert valid pixels to NaN, and the four backends should agree.
Describe the bug
_apply_eager_nodata_maskinxrspatial/geotiff/_attrs.pypromotes integer buffers to float64 and then compares against the sentinel:For int64/uint64 rasters with a sentinel above 253, float64 rounding makes nearby valid values compare equal to the sentinel. With
nodata=INT64_MAX, every value in[INT64_MAX - 512, INT64_MAX]rounds to 263 and gets masked to NaN. With UINT64_MAX the window is 1024 values wide.The dask chunk path (
_delayed_read_windowin_backends/dask.py), the GPU GDS chunk path (_apply_nodata_mask_gpuin_backends/_gpu_helpers.py), and the VRT path all compute the mask at native integer width before promoting, so they only mask exact sentinel hits. The same file read withmasked=Truetherefore gives different results on the eager backends than on dask:The function is also inconsistent with itself: the
mask_nodata=Falsescan a few lines down compares at native width (arr == arr.dtype.type(nodata_int)), sonodata_pixels_presentand the actual masking can disagree about which pixels are nodata.The write side already defends against exactly this.
_overview_kernels.pykeeps integer reductions on the numpy path because the sentinel mask has to be computed at native integer width before any float64 promotion (64-bit sentinels like INT64_MAX round when cast), andtests/write/test_overview.pypins that behavior for UINT64_MAX. The eager read path contradicts the module's own convention.Expected behavior
Eager masked reads mask only pixels that exactly equal the sentinel at the source dtype's width, matching the dask, GPU-chunked, and VRT read paths. Affected backends: numpy eager and cupy eager (both route through
_apply_eager_nodata_mask; the GPU eager sites call it via duck-typing from_finalize_eager_read).Additional context
Found by the accuracy sweep against the geotiff module. Promoting int64 data above 2**53 to float64 is lossy regardless, but the mask should not convert valid pixels to NaN, and the four backends should agree.