Skip to content

open_geotiff(masked=True) falsely masks valid pixels near 64-bit integer sentinels on eager backends #3098

@brendancol

Description

@brendancol

Describe the bug

_apply_eager_nodata_mask in xrspatial/geotiff/_attrs.py promotes integer buffers to float64 and then compares against the sentinel:

arr = arr.astype(np.float64)
mask = arr == np.float64(nodata_int)

For int64/uint64 rasters with a sentinel above 253, float64 rounding makes nearby valid values compare equal to the sentinel. With nodata=INT64_MAX, every value in [INT64_MAX - 512, INT64_MAX] rounds to 263 and gets masked to NaN. With UINT64_MAX the window is 1024 values wide.

The dask chunk path (_delayed_read_window in _backends/dask.py), the GPU GDS chunk path (_apply_nodata_mask_gpu in _backends/_gpu_helpers.py), and the VRT path all compute the mask at native integer width before promoting, so they only mask exact sentinel hits. The same file read with masked=True therefore gives different results on the eager backends than on dask:

i64max = np.iinfo(np.int64).max
data = np.array([[i64max, i64max - 1, i64max - 100],
                 [i64max - 511, i64max - 512, i64max - 513],
                 [1000, 2000, 3000]], dtype=np.int64)
da = xr.DataArray(data, dims=('y', 'x'),
                  coords={'y': [2.5, 1.5, 0.5], 'x': [0.5, 1.5, 2.5]})
to_geotiff(da, path, nodata=i64max, compression='deflate')

open_geotiff(path, masked=True)            # 4 NaNs (3 are valid pixels)
open_geotiff(path, masked=True, chunks=2)  # 1 NaN (correct)

The function is also inconsistent with itself: the mask_nodata=False scan a few lines down compares at native width (arr == arr.dtype.type(nodata_int)), so nodata_pixels_present and the actual masking can disagree about which pixels are nodata.

The write side already defends against exactly this. _overview_kernels.py keeps integer reductions on the numpy path because the sentinel mask has to be computed at native integer width before any float64 promotion (64-bit sentinels like INT64_MAX round when cast), and tests/write/test_overview.py pins that behavior for UINT64_MAX. The eager read path contradicts the module's own convention.

Expected behavior

Eager masked reads mask only pixels that exactly equal the sentinel at the source dtype's width, matching the dask, GPU-chunked, and VRT read paths. Affected backends: numpy eager and cupy eager (both route through _apply_eager_nodata_mask; the GPU eager sites call it via duck-typing from _finalize_eager_read).

Additional context

Found by the accuracy sweep against the geotiff module. Promoting int64 data above 2**53 to float64 is lossy regardless, but the mask should not convert valid pixels to NaN, and the four backends should agree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions