Skip to content

resample: preserve integer precision in nodata mask comparison (#2570)#2592

Merged
brendancol merged 2 commits into
mainfrom
issue-2570
May 29, 2026
Merged

resample: preserve integer precision in nodata mask comparison (#2570)#2592
brendancol merged 2 commits into
mainfrom
issue-2570

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2570.

Summary

  • _resolve_nodata no longer routes integer sentinels through float(); it casts to the input dtype instead so the equality test in _apply_nodata_mask runs in integer space.
  • For int64 sentinels above 2**53 (e.g. nodata = 2**60 + 1), valid cells that share the rounded float representation (e.g. 2**60) are no longer masked out.
  • Float inputs still route through float() so the existing NaN short-circuit keeps working.

Backend coverage

The fix lives in the shared masking helper that every backend goes through (numpy / cupy / dask+numpy / dask+cupy). Verified on numpy and dask+numpy locally.

Test plan

  • New TestNodataIntegerPrecision class in xrspatial/tests/test_resample.py covering:
    • int64 sentinel above 2**53 via explicit kwarg
    • int64 sentinel via _FillValue attr
    • int32 sentinel (negative min)
    • uint16 sentinel (max)
    • float NaN sentinel short-circuit still works
    • float finite sentinel still works
  • Pre-existing TestNodata cases still pass
  • Full test_resample.py suite (184 tests) green
  • test_resample_coverage_2026_05_27.py and test_resample_signature_annot_2544.py green

`_resolve_nodata` cast the sentinel through `float()` and
`_apply_nodata_mask` compared the data array against that float. For
int64 values above 2**53 the cast is lossy, so distinct integers
collided after rounding and valid cells were masked to NaN (e.g.
nodata=2**60+1 also masked a valid 2**60 cell).

Fix: keep the sentinel in the input's native dtype when the input is
integer/bool, and compare in integer space before promoting the
output to float32. Float inputs still route through float() so NaN
semantics keep working.

The masking helper feeds every backend (numpy / cupy / dask+numpy /
dask+cupy), so the change lives in one place.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: resample integer nodata precision fix

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/resample.py:1162-1165 -- the try/except TypeError around the NaN check is dead code. isinstance(nodata, float) already gates the path, and np.isnan on a Python float will never raise TypeError. Drop the try wrapper and keep the if isinstance(...) and np.isnan(...) check on its own line.

Nits (optional improvements)

  • xrspatial/resample.py:1166 -- np.asarray(nodata).astype(agg.dtype).item() will silently truncate a non-integer float sentinel (e.g. nodata=-9999.5 on int32 data becomes -9999). Realistically nobody passes a fractional sentinel, but a one-line check like if isinstance(nodata, float) and not nodata.is_integer(): raise ValueError(...) would catch the obvious user error. Skip if you'd rather keep the helper minimal.
  • xrspatial/tests/test_resample.py:1456 -- test_int64_sentinel_above_2pow53_preserves_distinct_values is the headline regression test. Consider asserting out.values[1, 1] == np.float32(1 << 60) so a future regression that masks the cell to NaN AND a regression that swaps the surviving value both get caught. The current np.isfinite check is correct but a bit loose.

What looks good

  • The split between float-input and integer-input paths in _resolve_nodata is the right call. It keeps the float NaN short-circuit working while routing integer comparisons through the input dtype.
  • _apply_nodata_mask computes the mask before promoting to float32, which is exactly the surgical change the bug needs. The mask carries over correctly to agg.where(mask) after the cast.
  • Backend coverage is right: the masking helper sits upstream of every per-backend dispatch path, so numpy, cupy, dask+numpy, and dask+cupy all inherit the fix automatically. The PR body confirms numpy and dask+numpy were exercised locally.
  • Test coverage hits the exact failure mode (int64 sentinel above 2**53), the explicit-kwarg and attr paths, and the unaffected float branches.

Checklist

  • Algorithm matches reference / intent (integer equality in input dtype)
  • All implemented backends produce consistent results (single shared helper)
  • NaN handling is correct (float NaN short-circuit preserved; NaN sentinel on int data is a no-op)
  • Edge cases are covered by tests (int64 > 2**53, int32, uint16, float NaN, float finite)
  • No dask chunk boundary concerns (no neighborhood op in this change)
  • No premature materialization or unnecessary copies
  • Benchmark: not applicable for a correctness fix in a helper
  • README feature matrix: not applicable, no new function or backend
  • Docstrings updated to reflect the new dispatch on input dtype

…en test (#2570)

- Remove the `try/except TypeError` wrapper around the NaN check; the
  `isinstance(nodata, float)` gate makes the except branch unreachable.
- Raise ValueError when a fractional float sentinel (e.g. -9999.5) is
  passed for an integer input -- silent truncation to int would mask
  cells the caller never asked to mask.
- Tighten the int64-above-2**53 regression test to assert the surviving
  cell holds the right value, not just that it's finite.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Re-ran the review after the address-nits commit (1adea6f). All previous findings are resolved:

  • Suggestion (dead try/except): FIXED -- the wrapper is gone; the isinstance + np.isnan check stands alone.
  • Nit (silent fractional truncation): FIXED -- _resolve_nodata now raises ValueError("nodata=... is not representable in integer dtype ...") when a non-integer float sentinel is passed for an integer input. Added test_fractional_float_sentinel_on_int_input_raises to lock it in.
  • Nit (loose isfinite assertion): FIXED -- the int64-above-2**53 regression test now asserts the surviving cell equals np.float32(1 << 60), so masking-to-NaN and value-swap regressions both get caught.

Full test_resample.py suite passes locally (185 tests). No remaining blockers, suggestions, or nits.

@brendancol brendancol merged commit 6105ca5 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resample: int64 nodata mask loses precision via float cast

1 participant