Skip to content

Refresh nodata/nodatavals on resample identity fast path (#2662)#2670

Merged
brendancol merged 4 commits into
mainfrom
issue-2662
May 29, 2026
Merged

Refresh nodata/nodatavals on resample identity fast path (#2662)#2670
brendancol merged 4 commits into
mainfrom
issue-2662

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2662

What changed

  • The resample identity fast path (scale_factor=1.0) and the 3D dispatch path masked nodata sentinels to NaN but refreshed only _FillValue, leaving attrs['nodata'] and attrs['nodatavals'] pointing at the original finite sentinel. Output advertised nodata=-9999 while the data was already NaN.
  • Factored the non-identity path's refresh logic into a _refresh_nodata_attrs helper and called it from all three paths, so masked-to-NaN output never advertises a stale integer sentinel.

Backend coverage

The fix is backend-independent attr handling; the mask itself routes through xarray.where. Regression tests cover numpy and dask+numpy. cupy and dask+cupy masking is exercised by the existing cross-backend nodata coverage.

Test plan

  • New TestIdentityNodataMetadata class covers: identity path refreshes all three attrs, identity matches the non-identity path, absent attrs stay absent, and the 3D path is fixed too.
  • Full resample suite passes (289 tests).

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
The identity (scale_factor=1.0) and 3D dispatch paths masked sentinel
values to NaN but only updated _FillValue, leaving attrs['nodata'] and
attrs['nodatavals'] advertising the stale finite sentinel. Factor the
non-identity path's refresh into _refresh_nodata_attrs and call it from
all three paths so masked-to-NaN output never advertises a stale
integer sentinel.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Refresh nodata/nodatavals on resample identity fast path (#2662)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/resample.py identity path (~L1392) and non-identity path (~L1503): the explicit _FillValue = float('nan') line is redundant whenever _FillValue is already in the input attrs, since _refresh_nodata_attrs sets it too. It is still needed for the explicit-nodata=-only case where no _FillValue attr exists, so leaving it is correct. A one-line comment noting that would help the next reader, but it is not required.

What looks good

  • The fix is symmetric across the three code paths: identity, 3D, and non-identity now all run if has_nodata: _FillValue=nan followed by _refresh_nodata_attrs, so they agree on what the output advertises.
  • The shared helper reads from the original src_attrs and writes to dst_attrs. That avoids the order-dependence bug you would hit if it read the partially-updated output attrs instead.
  • Tests hit the regression directly: masked pixel is NaN and no attr advertises the stale sentinel, identity matches the non-identity path, absent attrs stay absent, and the 3D path is covered.
  • The cupy/dask+cupy parametrize was dropped with a documented reason. The attr fix is backend-independent, so numpy + dask+numpy is enough.

Checklist

  • Algorithm matches intended behavior
  • Backends produce consistent results (attr fix is backend-independent)
  • NaN handling is correct
  • Edge cases covered (absent attrs, 3D path)
  • Dask chunk boundaries: not applicable, no map_overlap change
  • No premature materialization or unnecessary copies
  • Benchmark: not needed, no perf-sensitive change
  • README feature matrix: not applicable, no new function
  • Docstrings present and accurate

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (#2662)

The one nit from the first pass is addressed: the identity path now carries a comment explaining that the explicit _FillValue = nan line covers the explicit-nodata=-only case (where the input had no nodata attrs), while _refresh_nodata_attrs handles inputs that declared nodata / nodatavals. Logic is unchanged.

Also merged origin/main into the branch. The merge was clean (no conflicts); git auto-merged resample.py and test_resample.py. The full resample suite passes (222 tests) after the merge.

No new blockers, suggestions, or nits.

@brendancol brendancol merged commit 3e23143 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

resample identity fast path leaves stale nodata/nodatavals metadata

1 participant