Skip to content

Keep standardize piecewise/categorical on the device for cupy inputs#3159

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-performance-mcda-2026-06-10-02
Jun 10, 2026
Merged

Keep standardize piecewise/categorical on the device for cupy inputs#3159
brendancol merged 2 commits into
mainfrom
deep-sweep-performance-mcda-2026-06-10-02

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Fixes the standardize() part of #3151 (items 1 and 2). The monte_carlo sensitivity crash from that issue is handled separately together with #3152, since both need the same _monte_carlo rework.

  • _piecewise handed numpy lookup tables to cupy.interp, which rejects them (NotImplementedError). The tables are now converted with the data's own array module (a no-op for numpy), so plain-cupy piecewise works and stays on the device.
  • The _piecewise and _categorical dask chunk functions called np.asarray(block) with a comment claiming it handles cupy chunks. cupy refuses implicit conversion, so dask+cupy raised TypeError. The chunk functions now pick the block's own array module; cupy blocks are interpolated/mapped on the GPU and the result keeps its cupy meta.

Backend coverage: numpy and dask+numpy unchanged (same code path, xp is numpy); cupy piecewise and dask+cupy piecewise/categorical go from crashing to working, verified on a CUDA host with results staying in device memory end to end.

Test plan:

  • pytest xrspatial/tests/test_mcda.py (178 passed)
  • New TestStandardizeCupy tests gated on cuda_and_cupy_available: cupy piecewise, dask+cupy piecewise, dask+cupy categorical, each asserting parity with the numpy result and that the output is still a cupy array

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Keep standardize piecewise/categorical on the device for cupy inputs

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/mcda/standardize.py:272 (_interp_block): the breakpoint/value tables are converted with xp_b.asarray inside the chunk function, so cupy chunks re-upload the (tiny) tables on every block. Converting once outside the closure, based on the dask array's _meta, removes the per-chunk transfer. Same applies to keys/vals in _apply_mapping. Minor, since the tables are a few dozen bytes, but it is one line each.

Nits (optional improvements)

  • xrspatial/tests/test_mcda.py (TestStandardizeCupy): the raw fixture is 2x4 with (1, 2) chunks, which exercises multiple blocks but not ragged chunking. Not critical for elementwise ops.

What looks good

  • The fix picks the array module per block instead of guessing, so dask+numpy keeps its existing behavior byte for byte while dask+cupy stops crashing.
  • All three failure modes from #3151 items 1-2 have device-residency assertions (isinstance(..., cupy.ndarray)), not just value parity, so a silent fallback to host arrays would fail the test.

Checklist

  • Algorithm matches reference (np.interp/cupy.interp parity asserted)
  • All implemented backends produce consistent results (numpy path untouched; GPU paths verified on a CUDA host)
  • NaN handling is correct (NaN cell in the fixture, equal_nan parity)
  • Edge cases are covered by tests (unmapped categorical values stay NaN via parity with the numpy reference)
  • Dask chunk boundaries handled correctly (elementwise per block)
  • No premature materialization or unnecessary copies (apart from the per-chunk table upload noted above)
  • Benchmark exists or is not needed (no mcda benchmarks exist; not added here)
  • README feature matrix: standardize GPU cells already marked advanced; no tier change
  • Docstrings present and accurate

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after the second commit:

  • Per-chunk table upload: fixed. _piecewise and _categorical now resolve the array module once from data._meta and convert the lookup tables a single time outside the chunk closure.
  • Ragged-chunk nit: the GPU fixture now chunks (1, 3) on the 2x4 grid, giving an uneven 3+1 split.

No remaining findings. 178 tests pass locally, including the three GPU-gated cases on a CUDA host.

@brendancol brendancol merged commit a4bff3b into main Jun 10, 2026
7 checks passed
brendancol added a commit that referenced this pull request Jun 10, 2026
Union-merge the sweep CSV and test_mcda.py (#3148 API tests + #3149
coverage block). Drop three strict xfail markers that #3154 and #3159
turned into XPASS failures on main (piecewise cupy, categorical
dask+cupy, constrain attrs).
brendancol added a commit that referenced this pull request Jun 10, 2026
…e-mcda-2026-06-10-01

Conflicts:
- .claude/sweep-performance-state.csv: took main's newer geotiff/rasterize/
  reproject rows (LF endings) and re-inserted this branch's mcda row.

Semantic fixups in xrspatial/tests/test_mcda.py (auto-merged textually):
- Removed strict xfail markers that now XPASS: the three owa backend
  tests fixed by this PR, plus standardize cupy piecewise, standardize
  dask+cupy categorical (#3159) and constrain attrs (#3154), whose
  fixes merged before the #3156 xfails landed.
brendancol added a commit that referenced this pull request Jun 10, 2026
…cda-2026-06-10

Conflicts: .claude/sweep-accuracy-state.csv, xrspatial/mcda/sensitivity.py,
xrspatial/mcda/standardize.py, xrspatial/tests/test_mcda.py.

sensitivity.py taken from main (#3160 chunk-bounded monte_carlo already
handles cupy). standardize.py taken from main (#3159) plus this PR's
ascontiguousarray for cupy.interp, which #3159 lacked. test_mcda.py keeps
main's #3156 coverage with the strict xfail markers removed for the paths
this PR fixes (owa cupy/dask, standardize piecewise/categorical GPU,
monte-carlo sensitivity GPU) and for constrain attrs (fixed by #3154);
the de-xfail'd monte_carlo assertions are now NaN-aware. Sweep CSV is
main's rows plus this PR's mcda row.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant