Skip to content

Add xfail'd GPU and dask+GPU pack=True round-trip tests (#3114)#3127

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-test-coverage-geotiff-2026-06-09
Jun 10, 2026
Merged

Add xfail'd GPU and dask+GPU pack=True round-trip tests (#3114)#3127
brendancol merged 3 commits into
mainfrom
deep-sweep-test-coverage-geotiff-2026-06-09

Conversation

@brendancol

@brendancol brendancol commented Jun 9, 2026

Copy link
Copy Markdown
Contributor
  • Adds test_pack_round_trip_gpu to tests/write/test_pack_3064.py, parametrized over eager GPU and dask+GPU reads: open_geotiff(unpack=True, gpu=True[, chunks=2]) followed by to_geotiff(pack=True), asserting the uint8 source ints round-trip and the repacked file unpacks to the same values as the CPU path.
  • Both legs crash today (to_geotiff(pack=True) crashes on cupy and dask+cupy input #3112: _pack's fillna breaks on cupy-backed arrays), so the tests run behind requires_gpu with xfail(strict=True, raises=(AttributeError, TypeError)) against that issue — pinning to the known exception types so unrelated assertion failures surface as real failures instead of hiding under the known crash. When the crash is fixed they flip to hard failures and the marker comes off.
  • Fixes the module docstring, which still claimed GPU rejects mask_and_scale; that has been stale since Rename open_geotiff mask_and_scale to unpack and add GPU / dask+GPU support #3075.
  • Updates the test-coverage sweep state CSV for the geotiff module.

Test-only change; the source fix stays with #3112.

Backend coverage: numpy and dask+numpy were already tested; this adds the cupy and dask+cupy legs (currently xfail).

Proposed Changes

  • Add test_pack_round_trip_gpu parametrized over eager GPU and dask+GPU backends, gated by requires_gpu with xfail(strict=True, raises=(AttributeError, TypeError)) pinned to the to_geotiff(pack=True) crashes on cupy and dask+cupy input #3112 crash.
  • Pin xfail to raises=(AttributeError, TypeError) so unrelated failures (e.g. a GPU read-side regression dropping the mask_and_scale attr) surface as real failures rather than silently matching the known bug.
  • Fix stale module docstring claiming GPU rejects mask_and_scale.
  • Update .claude/sweep-test-coverage-state.csv for the geotiff module.
  • Merge origin/main into branch, resolving conflict in .claude/sweep-test-coverage-state.csv (kept our updated geotiff row; took main's updated rasterize and reproject rows).

Test plan:

unpack=True reads work on gpu and dask+gpu since #3075, but nothing in
the suite packed a GPU-read array back with to_geotiff(pack=True). Both
legs crash today (#3112), so the new tests run behind requires_gpu and
xfail(strict=True) against that issue; they flip to failures the moment
the _pack fillna crash is fixed. Also fixes the test module docstring,
which still said GPU rejects mask_and_scale.

Sweep state CSV updated for the geotiff module.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add xfail'd GPU and dask+GPU pack=True round-trip tests (#3114)

Blockers (must fix before merge)

  • None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/tests/write/test_pack_3064.py:160 -- the bare xfail(strict=True) absorbs any failure, not just the #3112 crash. The assert at line 173 (mask_and_scale_dtype == "uint8") runs before the pack call, so a GPU read-side regression that drops the attr would xfail silently and look like the known bug. Pin the expected failure with raises=(AttributeError, TypeError) so an unrelated assertion failure surfaces as a real failure instead.

Nits (optional improvements)

  • The two failure modes in #3112 may be version-dependent (the eager leg matches the known cupy 13.6 / xarray 2025.12 where/astype incompatibility). If a GPU CI host carries different pins where fillna works, strict=True turns an accidental pass into a CI failure. That failure would be informative (it means the env, not the code, decided the outcome), so keeping strict seems right -- but watch the first GPU CI run.

What looks good

  • The tests were executed on a CUDA host and observed to fail for the exact #3112 crash before the xfail was applied; 13 passed, 2 xfailed.
  • Parametrization, tmp-file naming (*_3114), requires_gpu gating, and the _write_int_tiff fixture reuse all match the surrounding file.
  • The stale module docstring (claiming GPU rejects mask_and_scale) is corrected in the same place the new coverage lands.
  • Test-only: no source changes, the fix stays with #3112.

Checklist

  • Algorithm matches reference (round-trip parity asserted against the CPU eager path)
  • All implemented backends produce consistent results (gpu/dask+gpu pinned as xfail on #3112)
  • NaN handling is correct (sentinel -> NaN -> sentinel round-trip asserted with equal_nan)
  • Edge cases covered (scale/offset + sentinel in one fixture)
  • Dask chunk boundaries handled (chunks=2 leg)
  • No premature materialization (np.asarray only on results under test)
  • Benchmark not needed (test-only PR)
  • README feature matrix unchanged (no API change)
  • Docstrings accurate after the stale-claim fix

Review follow-up: a bare strict xfail would absorb any failure,
including a GPU read-side regression in the mask_and_scale_dtype attr
assert that runs before the pack call. raises=(AttributeError,
TypeError) keeps unrelated failures visible.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 5751a11:

  • The one suggestion is addressed: the xfail now carries raises=(AttributeError, TypeError) with a comment mapping each exception to its backend leg, so an unrelated assertion failure (e.g. the attr check before the pack call) surfaces instead of hiding under the known #3112 crash.
  • Re-ran the file on the CUDA host: 13 passed, 2 xfailed.
  • The nit (strict xfail vs possible GPU CI dependency pins) stands as a watch item for the first GPU CI run, not a change request.

No remaining findings.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026
@brendancol

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

…age-geotiff-2026-06-09

# Conflicts:
#	.claude/sweep-test-coverage-state.csv

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Merge conflicts resolved in 02b43e2. The only conflict was in .claude/sweep-test-coverage-state.csv — kept our updated geotiff row (with #3112/#3114) and took main's updated rasterize and reproject rows. All 13 non-GPU pack tests still pass.

@brendancol brendancol merged commit 0d93ed6 into main Jun 10, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants