Stop downcasting mean() GPU paths to float32 (#3222)#3229
Closed
brendancol wants to merge 5 commits into
Closed
Conversation
brendancol
commented
Jun 10, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Stop downcasting mean() GPU paths to float32 (#3222)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/tests/test_focal.py:101-test_mean_transfer_function_dask_gpustill compares withrtol=1e-4, a tolerance that existed to absorb the float32 drift this PR removes. It can be tightened to the default now that all four backends compute in float64. Fine to leave if you want to keep that test's scope unchanged.
Nits (optional improvements)
- The new regression tests cover a single pass. A
passes=3case would also lock in that the error no longer compounds, though single-pass parity already implies it since each pass runs the same code.
What looks good
- The fix reuses
_promote_float, the same helper #2805 used forapply()/focal_stats(), somean()now follows the module convention instead of a one-off cast. - Casting
excludestodata_cu.dtype(focal.py:266) closes a second, subtler mismatch: exclude values were previously compared in float32 on GPU but float64 on CPU. - Both new tests assert dtype and values, ran on real CUDA hardware, and the +1e7 offset makes the old bug fail loudly (abs error 0.5) rather than slipping under a tolerance.
_mean_cupy_boundary's pad-then-trim path inherits the fix since_pad_arraypreserves dtype.
Checklist
- Algorithm matches reference (no algorithm change; dtype only)
- All implemented backends produce consistent results (float64 parity verified on GPU)
- NaN handling is correct (unchanged;
isnanchecks are dtype-agnostic) - Edge cases are covered by tests (large-offset values; existing suite covers NaN/boundary)
- Dask chunk boundaries handled correctly (depth=(1,1) unchanged; chunked test added)
- No premature materialization or unnecessary copies (one astype, same as before)
- Benchmark exists or is not needed (no perf-relevant change for float32 inputs; float64 inputs now do real float64 work on GPU, which is the point)
- README feature matrix: no change needed
- Docstrings present and accurate (no public API change)
brendancol
commented
Jun 10, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review after 62016a0:
- Suggestion (rtol=1e-4 in test_mean_transfer_function_dask_gpu): fixed. The comparison now uses the default tolerance with a comment explaining why the float32 allowance is gone.
- Nit (multi-pass coverage): fixed. test_mean_preserves_float64_gpu_3222 now also compares passes=3 output between numpy and cupy.
All 28 mean-related tests pass on a CUDA host. Nothing further from me.
…ocal-2026-06-10-01
Main's #3221 (issue #3214) landed the same float64-preservation fix for the mean() GPU paths that this branch made for #3222, so the merge takes main's focal.py wholesale. The branch's regression tests survive: multi-pass GPU parity and dask+cupy large-offset parity are not covered by #3221's tests, and the old rtol=1e-4 allowance on the dask+gpu transfer test can tighten now that every backend computes in float64.
Contributor
Author
|
Closing as superseded by #3221, which merged first with the same _promote_float change in _mean_cupy/_mean_dask_cupy plus the mean() entry-point fix, excludes dtype handling, and broader dtype tests. The only bits here not on main are a passes=3 compounding test and a dask+cupy large-offset parity test; those can ride along with a future focal test PR if wanted. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3222
_mean_cupyand_mean_dask_cupycast their input to float32, so a float64 raster came back float32 on GPU while the CPU paths returned float64. On values around 1e7 the GPU result was off by up to 0.5, and the error compounded withpasses > 1._promote_float, which keeps float64 and promotes non-float input, matching what Preserve input float dtype in apply() and focal_stats() (#2769) #2805 did forapply()andfocal_stats().Backend coverage: numpy and dask+numpy are unchanged (they already computed in float64). cupy and dask+cupy now match them bit-for-bit on the regression input.
Test plan:
test_mean_preserves_float64_gpu_3222(cupy, runs on GPU)test_mean_preserves_float64_dask_gpu_3222(dask+cupy)test_focal.pysuite: 235 passed on a CUDA host