Skip to content

Stop downcasting mean() GPU paths to float32 (#3222)#3229

Closed
brendancol wants to merge 5 commits into
mainfrom
deep-sweep-security-focal-2026-06-10-01
Closed

Stop downcasting mean() GPU paths to float32 (#3222)#3229
brendancol wants to merge 5 commits into
mainfrom
deep-sweep-security-focal-2026-06-10-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3222

  • _mean_cupy and _mean_dask_cupy cast their input to float32, so a float64 raster came back float32 on GPU while the CPU paths returned float64. On values around 1e7 the GPU result was off by up to 0.5, and the error compounded with passes > 1.
  • Both paths now promote with _promote_float, which keeps float64 and promotes non-float input, matching what Preserve input float dtype in apply() and focal_stats() (#2769) #2805 did for apply() and focal_stats().
  • The state CSV commit is from the security sweep that found this.

Backend coverage: numpy and dask+numpy are unchanged (they already computed in float64). cupy and dask+cupy now match them bit-for-bit on the regression input.

Test plan:

  • test_mean_preserves_float64_gpu_3222 (cupy, runs on GPU)
  • test_mean_preserves_float64_dask_gpu_3222 (dask+cupy)
  • Full test_focal.py suite: 235 passed on a CUDA host

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Stop downcasting mean() GPU paths to float32 (#3222)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/tests/test_focal.py:101 - test_mean_transfer_function_dask_gpu still compares with rtol=1e-4, a tolerance that existed to absorb the float32 drift this PR removes. It can be tightened to the default now that all four backends compute in float64. Fine to leave if you want to keep that test's scope unchanged.

Nits (optional improvements)

  • The new regression tests cover a single pass. A passes=3 case would also lock in that the error no longer compounds, though single-pass parity already implies it since each pass runs the same code.

What looks good

  • The fix reuses _promote_float, the same helper #2805 used for apply()/focal_stats(), so mean() now follows the module convention instead of a one-off cast.
  • Casting excludes to data_cu.dtype (focal.py:266) closes a second, subtler mismatch: exclude values were previously compared in float32 on GPU but float64 on CPU.
  • Both new tests assert dtype and values, ran on real CUDA hardware, and the +1e7 offset makes the old bug fail loudly (abs error 0.5) rather than slipping under a tolerance.
  • _mean_cupy_boundary's pad-then-trim path inherits the fix since _pad_array preserves dtype.

Checklist

  • Algorithm matches reference (no algorithm change; dtype only)
  • All implemented backends produce consistent results (float64 parity verified on GPU)
  • NaN handling is correct (unchanged; isnan checks are dtype-agnostic)
  • Edge cases are covered by tests (large-offset values; existing suite covers NaN/boundary)
  • Dask chunk boundaries handled correctly (depth=(1,1) unchanged; chunked test added)
  • No premature materialization or unnecessary copies (one astype, same as before)
  • Benchmark exists or is not needed (no perf-relevant change for float32 inputs; float64 inputs now do real float64 work on GPU, which is the point)
  • README feature matrix: no change needed
  • Docstrings present and accurate (no public API change)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 62016a0:

  • Suggestion (rtol=1e-4 in test_mean_transfer_function_dask_gpu): fixed. The comparison now uses the default tolerance with a comment explaining why the float32 allowance is gone.
  • Nit (multi-pass coverage): fixed. test_mean_preserves_float64_gpu_3222 now also compares passes=3 output between numpy and cupy.

All 28 mean-related tests pass on a CUDA host. Nothing further from me.

Main's #3221 (issue #3214) landed the same float64-preservation fix
for the mean() GPU paths that this branch made for #3222, so the
merge takes main's focal.py wholesale. The branch's regression tests
survive: multi-pass GPU parity and dask+cupy large-offset parity are
not covered by #3221's tests, and the old rtol=1e-4 allowance on the
dask+gpu transfer test can tighten now that every backend computes
in float64.
@brendancol

Copy link
Copy Markdown
Contributor Author

Closing as superseded by #3221, which merged first with the same _promote_float change in _mean_cupy/_mean_dask_cupy plus the mean() entry-point fix, excludes dtype handling, and broader dtype tests. The only bits here not on main are a passes=3 compounding test and a dask+cupy large-offset parity test; those can ride along with a future focal test PR if wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

focal mean() computes in float32 on GPU but float64 on CPU

1 participant