Found by the performance sweep against xrspatial/mcda, verified on a CUDA host (cupy 13.6). standardize() documents support for numpy, cupy, dask+numpy, and dask+cupy, but three GPU paths fail outright.
1. standardize(method="piecewise") on cupy input
_piecewise calls xp.interp(data, bp, vl) with xp = cupy while bp and vl are numpy arrays:
NotImplementedError: Only int or ndarray are supported for a
2. standardize(method="piecewise" | "categorical") on dask+cupy input
The map_blocks chunk functions call np.asarray(block) with a comment claiming this "handles cupy chunks". It does not; cupy refuses implicit conversion:
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
Even if the conversion worked, it would be a device-to-host copy per chunk with the interpolation running on the CPU and the result left there.
3. sensitivity(method="monte_carlo") on cupy input
_monte_carlo reads score.values every iteration, which hits the same implicit-conversion TypeError on cupy-backed data. Were the conversion allowed, it would be one device-to-host transfer per sample (1000 by default) with the Welford accumulation running on the host.
Proposed fixes
- piecewise on cupy: hand
cupy.interp cupy copies of the breakpoint and value tables so everything stays on device.
- piecewise/categorical chunk functions: detect cupy blocks and use the matching array module instead of converting to numpy.
- monte_carlo: accumulate with the array's own module (
score.data rather than .values) and convert to numpy once at the end.
Found by the performance sweep against
xrspatial/mcda, verified on a CUDA host (cupy 13.6).standardize()documents support for numpy, cupy, dask+numpy, and dask+cupy, but three GPU paths fail outright.1.
standardize(method="piecewise")on cupy input_piecewisecallsxp.interp(data, bp, vl)withxp = cupywhilebpandvlare numpy arrays:2.
standardize(method="piecewise" | "categorical")on dask+cupy inputThe
map_blockschunk functions callnp.asarray(block)with a comment claiming this "handles cupy chunks". It does not; cupy refuses implicit conversion:Even if the conversion worked, it would be a device-to-host copy per chunk with the interpolation running on the CPU and the result left there.
3.
sensitivity(method="monte_carlo")on cupy input_monte_carloreadsscore.valuesevery iteration, which hits the same implicit-conversion TypeError on cupy-backed data. Were the conversion allowed, it would be one device-to-host transfer per sample (1000 by default) with the Welford accumulation running on the host.Proposed fixes
cupy.interpcupy copies of the breakpoint and value tables so everything stays on device.score.datarather than.values) and convert to numpy once at the end.