Found by the performance sweep against xrspatial/mcda (dask 2025.7.0). Two problems in xrspatial/mcda/combine.py, both on the dask path.
1. owa() raises AttributeError on dask-backed Datasets
_sort_descending calls da.sort, which does not exist in dask:
import numpy as np, xarray as xr
from xrspatial.mcda import owa
ds = xr.Dataset({
"a": xr.DataArray(np.random.rand(20, 20), dims=["y", "x"]).chunk({"y": 10}),
"b": xr.DataArray(np.random.rand(20, 20), dims=["y", "x"]).chunk({"y": 10}),
})
owa(ds, {"a": 0.5, "b": 0.5}, [0.6, 0.4])
# AttributeError: module 'dask.array' has no attribute 'sort'
This is worse than a missing feature. When the criteria stack won't fit in RAM, the eager path's memory guard raises MemoryError telling the user to "Use a dask-backed Dataset for out-of-core processing", and that path then crashes. There is no dask test for owa (TestDaskChunkAlignment covers standardize, wlc, and fuzzy_overlay only), which is how this slipped through.
Fix: rechunk the criterion axis to a single chunk and sort per block with map_blocks + np.sort(axis=0). Sorting an axis fully contained in one chunk is exact, and peak memory stays bounded by the block size.
2. wpm() validation calls .compute() once per criterion
_check_wpm_positive loops over criteria.data_vars and runs float(da.nanmin(arr).compute()) for each variable. With N dask-backed criteria that is N separate scheduler runs, serialized, each a full pass over one layer, all at call time before the actual product is computed. Measured: a 4-criterion dataset triggers 4 scheduler invocations during graph construction.
Fix: collect the nanmin reductions and run them through one dask.compute(*mins) call so validation is a single parallel pass.
Backends affected: dask+numpy and dask+cupy. The numpy paths behave the same as before.
Found by the performance sweep against
xrspatial/mcda(dask 2025.7.0). Two problems inxrspatial/mcda/combine.py, both on the dask path.1.
owa()raises AttributeError on dask-backed Datasets_sort_descendingcallsda.sort, which does not exist in dask:This is worse than a missing feature. When the criteria stack won't fit in RAM, the eager path's memory guard raises MemoryError telling the user to "Use a dask-backed Dataset for out-of-core processing", and that path then crashes. There is no dask test for
owa(TestDaskChunkAlignmentcovers standardize, wlc, and fuzzy_overlay only), which is how this slipped through.Fix: rechunk the criterion axis to a single chunk and sort per block with
map_blocks+np.sort(axis=0). Sorting an axis fully contained in one chunk is exact, and peak memory stays bounded by the block size.2.
wpm()validation calls.compute()once per criterion_check_wpm_positiveloops overcriteria.data_varsand runsfloat(da.nanmin(arr).compute())for each variable. With N dask-backed criteria that is N separate scheduler runs, serialized, each a full pass over one layer, all at call time before the actual product is computed. Measured: a 4-criterion dataset triggers 4 scheduler invocations during graph construction.Fix: collect the
nanminreductions and run them through onedask.compute(*mins)call so validation is a single parallel pass.Backends affected: dask+numpy and dask+cupy. The numpy paths behave the same as before.