Skip to content

mcda: owa() crashes on dask input; wpm() validation runs one compute per criterion #3150

@brendancol

Description

@brendancol

Found by the performance sweep against xrspatial/mcda (dask 2025.7.0). Two problems in xrspatial/mcda/combine.py, both on the dask path.

1. owa() raises AttributeError on dask-backed Datasets

_sort_descending calls da.sort, which does not exist in dask:

import numpy as np, xarray as xr
from xrspatial.mcda import owa

ds = xr.Dataset({
    "a": xr.DataArray(np.random.rand(20, 20), dims=["y", "x"]).chunk({"y": 10}),
    "b": xr.DataArray(np.random.rand(20, 20), dims=["y", "x"]).chunk({"y": 10}),
})
owa(ds, {"a": 0.5, "b": 0.5}, [0.6, 0.4])
# AttributeError: module 'dask.array' has no attribute 'sort'

This is worse than a missing feature. When the criteria stack won't fit in RAM, the eager path's memory guard raises MemoryError telling the user to "Use a dask-backed Dataset for out-of-core processing", and that path then crashes. There is no dask test for owa (TestDaskChunkAlignment covers standardize, wlc, and fuzzy_overlay only), which is how this slipped through.

Fix: rechunk the criterion axis to a single chunk and sort per block with map_blocks + np.sort(axis=0). Sorting an axis fully contained in one chunk is exact, and peak memory stays bounded by the block size.

2. wpm() validation calls .compute() once per criterion

_check_wpm_positive loops over criteria.data_vars and runs float(da.nanmin(arr).compute()) for each variable. With N dask-backed criteria that is N separate scheduler runs, serialized, each a full pass over one layer, all at call time before the actual product is computed. Measured: a 4-criterion dataset triggers 4 scheduler invocations during graph construction.

Fix: collect the nanmin reductions and run them through one dask.compute(*mins) call so validation is a single parallel pass.

Backends affected: dask+numpy and dask+cupy. The numpy paths behave the same as before.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdaskDask backend / chunked arraysperformancePR touches performance-sensitive codeseverity:highSweep finding: HIGHsweep-performanceFound by /sweep-performance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions