Skip to content

mcda memory: monte_carlo sensitivity materializes the full dask dataset; constrain() deep-copies its input #3152

@brendancol

Description

@brendancol

Two memory findings from the performance sweep against xrspatial/mcda.

1. sensitivity(method="monte_carlo") calls criteria.compute() on dask input

_monte_carlo materializes the whole dataset up front. The comment next to it is right that the alternative it considered (chaining ~1000 lazy combine graphs) is worse, but the cure means the function only works when the full criteria stack fits in host memory. On a larger-than-memory raster the compute() call OOMs.

A chunk-wise version avoids both problems: draw all n_samples weight vectors up front (they are tiny), stack the criterion layers, and run the sample loop inside one map_blocks chunk function with Welford accumulation per block. Graph size stays at one task per chunk and peak memory is bounded by chunk size. The numpy path can share the same per-block kernel, so the two backends produce identical values for the same seed.

Note this changes the dask return type from eagerly-computed numpy to a lazy dask array, which is the point. test_monte_carlo_dask_returns_numpy codifies the eager behavior and needs updating with it.

2. constrain() deep-copies the suitability raster

constrain starts with result = suitability.copy() (deep by default). The copy is never mutated: xr.where returns new objects, and the only in-place change is the .name assignment at the end. On the eager path that is a full extra allocation and copy of the input for nothing. A shallow copy keeps the name assignment from leaking into the caller's object (the empty exclude case) without duplicating the buffer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    daskDask backend / chunked arraysoomOut-of-memory risk with large datasetsperformancePR touches performance-sensitive codeseverity:mediumSweep finding: MEDIUMsweep-performanceFound by /sweep-performance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions