Skip to content

cumulative_viewshed recomputes the dask source raster once per observer #3185

@brendancol

Description

@brendancol

Description

cumulative_viewshed in xrspatial/visibility.py loops over observers and calls viewshed(raster, ...) once per observer, accumulating a count. A code comment ("Detect dask backend to keep accumulation lazy") implies the dask path stays lazy. It doesn't.

For a dask-backed raster with no max_distance, each viewshed() call goes through _viewshed_dask Tier B, which calls raster.data.compute() to run the exact CPU sweep. Since cumulative_viewshed hands the same dask raster to every call, the full source raster gets materialized N times, once per observer.

Reproduction

A probe that wraps the source in da.map_blocks and counts block evaluations shows one source compute per observer:

4 observers => 4 eager source computes during cumulative_viewshed

The lazy accumulation buys nothing here: the expensive sweep has already run eagerly N times before the accumulation graph is even built.

Impact

The per-observer sweep is unavoidable; each observer needs its own visibility grid. What's redundant is recomputing the same source dask graph N times instead of once. When the source is backed by an expensive graph (chained ops, file reads), that multiplies the source cost by the observer count.

Peak memory stays around one grid, so this is not an OOM problem. The output graph grows by roughly 64 tasks per observer, so a very large observer count produces a long graph.

Proposed fix

When the raster is dask-backed and neither the function-level max_distance nor any per-observer max_distance is set (every observer takes the full-grid Tier B path), compute raster.data once into a numpy-backed DataArray before the loop and pass that to viewshed(). When any max_distance is present, keep the current per-observer behavior so dask windowing still loads only each observer's window.

Results stay identical: Tier B already computes to numpy and runs _viewshed_cpu, and the existing test_dask_matches_numpy already asserts dask/numpy parity.

Affected backends: dask+numpy and dask+cupy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancePR touches performance-sensitive code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions