Compute dask source once in cumulative_viewshed (#3185) by brendancol · Pull Request #3199 · xarray-contrib/xarray-spatial

brendancol · 2026-06-10T15:23:30Z

What changed

cumulative_viewshed called viewshed() once per observer. On the dask path with no max_distance, every call ran raster.data.compute() on the same source, so the source was materialised once per observer.
When every observer takes the full-grid path (no function-level or per-observer max_distance), the source is now computed once up front and the observers run against the in-memory raster. The accumulated count is re-wrapped as dask so the output backend still matches the dask input.
Observers that set a max_distance keep the dask windowing path, which loads only each observer's window.

Backend coverage

numpy: unchanged.
dask+numpy and dask+cupy: source computed once instead of N times on the no-max_distance path; per-observer max_distance behaviour unchanged.
cupy: unchanged.

Test plan

test_dask_source_computed_once asserts a single source compute for four observers and a dask-backed output.
test_dask_per_observer_max_distance_stays_lazy asserts the windowed path stays dask and matches numpy.
Existing test_dask_matches_numpy parity check still passes.
Full test_visibility.py suite: 27 passed.

cumulative_viewshed called viewshed() per observer; on the dask no-max_distance path each call computed the same source raster, so the source was materialised once per observer. Compute it once up front when every observer takes the full-grid path, run the observers against the in-memory raster, then re-wrap the count as dask to preserve the output backend. Observers that set max_distance keep the dask windowing path. Adds tests asserting the source is computed once and that the per-observer max_distance path stays lazy with dask/numpy parity.

brendancol · 2026-06-10T15:26:28Z

PR Review: Compute dask source once in cumulative_viewshed

(Posted as a comment via REST; gh pr review/gh pr comment GraphQL path returned HTTP 401.)

Blockers

None.

Suggestions

The da.from_array(count, chunks=_input_dask_chunks) wrap at the end builds a dask array whose data is already a concrete numpy buffer. Intentional and correct (it restores the dask output type without re-running the sweep), but the comment only says "Restore the dask-backed output type the dask input would have given." A half-line noting the count is already concrete, so nothing lazy is recovered, would save the next reader a double-take. Not blocking.

Nits

The new guard not any('max_distance' in obs for obs in observers) walks the observer list once before the main loop. Fine for normal observer counts.

What looks good

Surgical change matching the issue: compute the dask source once when every observer takes the full-grid path, otherwise keep per-observer dask windowing.
Output backend preserved: a dask input still returns a dask-backed result, output chunks match the input (verified).
Caller's raster left alone; raster.copy() does not compute or mutate the input (verified).
Correctness holds: Tier B already computed to numpy and ran _viewshed_cpu; existing test_dask_matches_numpy still passes.
Tests: test_dask_source_computed_once asserts one source compute for four observers and a dask output; test_dask_per_observer_max_distance_stays_lazy asserts the windowed path stays dask with numpy parity.
Verified locally: source computes dropped from N to 1; computing the result adds zero further source computes.

Checklist

Algorithm matches reference: N/A (scheduling change only)
Backends consistent: yes
NaN handling: unchanged
Edge cases tested: yes (no max_distance, per-observer max_distance, parity)
Dask chunk boundaries: yes (output chunks match input)
No premature materialization or unnecessary copies: improved (removes N-1 source materializations)
Benchmark: none; not needed for a scheduling fix
README feature matrix: N/A
Docstrings: accurate (no signature change)

…e-visibility-2026-06-10

Conflicts: - .claude/sweep-performance-state.csv: main normalized the file to LF and updated the rasterize row; this branch updated the visibility row. Took main's LF file and reapplied the branch's visibility row. xrspatial/visibility.py auto-merged (main side was the isort-only #3189).

# Conflicts: # xrspatial/tests/test_visibility.py # xrspatial/visibility.py

github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

brendancol added 4 commits June 10, 2026 08:27

Clarify already-concrete count comment in cumulative_viewshed (#3185)

cd56e3d

Merge remote-tracking branch 'origin/main' into deep-sweep-performanc…

013e41a

…e-visibility-2026-06-10

Merge remote-tracking branch 'origin/main' into HEAD

ca6f650

# Conflicts: # xrspatial/tests/test_visibility.py # xrspatial/visibility.py

brendancol merged commit 8408ec8 into main Jun 11, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute dask source once in cumulative_viewshed (#3185)#3199

Compute dask source once in cumulative_viewshed (#3185)#3199
brendancol merged 5 commits into
mainfrom
deep-sweep-performance-visibility-2026-06-10

brendancol commented Jun 10, 2026

Uh oh!

brendancol commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 10, 2026

What changed

Backend coverage

Test plan

Uh oh!

brendancol commented Jun 10, 2026

PR Review: Compute dask source once in cumulative_viewshed

Blockers

Suggestions

Nits

What looks good

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant