Skip to content

Compute dask source once in cumulative_viewshed (#3185)#3199

Merged
brendancol merged 5 commits into
mainfrom
deep-sweep-performance-visibility-2026-06-10
Jun 11, 2026
Merged

Compute dask source once in cumulative_viewshed (#3185)#3199
brendancol merged 5 commits into
mainfrom
deep-sweep-performance-visibility-2026-06-10

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3185

What changed

  • cumulative_viewshed called viewshed() once per observer. On the dask path with no max_distance, every call ran raster.data.compute() on the same source, so the source was materialised once per observer.
  • When every observer takes the full-grid path (no function-level or per-observer max_distance), the source is now computed once up front and the observers run against the in-memory raster. The accumulated count is re-wrapped as dask so the output backend still matches the dask input.
  • Observers that set a max_distance keep the dask windowing path, which loads only each observer's window.

Backend coverage

  • numpy: unchanged.
  • dask+numpy and dask+cupy: source computed once instead of N times on the no-max_distance path; per-observer max_distance behaviour unchanged.
  • cupy: unchanged.

Test plan

  • test_dask_source_computed_once asserts a single source compute for four observers and a dask-backed output.
  • test_dask_per_observer_max_distance_stays_lazy asserts the windowed path stays dask and matches numpy.
  • Existing test_dask_matches_numpy parity check still passes.
  • Full test_visibility.py suite: 27 passed.

cumulative_viewshed called viewshed() per observer; on the dask
no-max_distance path each call computed the same source raster, so the
source was materialised once per observer. Compute it once up front when
every observer takes the full-grid path, run the observers against the
in-memory raster, then re-wrap the count as dask to preserve the output
backend. Observers that set max_distance keep the dask windowing path.

Adds tests asserting the source is computed once and that the per-observer
max_distance path stays lazy with dask/numpy parity.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026
@brendancol

Copy link
Copy Markdown
Contributor Author

PR Review: Compute dask source once in cumulative_viewshed

(Posted as a comment via REST; gh pr review/gh pr comment GraphQL path returned HTTP 401.)

Blockers

None.

Suggestions

  • The da.from_array(count, chunks=_input_dask_chunks) wrap at the end builds a dask array whose data is already a concrete numpy buffer. Intentional and correct (it restores the dask output type without re-running the sweep), but the comment only says "Restore the dask-backed output type the dask input would have given." A half-line noting the count is already concrete, so nothing lazy is recovered, would save the next reader a double-take. Not blocking.

Nits

  • The new guard not any('max_distance' in obs for obs in observers) walks the observer list once before the main loop. Fine for normal observer counts.

What looks good

  • Surgical change matching the issue: compute the dask source once when every observer takes the full-grid path, otherwise keep per-observer dask windowing.
  • Output backend preserved: a dask input still returns a dask-backed result, output chunks match the input (verified).
  • Caller's raster left alone; raster.copy() does not compute or mutate the input (verified).
  • Correctness holds: Tier B already computed to numpy and ran _viewshed_cpu; existing test_dask_matches_numpy still passes.
  • Tests: test_dask_source_computed_once asserts one source compute for four observers and a dask output; test_dask_per_observer_max_distance_stays_lazy asserts the windowed path stays dask with numpy parity.
  • Verified locally: source computes dropped from N to 1; computing the result adds zero further source computes.

Checklist

  • Algorithm matches reference: N/A (scheduling change only)
  • Backends consistent: yes
  • NaN handling: unchanged
  • Edge cases tested: yes (no max_distance, per-observer max_distance, parity)
  • Dask chunk boundaries: yes (output chunks match input)
  • No premature materialization or unnecessary copies: improved (removes N-1 source materializations)
  • Benchmark: none; not needed for a scheduling fix
  • README feature matrix: N/A
  • Docstrings: accurate (no signature change)

Conflicts:
- .claude/sweep-performance-state.csv: main normalized the file to LF and
  updated the rasterize row; this branch updated the visibility row. Took
  main's LF file and reapplied the branch's visibility row.

xrspatial/visibility.py auto-merged (main side was the isort-only #3189).
# Conflicts:
#	xrspatial/tests/test_visibility.py
#	xrspatial/visibility.py
@brendancol brendancol merged commit 8408ec8 into main Jun 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cumulative_viewshed recomputes the dask source raster once per observer

1 participant