Describe the bug
hotspots() is meant to have a lazy Dask path, but the Dask and Dask+CuPy versions call da.compute() on the global mean and standard deviation while the task graph is still being built. See xrspatial/focal.py near lines 1289/1293 (Dask+NumPy) and 1336 (Dask+CuPy).
So hotspots() fires off about a dozen Dask tasks the moment it returns, reading every chunk once before the caller asks for anything. A Dask-backed call should defer all work until .compute(), and this one doesn't.
The z-score step needs a global mean and std, which are whole-array reductions. Right now those get resolved eagerly. They can stay as lazy 0-d Dask reductions folded into the graph, so the normalization and classification stay deferred too.
Expected behavior
Calling hotspots() on a Dask input should build a graph and compute nothing. The global mean and std stay lazy, broadcast into the per-chunk z-score, and the result matches the eager version once you actually compute it.
Additional context
The existing laziness test only checks that the return value is a Dask array (xrspatial/tests/test_dask_laziness.py:115), so it passes even when tasks run eagerly. A better test would assert that nothing executes when hotspots() is called, e.g. with a scheduler callback that counts task runs.
Describe the bug
hotspots()is meant to have a lazy Dask path, but the Dask and Dask+CuPy versions callda.compute()on the global mean and standard deviation while the task graph is still being built. Seexrspatial/focal.pynear lines 1289/1293 (Dask+NumPy) and 1336 (Dask+CuPy).So
hotspots()fires off about a dozen Dask tasks the moment it returns, reading every chunk once before the caller asks for anything. A Dask-backed call should defer all work until.compute(), and this one doesn't.The z-score step needs a global mean and std, which are whole-array reductions. Right now those get resolved eagerly. They can stay as lazy 0-d Dask reductions folded into the graph, so the normalization and classification stay deferred too.
Expected behavior
Calling
hotspots()on a Dask input should build a graph and compute nothing. The global mean and std stay lazy, broadcast into the per-chunk z-score, and the result matches the eager version once you actually compute it.Additional context
The existing laziness test only checks that the return value is a Dask array (
xrspatial/tests/test_dask_laziness.py:115), so it passes even when tasks run eagerly. A better test would assert that nothing executes whenhotspots()is called, e.g. with a scheduler callback that counts task runs.