Skip to content

Make focal memory guard backend-aware for dask input (#3218)#3228

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-focal-2026-06-10-01
Jun 10, 2026
Merged

Make focal memory guard backend-aware for dask input (#3218)#3228
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-focal-2026-06-10-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3218

  • _check_kernel_vs_raster_memory() now accepts the dask .chunks tuple and budgets the largest chunk plus the kernel halo instead of the full padded raster. map_overlap only materializes one padded chunk per task, so the full-raster term was a false positive that blocked any dask raster bigger than ~half host RAM from running apply(), focal_stats(), or hotspots().
  • numpy and cupy input keep the existing full-raster budget; those paths really do allocate full-size arrays.
  • The MemoryError message now says "chunk" or "raster" depending on which footprint was charged.

Verified: a 200000x200000 float32 lazy dask raster (160 GB) with a 3x3 kernel now builds graphs for all three entry points. Before the fix, all three raised MemoryError at graph construction while mean() (no guard) worked.

Backend coverage: dask+numpy and dask+cupy get the per-chunk budget; numpy and cupy are unchanged.

Test plan:

  • New tests: all 3 entry points accept a large dask raster under a patched 1 MB memory probe; numpy input is still rejected; an oversized kernel on dask is still rejected and the message reports the chunk
  • Full xrspatial/tests/test_focal.py: 238 passed
  • GPU sanity run of apply/focal_stats/hotspots on cupy and dask+cupy backends

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Make focal memory guard backend-aware for dask input (#3218)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/focal.py:96-99: the per-chunk budget charges one padded chunk, but the threaded scheduler materializes one per worker concurrently, so true peak is roughly num_workers * padded_chunk. The 0.5-of-available headroom covers this for sane chunk sizes, and tightening it would risk reintroducing false rejections, so leaving it as is seems right. Worth a one-line comment if it ever bites.
  • xrspatial/focal.py:97: dask arrays with unknown chunk sizes (NaN chunks after boolean indexing) make max(chunks[-2]) NaN, and every comparison downstream is False, so the guard silently passes. dask's own map_overlap error fires later, so no crash; just noting the behavior is fall-through rather than explicit.

What looks good

  • Correct fix scope: numpy/cupy keep the full-raster budget (they really allocate full-size padded arrays), only chunked input switches to the per-task footprint. Mirrors the merge() guard fix from #3048.
  • chunks[-2] / chunks[-1] indexing is safe: 3D input recurses through _apply_per_band before the guard runs, and the negative indices would handle a 3D chunks tuple anyway.
  • The error message now names the unit it budgeted ("chunk" vs "raster"), and a test pins that wording.
  • Tests cover all three entry points accepting a large dask raster under a patched memory probe, the numpy rejection still firing, and the oversized-kernel-on-dask rejection. No .compute() anywhere, so the tests stay fast.
  • The existing #1284 tests (patched return_value=1) still pass, confirming the kernel-bytes term alone keeps rejecting absurd kernels on every backend.

Checklist

  • Algorithm matches reference (per-task footprint = largest chunk + 2*pad halo, which is what map_overlap allocates)
  • All implemented backends produce consistent results (guard change only; GPU sanity run on cupy and dask+cupy)
  • NaN handling is correct (no numeric path touched)
  • Edge cases are covered by tests (accept, reject-numpy, reject-oversized-kernel)
  • Dask chunk boundaries handled correctly
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (guard-only change, no compute path touched)
  • README feature matrix updated (n/a, no API change)
  • Docstrings present and accurate (helper docstring documents the chunks parameter)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 89a07c2: both nits from the first pass are addressed by the new comment block in _check_kernel_vs_raster_memory (concurrency headroom rationale and the NaN-chunk fall-through behavior). The commit is comment-only; guard tests (3218 + 1284 set) still pass. No new findings.

@brendancol brendancol merged commit 3836bac into main Jun 10, 2026
7 checks passed
brendancol added a commit that referenced this pull request Jun 10, 2026
…ocal-2026-06-10-02

Conflicts: xrspatial/focal.py, xrspatial/tests/test_focal.py.
Combined the dtype-aware itemsize budget (#3223) with main's
chunk-aware budgeting (#3228); kept both sides' new tests and bumped
the _promote_float spy count in the #3231 test by one for the guard's
dtype-only call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

focal memory guard rejects large dask rasters the dask path never materializes

1 participant