Use largest chunk for clip_polygon rasterize mask on dask (#3191)#3201
Merged
Conversation
With crop=True on a dask raster, _crop_to_bbox slices the array and leaves a tiny partial chunk at the leading edge. clip_polygon picked that first chunk as the rasterize mask chunk size, fragmenting the mask into many narrow chunks and exploding the task graph xarray.where has to align (13169 tasks for a 1500x1500 output vs 1045 with the fix). Pick the largest chunk per axis instead so the mask grid stays coarse and tracks the raster's interior chunk size. Output values are unchanged. Affects dask+numpy and dask+cupy; crop=False and the non-dask numpy/cupy paths are untouched. Adds regression tests asserting the crop-path graph stays bounded (graph construction only, no .compute()) and that crop=True matches crop=False values. Also records the polygon_clip performance sweep state.
brendancol
commented
Jun 10, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Use largest chunk for clip_polygon rasterize mask on dask (#3191)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None.
Nits (optional improvements)
test_crop_matches_nocrop_valuesruns on an all-zeros raster, so inside the polygon every value is 0.0 and outside is NaN. That checks mask placement and NaN positions but not that real data values survive the coarser chunking.TestClipPolygonDask.test_crop_matches_numpyalready covers value preservation with arange data, so this is fine as written; swapping in arange data here too would just make the one test self-sufficient.
What looks good
- One-line change with a comment that names the root cause: slicing leaves a tiny leading edge chunk, and the old code picked that as the mask chunk size.
max(rc)/max(cc)is safe. Chunk tuples off a dask array are always non-empty with positive ints, and for a single-chunk-per-axis rastermax(rc) == rc[0], so that case is unchanged.- The change only touches the dask branch. numpy/cupy non-dask paths and crop=False are untouched, which matches the PR description.
- The graph-size regression test reads
__dask_graph__()and never calls.compute(), which is the right way to assert graph shape. - Both dask+numpy and dask+cupy hit the changed line and both are tested; the suite passes on a CUDA host.
Checklist
- Algorithm matches reference/paper: N/A, chunking change only
- All implemented backends produce consistent results: yes, crop vs no-crop parity tested
- NaN handling correct: yes, mask placement unchanged
- Edge cases covered: mid-chunk crop covered; single-chunk raster is a no-op by construction
- Dask chunk boundaries handled correctly: yes, that is the fix
- No premature materialization or unnecessary copies: confirmed, no .compute()
- Benchmark exists or not needed: no benchmark, not needed for a chunking fix
- README feature matrix updated: N/A, no API change
- Docstrings present and accurate: unchanged, still accurate
…e-polygon_clip-2026-06-10
…e-polygon_clip-2026-06-10 Conflicts: .claude/sweep-performance-state.csv Took main's CSV (LF line endings, rasterize Pass 4 row) and re-applied this branch's polygon_clip row (#3191). polygon_clip.py and test_polygon_clip.py auto-merged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3191
clip_polygonwithcrop=Trueon a dask raster fragments the rasterize mask into tiny chunks and blows up the task graph._crop_to_bboxslices the dask array, leaving a small partial chunk at the leading edge, and the code picked that first chunk (chunks[-1][0]) as the mask chunk size. A 1500-px-wide output ended up with ~125 narrow mask chunks, andxarray.wherehad to align them against the irregular raster chunks.Changes
max(rc), max(cc)) for the rasterize mask instead of the leading partial chunk, so the mask grid stays coarse and tracks the raster's interior chunk size.box(500, 500, 2000, 2000)on a 2560x2560 raster (256 chunks), this drops the task graph from 13169 to 1045 tasks (graph construction only, no.compute()), with identical output values.numblocksstay bounded without computing, the other checkscrop=Truevalues matchcrop=False.Backend coverage
crop=False: unaffected (chunks stay uniform).Test plan
pytest xrspatial/tests/test_polygon_clip.py(25 passed, including CuPy and dask+cupy)