Description
When clip_polygon runs with crop=True on a dask-backed raster, the dask task graph ends up much bigger than the output size needs. The culprit is the chunk size picked for the internal rasterize mask.
_crop_to_bbox slices the dask raster down to the geometry bounding box. Slicing a dask array leaves irregular chunk sizes at the cut edges. For a 2560x2560 raster chunked at (256, 256), clipping to a box that starts mid-chunk gives x-chunks like (12, 256, 256, 256, 256, 256, 208).
clip_polygon then takes the rasterize mask chunk size from the first chunk of each axis:
rc, cc = raster.data.chunks[-2], raster.data.chunks[-1]
kw.setdefault('chunks', (rc[0], cc[0]))
rc[0] / cc[0] is the leading edge chunk, which after slicing is often a tiny partial chunk (12 px). rasterize builds a uniform mask at that size, so a 1500-px-wide output gets 125 chunks of 12 px each. xarray.where then has to align the irregular raster chunks against the tiny uniform mask chunks, and the task count blows up.
Evidence
Graph construction only, no .compute(). 2560x2560 raster, chunks=(256, 256), clip to box(500, 500, 2000, 2000), crop=True:
- output shape: 1500x1500
- mask chunks: (8, 125)
- task count: 13169
Using the largest chunk per axis (max(rc), max(cc)) instead of the first:
- mask chunks: (6, 6)
- task count: 1045
About a 12.6x smaller graph, same output values.
Impact
- Backends affected: dask+numpy and dask+cupy (both go through the same chunk selection).
- Bottleneck: graph-bound. This is scheduler and graph-build overhead, not peak memory. Peak memory still scales with chunk size, so it is not an OOM risk.
- crop=False is unaffected (no slicing, chunks stay uniform). numpy and cupy non-dask paths are unaffected.
Fix
Pick a representative interior chunk size instead of the leading partial chunk:
kw.setdefault('chunks', (max(rc), max(cc)))
That keeps the mask grid coarse and roughly aligned with the raster's interior chunk size.
Description
When
clip_polygonruns withcrop=Trueon a dask-backed raster, the dask task graph ends up much bigger than the output size needs. The culprit is the chunk size picked for the internal rasterize mask._crop_to_bboxslices the dask raster down to the geometry bounding box. Slicing a dask array leaves irregular chunk sizes at the cut edges. For a 2560x2560 raster chunked at (256, 256), clipping to a box that starts mid-chunk gives x-chunks like(12, 256, 256, 256, 256, 256, 208).clip_polygonthen takes the rasterize mask chunk size from the first chunk of each axis:rc[0]/cc[0]is the leading edge chunk, which after slicing is often a tiny partial chunk (12 px).rasterizebuilds a uniform mask at that size, so a 1500-px-wide output gets 125 chunks of 12 px each.xarray.wherethen has to align the irregular raster chunks against the tiny uniform mask chunks, and the task count blows up.Evidence
Graph construction only, no
.compute(). 2560x2560 raster, chunks=(256, 256), clip tobox(500, 500, 2000, 2000), crop=True:Using the largest chunk per axis
(max(rc), max(cc))instead of the first:About a 12.6x smaller graph, same output values.
Impact
Fix
Pick a representative interior chunk size instead of the leading partial chunk:
That keeps the mask grid coarse and roughly aligned with the raster's interior chunk size.