polygonize dask backend serializes chunks via per-chunk dask.compute()

## What's happening

The dask backend for `polygonize` (`_polygonize_dask`) walks the chunks in a nested Python loop and calls `dask.compute()` once per chunk. Each call blocks until that one chunk is done, so the scheduler only ever has a single chunk's worth of work in flight. Chunks that could run in parallel don't.

## Numbers

On a 1024x1024 int32 raster split into 64 chunks of 128x128, collecting the per-chunk delayed tasks and computing them in one `dask.compute()` call ran in 5.14s versus 8.28s for the current loop, about 1.61x. With more chunks and more workers the gap gets wider.

## The catch

The per-chunk loop isn't an accident. The comment right above it spells out the reason: computing one chunk at a time means only boundary polygons pile up in driver memory, so peak memory tracks boundary-polygon count instead of total polygons times chunk count. If you "fix" this by dumping every chunk into a single `dask.compute()`, you pull all the interior polygons into memory at once. On a huge raster that's a regression, not a win.

## Suggested approach

Batch one row of chunks per `dask.compute()` call. The scheduler can run the chunks in a row concurrently, and peak memory stays bounded by a single row's results rather than the whole raster. Same memory behavior the original loop was after, most of the parallelism back.

## Backends

dask+numpy and dask+cupy both go through `_polygonize_dask`.

## Bottleneck and OOM notes

Compute-bound: CPU boundary tracing in `_scan` dominates on every backend. OOM verdict stays RISKY because the driver still accumulates O(total polygons) interior polygons no matter what this change does. Row-batching holds the existing memory bound; it doesn't make it worse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygonize dask backend serializes chunks via per-chunk dask.compute() #2608

What's happening

Numbers

The catch

Suggested approach

Backends

Bottleneck and OOM notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

polygonize dask backend serializes chunks via per-chunk dask.compute() #2608

Description

What's happening

Numbers

The catch

Suggested approach

Backends

Bottleneck and OOM notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions