polygonize: batch dask compute per chunk row to recover parallelism by brendancol · Pull Request #2632 · xarray-contrib/xarray-spatial

brendancol · 2026-05-29T13:54:53Z

What this does

_polygonize_dask called dask.compute() once per chunk inside a nested Python loop. That's one chunk per scheduler round-trip, so on a multi-worker scheduler the workers idle between chunks. This batches one dask.compute() per row of chunks instead, so a row's chunks can run at the same time. Peak driver memory still tracks one row of chunk results rather than the whole raster, so the memory tradeoff the original loop was written for stays intact.

Measurements

4-worker LocalCluster (processes), 1024x1024 int32 / 64 chunks: the per-chunk loop is 2.79x slower than row-batched.
Default threaded scheduler with numba warm: within noise (about 1.03x). The @ngjit kernels release the GIL so threads already overlap there; you need a multi-worker scheduler to see the win.

Output is byte-identical to the old path, checked for both 4- and 8-connectivity.

Backends

dask+numpy and dask+cupy, which both route through _polygonize_dask. The numpy and cupy paths are untouched.

Test plan

New file test_polygonize_dask_row_batch_2608.py: numpy/dask area parity across four chunkings, masked-raster parity, 8-connectivity self-consistency across repeated runs, and a structural check that compute fires exactly once per chunk row.
Full polygonize suite: 299 passed, 16 skipped (GPU and optional-dep gated).

…2608) _polygonize_dask called dask.compute() once per chunk inside a nested Python loop, forcing one chunk per scheduler round-trip. Workers sat idle between chunks under a multi-worker scheduler (measured 2.79x slower than batched on a 4-worker LocalCluster, 1024x1024 / 64 chunks). Issue one dask.compute() per row of chunks so a row's chunks run concurrently. Peak driver memory stays bounded by one row of chunk results rather than the full raster, preserving the memory tradeoff the per-chunk loop was written for. Output is byte-identical (verified for 4- and 8-connectivity). Adds test_polygonize_dask_row_batch_2608.py: numpy/dask parity across chunkings, masked-raster parity, 8-connectivity self-consistency, and a structural check that compute is called once per chunk row.

brendancol

PR Review: polygonize dask compute row-batching

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

test_one_compute_per_chunk_row patches dask.compute on the real dask module (poly_mod.dask is the actual module, not a local alias). monkeypatch restores it afterward so nothing leaks, but the route through sys.modules plus a global-attribute patch is a bit subtle for the next reader. Works as written; flagging only for legibility.

What looks good

Small, contained change: one nested loop reworked, nothing else in the function touched.
dask.compute(*row_tasks) returns results in task order, and the per-row unpacking keeps the exact accumulation order (cols within a row, rows top to bottom), so the output is unchanged. Verified empirically for connectivity 4 and 8.
Keeps the memory behavior the original loop was written for: peak driver memory is bounded by one row of chunk results, not the whole raster.
Tests cover the right ground: numpy parity across four chunkings including ragged ones, masked-raster parity, 8-connectivity self-consistency, and a structural check that compute fires once per row. Splitting 4-conn (parity) from 8-conn (self-consistency) is the right call, since 8-conn dask area is a known approximation against numpy.
Single-column chunk grids work (one task returns a one-tuple the for-loop unpacks fine).

Checklist

Algorithm matches reference (output byte-identical to prior path)
All implemented backends produce consistent results (dask+numpy and dask+cupy share this path; numpy/cupy untouched)
NaN handling is correct (unchanged from prior path)
Edge cases are covered by tests (ragged chunks, mask, single-column grid)
Dask chunk boundaries handled correctly (merge logic unchanged)
No premature materialization or unnecessary copies
Benchmark exists (benchmarks/benchmarks/polygonize.py)
README feature matrix updated (not applicable, no API change)
Docstrings present and accurate (loop comment rewritten to match)

brendancol

Follow-up review (after nit fix)

The one nit from the prior pass is addressed: the comment in test_one_compute_per_chunk_row now spells out why patching dask.compute on poly_mod.dask is the right (and only) seam, and notes monkeypatch handles teardown.

No remaining blockers, suggestions, or nits. The change stays small and output-preserving; tests still pass (8/8 in the new file). Ready from a review standpoint pending CI.

brendancol added 2 commits May 29, 2026 06:53

sweep-performance: record polygonize re-audit (#2608)

8c123ee

github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

brendancol commented May 29, 2026

View reviewed changes

Address review nit: clarify dask.compute patch rationale in test (#2608)

ed2534d

brendancol commented May 29, 2026

View reviewed changes

brendancol merged commit 2888320 into main May 29, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polygonize: batch dask compute per chunk row to recover parallelism#2632

polygonize: batch dask compute per chunk row to recover parallelism#2632
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-polygonize-2026-05-29

brendancol commented May 29, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 29, 2026

What this does

Measurements

Backends

Test plan

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: polygonize dask compute row-batching

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review (after nit fix)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant