Skip to content

Group streaming-write rows into bands so source chunks compute once (#3117)#3136

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-geotiff-2026-06-09
Jun 10, 2026
Merged

Group streaming-write rows into bands so source chunks compute once (#3117)#3136
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-geotiff-2026-06-09

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3117

  • _write_streaming ran one dask .compute() per 256-row tile-row (or strip), so a source chunk taller than the band re-executed once per band it overlapped: 2x at chunks=512, 4x at chunks=1024, with the whole upstream graph re-running for computed pipelines like slope().
  • New _stream_row_bands groups consecutive tile-rows/strips into row bands sized by the source chunk-row span (including the one-chunk overlap halo, same accounting as Streaming GeoTIFF writer budgets memory from output tiles, not source chunks — OOMs on slope/overlap pipelines #3007) under streaming_buffer_bytes. Each band is computed once and the tiles/strips are carved out of the materialised array, so peak memory stays where it was.
  • Wide rasters that need horizontal segmentation keep the per-tile-row path. streaming_buffer_bytes stays a soft cap.

Backend coverage: only the dask+numpy write path changes (to_geotiff on a dask-backed array, non-COG, string path). The eager, cupy, and dask+cupy writers are untouched.

Test plan:

  • New TestRowBandRecompute3117: instrumented chunk-execution counts equal 1 for tiled and strip layouts; segmented wide-raster round trip; NaN-sentinel parity between banded streaming and eager writes
  • New test_stream_row_bands_3117 unit coverage of the band grouping, budget cutoff, and unknown-chunks fallback
  • xrspatial/geotiff/tests/write/ + integration/ + parity/ suites pass (2195 passed, 44 skipped)
  • Streaming GeoTIFF writer budgets memory from output tiles, not source chunks — OOMs on slope/overlap pipelines #3007 budget tests still pass (peak source bytes per compute stay bounded)
  • Measured: per-chunk executions drop from chunk_height/tile_height to 1 on the default open_geotiff(chunks=512) -> to_geotiff round trip

…3117)

_write_streaming ran one dask .compute() per 256-row tile-row (or
strip), so a source chunk taller than the band was re-read and
re-decoded once per band it overlapped: 2x at chunks=512, 4x at
chunks=1024, with the whole upstream graph re-running for computed
pipelines.

Add _stream_row_bands to group consecutive tile-rows/strips into row
bands sized by the source chunk-row span (with the one-chunk overlap
halo, same accounting as #3007) under streaming_buffer_bytes, compute
each band once, and carve the tiles/strips out of the materialised
band. Wide rasters that need horizontal segmentation keep the
per-tile-row path. Measured per-chunk executions drop from
chunk_height/tile_height to 1 on the default read->write round trip.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Group streaming-write rows into bands so source chunks compute once (#3117)

Blockers (must fix before merge)

  • None found.

Suggestions (should fix, not blocking)

  • .claude/sweep-performance-state.csv: the geotiff row's notes column dropped the 13-pass audit trail that previous sweeps kept (Pass 13 (2026-05-20): ... | Pass 12 ...). Other long-lived rows in this file (rasterize, polygonize, zonal) prepend the new pass and keep the old text. Restore the prior notes behind the new entry so the history survives.
  • xrspatial/geotiff/tests/write/test_streaming.py: the budget contract for the new banded path has no direct peak-bytes test. test_overlap_source_respects_buffer_3007 pins peak source bytes per compute, but only on the segmented path (its tight 8 MB budget turns banding off). A variant with a budget large enough to enable banding but small enough to force several bands would pin that _stream_row_bands actually bounds the per-compute span on the path this PR adds.

Nits (optional improvements)

  • PR has no labels; performance and geotiff fit, and the repo uses them for sweep-driven fixes.
  • _writer.py banded branch: when NaNs are present the sentinel rewrite copies the whole band (band_np.copy()), which can transiently double a budget-sized buffer. Same pattern as the old per-tile-row copy and bounded by 2x the soft cap, so fine as-is; a comment noting the bound would help the next reader.

What looks good

  • _stream_row_bands reuses the #3007 span accounting (touched chunk-rows plus a one-chunk halo) so the banding and the segment budget agree on what a compute materialises.
  • The segmented wide-raster path is untouched: band_np stays None there and the fallback band list is one tile-row per band, byte-identical behaviour to before.
  • Regression tests count actual chunk executions through map_blocks and assert exactly 1, for both tiled and strip layouts; the unit test covers grouping, the budget cutoff, unknown chunks, and ragged tails.
  • The strip path previously had no budget participation at all; it now shares the same banding code instead of growing a second mechanism.
  • 2195 write/integration/parity tests pass, including the #3007 budget tests and the eager-vs-streaming byte parity checks.

Checklist

  • Algorithm matches reference (#3007 span accounting reused)
  • All implemented backends produce consistent results (dask+numpy only path changed; eager/cupy writers untouched)
  • NaN handling is correct (per-band sentinel restore, parity test vs eager writer)
  • Edge cases are covered by tests (ragged tail, unknown chunks, budget below one band)
  • Dask chunk boundaries handled correctly (bands extend only while the span fits)
  • No premature materialization beyond the documented soft cap
  • Benchmark exists or is not needed (no benchmark; covered by deterministic execution-count test instead)
  • README feature matrix update not applicable
  • Docstrings updated (_write_streaming notes the banding for both layouts)

@brendancol brendancol added performance PR touches performance-sensitive code geotiff GeoTIFF module dask Dask backend / chunked arrays labels Jun 10, 2026
… bound (#3117)

- Prepend Pass 14 to the geotiff sweep-state notes instead of dropping
  the 13-pass history (file precedent keeps the trail with ' | ').
- Add test_banded_compute_respects_buffer: banding must group tile-rows
  (fewer computes than tile-rows) while no single compute materialises
  more source bytes than streaming_buffer_bytes.
- Comment the 2x-soft-cap transient bound on the band NaN-sentinel copy.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after ebd155a)

All four findings from the first pass are addressed:

  • CSV audit trail: fixed. The geotiff row now prepends Pass 14 and keeps the 13-pass history behind a | separator, matching the rasterize/polygonize/zonal precedent.
  • Banded budget contract: fixed. test_banded_compute_respects_buffer asserts banding groups tile-rows (computes < tile-rows) and that no single compute materialises more source bytes than streaming_buffer_bytes. The bound is strict because the spy source has no overlap halo.
  • Labels: performance, geotiff, dask added.
  • Copy-bound comment: added at the band NaN-sentinel rewrite, noting the 2x-soft-cap transient and that astype shares the same factor.

Re-checked the follow-up diff: the da.Array.compute spy accumulates inside the compute call boundary, so the before/after delta per compute is race-free under the threaded scheduler. 77 streaming tests pass, flake8 clean. Nothing further from me.

@brendancol brendancol merged commit 78d27c3 into main Jun 10, 2026
7 checks passed
brendancol added a commit that referenced this pull request Jun 10, 2026
Conflicts:
- xrspatial/geotiff/_writer.py: kept main's row-band streaming
  restructure (#3136) and re-applied the compute -> .get() ->
  np.asarray ordering from #3171 to the band computes and the
  segmented wide-raster compute.
- xrspatial/geotiff/tests/gpu/test_writer.py: kept both appended
  test blocks (#3165 regression tests and #3166 warning tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dask Dask backend / chunked arrays geotiff GeoTIFF module performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming dask writer recomputes every source chunk once per 256-row band

1 participant