Group streaming-write rows into bands so source chunks compute once (#3117)#3136
Merged
Merged
Conversation
…3117) _write_streaming ran one dask .compute() per 256-row tile-row (or strip), so a source chunk taller than the band was re-read and re-decoded once per band it overlapped: 2x at chunks=512, 4x at chunks=1024, with the whole upstream graph re-running for computed pipelines. Add _stream_row_bands to group consecutive tile-rows/strips into row bands sized by the source chunk-row span (with the one-chunk overlap halo, same accounting as #3007) under streaming_buffer_bytes, compute each band once, and carve the tiles/strips out of the materialised band. Wide rasters that need horizontal segmentation keep the per-tile-row path. Measured per-chunk executions drop from chunk_height/tile_height to 1 on the default read->write round trip.
brendancol
commented
Jun 9, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Group streaming-write rows into bands so source chunks compute once (#3117)
Blockers (must fix before merge)
- None found.
Suggestions (should fix, not blocking)
-
.claude/sweep-performance-state.csv: the geotiff row's notes column dropped the 13-pass audit trail that previous sweeps kept (Pass 13 (2026-05-20): ... | Pass 12 ...). Other long-lived rows in this file (rasterize, polygonize, zonal) prepend the new pass and keep the old text. Restore the prior notes behind the new entry so the history survives. -
xrspatial/geotiff/tests/write/test_streaming.py: the budget contract for the new banded path has no direct peak-bytes test.test_overlap_source_respects_buffer_3007pins peak source bytes per compute, but only on the segmented path (its tight 8 MB budget turns banding off). A variant with a budget large enough to enable banding but small enough to force several bands would pin that_stream_row_bandsactually bounds the per-compute span on the path this PR adds.
Nits (optional improvements)
- PR has no labels;
performanceandgeotifffit, and the repo uses them for sweep-driven fixes. -
_writer.pybanded branch: when NaNs are present the sentinel rewrite copies the whole band (band_np.copy()), which can transiently double a budget-sized buffer. Same pattern as the old per-tile-row copy and bounded by 2x the soft cap, so fine as-is; a comment noting the bound would help the next reader.
What looks good
_stream_row_bandsreuses the #3007 span accounting (touched chunk-rows plus a one-chunk halo) so the banding and the segment budget agree on what a compute materialises.- The segmented wide-raster path is untouched:
band_npstaysNonethere and the fallback band list is one tile-row per band, byte-identical behaviour to before. - Regression tests count actual chunk executions through
map_blocksand assert exactly 1, for both tiled and strip layouts; the unit test covers grouping, the budget cutoff, unknown chunks, and ragged tails. - The strip path previously had no budget participation at all; it now shares the same banding code instead of growing a second mechanism.
- 2195 write/integration/parity tests pass, including the #3007 budget tests and the eager-vs-streaming byte parity checks.
Checklist
- Algorithm matches reference (#3007 span accounting reused)
- All implemented backends produce consistent results (dask+numpy only path changed; eager/cupy writers untouched)
- NaN handling is correct (per-band sentinel restore, parity test vs eager writer)
- Edge cases are covered by tests (ragged tail, unknown chunks, budget below one band)
- Dask chunk boundaries handled correctly (bands extend only while the span fits)
- No premature materialization beyond the documented soft cap
- Benchmark exists or is not needed (no benchmark; covered by deterministic execution-count test instead)
- README feature matrix update not applicable
- Docstrings updated (
_write_streamingnotes the banding for both layouts)
… bound (#3117) - Prepend Pass 14 to the geotiff sweep-state notes instead of dropping the 13-pass history (file precedent keeps the trail with ' | '). - Add test_banded_compute_respects_buffer: banding must group tile-rows (fewer computes than tile-rows) while no single compute materialises more source bytes than streaming_buffer_bytes. - Comment the 2x-soft-cap transient bound on the band NaN-sentinel copy.
brendancol
commented
Jun 10, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (after ebd155a)
All four findings from the first pass are addressed:
- CSV audit trail: fixed. The geotiff row now prepends Pass 14 and keeps the 13-pass history behind a
|separator, matching the rasterize/polygonize/zonal precedent. - Banded budget contract: fixed.
test_banded_compute_respects_bufferasserts banding groups tile-rows (computes < tile-rows) and that no single compute materialises more source bytes thanstreaming_buffer_bytes. The bound is strict because the spy source has no overlap halo. - Labels:
performance,geotiff,daskadded. - Copy-bound comment: added at the band NaN-sentinel rewrite, noting the 2x-soft-cap transient and that
astypeshares the same factor.
Re-checked the follow-up diff: the da.Array.compute spy accumulates inside the compute call boundary, so the before/after delta per compute is race-free under the threaded scheduler. 77 streaming tests pass, flake8 clean. Nothing further from me.
brendancol
added a commit
that referenced
this pull request
Jun 10, 2026
Conflicts: - xrspatial/geotiff/_writer.py: kept main's row-band streaming restructure (#3136) and re-applied the compute -> .get() -> np.asarray ordering from #3171 to the band computes and the segmented wide-raster compute. - xrspatial/geotiff/tests/gpu/test_writer.py: kept both appended test blocks (#3165 regression tests and #3166 warning tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3117
_write_streamingran one dask.compute()per 256-row tile-row (or strip), so a source chunk taller than the band re-executed once per band it overlapped: 2x atchunks=512, 4x atchunks=1024, with the whole upstream graph re-running for computed pipelines likeslope()._stream_row_bandsgroups consecutive tile-rows/strips into row bands sized by the source chunk-row span (including the one-chunk overlap halo, same accounting as Streaming GeoTIFF writer budgets memory from output tiles, not source chunks — OOMs on slope/overlap pipelines #3007) understreaming_buffer_bytes. Each band is computed once and the tiles/strips are carved out of the materialised array, so peak memory stays where it was.streaming_buffer_bytesstays a soft cap.Backend coverage: only the dask+numpy write path changes (
to_geotiffon a dask-backed array, non-COG, string path). The eager, cupy, and dask+cupy writers are untouched.Test plan:
TestRowBandRecompute3117: instrumented chunk-execution counts equal 1 for tiled and strip layouts; segmented wide-raster round trip; NaN-sentinel parity between banded streaming and eager writestest_stream_row_bands_3117unit coverage of the band grouping, budget cutoff, and unknown-chunks fallbackxrspatial/geotiff/tests/write/+integration/+parity/suites pass (2195 passed, 44 skipped)open_geotiff(chunks=512)->to_geotiffround trip