Scope max_pixels to the chunk when chunks= is supplied (#2501)#2502
Merged
Conversation
`open_geotiff(path, chunks=N, max_pixels=M)` used to reject the read up front when the full windowed extent exceeded `max_pixels`, defeating the point of `chunks=` for large rasters: callers had to widen the cap to the full file just to build the lazy graph, which then disabled the per-chunk safety guard for every task too. Drop the full-extent guard in both the CPU dask reader (`_backends/dask.py`) and the GPU+dask GDS path (`_backends/gpu.py`). Per-chunk decode already enforces `max_pixels` against the chunk window via `_read_to_array` -> `_check_dimensions`; the GPU GDS path keeps its per-TIFF-tile guard, which is a separate hostile-input defense. The eager (no-chunks) path is unchanged. Update the `max_pixels` docstring in `open_geotiff`, `read_geotiff_dask`, and `read_geotiff_gpu`, and rewrite the tests that asserted the old up-front contract to lock in the new per-chunk semantics.
brendancol
commented
May 27, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Scope max_pixels to the chunk when chunks= is supplied (#2501)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/geotiff/_backends/gpu.py:1505-1511-- the new comment saysmax_pixels"bounds the per-tile decode buffer ... a single tile is the largest contiguous allocation any one task makes." That isn't quite right. The GDS chunk task at_chunk_task(gpu.py:1616) allocates(ch_h, ch_w, samples)per call, which is the chunk shape, not the tile shape. Withmax_pixels=10,tile=2, andchunks=20the chunk buffer is 400 pixels, the per-tile check at line 1509 passes (4 < 10), and the old up-front chunk check is gone, so the cap doesn't actually bound the chunk on the GDS path. Either add_check_dimensions(ch_w, ch_h, samples, max_pixels)after the chunks are resolved at gpu.py:1551 to mirror the CPU dask path, or rewrite the comment to describe what's actually being bounded (the per-TIFF-tile dimensions, as a hostile-input guard against forged tile widths) rather than implying dask chunks are bounded.
Nits (optional improvements)
-
xrspatial/geotiff/tests/parity/test_pixel_equality.py:482-501-- the plan called for multi-band coverage of theeff_bandsremoval, but the new test only uses a single-band fixture. The deletedeff_bands = (1 if band is not None else (n_bands if n_bands > 0 else 1))computation isn't directly exercised. Worth adding a 3-band case (or extendingsmall_multiband_tiff_pathcoverage) wherechunks*samplesfits undermax_pixelsbutfull_image*samplesdoes not. -
xrspatial/geotiff/tests/unit/test_signatures.py:2095-2099-- both theread_geotiff_gpucall and.compute()are inside thepytest.raisesblock. The docstring justifies that (either branch can raise depending on KvikIO availability), but the conventional shape isda = read_geotiff_gpu(...)above the block, thenwith pytest.raises(...): da.compute()inside. Either form is fine.
What looks good
- The CPU dask path change is correct: removing the up-front guard delegates to the per-chunk
_read_to_array->_check_dimensionsenforcement that was already wired through_delayed_read_window. - Docstring updates are precise about what each entry point now bounds.
- Old tests asserting up-front rejection got rewritten to assert the new contract end-to-end (graph builds, compute raises) rather than deleted.
- The
_MAX_DASK_CHUNKStask-count guard at_backends/dask.py:547still prevents a forged multi-billion-pixel image from materialising a graph with billions of tasks, so dropping the full-extent guard does not open a DoS hole here.
Checklist
- Algorithm matches reference/paper (no algorithm changes; pure plumbing)
- All implemented backends produce consistent results (CPU dask + GPU dask CPU-fallback now share per-chunk semantics; GPU GDS retains per-tile guard, see suggestion)
- NaN handling is correct (unchanged)
- Edge cases are covered by tests (single-band, multi-chunk; multi-band gap noted as a nit)
- Dask chunk boundaries handled correctly (per-chunk read_to_array already correct)
- No premature materialization or unnecessary copies (removed lines simplify)
- Benchmark exists or is not needed (not needed; no perf delta)
- README feature matrix updated (not applicable; no new functions)
- Docstrings present and accurate
… test (#2501) * Add `_check_dimensions(ch_w, ch_h, samples, max_pixels)` after the chunk shape is resolved in `_read_geotiff_gpu_chunked_gds`. The prior comment claimed the per-TIFF-tile check bounded chunk decode; it does not, since each `_chunk_task` allocates a chunk-shaped GPU buffer. Now the GPU GDS path matches the CPU dask path's per-chunk semantics. Update the per-tile comment to describe what it actually guards (forged-tile-dim defense). * Add a multi-band parity test (`test_read_geotiff_dask_max_pixels_chunk_includes_band_count`) that exercises the per-chunk `chunk_h * chunk_w * samples` arithmetic against the 3-band fixture. * Polish the GDS-path docstring on the GPU chunks max_pixels test to acknowledge the new chunk-extent guard.
brendancol
commented
May 27, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review (follow-up on commit 238cbb2): #2501
Blockers
None.
Suggestions
None -- all three points from the previous review are addressed in this push.
Nits
None.
What looks good
_backends/gpu.py:1561adds the chunk-extent guard exactly where the suggestion landed. The comment at gpu.py:1505-1510 is now accurate about what the per-tile check actually defends (forged tile dimensions, not chunk memory). Net effect: the GPU GDS path matches the CPU dask path's per-chunk semantics, so chunk-scopedmax_pixelsis uniform across CPU dask, GPU GDS, and GPU CPU-fallback.tests/parity/test_pixel_equality.py:505-526exercises the multi-band arithmetic against the 4x6x3 fixture: chunks=2 gives 12 px per chunk (under cap 20) and a full image of 72 px (above cap), confirming the up-front guard is gone; chunks=4 gives 48 px and trips the per-chunk guard.- The compound
pytest.raisesblock intest_signatures.py:2095-2099is left as-is with the docstring updated to reflect that the GDS path now also raises at build time. Either shape was fine in the prior review.
Checklist
- Algorithm matches reference/paper (no algorithm changes)
- All implemented backends produce consistent results (CPU dask + GPU GDS + GPU CPU-fallback all chunk-scoped now)
- NaN handling is correct (unchanged)
- Edge cases are covered by tests (single-band and multi-band per-chunk; eager unchanged)
- Dask chunk boundaries handled correctly
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed (not needed)
- README feature matrix updated (not applicable)
- Docstrings present and accurate
# Conflicts: # xrspatial/geotiff/_backends/dask.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
open_geotiff(path, chunks=N, max_pixels=M)no longer rejects the read up front when the full image exceedsmax_pixels. The cap now bounds each chunk's materialised buffer instead of the full lazy region._delayed_read_window->_read_to_array->_check_dimensions. The change drops the redundant full-extent guard in_backends/dask.pyand the GPU+dask GDS path in_backends/gpu.py. The eager (no-chunks) path is unchanged.open_geotiff,read_geotiff_dask, andread_geotiff_gpudescribe the new contract.Backend coverage: numpy (eager, unchanged), dask+numpy (new chunk semantics), dask+cupy (new chunk semantics, GDS fast path and CPU-fallback path). VRT chunked path is out of scope; VRT mosaicing has its own composition rules.
Closes #2501
Test plan
pytest xrspatial/geotiff/tests/test_security.py -k max_pixelspytest xrspatial/geotiff/tests/parity/test_pixel_equality.py -k max_pixelspytest xrspatial/geotiff/tests/ -k "max_pixels or pixel_safety"-- 19 passed