Skip to content

open_geotiff: max_pixels should bound the chunk, not the full image, when chunks= is supplied #2501

@brendancol

Description

@brendancol

Describe the bug

open_geotiff(path, chunks=N, max_pixels=M) rejects the read up front when the full image extent exceeds max_pixels, even though dask chunks only materialise one block at a time. The CPU dask reader (_backends/dask.py) and the GPU+dask reader (_backends/gpu.py::_read_geotiff_gpu_chunked) both run _check_dimensions against the full windowed region before any task is scheduled.

That forces callers who want to read a large raster via dask to set max_pixels above the full image size, which then disables the per-chunk safety guard for every task too. The whole point of chunks= is bounded peak memory, so max_pixels should track the chunk size, not the file size.

Expected behavior

When chunks is supplied, max_pixels should apply to each chunk's materialised buffer, not the full image. The eager (no-chunks) path keeps applying max_pixels to the full windowed region.

On a 100x100 single-band TIFF:

  • open_geotiff(path, chunks=10, max_pixels=200) should succeed. Each 10x10 chunk is 100 pixels, under the cap.
  • open_geotiff(path, chunks=20, max_pixels=200) should raise PixelSafetyLimitError when a chunk task runs. 20x20 is 400, over the cap.

Today the first case raises up front because 100*100 = 10000 > 200. Callers have to raise max_pixels to at least 10000 to read it at all, which then admits any chunk size.

Additional context

The per-chunk check is already wired: _delayed_read_window forwards max_pixels to _read_to_array, which runs _check_dimensions(out_w, out_h, samples, max_pixels) against the chunk window. The fix is to drop the redundant full-extent guard in the chunked paths and update the docstrings and tests that lock in the old contract.

Scope: CPU dask path and GPU+dask path. The VRT chunked path is out of scope; VRT mosaicing has its own composition rules.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggeotiffGeoTIFF module

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions