dask+cupy writes auto-dispatch to the GPU writer and materialize the full array, contradicting the streaming contract

The `to_geotiff` docstring says dask-backed DataArrays are "written in streaming mode: one tile-row at a time, without materialising the full array into RAM" (`_writers/eager.py:102-109`). That is not what happens for dask+cupy input.

`_is_gpu_data` detects dask-of-cupy via `_meta` (`_backends/_gpu_helpers.py:23-37`), and the `use_gpu` dispatch at `eager.py:633`/`757` runs before the dask streaming branch at `918`. So dask+cupy auto-routes to `_write_geotiff_gpu`, which calls `data.compute()` (`_writers/gpu.py:565`) and materializes the whole array on device. Verified by execution: the write succeeds eagerly with no warning, and `streaming_buffer_bytes` does nothing on this path.

The no-op is documented only in the private `_write_geotiff_gpu` docstring (`gpu.py:207-210`). The public docs say `streaming_buffer_bytes` is "Only relevant for dask-backed inputs", which is exactly the input type where it gets ignored.

For out-of-core GPU pipelines this defeats the point of chunking. Short term, the public docstring should state that dask+cupy writes materialize on device and that `streaming_buffer_bytes` does not apply; longer term the GPU writer could stream per block. Related to the gpu=False escape hatch being broken, filed separately.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask+cupy writes auto-dispatch to the GPU writer and materialize the full array, contradicting the streaming contract #3166

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

dask+cupy writes auto-dispatch to the GPU writer and materialize the full array, contradicting the streaming contract #3166

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions