Skip to content

to_geotiff(pack=True) computes the whole dask graph twice when no nodata sentinel is declared #3235

@brendancol

Description

@brendancol

Describe the bug
_pack() in xrspatial/geotiff/_attrs.py guards the integer restore with an eager NaN check:

elif tgt.kind in ('i', 'u'):
    # ``isnull().any()`` forces a compute on dask; only reached on the
    # error path where no sentinel exists to fill an integer's holes.
    if bool(out.isnull().any()):
        raise ValueError(...)

The comment is wrong about when this runs: the branch fires whenever the packed target dtype is integer and no nodata sentinel is declared, including the normal success path. On dask-backed input, bool(out.isnull().any()) executes the whole upstream graph at to_geotiff(pack=True) call time. The streaming writer then executes it again to produce the pixels.

Measured with a dask pretask counter on a 512x512 int16 source carrying SCALE/OFFSET but no GDAL_NODATA, read via open_geotiff(src, unpack=True, chunks=128) and written via to_geotiff(out, pack=True): 16 chunk-decode tasks for the isnull/any check plus 16 for the write. Every source chunk decodes twice, so a computed pipeline feeding pack=True (say, slope over a mosaic) runs end to end twice.

The sentinel-present path doesn't have this problem: out.fillna(nodata) is lazy.

Expected behavior
Each source chunk computes once per pack round trip. For dask-backed data the NaN guard can run per chunk inside the graph and raise during the write's single compute. The numpy path can keep the eager call-time check.

How to hit it
Any CF-packed integer GeoTIFF with SCALE/OFFSET but no GDAL_NODATA tag, round-tripped via open_geotiff(unpack=True, chunks=...) then to_geotiff(pack=True).

Additional context
Found in performance sweep pass 15 (2026-06-11); the sweep's module verdict is SAFE / IO-bound. Affected backend: the dask (CPU) write path with pack=True. numpy/cupy eager paths are unaffected, and dask+cupy crashes earlier (#3112) so it never reaches this code today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggeotiffGeoTIFF moduleperformancePR touches performance-sensitive code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions