Describe the bug
_pack() in xrspatial/geotiff/_attrs.py guards the integer restore with an eager NaN check:
elif tgt.kind in ('i', 'u'):
# ``isnull().any()`` forces a compute on dask; only reached on the
# error path where no sentinel exists to fill an integer's holes.
if bool(out.isnull().any()):
raise ValueError(...)
The comment is wrong about when this runs: the branch fires whenever the packed target dtype is integer and no nodata sentinel is declared, including the normal success path. On dask-backed input, bool(out.isnull().any()) executes the whole upstream graph at to_geotiff(pack=True) call time. The streaming writer then executes it again to produce the pixels.
Measured with a dask pretask counter on a 512x512 int16 source carrying SCALE/OFFSET but no GDAL_NODATA, read via open_geotiff(src, unpack=True, chunks=128) and written via to_geotiff(out, pack=True): 16 chunk-decode tasks for the isnull/any check plus 16 for the write. Every source chunk decodes twice, so a computed pipeline feeding pack=True (say, slope over a mosaic) runs end to end twice.
The sentinel-present path doesn't have this problem: out.fillna(nodata) is lazy.
Expected behavior
Each source chunk computes once per pack round trip. For dask-backed data the NaN guard can run per chunk inside the graph and raise during the write's single compute. The numpy path can keep the eager call-time check.
How to hit it
Any CF-packed integer GeoTIFF with SCALE/OFFSET but no GDAL_NODATA tag, round-tripped via open_geotiff(unpack=True, chunks=...) then to_geotiff(pack=True).
Additional context
Found in performance sweep pass 15 (2026-06-11); the sweep's module verdict is SAFE / IO-bound. Affected backend: the dask (CPU) write path with pack=True. numpy/cupy eager paths are unaffected, and dask+cupy crashes earlier (#3112) so it never reaches this code today.
Describe the bug
_pack()inxrspatial/geotiff/_attrs.pyguards the integer restore with an eager NaN check:The comment is wrong about when this runs: the branch fires whenever the packed target dtype is integer and no nodata sentinel is declared, including the normal success path. On dask-backed input,
bool(out.isnull().any())executes the whole upstream graph atto_geotiff(pack=True)call time. The streaming writer then executes it again to produce the pixels.Measured with a dask pretask counter on a 512x512 int16 source carrying SCALE/OFFSET but no GDAL_NODATA, read via
open_geotiff(src, unpack=True, chunks=128)and written viato_geotiff(out, pack=True): 16 chunk-decode tasks for the isnull/any check plus 16 for the write. Every source chunk decodes twice, so a computed pipeline feeding pack=True (say, slope over a mosaic) runs end to end twice.The sentinel-present path doesn't have this problem:
out.fillna(nodata)is lazy.Expected behavior
Each source chunk computes once per pack round trip. For dask-backed data the NaN guard can run per chunk inside the graph and raise during the write's single compute. The numpy path can keep the eager call-time check.
How to hit it
Any CF-packed integer GeoTIFF with SCALE/OFFSET but no GDAL_NODATA tag, round-tripped via
open_geotiff(unpack=True, chunks=...)thento_geotiff(pack=True).Additional context
Found in performance sweep pass 15 (2026-06-11); the sweep's module verdict is SAFE / IO-bound. Affected backend: the dask (CPU) write path with pack=True. numpy/cupy eager paths are unaffected, and dask+cupy crashes earlier (#3112) so it never reaches this code today.