Skip to content

Defer the pack=True NaN guard into the dask graph so chunks compute once#3237

Merged
brendancol merged 2 commits into
mainfrom
issue-3235
Jun 11, 2026
Merged

Defer the pack=True NaN guard into the dask graph so chunks compute once#3237
brendancol merged 2 commits into
mainfrom
issue-3235

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3235.

_pack's no-sentinel integer restore scanned for NaN with an eager bool(out.isnull().any()), which executed the entire upstream dask graph at to_geotiff(pack=True) call time. The streaming writer then executed the graph again for the pixels, so every source chunk computed twice. The guard now maps into the graph per chunk (_pack_guard_no_nan) for dask-backed data and raises from the write's single compute. numpy-backed data keeps the eager call-time check.

Measured on a 512x512 int16 source with SCALE/OFFSET and no GDAL_NODATA, read with unpack=True, chunks=128: decode-task executions drop from 32 to 16 (total task starts 71 to 33), round-trip parity unchanged.

Error timing for the dask NaN-with-no-sentinel case moves from the to_geotiff call to the write's compute; noted in to_geotiff's pack docstring and in _pack. The write stays atomic (temp file plus rename), so a failed pack leaves no partial output at the destination.

Backend coverage: dask (CPU) is the affected path and gets the fix. numpy keeps its eager behavior. The guard itself is backend-agnostic (np.isnan dispatches on the array type, meta= preserves cupy backing) and has a cupy-chunk unit test; full dask+cupy pack round trips remain blocked upstream by #3112.

Also carries the geotiff performance-sweep state update for pass 15 (one CSV row).

Test plan:

  • 9 new tests in tests/write/test_pack_lazy_nan_guard_3235.py: fusion-proof per-chunk execution counter, eager/lazy round-trip parity, dask raise-at-compute, numpy raise-at-call, guard unit tests on numpy and cupy chunks
  • Existing pack suites pass (test_pack_3064, test_pack_band_subset_3161, test_pack_nodata_kwarg_3168, test_mask_and_scale_dtype_parity_3066, test_streaming): 111 passed, 1 skipped, 2 xfailed
  • Full write + attrs suites: 1148 passed, 1 skipped, 2 xfailed
  • GPU-marked tests executed on a CUDA host

…3235)

The no-sentinel integer restore in _pack scanned for NaN with an eager
isnull().any(), executing the whole upstream graph at call time; the
streaming writer then executed it again for the pixels. Map the guard
per chunk into the graph for dask-backed data so it raises from the
write's single compute. numpy keeps the eager call-time check.

Also update the geotiff performance-sweep state for pass 15.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 11, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Defer the pack=True NaN guard into the dask graph so chunks compute once

Blockers (must fix before merge)

  • None found.

Suggestions (should fix, not blocking)

  • tests/write/test_pack_lazy_nan_guard_3235.py:50 -- the execution-counter test pins the tiled .tif path only. A .vrt destination consumes the packed graph through _write_vrt_tiled's delayed tile writes instead of _write_streaming, and nothing covers that the deferred ValueError still surfaces (same message) through _safe_write_tile's collect-and-reraise machinery. One small test writing the NaN no-sentinel dask array to a .vrt path would close that.

Nits (optional improvements)

  • xrspatial/geotiff/_attrs.py:1823 -- the guard is mapped even when out.dtype is non-float, where it can only be a no-op (the (data - offset) / scale arithmetic promotes to float in every reachable case). The chunk-level dtype.kind == 'f' check already makes this safe; this is graph noise in an unreachable case at most. Fine to leave.
  • xrspatial/geotiff/_attrs.py:1707 -- bool(np.isnan(chunk).any()) allocates a chunk-sized bool mask per chunk. np.isnan(np.min(chunk)) would avoid it, but the mask is chunk/8 bytes and the isnan(...).any() form matches the rest of the module. Consistency wins; noting for completeness only.

What looks good

  • The fix lands exactly where the issue points: the eager isnull().any() is gone for dask input, and the measurement in the PR body (decode executions 32 to 16) matches an independent rerun during this review.
  • meta=out.data._meta on the map_blocks call keeps cupy backing intact and avoids the meta-inference probe that would otherwise execute the guard once extra at graph build.
  • The counter test threads a counting identity block through the graph rather than parsing fused task names, so it survives scheduler fusion changes.
  • The error message is byte-identical between the eager and deferred paths, and test_pack_dask_nan_no_sentinel_packed_graph_is_lazy pins that _pack itself stays lazy even when NaN is present.
  • The docstring claim about atomicity was verified against _writer.py (mkstemp + os.replace + unlink on failure) rather than asserted.
  • The sweep CSV change is a single row with LF endings preserved, avoiding the known CRLF whole-file conflict hazard.

Checklist

  • Algorithm matches reference (inverse-scale semantics unchanged; guard placement before round/astype is correct)
  • All implemented backends produce consistent results (numpy eager, dask deferred; cupy guard unit-tested, e2e blocked by #3112 upstream)
  • NaN handling is correct (np.isnan, not == np.nan; float-kind gate)
  • Edge cases covered (clean chunk, NaN chunk, integer chunk, cupy chunk, parity round trip)
  • Dask chunk boundaries handled correctly (elementwise map_blocks, no overlap needed)
  • No premature materialization or unnecessary copies (that is the point of the PR)
  • Benchmark not needed (regression pinned by execution-count test instead)
  • README feature matrix not applicable (no new function, no backend change)
  • Docstrings present and accurate (error-timing change documented on both to_geotiff and _pack)

Review follow-up: the guard's ValueError must come back verbatim
through _write_vrt_tiled's collect-and-reraise path and leave no
tiles dir behind.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 5798562)

The suggestion from the first pass is addressed: test_pack_dask_nan_no_sentinel_raises_through_vrt_write now covers the .vrt destination, asserting the deferred ValueError comes back verbatim through _write_vrt_tiled's collect-and-reraise path and that neither the .vrt file nor the <stem>_tiles dir is left behind (the assertion names match the writer's actual stem + '_tiles' construction at _writers/eager.py:1264). All 10 tests in the file pass.

The two nits stand as dismissed for the reasons given in the first review: the non-float mapping case is unreachable and already handled at chunk level, and the isnan(...).any() form matches the module's prevailing style.

No new findings. Nothing blocking.

@brendancol brendancol merged commit 592f4e9 into main Jun 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff(pack=True) computes the whole dask graph twice when no nodata sentinel is declared

1 participant