Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after band-subset reads by brendancol · Pull Request #3175 · xarray-contrib/xarray-spatial

brendancol · 2026-06-10T13:11:07Z

_pack() kept attrs['gdal_metadata'] verbatim. That's right for a full-file round trip, but after a band-subset unpack=True read the kept metadata still describes the source's band indices, so a pack=True write re-emitted per-band SCALE/OFFSET entries for bands the output file doesn't have. Reading the packed file back raised MixedBandMetadataError, and band=0 applied the wrong band's scale with no error.

The fix: when the array carries unpack state (scale_factor / mask_and_scale_dtype) and the metadata has per-band (SCALE, i) / (OFFSET, i) entries, _pack() replaces them with dataset-level values holding the pair that was actually applied on read. The stale gdal_metadata_xml attr is dropped so the writer rebuilds GDAL_METADATA from the rewritten dict. Dataset-level-only metadata is untouched, including the raw XML. Plain masked reads are also untouched, since they never applied the per-band scale.

The rewrite happens in _pack(), which runs before write dispatch, so numpy and dask writes both get it (mask_and_scale is a CPU eager + dask read feature).

Test plan:

New tests in tests/write/test_pack_band_subset_3161.py: the issue repro (distinct per-band SCALE, band-subset read, pack, re-read full and band=0), per-band OFFSET, selected band without a SCALE entry, full read with uniform per-band scale, dataset-level metadata kept verbatim
Existing pack tests (test_pack_3064.py) still pass
geotiff write/attrs/unit/read/round-trip suites pass locally

…nd-subset reads (#3161)

brendancol

PR Review: Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after band-subset reads

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

xrspatial/geotiff/_attrs.py:1728-1738 -- the rewrite block only normalizes SCALE/OFFSET. Other per-band items (band descriptions, STATISTICS_*) survive a band-subset pack with their source indices intact, e.g. a single-band packed file can still carry ('STATISTICS_MAXIMUM', 1). They don't affect pixel values, and they can't be re-indexed without knowing which band was read, so leaving them is the right call. Worth one sentence in the comment so the next reader knows it's deliberate rather than missed.
xrspatial/geotiff/tests/write/test_pack_band_subset_3161.py:106 and :130 -- test_pack_band_subset_selected_band_without_scale and test_pack_full_read_uniform_per_band_scale run eager-only while the two tests above them parametrize numpy/dask. The metadata logic is backend-independent, but the parametrize line is cheap and keeps the file symmetric.

What looks good

The trigger is scoped to arrays carrying unpack state (scale_factor / mask_and_scale_dtype), so a plain masked=True read of a scaled file keeps its valid source metadata instead of having it collapsed to identity.
The rewrite builds a new dict rather than mutating attrs['gdal_metadata'] in place. That matters: _finalize_eager_read documents that nested attr values are shared with the caller's seed dict, so in-place mutation would leak into the read-side object.
Dropping gdal_metadata_xml on rewrite is necessary, not incidental: _extract_rich_tags (_attrs.py:1441-1446) prefers the raw XML over the dict, so leaving it would re-emit the stale per-band items anyway.
Verified by hand: a source with dataset-level SCALE=0.5 plus distinct per-band entries keeps the dataset-level value verbatim (it won on read) and drops only the per-band noise; the packed file re-reads to identical values.
Tests hit the exact issue repro (full re-read used to raise MixedBandMetadataError, band=0 used to apply 0.1 instead of 0.2), plus per-band OFFSET, a selected band with no SCALE entry of its own, uniform per-band full reads, and the dataset-level-verbatim guarantee. Tmp names carry the issue number.

Checklist

Algorithm matches reference: the applied (scale, offset) pair is by construction the single pair valid for every band present, and dataset-level entries take precedence in _extract_scale_offset
Backends: rewrite runs in _pack before write dispatch; numpy and dask round trips both tested (mask_and_scale is CPU eager + dask only)
NaN handling unchanged (sentinel fill path untouched)
Edge cases covered (identity scale, offset-only, mixed dataset+per-band)
No dask materialization added (attrs-only logic)
Benchmark not needed (metadata dict work on the write path)
README/docs: no public API change; pack docstring promise now actually holds
Docstrings updated (_pack documents the per-band exception)

…maining pack tests on dask too (#3161)

brendancol

Follow-up review after `9ab38ab`

Both nits from the first pass are addressed:

_attrs.py:1739-1741 now says outright that non-SCALE per-band items (band descriptions, STATISTICS_*) are left alone because they don't affect pixel values and can't be re-indexed without the original band index.
test_pack_band_subset_selected_band_without_scale and test_pack_full_read_uniform_per_band_scale now run on numpy and dask like the rest of the file (22 tests total, all passing locally).

No new findings. The delta is a comment plus two parametrize lines; nothing else changed.

Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after ba…

dd3452a

…nd-subset reads (#3161)

github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

brendancol commented Jun 10, 2026

View reviewed changes

Address review nits: note non-SCALE per-band items left as-is, run re…

9ab38ab

…maining pack tests on dask too (#3161)

brendancol commented Jun 10, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into issue-3161

e89f43f

brendancol mentioned this pull request Jun 10, 2026

main is red: strict xfail in test_reproject_streaming_3101 XPASSes after #3111 fixed the #3100 crash #3177

Closed

Merge remote-tracking branch 'origin/main' into issue-3161

cacb429

brendancol merged commit c7a8b64 into main Jun 10, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after band-subset reads#3175

Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after band-subset reads#3175
brendancol merged 4 commits into
mainfrom
issue-3161

brendancol commented Jun 10, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 10, 2026

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

PR Review: Rewrite stale per-band SCALE/OFFSET in to_geotiff(pack=True) after band-subset reads

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Follow-up review after 9ab38ab

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Follow-up review after `9ab38ab`