Skip to content

Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)#3135

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-06-09
Jun 10, 2026
Merged

Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)#3135
brendancol merged 4 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-06-09

Conversation

@brendancol

@brendancol brendancol commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

to_geotiff(da, 'out.vrt', tile_size=N) on a non-georeferenced array spanning more than one tile wrote a corrupt index. Placement was derived from each tile's GeoTransform, and non-georef tiles all carry the default identity transform, so every DstRect landed at the origin and rasterXSize/rasterYSize shrank to a single tile. Reading the VRT back silently returned one tile's data (a 24x32 input came back as 16x16). Gap left by #2971, whose tests only covered a single non-georef source.

Proposed Changes

  • _write_vrt_tiled records each tile's pixel offsets and passes them through _build_vrt to write_vrt via an internal dst_offsets kwarg when the input has no georeferencing. The dask-backed and plain-ndarray .vrt writes go through the same loop, so both are covered.
  • write_vrt refuses more than one all-non-georef source without explicit placement (identity transforms cannot place the tiles), and rejects dst_offsets alongside georeferenced sources, which place via their GeoTransform. The dst_offsets docstring states that offsets are not checked for overlap or full coverage; direct callers own their layout.
  • Georeferenced placement math is unchanged; the per-source offset computation moved out of the XML loop but produces the same values.
  • Merged origin/main to bring the branch up to date; the only conflict was in .claude/sweep-metadata-state.csv (HEAD geotiff row kept, new mcda row from main included).

Backend coverage: the writer fix is backend-independent; the round-trip test reads the repaired VRT on numpy, cupy, dask+numpy, and dask+cupy.

Test plan:

  • 18 new tests in tests/vrt/test_non_georef_placement_3116.py: 4-backend round trip, dask-backed write, plain-ndarray write, single-tile no-op, XML DstRect assertions, georef placement regression guard, and the write_vrt error contract (ambiguous placement, georef + offsets, length mismatch, malformed pairs).
  • Full VRT suite: 520 passed.
  • Write + round-trip + signature suites: 1292 passed, 1 skipped.
  • flake8 / isort clean on touched files.

Also carries the metadata-sweep state CSV row for the geotiff module (sweep bookkeeping).

to_geotiff(da, 'out.vrt', tile_size=N) on a non-georeferenced array
spanning more than one tile wrote a corrupt index: placement came from
each tile's GeoTransform, and non-georef tiles all carry the default
identity transform, so every DstRect landed at the origin and the
mosaic shrank to one tile. Reading the VRT back silently returned a
single tile's data with wrong dims and coords.

_write_vrt_tiled now records each tile's pixel offsets and passes them
through _build_vrt to write_vrt via an internal dst_offsets kwarg when
the input has no georeferencing. write_vrt refuses more than one
all-non-georef source without explicit placement, and rejects
dst_offsets alongside georeferenced sources (those place via their
GeoTransform). Georeferenced placement math is unchanged.

Also commits the metadata-sweep state row for the geotiff module.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_vrt.py (dst_offsets docstring): the contract doesn't say what happens when explicit offsets overlap or leave gaps. The only production caller (_write_vrt_tiled) always produces a full non-overlapping cover, but a direct internal caller gets no validation and no documented behaviour. A one-line note ("offsets are not checked for overlap or full coverage") would pin that down.

Nits (optional improvements)

  • tests/vrt/test_non_georef_placement_3116.py:92open(vrt_path).read() leaks the file handle; pathlib.Path(vrt_path).read_text() avoids the ResourceWarning under stricter warning filters.
  • tests/vrt/test_non_georef_placement_3116.py:222 — the hash(str(bad)) & 0xffff filename suffix isn't needed: pytest gives each parametrized case its own tmp_path, so a fixed name can't collide. The hash also varies between runs (string hashing is salted), which makes failure artifacts harder to compare.

What looks good

  • The placement refactor is value-preserving for georeferenced sources: the per-source offset math moved out of the XML loop verbatim, and test_georef_multi_tile_placement_unchanged plus the existing 520-test VRT suite pin it.
  • The fail-closed gate for multiple non-georef sources without offsets matches the _check_no_mixed_georef precedent, and the error message names the recovery path.
  • dst_offsets is rejected alongside georeferenced sources instead of being silently ignored, so the two placement mechanisms can't disagree.
  • The bool/float/negative/arity rejections on offset pairs are each covered by a parametrized test, and the True-as-int case is handled explicitly.
  • Round-trip coverage spans all four reader backends, plus the dask-backed and plain-ndarray write branches, which share the fixed loop.
  • The XML-level test asserts the exact DstRect offsets and mosaic size, so a placement regression can't hide behind a reader that compensates.

Checklist

  • Algorithm matches reference (VRT pixel-space DstRect placement; verified against the emitted XML)
  • All implemented backends produce consistent results (4-backend round-trip test)
  • NaN handling is correct (not applicable; writer-side index placement, no pixel math)
  • Edge cases are covered by tests (single tile, ragged last tile row/column, plain ndarray, dask chunk grid)
  • Dask chunk boundaries handled correctly (offsets follow the chunk grid, not tile_size)
  • No premature materialization or unnecessary copies (metadata-only pass unchanged)
  • Benchmark exists or is not needed (writer bug fix; not needed)
  • README feature matrix updated (not applicable; no new function or backend change)
  • Docstrings present and accurate (modulo the overlap/gap note above)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after e0da0b1.

All three findings from the first pass are addressed:

  • dst_offsets docstring now states that offsets are not checked for overlap or coverage and that direct callers own their layout (fixed).
  • XML assertion test reads via pathlib.Path.read_text, no leaked handle (fixed).
  • The salted-hash filename suffix in the bad-pair test is gone; tmp_path already isolates parametrized cases (fixed).

Test suite re-run after the changes: 18/18 in test_non_georef_placement_3116.py, flake8 and isort clean. Nothing further from me.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026
test_read_geotiff_dask_handles_vrt_directly builds a 2-source VRT from
non-georeferenced tiles, which the new placement guard now refuses
without dst_offsets. The test only verifies that _read_geotiff_dask
routes .vrt sources to the VRT reader, so spell out the side-by-side
layout its comment already described. Before the guard, this setup
produced the collapsed index #3116 fixed; the test passed only because
it asserts dims alone.
@brendancol

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

…eotiff-2026-06-09

# Conflicts:
#	.claude/sweep-metadata-state.csv

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Merged origin/main into this branch. The only conflict was in .claude/sweep-metadata-state.csv — both sides had modified the geotiff row; I kept the HEAD version (2026-06-09, #3116) and included the new mcda row that main added. Merge commit: d66a228.

@brendancol brendancol merged commit 7d62632 into main Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants