Skip to content

Forward caller chunking to the coregister/auto_reproject output#3236

Merged
brendancol merged 2 commits into
mainfrom
issue-3234
Jun 11, 2026
Merged

Forward caller chunking to the coregister/auto_reproject output#3236
brendancol merged 2 commits into
mainfrom
issue-3234

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3234

  • .xrs.open_geotiff now passes chunk_size to the reproject() call used by coregister=True and auto_reproject=True, so a dask result keeps the caller's chunk layout (or the explicit chunks= kwarg when one is given) instead of reverting to reproject's 512x512 default.
  • Renamed the internal _infer_caller_y_chunk to _infer_caller_chunk(obj, axis) so the x-axis chunk can be inferred too. Single call site, no public API change.
  • Documented the chunking behavior in the DataArray accessor docstring.

Backend coverage: the fix sits in the shared windowed-read path. coregister is CPU-only by design (numpy / dask+numpy); the auto_reproject chunk forwarding also covers dask+cupy callers.

Test plan:

  • test_coregister_dask_template_keeps_caller_chunks: dask template chunked (3, 2) gives output chunksize (3, 2), asymmetric to prove (y, x) ordering
  • test_coregister_explicit_chunks_kwarg: numpy template + chunks=2 gives output chunksize (2, 2)
  • test_auto_reproject_dask_template_keeps_caller_chunks: auto_reproject output follows caller chunks
  • test_open_geotiff_coregister.py, test_open_geotiff_resampling.py, test_accessor.py all pass (90 tests)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 11, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Forward caller chunking to the coregister/auto_reproject output

Blockers (must fix before merge)

  • xrspatial/accessor.py:323explicit_chunks = kwargs.get('chunks') captures the raw kwarg and forwards it to reproject(chunk_size=...) unconverted. _validate_chunks_arg deliberately accepts np.integer scalars for the read path and coerces them, but _compute_chunk_layout only special-cases builtin int, so chunks=np.int64(2048) with coregister=True now raises TypeError: cannot unpack non-iterable numpy.int64 object. Before this PR that call worked (the read coerced it; reproject never saw it). Coerce integer-likes with int(...) before forwarding, or run the captured value through _validate_chunks_arg(..., allow_none=True).

Suggestions (should fix, not blocking)

  • xrspatial/tests/test_open_geotiff_coregister.py — no test covers a tuple chunks=(3, 2) kwarg flowing through to the reproject output. It is the same forwarding path, but the tuple form is the one _compute_chunk_layout unpacks directly, so a one-line assertion would pin it down.

Nits (optional improvements)

  • xrspatial/accessor.py:980-983 — the new Returns sentence says a dask result chunks "like self", but the windowed read only infers the y-axis chunk (line 328), so a plain read against a template chunked (3, 2) comes back (3, 3). The exact (y, x) match only holds for the reproject output. Scoping the sentence to the reproject step would keep the docstring honest.

What looks good

  • Capturing explicit_chunks before the setdefault is the right move; reading kwargs['chunks'] afterward would have conflated user intent with the inferred value.
  • The asymmetric (3, 2) chunk test pins the (y, x) ordering, which is exactly the kind of thing that silently transposes.
  • Forwarding into both the coregister and auto_reproject branches fixes the same gap in one pass instead of leaving the second branch for a follow-up.
  • None falls through to reproject's existing defaults, so numpy callers without a chunks= kwarg see no behavior change.

Checklist

  • Algorithm matches reference/paper (n/a — chunk plumbing only)
  • All implemented backends produce consistent results (CPU-only coregister guard unchanged; dask+cupy auto_reproject forwards the same value)
  • NaN handling is correct (untouched; nodata_arg path unchanged)
  • Edge cases are covered by tests (np.integer and tuple chunks kwarg are not)
  • Dask chunk boundaries handled correctly (reproject computes per-chunk source windows; chunk size only changes tiling)
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (not needed)
  • README feature matrix updated (n/a)
  • Docstrings present and accurate (modulo the nit above)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after b4c66f4)

All three findings from the first pass are addressed:

  • Blocker (np.integer chunks crash): fixed. The captured kwarg now goes through _validate_chunks_arg(..., allow_none=True) at xrspatial/accessor.py:325, which coerces np.integer scalars to builtin int before the value reaches _compute_chunk_layout. Verified chunks=np.int64(2) with coregister=True returns chunksize (2, 2) in the new test.
  • Suggestion (tuple chunks coverage): fixed. test_coregister_chunks_kwarg_tuple_and_np_integer covers both the (row, col) tuple form and the np scalar form.
  • Nit (docstring overstatement): fixed. The Returns sentence now scopes the chunk-matching claim to the reproject step.

One side effect worth noting, not a problem: invalid chunks values (e.g. a string) now raise from the accessor before the read starts, instead of from inside the read. The error message is identical because both call the same validator, so callers just see it earlier.

No new issues. 91 tests pass across the coregister, resampling, and accessor suites; flake8 is clean.

@brendancol brendancol merged commit a172608 into main Jun 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

coregister=True drops the caller's chunking; reproject output reverts to 512x512

1 participant