Skip to content

accessor: backend-aware .xrs.open_geotiff on DataArray and Dataset#2598

Merged
brendancol merged 4 commits into
mainfrom
issue-2557
May 28, 2026
Merged

accessor: backend-aware .xrs.open_geotiff on DataArray and Dataset#2598
brendancol merged 4 commits into
mainfrom
issue-2557

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2557.

Summary

  • Adds da.xrs.open_geotiff(source, *, auto_reproject=False, **kwargs) to the DataArray accessor, mirroring the existing Dataset method.
  • Both accessors now infer the caller's backend (numpy / cupy / dask+numpy / dask+cupy) and pass matching gpu= / chunks= to xrspatial.geotiff.open_geotiff so the returned DataArray matches self. Caller-supplied gpu= / chunks= always override the inference.
  • New auto_reproject flag turns CRS mismatch from "silently wrong window" into either a clear ValueError (default) or a correctly reprojected result that lines up with the caller's CRS (via xrspatial.reproject.reproject).

Backend coverage

numpy / cupy / dask+numpy / dask+cupy. Backend inference reads the caller's y-chunk size via _classify_backend. The reproject path delegates to xrspatial.reproject.reproject, which already supports all four backends.

Test plan

  • DataArray accessor windowed read returns the expected slice
  • DataArray accessor raises on missing y/x coords
  • kwargs (e.g. name=) are forwarded to open_geotiff
  • numpy caller -> numpy result
  • dask caller -> dask result with inferred y-chunk size from self.chunks
  • Explicit chunks= overrides inferred chunks
  • Dataset accessor infers backend from the first 2D y/x data variable
  • CRS mismatch raises ValueError by default
  • auto_reproject=True returns a DataArray in the caller's CRS
  • Caller without attrs['crs'] skips the mismatch check (backward compatible)
  • Dataset CRS falls back from ds.attrs['crs'] when the variable lacks one
  • Existing xrspatial/geotiff/tests/integration/test_dask_pipeline.py suite still passes (72 tests, no regressions)
  • Existing xrspatial/tests/test_accessor.py suite still passes (29 tests)

…2557)

Add .xrs.open_geotiff(source, *, auto_reproject=False, **kwargs) to
the DataArray accessor (mirroring the existing Dataset method) and
enhance both accessors to:

- Infer the caller's backend (numpy / cupy / dask+numpy / dask+cupy)
  via xrspatial.utils._classify_backend and pass matching gpu= /
  chunks= to xrspatial.geotiff.open_geotiff so the returned
  DataArray matches the caller. Caller-supplied gpu= / chunks=
  always win.

- Detect CRS mismatch between caller (attrs['crs']) and file
  (read via _read_geo_info). Default behaviour now raises a clear
  ValueError pointing at auto_reproject=True; previously the
  windowing code silently used the wrong bbox. With
  auto_reproject=True, the caller bbox is projected to the file CRS
  for the windowed read and the result is reprojected back to the
  caller's CRS via xrspatial.reproject.reproject.

Both accessor methods share a module-private
_open_geotiff_windowed helper to keep the behaviour identical. The
Dataset method picks a representative 2D y/x data variable for
backend/chunks inference (matching the pattern in to_geotiff) and
falls back to ds.attrs['crs'] when the variable lacks it.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: accessor: backend-aware .xrs.open_geotiff on DataArray and Dataset

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/accessor.py:132-138 (and the matching int(caller_crs) / int(file_crs) calls at 142-143, 150-152, 186-187): the CRS comparison assumes attrs['crs'] is int-convertible. Non-int values ("EPSG:4326", raw WKT, pyproj.CRS) make int(caller_crs) raise with a confusing ValueError before the mismatch logic even runs. The chained case is the worst version: after auto_reproject=True, xrspatial.reproject.reproject sets result.attrs['crs'] to a WKT string (visible in test_auto_reproject_returns_caller_crs), so da.xrs.open_geotiff(...).xrs.open_geotiff(...) breaks on the second call. Normalize both sides through pyproj.CRS(...) and compare with .equals(...), or reuse xrspatial.geotiff._crs._resolve_crs_to_wkt for both.

  • xrspatial/accessor.py:153-159: the bbox-to-file-CRS projection samples only the 4 corners. For projections with curvature across the bbox (high latitudes, large extents, anything pre/post-pole), the projected corner envelope is a strict subset of the true bbox and the windowed read misses data along the curved edges. At minimum add a docstring caveat; better, sample several points along each edge before taking min/max (e.g. 20 points per side via np.linspace) so the window covers the full footprint.

  • xrspatial/accessor.py:124-125: np.asarray(obj.coords['y'].values) is a double conversion. .values already returns a numpy array. Use obj.coords['y'].values directly (matches the existing Dataset accessor style at line 1094 of the pre-change file).

Nits (optional improvements)

  • xrspatial/geotiff/tests/integration/test_dask_pipeline.py:1166-1188 (test_auto_reproject_returns_caller_crs): result.coords['y'].max() > 1e5 just confirms the y values look like mercator metres rather than degrees, not that the reprojection is numerically correct. Compare the caller's bbox to (result.coords['x'].min/max, result.coords['y'].min/max) within a small tolerance so a future regression in the reprojection direction gets caught.

  • No GPU test path (gated on has_cuda_and_cupy) for backend inference. The existing accessor tests follow the same pattern, so this is consistent with the file, but the main new behaviour in this PR is backend matching, and the cupy / dask+cupy branches at accessor.py:173-178 are currently unexercised. A gated cupy test (e.g. assert that a cupy caller gets back a cupy-backed result) would close the gap.

  • The _open_geotiff_windowed docstring at accessor.py:108-114 doesn't mention the half-pixel extent expansion or the EPSG-int assumption. Both surface as surprises (the mismatch error message names EPSG; a user with a WKT string in attrs['crs'] will be confused).

What looks good

  • Helper extraction is clean. Dataset and DataArray paths share _open_geotiff_windowed so behaviour is identical in both directions.
  • API hygiene is right: caller kwargs always win over inferred backend kwargs (kwargs.setdefault at 174 and 178), and the new flag has a default that preserves prior behaviour on the matching-CRS path.
  • Default CRS-mismatch error message names the flag (accessor.py:141-146), so callers find the fix without re-reading the docs.
  • Tests cover backend inference, the chunks= override, the no-attrs['crs'] backward-compat path, the Dataset CRS-attr fallback, and the mismatch raise.
  • CHANGELOG entry under #### Added calls out the behaviour change from silently-wrong-window to ValueError, which is the only user-visible regression risk.

Checklist

  • Algorithm matches reference/paper (n/a -- no numerical algorithm here, just windowing + reproject delegation)
  • All implemented backends produce consistent results (numpy/dask only; cupy/dask+cupy paths untested)
  • NaN handling is correct (n/a for this PR; coords are expected to be finite)
  • Edge cases are covered by tests (no-coords, no-CRS, kwargs override, mismatch raise, mismatch + reproject)
  • Dask chunk boundaries handled correctly (chunks inferred from caller's y-axis chunks)
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (accessor-only, delegates to existing benchmarked function)
  • README feature matrix updated (n/a -- accessor enhancement, not a new spatial op)
  • Docstrings present and accurate

…2557)

Review fixes for PR #2598:

- _open_geotiff_windowed: normalize both caller and file CRS through
  pyproj.CRS and compare with .equals() instead of int(crs). Non-int
  attrs['crs'] values (WKT, "EPSG:xxxx", pyproj.CRS) now work, which
  also fixes chained reads: xrspatial.reproject sets attrs['crs'] to
  a WKT string, so the previous int() coercion broke
  da.xrs.open_geotiff(...).xrs.open_geotiff(...) on the second call.

- Bbox transformation now samples the perimeter (20 points per side)
  before taking min/max in the file CRS, so the windowed read covers
  the full footprint when the transform has curvature across the
  bbox (high latitudes, large extents).

- Drop redundant np.asarray() wrap on coord .values arrays.

- Expand _open_geotiff_windowed docstring to call out the half-pixel
  extent expansion and the CRS normalization behaviour.

- Strengthen test_auto_reproject_returns_caller_crs to compare bbox
  bounds within tolerance rather than a coarse magnitude check.

- Add test_chained_open_geotiff_with_wkt_crs: regression for the
  WKT-string attrs['crs'] case (the chained-read failure).

- Add TestOpenGeotiffGPUBackendInference_2557 with a cuda+cupy-gated
  test covering the cupy backend-inference branch.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: follow-up after review fixes (#2598)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/accessor.py:107-118 (_to_pyproj_crs): the bare except Exception swallows every parse failure and returns None. The mismatch logic then treats None as "no CRS to compare" and skips the check, so a malformed but non-None attrs['crs'] (e.g. "EPSG:99999", a typo in a WKT) silently disables the safety net and reads with the caller's bbox assumed to be in the file CRS. Either narrow the except clause to the specific pyproj exceptions, or surface the parse failure as a ValueError at the same severity as a real mismatch. Today's behaviour is strictly less safe than the previous int() coercion, which would at least have raised on garbage.

Nits (optional improvements)

  • xrspatial/accessor.py:121-139 (_bbox_edge_samples): n_per_side=20 is a magic constant with no override path. Fine for now (the bbox transform is cheap), but worth a comment near the call site at accessor.py:202 explaining the choice, since 20 will surprise the next reader.

  • xrspatial/geotiff/tests/integration/test_dask_pipeline.py:1264-1265 (test_chained_open_geotiff_with_wkt_crs): the final assertion assert result.shape == (4, 4) or result.shape[0] >= 4 is permissive enough to pass on almost anything. The half-pixel expansion can grow the result by one row, so result.shape[0] >= 4 alone is the real check; the == (4, 4) half is misleading. Drop the OR or assert a tight upper bound (e.g. 4 <= result.shape[0] <= 6).

What looks good

  • All three Suggestions and all three Nits from the first review pass are addressed.
  • _to_pyproj_crs + pyproj.CRS.equals is the right comparison primitive. It handles int / "EPSG:xxxx" / WKT / pyproj.CRS symmetrically, and test_chained_open_geotiff_with_wkt_crs exercises the chained WKT case directly.
  • Perimeter sampling at accessor.py:197-207 correctly drops the int() coercions and routes both CRSs through pyproj objects for Transformer.from_crs. The 20-point-per-side sampling closes the curvature gap for the worst-realistic case (a large bbox at high latitudes).
  • The strengthened test_auto_reproject_returns_caller_crs now compares result bbox to template bbox within a one-pixel tolerance, which catches regressions in projection direction. Good replacement for the prior magnitude check.
  • The new gated TestOpenGeotiffGPUBackendInference_2557::test_cupy_caller_returns_cupy covers the gpu=True branch (accessor.py:221-222) that was previously unexercised. Local runs skip cleanly when CUDA / cupy is absent.

Checklist

  • Findings from the first review pass are addressed
  • CRS comparison handles int / EPSG-string / WKT / pyproj.CRS uniformly
  • bbox transform covers curved-edge cases
  • Tests strengthened with bbox tolerance and chained-WKT regression
  • GPU backend-inference test added (gated)
  • Existing tests still pass (74 in test_dask_pipeline.py)
  • _to_pyproj_crs silently returns None on parse failure (see Suggestion)

Round 2 review fixes for PR #2598:

- _to_pyproj_crs: replace bare 'except Exception' with a targeted
  pyproj CRSError catch that re-raises as ValueError, so a malformed
  attrs['crs'] (e.g. typo'd EPSG, garbage WKT) surfaces a clear
  error instead of silently disabling the mismatch safety net by
  returning None.

- Add inline comment at the bbox-transform call site explaining the
  20-points-per-side perimeter sampling choice.

- Tighten the chained-WKT test's auto_reproject assertion to bound
  the result shape against the file's native pixel resolution
  rather than the permissive '== (4, 4) or >= 4' check.

- Add test_malformed_crs_raises covering the new "garbage CRS
  raises ValueError" path.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 28, 2026
@brendancol brendancol merged commit c32f551 into main May 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accessor: add DataArray-side .xrs.open_geotiff with backend + CRS inference

1 participant