accessor: backend-aware .xrs.open_geotiff on DataArray and Dataset#2598
Conversation
…2557) Add .xrs.open_geotiff(source, *, auto_reproject=False, **kwargs) to the DataArray accessor (mirroring the existing Dataset method) and enhance both accessors to: - Infer the caller's backend (numpy / cupy / dask+numpy / dask+cupy) via xrspatial.utils._classify_backend and pass matching gpu= / chunks= to xrspatial.geotiff.open_geotiff so the returned DataArray matches the caller. Caller-supplied gpu= / chunks= always win. - Detect CRS mismatch between caller (attrs['crs']) and file (read via _read_geo_info). Default behaviour now raises a clear ValueError pointing at auto_reproject=True; previously the windowing code silently used the wrong bbox. With auto_reproject=True, the caller bbox is projected to the file CRS for the windowed read and the result is reprojected back to the caller's CRS via xrspatial.reproject.reproject. Both accessor methods share a module-private _open_geotiff_windowed helper to keep the behaviour identical. The Dataset method picks a representative 2D y/x data variable for backend/chunks inference (matching the pattern in to_geotiff) and falls back to ds.attrs['crs'] when the variable lacks it.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: accessor: backend-aware .xrs.open_geotiff on DataArray and Dataset
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/accessor.py:132-138(and the matchingint(caller_crs)/int(file_crs)calls at 142-143, 150-152, 186-187): the CRS comparison assumesattrs['crs']is int-convertible. Non-int values ("EPSG:4326", raw WKT,pyproj.CRS) makeint(caller_crs)raise with a confusingValueErrorbefore the mismatch logic even runs. The chained case is the worst version: afterauto_reproject=True,xrspatial.reproject.reprojectsetsresult.attrs['crs']to a WKT string (visible intest_auto_reproject_returns_caller_crs), soda.xrs.open_geotiff(...).xrs.open_geotiff(...)breaks on the second call. Normalize both sides throughpyproj.CRS(...)and compare with.equals(...), or reusexrspatial.geotiff._crs._resolve_crs_to_wktfor both. -
xrspatial/accessor.py:153-159: the bbox-to-file-CRS projection samples only the 4 corners. For projections with curvature across the bbox (high latitudes, large extents, anything pre/post-pole), the projected corner envelope is a strict subset of the true bbox and the windowed read misses data along the curved edges. At minimum add a docstring caveat; better, sample several points along each edge before taking min/max (e.g. 20 points per side vianp.linspace) so the window covers the full footprint. -
xrspatial/accessor.py:124-125:np.asarray(obj.coords['y'].values)is a double conversion..valuesalready returns a numpy array. Useobj.coords['y'].valuesdirectly (matches the existing Dataset accessor style at line 1094 of the pre-change file).
Nits (optional improvements)
-
xrspatial/geotiff/tests/integration/test_dask_pipeline.py:1166-1188(test_auto_reproject_returns_caller_crs):result.coords['y'].max() > 1e5just confirms the y values look like mercator metres rather than degrees, not that the reprojection is numerically correct. Compare the caller's bbox to(result.coords['x'].min/max, result.coords['y'].min/max)within a small tolerance so a future regression in the reprojection direction gets caught. -
No GPU test path (gated on
has_cuda_and_cupy) for backend inference. The existing accessor tests follow the same pattern, so this is consistent with the file, but the main new behaviour in this PR is backend matching, and the cupy / dask+cupy branches ataccessor.py:173-178are currently unexercised. A gated cupy test (e.g. assert that a cupy caller gets back a cupy-backed result) would close the gap. -
The
_open_geotiff_windoweddocstring ataccessor.py:108-114doesn't mention the half-pixel extent expansion or the EPSG-int assumption. Both surface as surprises (the mismatch error message names EPSG; a user with a WKT string inattrs['crs']will be confused).
What looks good
- Helper extraction is clean. Dataset and DataArray paths share
_open_geotiff_windowedso behaviour is identical in both directions. - API hygiene is right: caller kwargs always win over inferred backend kwargs (
kwargs.setdefaultat 174 and 178), and the new flag has a default that preserves prior behaviour on the matching-CRS path. - Default CRS-mismatch error message names the flag (
accessor.py:141-146), so callers find the fix without re-reading the docs. - Tests cover backend inference, the
chunks=override, the no-attrs['crs']backward-compat path, the Dataset CRS-attr fallback, and the mismatch raise. - CHANGELOG entry under
#### Addedcalls out the behaviour change from silently-wrong-window to ValueError, which is the only user-visible regression risk.
Checklist
- Algorithm matches reference/paper (n/a -- no numerical algorithm here, just windowing + reproject delegation)
- All implemented backends produce consistent results (numpy/dask only; cupy/dask+cupy paths untested)
- NaN handling is correct (n/a for this PR; coords are expected to be finite)
- Edge cases are covered by tests (no-coords, no-CRS, kwargs override, mismatch raise, mismatch + reproject)
- Dask chunk boundaries handled correctly (chunks inferred from caller's y-axis chunks)
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed (accessor-only, delegates to existing benchmarked function)
- README feature matrix updated (n/a -- accessor enhancement, not a new spatial op)
- Docstrings present and accurate
…2557) Review fixes for PR #2598: - _open_geotiff_windowed: normalize both caller and file CRS through pyproj.CRS and compare with .equals() instead of int(crs). Non-int attrs['crs'] values (WKT, "EPSG:xxxx", pyproj.CRS) now work, which also fixes chained reads: xrspatial.reproject sets attrs['crs'] to a WKT string, so the previous int() coercion broke da.xrs.open_geotiff(...).xrs.open_geotiff(...) on the second call. - Bbox transformation now samples the perimeter (20 points per side) before taking min/max in the file CRS, so the windowed read covers the full footprint when the transform has curvature across the bbox (high latitudes, large extents). - Drop redundant np.asarray() wrap on coord .values arrays. - Expand _open_geotiff_windowed docstring to call out the half-pixel extent expansion and the CRS normalization behaviour. - Strengthen test_auto_reproject_returns_caller_crs to compare bbox bounds within tolerance rather than a coarse magnitude check. - Add test_chained_open_geotiff_with_wkt_crs: regression for the WKT-string attrs['crs'] case (the chained-read failure). - Add TestOpenGeotiffGPUBackendInference_2557 with a cuda+cupy-gated test covering the cupy backend-inference branch.
brendancol
left a comment
There was a problem hiding this comment.
PR Review: follow-up after review fixes (#2598)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/accessor.py:107-118(_to_pyproj_crs): the bareexcept Exceptionswallows every parse failure and returnsNone. The mismatch logic then treatsNoneas "no CRS to compare" and skips the check, so a malformed but non-Noneattrs['crs'](e.g."EPSG:99999", a typo in a WKT) silently disables the safety net and reads with the caller's bbox assumed to be in the file CRS. Either narrow the except clause to the specific pyproj exceptions, or surface the parse failure as aValueErrorat the same severity as a real mismatch. Today's behaviour is strictly less safe than the previous int() coercion, which would at least have raised on garbage.
Nits (optional improvements)
-
xrspatial/accessor.py:121-139(_bbox_edge_samples):n_per_side=20is a magic constant with no override path. Fine for now (the bbox transform is cheap), but worth a comment near the call site ataccessor.py:202explaining the choice, since 20 will surprise the next reader. -
xrspatial/geotiff/tests/integration/test_dask_pipeline.py:1264-1265(test_chained_open_geotiff_with_wkt_crs): the final assertionassert result.shape == (4, 4) or result.shape[0] >= 4is permissive enough to pass on almost anything. The half-pixel expansion can grow the result by one row, soresult.shape[0] >= 4alone is the real check; the== (4, 4)half is misleading. Drop the OR or assert a tight upper bound (e.g.4 <= result.shape[0] <= 6).
What looks good
- All three Suggestions and all three Nits from the first review pass are addressed.
_to_pyproj_crs+pyproj.CRS.equalsis the right comparison primitive. It handles int /"EPSG:xxxx"/ WKT /pyproj.CRSsymmetrically, andtest_chained_open_geotiff_with_wkt_crsexercises the chained WKT case directly.- Perimeter sampling at
accessor.py:197-207correctly drops the int() coercions and routes both CRSs through pyproj objects forTransformer.from_crs. The 20-point-per-side sampling closes the curvature gap for the worst-realistic case (a large bbox at high latitudes). - The strengthened
test_auto_reproject_returns_caller_crsnow compares result bbox to template bbox within a one-pixel tolerance, which catches regressions in projection direction. Good replacement for the prior magnitude check. - The new gated
TestOpenGeotiffGPUBackendInference_2557::test_cupy_caller_returns_cupycovers thegpu=Truebranch (accessor.py:221-222) that was previously unexercised. Local runs skip cleanly when CUDA / cupy is absent.
Checklist
- Findings from the first review pass are addressed
- CRS comparison handles int / EPSG-string / WKT / pyproj.CRS uniformly
- bbox transform covers curved-edge cases
- Tests strengthened with bbox tolerance and chained-WKT regression
- GPU backend-inference test added (gated)
- Existing tests still pass (74 in test_dask_pipeline.py)
-
_to_pyproj_crssilently returns None on parse failure (see Suggestion)
Round 2 review fixes for PR #2598: - _to_pyproj_crs: replace bare 'except Exception' with a targeted pyproj CRSError catch that re-raises as ValueError, so a malformed attrs['crs'] (e.g. typo'd EPSG, garbage WKT) surfaces a clear error instead of silently disabling the mismatch safety net by returning None. - Add inline comment at the bbox-transform call site explaining the 20-points-per-side perimeter sampling choice. - Tighten the chained-WKT test's auto_reproject assertion to bound the result shape against the file's native pixel resolution rather than the permissive '== (4, 4) or >= 4' check. - Add test_malformed_crs_raises covering the new "garbage CRS raises ValueError" path.
Closes #2557.
Summary
da.xrs.open_geotiff(source, *, auto_reproject=False, **kwargs)to the DataArray accessor, mirroring the existing Dataset method.gpu=/chunks=toxrspatial.geotiff.open_geotiffso the returned DataArray matchesself. Caller-suppliedgpu=/chunks=always override the inference.auto_reprojectflag turns CRS mismatch from "silently wrong window" into either a clearValueError(default) or a correctly reprojected result that lines up with the caller's CRS (viaxrspatial.reproject.reproject).Backend coverage
numpy / cupy / dask+numpy / dask+cupy. Backend inference reads the caller's y-chunk size via
_classify_backend. The reproject path delegates toxrspatial.reproject.reproject, which already supports all four backends.Test plan
name=) are forwarded toopen_geotiffself.chunkschunks=overrides inferred chunksValueErrorby defaultauto_reproject=Truereturns a DataArray in the caller's CRSattrs['crs']skips the mismatch check (backward compatible)ds.attrs['crs']when the variable lacks onexrspatial/geotiff/tests/integration/test_dask_pipeline.pysuite still passes (72 tests, no regressions)xrspatial/tests/test_accessor.pysuite still passes (29 tests)