Skip to content

accessor: add DataArray-side .xrs.open_geotiff with backend + CRS inference #2557

@brendancol

Description

@brendancol

Reason or Problem

ds.xrs.open_geotiff(source) exists on the Dataset accessor (xrspatial/accessor.py:1068) and reads only the pixel window covered by the Dataset's y/x coords. Three gaps:

  1. There is no DataArray counterpart. da.xrs.to_geotiff(...) is defined but da.xrs.open_geotiff(...) is not, which is inconsistent and forces a Dataset wrap to use the windowed-read shortcut.
  2. Neither accessor adapts the read to the caller's backend. A dask+cupy DataArray that calls the accessor receives an eager numpy DataArray back, which then has to be rechunked and moved to GPU manually.
  3. The windowing path silently assumes the caller's coords are in the file's CRS. If they are not, the computed window is wrong and the read returns the wrong region with no error.

Proposal

Add da.xrs.open_geotiff(source, **kwargs) to the DataArray accessor and enhance both accessors to:

  • Infer the caller's backend (numpy / cupy / dask+numpy / dask+cupy) via _classify_backend and pass matching gpu= / chunks= to xrspatial.geotiff.open_geotiff so the returned DataArray's backend matches self. Caller-supplied gpu= / chunks= always win.
  • On CRS mismatch between caller and file, raise a clear ValueError by default (replacing today's silently-wrong window). With auto_reproject=True, project the caller's bbox to the file's CRS for the windowed read, then reproject the read result back to the caller's CRS via xrspatial.reproject.reproject so the returned DataArray lines up with self.

Design:

  • A new private helper in xrspatial/accessor.py (e.g. _open_geotiff_windowed(obj, source, *, auto_reproject, **kwargs)) holds the shared body. Both accessor methods delegate to it.
  • Backend inference uses _classify_backend(obj) from xrspatial/utils.py:345. Mapping:
    • numpy -> no override
    • cupy -> gpu=True
    • dask+numpy -> chunks=<caller's y-chunk size>
    • dask+cupy -> gpu=True, chunks=<caller's y-chunk size>
  • CRS comes from attrs['crs'] (int EPSG, the convention established by the geotiff reader). File CRS comes from _read_geo_info(source).crs_epsg.
  • The reproject path uses pyproj.Transformer to convert the caller's bbox to the file CRS for windowing, then xrspatial.reproject.reproject(read_da, target_crs=caller_crs, source_crs=file_crs) for the final result. reproject already supports all four backends, so backend match is preserved.
  • The new flag is auto_reproject: bool = False. Matches the allow_* / auto_* style already used in open_geotiff.

Usage:

import xrspatial  # registers .xrs accessor
import xarray as xr

# DataArray-side, backend inferred from caller
da = some_dask_cupy_dataarray  # EPSG:4326
out = da.xrs.open_geotiff("aoi.tif")           # raises if file CRS != 4326
out = da.xrs.open_geotiff("aoi.tif", auto_reproject=True)  # returns dask+cupy in EPSG:4326

# Dataset-side keeps existing behavior; gains the same flag
out = ds.xrs.open_geotiff("aoi.tif", auto_reproject=True)

Value:

  • Closes the symmetry gap between da.xrs.to_geotiff and da.xrs.open_geotiff.
  • Makes the windowed-read shortcut backend-correct by default, removing manual rechunk plus .map_blocks(cupy.asarray) boilerplate at every call site.
  • Turns today's silent CRS mismatch (wrong window, no error) into either a loud error or a correct, reprojected result, controlled by a single flag.

Stakeholders and Impacts

Affects xrspatial/accessor.py only (new helper plus two accessor methods). Uses existing public utilities (_classify_backend, _read_geo_info, _extent_to_window, xrspatial.reproject.reproject). No changes to xrspatial/geotiff/ or xrspatial/reproject/. I will implement.

Drawbacks

  • Default behavior on CRS mismatch changes from "silently wrong window" to "raises ValueError". This is a behavior change for any caller relying on the wrong-but-silent behavior; the upside is that the new error message points directly at auto_reproject=True.
  • Adds an import-time dependency on pyproj.Transformer inside the accessor body. The rest of the repo already depends on pyproj via the reproject module, so this is a no-op for dependency scope.

Alternatives

  • Leave the DataArray accessor empty and document the Dataset wrap workaround. Rejected: keeps the asymmetry and the silent-CRS-bug.
  • Always reproject on mismatch with no flag. Rejected: surprising for callers who explicitly want native-CRS data.

Unresolved Questions

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencybackend-coverageAdding missing dask/cupy/dask+cupy backend supportenhancementNew feature or requestgeotiffGeoTIFF module

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions