Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/sweep-api-consistency-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ module,last_inspected,issue,severity_max,categories_found,notes
geotiff,2026-05-18,2106,MEDIUM,3,"Sweep 2026-05-18 (deep-sweep-api-consistency-geotiff-2026-05-18-1779164255). 1 MEDIUM Cat 3 finding fixed in this branch: open_geotiff(max_cloud_bytes=...) was the only kwarg on the public reader/writer surface without a Python type annotation. Docstring already declared ``int or None``; the surface and the docs disagreed. Fix adds ``int | None`` to the annotation; default stays the module-internal _MAX_CLOUD_BYTES_SENTINEL. Regression test in test_open_geotiff_max_cloud_bytes_annot_2106.py pins the immediate gap and parametrises over every public reader/writer to catch future ungenerated annotations. Prior sweep findings (#1922/#1935 kwarg ordering, #2052 mask_nodata parity, #2097 GPU MinIsWhite, #2095 zero-band 3D writes, #1946 write_vrt path/vrt_path shim) all confirmed fixed. Cross-sibling return-type drift (Cat 2): write_vrt returns str while to_geotiff and write_geotiff_gpu return path which is str | BinaryIO -- inspected and still LOW (callers do not substitute writers; the return-type drift is documented in each writer's docstring). Cross-cutting cross-module drift (chunk_size in reproject vs chunks in geotiff; target_crs vs crs) documented but not filed per sweep template (cross-cutting). cuda-validated."
polygonize,2026-05-19,2148,HIGH,1;3,"Sweep 2026-05-19 (deep-sweep-api-consistency-polygonize-2026-05-19). 1 MEDIUM Cat 3 finding fixed in this branch (#2148): polygonize() was the only public vector/raster conversion function without a return type annotation. Sieve/contours/rasterize/clip_polygon all declare one. Fix adds a Union return annotation (numpy tuple | awkward tuple | geopandas GeoDataFrame | spatialpandas GeoDataFrame | geojson dict) using TYPE_CHECKING forward refs for optional deps, and expands the docstring Returns section to enumerate the per-return_type shapes. 1 HIGH Cat 1 finding NOT fixed in this PR -- cross-module rename: polygonize uses `connectivity` (int 4|8) while sieve uses `neighborhood` (int 4|8) for the identical rook/queen pixel-connectivity concept. Industry convention (GDAL, rasterio.features.sieve) favours `connectivity`; the deprecation shim belongs in sieve.py, not polygonize, so this is out of scope for the polygonize-scoped sweep branch. Documented here for the next sieve sweep pass. 1 LOW Cat 1 cross-cutting: polygonize/sieve/clip_polygon use `raster` while contours and many older modules use `agg` for the input DataArray -- library-wide drift, not filed per-module per sweep template. Cat 2 return-shape: polygonize returns tuple/GeoDataFrame/dict by return_type; consistent with contours' tuple/GeoDataFrame dispatch. No Cat 4 (no mutable defaults; connectivity=4 default matches sieve neighborhood=4 default). No Cat 5 (polygonize re-exported in xrspatial/__init__.py; no orphan API; no __all__ but consistent with module convention). cuda-validated: cupy backend accepts identical kwargs, smoke-tested with cupy DataArray on host with CUDA_AVAILABLE."
rasterize,2026-05-21,2250,MEDIUM,3,"Sweep 2026-05-21 (deep-sweep-api-consistency-rasterize-2026-05-21). 1 MEDIUM Cat 3 finding fixed in this branch (#2250): rasterize() was missing type annotations on geometries, columns, and merge (3 of 16 public params); the other 13 plus the return type were annotated. The docstring already declared the intended types so this was a doc-vs-signature drift. Fix annotates geometries: Any (because the accepted GeoDataFrame / dask_geopandas / iterable union spans optional deps), columns: Optional[Sequence[str]], merge: Union[str, Callable]. Regression test in test_rasterize_signature_annot_2250.py pins every param + the return annotation so a future contributor can't silently drop annotations again. Cross-module drift documented but not filed per template: clip_polygon(nodata) vs rasterize(fill) same concept different name; clip_polygon(name: Optional[str]=None) vs rasterize(name: str='rasterize') default convention; polygonize(column_name) vs rasterize(column) column selector. No Cat 1 in-module rename, no Cat 2 return drift (returns xr.DataArray as documented), no Cat 4 mutable defaults, no Cat 5 orphan API (rasterize is the only public symbol from the module and is re-exported in __init__). cuda-validated: cupy backend accepts identical kwargs, smoke-tested with use_cuda=True on host with CUDA_AVAILABLE."
reproject,2026-05-10,1570,HIGH,2;5,"Filed cross-module attrs['vertical_crs'] type collision (string vs EPSG int) vs xrspatial.geotiff. Fixed in PR (TBD): reproject now writes EPSG int and preserves friendly token under vertical_datum. MEDIUM kwarg-order drift (transform_precision vs chunk_size) and missing type hints vs geotiff documented but not fixed (cosmetic, kwarg-only)."
reproject,2026-05-29,2613,MEDIUM,1,"Sweep 2026-05-29 (deep-sweep-api-consistency-reproject-2026-05-29). 1 MEDIUM Cat 1 finding fixed in this branch (#2613, PR #2626): reproject() spelled the source/target concept two ways in one signature -- source_crs/target_crs (full words) for horizontal CRS but src_vertical_crs/tgt_vertical_crs (abbreviated) for the vertical datum. Renamed the vertical kwargs to source_vertical_crs/target_vertical_crs with a deprecation shim: old names still accepted, emit DeprecationWarning, and passing both old+new for one side raises TypeError. Docstring updated; existing vertical-shift tests migrated to new names; added back-compat + conflict tests. Verified on numpy AND cupy entry points (shared signature; backend dispatch is internal). Other findings documented but NOT filed per template: (LOW Cat 1) itrf_transform(src=/tgt=) uses abbreviated keyword-only names for ITRF frame names vs source_crs/target_crs elsewhere -- separate function family (frames, not CRS), left as-is. (LOW cross-cutting Cat 1) first-arg `raster` (reproject)/`rasters` (merge) vs `agg` in terrain modules -- library-wide drift, not per-module. Prior #1570 vertical_crs EPSG-int collision confirmed still fixed. No Cat 2 return drift (reproject/merge both return DataArray as documented; geoid_height scalar/array and itrf_transform tuple are distinct families). No Cat 4 default drift (resampling/transform_precision/chunk_size/bounds_policy/model defaults consistent across siblings). No Cat 5 orphan API (itrf_frames is list_frames aliased in __all__; vertical/itrf funcs namespaced under xrspatial.reproject like geotiff's funcs). cuda-validated: CUDA_AVAILABLE=True on this host."
resample,2026-05-27,2544,MEDIUM,3,"Sweep 2026-05-27 (deep-sweep-api-consistency-resample-2026-05-27). 1 MEDIUM Cat 3 finding fixed in this branch (#2544): resample() was the only public symbol in xrspatial.resample without type annotations on any parameter or return; siblings slope/aspect/hillshade/curvature all annotate `agg: xr.DataArray` and `-> xr.DataArray`. Fix adds annotations matching the docstring (agg: xr.DataArray; scale_factor / target_resolution: float | tuple[float, float] | None; method: str; nodata: float | None; name: str) and a `-> xr.DataArray` return type, plus a docstring note that the @supports_dataset decorator accepts Dataset too. Regression test test_resample_signature_annot_2544.py pins every param and the return annotation. Other findings documented but not filed per template: (MEDIUM Cat 1 cross-module) `method` (resample) vs `resampling` (reproject/merge) -- same conceptual parameter, different name, cross-cutting rename, needs design issue. (LOW Cat 1 cross-cutting) first-arg `agg` (resample/slope/aspect/...) vs `raster` (reproject/rasterize/polygonize/sieve) -- library-wide drift, not per-module. (LOW Cat 5) ALL_METHODS imported by tests but not in __all__ (module has no __all__); borderline orphan but used for test parametrisation only. No Cat 2 (returns xr.DataArray as documented). No Cat 4 mutable defaults. resample is exported in xrspatial/__init__.py. cuda-validated: cupy backend smoke-tested with nearest, bilinear, and average on host with CUDA_AVAILABLE=True."
zonal,2026-05-27,2521,HIGH,1;3;5,"Sweep 2026-05-27 (deep-sweep-api-consistency-zonal-2026-05-27). 1 HIGH Cat 1 finding fixed in this branch (#2521): crop() used zones_ids while stats/crosstab use zone_ids -- pure typo creating a TypeError trap when switching between sibling zonal functions. Fix accepts both, deprecates zones_ids with DeprecationWarning, raises if both supplied, raises if neither. All call sites in tests migrated to canonical zone_ids; legacy zones_ids paths covered by new regression tests. Other findings not fixed in this PR: (HIGH Cat 1+4) nodata vs nodata_values drift across stats/crosstab (nodata_values=None) vs apply/hypsometric_integral (nodata=0) -- different name AND different default, breaks substitutability; cross-function scope, needs a design issue. (MEDIUM Cat 3) crosstab docstring says 'layer: int, default=0' but signature is 'Optional[int] = None'. (MEDIUM Cat 3) hypsometric_integral lacks all type annotations; apply and crop lack return type annotations (siblings have them). (MEDIUM Cat 5) get_full_extent has public-style docstring with 'from xrspatial.zonal import get_full_extent' example but is not in __init__.py -- borderline orphan, but minor utility. (LOW Cat 3) apply() docstring mixes 'values' parameter name with 'agg' prose; example returns np.array shape (not DataArray) while function actually returns a DataArray. Cross-cutting: zones/raster as first-arg name varies (zonal.stats uses zones; zonal.regions/trim use raster). Regions/trim are single-array operations on the zone raster itself, so the rename arguably matches the role. Documented, not filed. cuda-validated: CUDA_AVAILABLE=True on this host."
76 changes: 61 additions & 15 deletions xrspatial/reproject/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,34 @@
'ellipsoidal': 4979, # WGS 84 (3D, ellipsoidal height)
}

# Sentinel marking the deprecated ``src_vertical_crs`` / ``tgt_vertical_crs``
# kwargs as "not passed". Distinct from None so we can tell an explicit
# ``src_vertical_crs=None`` apart from the default and only warn when the
# caller actually used the old name.
_DEPRECATED = object()


def _resolve_deprecated_vertical_kwarg(old_name, old_val, new_name, new_val):
"""Map a deprecated vertical-CRS kwarg onto its renamed replacement.

Emits a ``DeprecationWarning`` when the old name is used and rejects
passing both the old and new spellings at once.
"""
if old_val is _DEPRECATED:
return new_val
import warnings
warnings.warn(
f"reproject(): {old_name!r} is deprecated, use {new_name!r} instead.",
DeprecationWarning,
stacklevel=3,
)
if new_val is not None:
raise TypeError(
f"reproject(): pass either {new_name!r} or the deprecated "
f"{old_name!r}, not both."
)
return old_val


def _find_spatial_dims(raster):
"""Find the y and x dimension names, handling multi-band rasters.
Expand Down Expand Up @@ -653,9 +681,11 @@ def reproject(
chunk_size=None,
name=None,
max_memory=None,
src_vertical_crs=None,
tgt_vertical_crs=None,
source_vertical_crs=None,
target_vertical_crs=None,
bounds_policy="auto",
src_vertical_crs=_DEPRECATED,
tgt_vertical_crs=_DEPRECATED,
):
"""Reproject a raster DataArray to a new coordinate reference system.

Expand Down Expand Up @@ -702,16 +732,22 @@ def reproject(
``'512MB'``. Controls how many output tiles are processed
in parallel for large-dataset streaming mode. Default None
uses 1GB. Has no effect for small datasets that fit in memory.
src_vertical_crs : str or None
source_vertical_crs : str or None
Source vertical datum for height values. One of:

- ``'EGM96'`` -- orthometric heights relative to EGM96 geoid (MSL)
- ``'EGM2008'`` -- orthometric heights relative to EGM2008 geoid
- ``'ellipsoidal'`` -- heights relative to the WGS84 ellipsoid
- ``None`` -- no vertical transformation (default)
tgt_vertical_crs : str or None
Target vertical datum. Same options as *src_vertical_crs*.
target_vertical_crs : str or None
Target vertical datum. Same options as *source_vertical_crs*.
Both must be set to trigger a vertical transformation.
src_vertical_crs : str or None
Deprecated alias for *source_vertical_crs*. Passing it emits a
``DeprecationWarning``.
tgt_vertical_crs : str or None
Deprecated alias for *target_vertical_crs*. Passing it emits a
``DeprecationWarning``.
bounds_policy : {"auto", "raw", "clamp", "percentile"}, default "auto"
How to derive the output extent from the source extent when
``bounds`` is not supplied. Only relevant when projecting near a
Expand Down Expand Up @@ -745,13 +781,13 @@ def reproject(
-------
xr.DataArray
The output ``attrs['crs']`` is in WKT format.
Whenever *tgt_vertical_crs* is set, ``attrs['vertical_crs']``
Whenever *target_vertical_crs* is set, ``attrs['vertical_crs']``
records the target vertical datum's EPSG code (5773 for EGM96,
3855 for EGM2008, 4979 for ellipsoidal WGS84) to match the
convention used by ``xrspatial.geotiff``. The friendly string
token (``'EGM96'`` etc.) is preserved under ``attrs['vertical_datum']``.
Both attrs are written even when no shift is applied (e.g. when
*src_vertical_crs* equals *tgt_vertical_crs*, or when only the
*source_vertical_crs* equals *target_vertical_crs*, or when only the
target is given), so the output's vertical reference is always
explicit.

Expand Down Expand Up @@ -786,6 +822,16 @@ def reproject(
>>> result.attrs['crs'].startswith(('PROJCRS', 'PROJCS'))
True
"""
# Back-compat shim for the old abbreviated kwarg names. These were
# renamed to source_vertical_crs / target_vertical_crs to match the
# source_crs / target_crs spelling used by the rest of the signature.
source_vertical_crs = _resolve_deprecated_vertical_kwarg(
'src_vertical_crs', src_vertical_crs,
'source_vertical_crs', source_vertical_crs)
target_vertical_crs = _resolve_deprecated_vertical_kwarg(
'tgt_vertical_crs', tgt_vertical_crs,
'target_vertical_crs', target_vertical_crs)

_validate_raster(raster, func_name='reproject', name='raster',
ndim=(2, 3))

Expand All @@ -812,8 +858,8 @@ def reproject(

# Reject unknown vertical-datum tokens at the API boundary so we never
# write None into attrs['vertical_crs'] for typos / unsupported values.
for _name, _val in (('src_vertical_crs', src_vertical_crs),
('tgt_vertical_crs', tgt_vertical_crs)):
for _name, _val in (('source_vertical_crs', source_vertical_crs),
('target_vertical_crs', target_vertical_crs)):
if _val is not None and _val not in _VERTICAL_DATUM_EPSG:
raise ValueError(
f"Unknown {_name}={_val!r}; expected one of "
Expand Down Expand Up @@ -970,11 +1016,11 @@ def reproject(
)

# Vertical datum transformation (if requested)
if src_vertical_crs is not None and tgt_vertical_crs is not None:
if src_vertical_crs != tgt_vertical_crs:
if source_vertical_crs is not None and target_vertical_crs is not None:
if source_vertical_crs != target_vertical_crs:
result_data, nd = _apply_vertical_shift(
result_data, y_coords, x_coords,
src_vertical_crs, tgt_vertical_crs, nd,
source_vertical_crs, target_vertical_crs, nd,
tgt_crs_wkt=tgt_wkt,
)

Expand Down Expand Up @@ -1009,13 +1055,13 @@ def reproject(
except TypeError:
n_entries = 1
out_attrs['nodatavals'] = tuple(nd for _ in range(n_entries))
if tgt_vertical_crs is not None:
if target_vertical_crs is not None:
# Align with xrspatial.geotiff: attrs['vertical_crs'] holds the
# EPSG integer code. The friendly string token is preserved under
# attrs['vertical_datum'] so the human-readable name is not lost.
# See GH issue #1570.
out_attrs['vertical_crs'] = _VERTICAL_DATUM_EPSG.get(tgt_vertical_crs)
out_attrs['vertical_datum'] = tgt_vertical_crs
out_attrs['vertical_crs'] = _VERTICAL_DATUM_EPSG.get(target_vertical_crs)
out_attrs['vertical_datum'] = target_vertical_crs

# Handle multi-band output (3D result from multi-band source)
if result_data.ndim == 3:
Expand Down
Loading
Loading