Skip to content

reproject: account for datum shifts in output bounds estimation#2655

Merged
brendancol merged 4 commits into
mainfrom
issue-2649
May 29, 2026
Merged

reproject: account for datum shifts in output bounds estimation#2655
brendancol merged 4 commits into
mainfrom
issue-2649

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2649

What

  • transform_points() ran its Numba projection kernels in WGS84 and skipped datum shifts. The per-pixel data path (try_numba_transform) applies a Helmert shift for the same CRS pairs, so output bounds estimated without the shift disagreed with the reprojected data by the shift magnitude. For NAD27 (EPSG:4267) that is tens to over a hundred metres in CONUS, which is many pixels on a high-resolution raster.
  • Fix: detect datum-shift pairs via _get_datum_params and bail to pyproj, which applies the shift. The fast path is unchanged for WGS84/NAD83 pairs that need no shift.

Backend coverage

transform_points is the shared CRS-pair dispatcher used by the bounds estimator across all backends (numpy / cupy / dask+numpy / dask+cupy). The change is backend-independent: it routes datum-shift pairs through pyproj before any per-backend array work, so all four paths get correct bounds.

Test plan

  • transform_points returns None (bails to pyproj) for NAD27 <-> Web Mercator, both directions
  • Fast path is preserved for WGS84 and NAD83 (no shift), matching pyproj to 1e-3 m
  • _transform_boundary matches pyproj for NAD27 (was off by ~45 m in x before)
  • End-to-end _compute_output_grid bounds for a high-resolution NAD27 raster sit on the datum-shifted corners, not the unshifted ones
  • Full reproject suite green (500 passed)

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
transform_points() ran its Numba projection kernels in WGS84 and skipped
datum shifts, but the per-pixel data path applies a Helmert shift for the
same CRS pairs. Output bounds estimated without the shift disagreed with
the reprojected data by the shift magnitude -- tens to over a hundred
metres for NAD27 (EPSG:4267) in CONUS, which is many pixels on a
high-resolution raster.

Detect datum-shift pairs via _get_datum_params and bail to pyproj, which
applies the shift correctly. The fast path is unchanged for WGS84/NAD83
pairs that need no shift.

Add TestDatumShiftBounds covering the fast-path bail, the no-shift fast
path, boundary agreement with pyproj, and end-to-end grid bounds for a
high-resolution NAD27 raster.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: account for datum shifts in output bounds estimation

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/reproject/_projections.py:2128: the bail also fires when source and target share the same datum-shift datum (e.g. NAD27 to a NAD27-based projected CRS), where the net shift cancels and the fast path would have been correct anyway. A few extra pairs go through pyproj for no accuracy gain. This is the safe default and the affected pairs are rare, so leaving it is fine.

What looks good

  • Root cause is correct. The fast path skipped the datum shift while the per-pixel data path (try_numba_transform) applies it, so the bounds and the data disagreed. The fix keys on the same _get_datum_params predicate the data path uses, so the two now agree by construction.
  • The second bounds caller at __init__.py:1796 already has a pyproj fallback right after the transform_points call, so it picks up the fix without changes.
  • Tests cover the bail, the preserved no-shift fast path, boundary agreement with pyproj, and end-to-end grid bounds for a high-resolution NAD27 raster, with a guard against regressing to the unshifted bounds.

Checklist

  • Algorithm matches reference (pyproj is the reference; fast path now defers to it for datum shifts)
  • Backend-independent change; no per-backend code paths
  • NaN handling unchanged (pyproj fallback already filters non-finite)
  • Edge cases covered (both directions, no-shift pairs, high-resolution grid)
  • No dask chunk boundary concerns (bounds estimation, not per-pixel)
  • No premature materialization or extra copies
  • Benchmark not needed (bounds estimation runs on a handful of points)
  • README feature matrix not applicable (no new function, no backend change)
  • Docstring updated to explain the datum-shift bail

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 89be27b)

The one nit from the previous pass is addressed: added a comment at xrspatial/reproject/_projections.py:2137 noting the bail is intentionally conservative for same-datum pairs (where the shift cancels), trading a rare fast-path miss for guaranteed correctness. Comment-only change, no logic touched.

No new findings. Datum-shift tests still pass (4/4). Nothing blocking.

@brendancol brendancol merged commit a6dd1c5 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: output bounds estimation skips datum shifts, wrong bounds for NAD27 and high-res rasters

1 participant