Skip to content

reproject: disable numba fast path for non-WGS84 datums#2657

Merged
brendancol merged 3 commits into
mainfrom
issue-2651
May 29, 2026
Merged

reproject: disable numba fast path for non-WGS84 datums#2657
brendancol merged 3 commits into
mainfrom
issue-2651

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2651

The numba JIT fast path in the reproject module could silently corrupt coordinates when either CRS used a non-WGS84 datum.

  • The projection kernels run in WGS84. The datum-shift wrapper applied a geocentric Helmert shift to the kernel output as if it were lon/lat in degrees. For a projected source CRS (e.g. OSGB36 / EPSG:27700) the output is easting/northing in metres, so the shift produced nonsense (easting ~5 where pyproj returns ~408701 metres).
  • A non-WGS84 target datum was never handled, so the WGS84 kernels skipped the required target-side shift.

The fix gates try_numba_transform to return None whenever either side has a non-WGS84 datum, so pyproj handles those transforms. The now-dead datum-shift wrapper and the orphaned _apply_datum_shift_inv helper are removed.

Backend coverage: the gate lives in the shared coordinate-transform dispatch used by both the numpy and dask paths. The CUDA path has no datum handling and is unaffected.

Test plan:

  • New regression tests: fast path returns None for projected non-WGS source (EPSG:27700), geographic non-WGS source (EPSG:4267), and non-WGS target; WGS84 UTM path still uses the fast path.
  • End-to-end test confirms per-pixel source coords for an EPSG:27700 target match pyproj to mm.
  • Full reproject suite passes (385 tests), plus reproject coverage and lite_crs suites.

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
The numba projection kernels run in WGS84. The datum-shift wrapper
applied a degree-based Helmert shift to the kernel output, which is
wrong when the source CRS is projected (the output is easting/northing
in metres, not lon/lat degrees) and never handled a non-WGS84 target
datum at all. Reprojecting from EPSG:27700 returned eastings near 5
where pyproj returns ~408701 metres.

Gate try_numba_transform to return None whenever either side uses a
non-WGS84 datum so pyproj handles those transforms. Remove the now-dead
datum-shift wrapper and the orphaned _apply_datum_shift_inv helper.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: reproject: disable numba fast path for non-WGS84 datums

I confirmed the bug independently before reviewing. Reprojecting EPSG:27700 to EPSG:4326 through the old numba path returned an easting near 5 where pyproj returns ~408701 metres. The diagnosis holds: the old wrapper applied a degree-based Helmert shift to whatever the kernel returned, which is projected easting/northing for a projected source, and it never handled a non-WGS84 target datum. Gating the fast path to fall back to pyproj is the right call.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • The grid-shift machinery in xrspatial/reproject/_datum_grids.py is now dead production code. After this change its only callers are tests (xrspatial/tests/test_reproject.py:4705). Nothing in the runtime path reaches find_grid_for_point, get_grid, or apply_grid_shift_inverse anymore. Removing it is out of scope for a P0 correctness fix, so a follow-up issue to delete the unused module is cleaner than expanding this PR.

Nits (optional improvements)

  • _helmert7_inv (xrspatial/reproject/_projections.py) is now unused after _apply_datum_shift_inv was removed. _apply_datum_shift_fwd was already dead before this PR (defined, never called). Both could go in the same follow-up that drops _datum_grids.

What looks good

  • The gate sits in the shared try_numba_transform, so both the numpy and dask chunk paths get the fix. The CUDA path has no datum handling, so it is not affected.
  • transform_points is correctly left alone. It is used for output-bounds estimation and polygonize vertices, not per-pixel resampling, and its docstring already documents the omitted datum shift as a sub-pixel approximation. It never carried the corrupting wrapper.
  • Tests cover projected non-WGS source, geographic non-WGS source, and non-WGS target, plus a positive control that the WGS84 UTM path still uses the fast path, and an end-to-end check that EPSG:27700 source coords match pyproj to mm.
  • Removing the dead _apply_datum_shift_inv helper that the change orphaned keeps the diff clean.

Checklist

  • Algorithm matches reference: fast path now defers to pyproj for non-WGS84 datums
  • All implemented backends consistent: shared dispatch covers numpy and dask
  • NaN handling: unchanged by this PR
  • Edge cases covered by tests
  • Dask chunk boundaries: unchanged
  • No premature materialization or unnecessary copies
  • Benchmark: not needed (fix removes a fast path for an uncommon CRS class)
  • README feature matrix: not applicable (no new public function, no backend support change)
  • Docstrings present and accurate

@brendancol brendancol merged commit ecee550 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: numba fast path corrupts coordinates for non-WGS84 datums

1 participant