Skip to content

Skip duplicate try_numba_transform probe in the numpy reproject chunk worker#3123

Open
brendancol wants to merge 2 commits into
mainfrom
deep-sweep-performance-reproject-2026-06-09
Open

Skip duplicate try_numba_transform probe in the numpy reproject chunk worker#3123
brendancol wants to merge 2 commits into
mainfrom
deep-sweep-performance-reproject-2026-06-09

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3106

  • _reproject_chunk_numpy probed the numba fast path, then _transform_coords probed it again before building the pyproj control grid. The worker now passes no CRS objects to _transform_coords, which skips the inner retry (it is gated on both CRSes being non-None). Saves one CRS-param parse (~10 pyproj to_dict()/to_authority() round-trips) and four chunk-sized float64 allocations per output chunk on pyproj-fallback CRS pairs, measured at ~11% of the chunk worker for a 512x512 chunk (4326 -> Mollweide).
  • The cupy CPU fallbacks (_reproject_chunk_cupy, _reproject_dask_cupy) are unchanged: they only try the CUDA path first, so the probe inside _transform_coords is their first and only numba attempt.
  • Also commits the performance-sweep state row for the reproject pass that found this (OOM verdict SAFE, compute-bound).

Backends: affects numpy and dask+numpy (plus merge's per-block adapter, which routes through the same worker). cupy and dask+cupy behavior unchanged; validated on a CUDA host (numpy/cupy max abs diff 2e-12, on-device transform confirmed).

Test plan:

  • New TestNoDuplicateNumbaFastPathProbe: spy asserts exactly one try_numba_transform call per chunk on a no-fast-path pair; _transform_coords keeps its own probe when handed CRS objects; pyproj-reference value parity for the fallback pair
  • Full reproject suite: 447 passed (test_reproject.py, coverage, cupy gate)

_reproject_chunk_numpy already probes the numba fast path before
falling back to _transform_coords, which probed it again. Pass no CRS
objects to _transform_coords so the inner retry is skipped. Saves one
CRS-param parse and four chunk-sized coordinate allocations per output
chunk on pyproj-fallback CRS pairs (~11% of the chunk worker). The cupy
CPU fallbacks keep the inner probe -- it is their first numba attempt.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 9, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Skip duplicate try_numba_transform probe in the numpy reproject chunk worker

Blockers (must fix before merge)

  • None.

Suggestions (should fix, not blocking)

  • None. The change is two lines plus a comment, and the gating logic in _transform_coords (xrspatial/reproject/init.py:284) is unchanged for callers that pass CRS objects.

Nits (optional improvements)

  • The state CSV rewrite reorders the interpolate_spline / interpolate-kriging rows. This is just the canonical writer's sorted() ordering (- sorts before _), not data loss, but it makes the diff look bigger than it is.
  • test_fallback_pair_values_match_pyproj_reference exercises _transform_coords directly rather than the chunk worker, so it overlaps a little with TestExactPrecisionEscapeHatch. Fine to keep since it pins the no-CRS-kwargs call form the worker now uses.

Verification performed

  • Reverted xrspatial/reproject/__init__.py to the pre-fix state and re-ran the new test class: test_chunk_numpy_probes_fast_path_exactly_once fails (probe count 2), the other two pass. With the fix it passes. The regression test is genuinely red on the old code.
  • Full reproject suite (test_reproject.py + coverage + cupy gate): 447 passed.

What looks good

  • The fix targets only _reproject_chunk_numpy; the cupy CPU fallbacks (_reproject_chunk_cupy line ~571, _reproject_dask_cupy line ~1790) still pass CRS objects, and the new test_transform_coords_still_probes_when_given_crs pins that contract so a future cleanup can't silently remove their only numba probe.
  • transform_precision=0 semantics are untouched: both the worker's probe and the inner retry were already gated off for the exact path, and the existing TestExactPrecisionEscapeHatch spies still pass.
  • WKT construction moved into a helper so collection does not touch pyproj when it is not installed.

Checklist

  • Algorithm matches reference (identical inputs to both probes; skipping the second cannot change results)
  • All implemented backends produce consistent results (numpy/dask+numpy affected; cupy paths unchanged)
  • NaN handling is correct (untouched)
  • Edge cases covered (probe-count spy, CRS-passing contract, pyproj value parity)
  • Dask chunk boundaries handled correctly (per-chunk worker logic unchanged)
  • No premature materialization or unnecessary copies (removes allocations)
  • Benchmark exists (benchmarks/benchmarks/reproject.py covers reproject)
  • README feature matrix: no change needed (no new function, no backend change)
  • Docstrings: no public API change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject: numpy chunk worker runs try_numba_transform twice per chunk on pyproj-fallback CRS pairs

1 participant