Skip to content

Add tests for the streaming reproject fallback path (#3101)#3118

Merged
brendancol merged 5 commits into
mainfrom
deep-sweep-test-coverage-reproject-2026-06-09
Jun 10, 2026
Merged

Add tests for the streaming reproject fallback path (#3101)#3118
brendancol merged 5 commits into
mainfrom
deep-sweep-test-coverage-reproject-2026-06-09

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3101.

  • reproject()'s streaming fallback (_reproject_streaming, _process_tile_batch, _parse_max_memory) only runs when the source is over 512 MB and dask is not importable, so nothing in CI ever executed it. The new test file calls the helpers directly on small rasters and checks the output against the in-memory numpy path on the same grid: multi-tile threaded, serial (max_memory=1 byte), single tile, and nearest resampling with NaN cells. _parse_max_memory gets 10 unit cases (int, float, KB/MB/GB/TB suffixes, lowercase with whitespace, bare numeric string, None default).
  • The 3-D source crash in the streaming assembly is pinned with a strict xfail referencing _reproject_streaming crashes on 3-D (multi-band) sources #3100. The source fix stays with that issue; this PR is test-only.
  • Backend note: the streaming path is CPU/numpy-only, so no cupy or dask variants apply.

Test plan:

  • pytest xrspatial/tests/test_reproject_streaming_3101.py -> 14 passed, 1 xfailed
  • No source changes (the only non-test diff is the sweep state CSV)

The streaming branch of reproject() (_reproject_streaming,
_process_tile_batch, _parse_max_memory) only runs when the source
exceeds 512 MB and dask is not importable, so no existing test
executed it. Call the helpers directly on small rasters and compare
against the in-memory numpy path on the same output grid: multi-tile
threaded, serial (max_memory=1), single-tile, and nearest+NaN runs,
plus _parse_max_memory unit cases.

The 3-D source crash in the streaming assembly is pinned with a
strict xfail referencing #3100 (source fix tracked there).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 9, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Add tests for the streaming reproject fallback path (#3101)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/tests/test_reproject_streaming_3101.py:101-123 -- the threaded vs serial split depends on the budget arithmetic inside _process_tile_batch (max_concurrent >= 2 and len(batch) > 1). If the tile_mem formula in _reproject_streaming changes, both tests could silently run the same branch while parity still passes. A cheap guard: assert the computed concurrency in each test (budget // tile_mem >= 2 for the threaded case, == 1 for the serial case) so a formula change that re-routes the branch is visible.
  • xrspatial/tests/test_reproject_streaming_3101.py:140-152 -- when #3100 is fixed, the strict xfail flips and forces a revisit, which is the point. The post-fix assertion only checks ndim/shape[2], though. Compare against the in-memory 3-D reproject() output so the un-xfailed test has real teeth from day one.

Nits (optional improvements)

  • The dask.bag distributed branch of _reproject_streaming (xrspatial/reproject/__init__.py:1592-1619) stays uncovered; it needs a live dask.distributed client. The sweep CSV records it as a carried LOW, fine to leave -- just noting that "streaming path covered" excludes that branch.

What looks good

  • Parity tests compare against the public reproject() output rather than stored fixtures, so the tile assembly is checked against the contract that matters.
  • _parse_max_memory cases cover every suffix plus int, float, bare numeric string, and the None default.
  • The 3-D crash is pinned strict with raises=ValueError and links the source issue; source stays untouched.

Checklist

  • Algorithm matches reference -- N/A, test-only PR
  • Backend consistency -- streaming path is CPU-only by design; no GPU variants apply
  • NaN handling -- covered by the nearest-resampling NaN-block test
  • Edge cases -- single-tile, multi-tile, serial, 3-D strict xfail
  • Dask chunk boundaries -- N/A
  • No premature materialization -- N/A (tests)
  • Benchmark -- not needed for tests
  • README feature matrix -- not applicable
  • Docstrings -- module docstring explains why the helpers are called directly

…#3101)

- Mirror the tile_mem budget arithmetic so the threaded and serial
  tests verify which _process_tile_batch branch they take.
- The 3-D xfail now compares against the in-memory 3-D reproject()
  output, so the test has full parity teeth once #3100 lands.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after d07dd09)

Both suggestions from the first pass are addressed:

  • _max_concurrent mirrors the tile_mem arithmetic from _reproject_streaming / _process_tile_batch, and the threaded/serial tests now assert which batching branch they engage (>= 2 vs == 1), plus a multi-tile shape check on the threaded run. A formula change upstream now fails loudly instead of collapsing both tests onto one branch. Fixed.
  • test_3d_source_streams compares against the in-memory 3-D reproject() output, so the assertion is full parity once #3100 lands and the strict xfail comes off. Fixed.

Remaining open item is the nit about the dask.bag distributed branch (xrspatial/reproject/__init__.py:1592-1619), deferred with reason: it needs a live dask.distributed client fixture, and it is recorded as a carried LOW in the sweep state CSV.

File re-ran clean: 14 passed, 1 xfailed. No blockers.

)

First CI run of the threaded streaming branch aborted on macos-arm64
py3.14: two pool threads entered the @njit(parallel=True) projection
kernels at once and numba's workqueue layer called abort() (filed as
#3141). Route the threaded parity test through transform_precision=0
(exact pyproj, one Transformer per thread) and pin the multi-tile NaN
test and the 3-D xfail to the serial loop so the numba fast path is
still covered, just never entered concurrently.
@brendancol

Copy link
Copy Markdown
Contributor Author

CI note: the first run's macos-3.14 job died with SIGABRT during test_multi_tile_threaded -- two pool threads entered the @njit(parallel=True) projection kernels at once, and numba's workqueue threading layer aborts on concurrent entry. That is a real bug in the streaming path itself (the production ThreadPoolExecutor does the same thing); filed as #3141.

Commit 6432929 keeps the coverage but avoids the unsupported pattern: the threaded parity test runs with transform_precision=0 (exact pyproj, one Transformer per thread), and the multi-tile NaN test plus the 3-D xfail are pinned to the serial loop so the numba fast path is still exercised, just never concurrently. The windows-3.14 "failure" was fail-fast cancellation; its suite had fully passed.

@brendancol brendancol merged commit 8c3d07e into main Jun 10, 2026
7 checks passed
brendancol added a commit that referenced this pull request Jun 10, 2026
…) (#3181)

#3111 fixed the 2-D output buffer crash in _reproject_streaming (#3100)
two minutes before #3118 merged a strict xfail pinning that same crash.
On merged main the test body passes, so the strict marker fails CI with
XPASS. Drop the xfail and let the test assert streaming vs in-memory
parity directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming reproject fallback has zero test coverage

1 participant