Add tests for the streaming reproject fallback path (#3101)#3118
Conversation
The streaming branch of reproject() (_reproject_streaming, _process_tile_batch, _parse_max_memory) only runs when the source exceeds 512 MB and dask is not importable, so no existing test executed it. Call the helpers directly on small rasters and compare against the in-memory numpy path on the same output grid: multi-tile threaded, serial (max_memory=1), single-tile, and nearest+NaN runs, plus _parse_max_memory unit cases. The 3-D source crash in the streaming assembly is pinned with a strict xfail referencing #3100 (source fix tracked there).
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Add tests for the streaming reproject fallback path (#3101)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
-
xrspatial/tests/test_reproject_streaming_3101.py:101-123-- the threaded vs serial split depends on the budget arithmetic inside_process_tile_batch(max_concurrent >= 2 and len(batch) > 1). If thetile_memformula in_reproject_streamingchanges, both tests could silently run the same branch while parity still passes. A cheap guard: assert the computed concurrency in each test (budget // tile_mem >= 2 for the threaded case, == 1 for the serial case) so a formula change that re-routes the branch is visible. -
xrspatial/tests/test_reproject_streaming_3101.py:140-152-- when #3100 is fixed, the strict xfail flips and forces a revisit, which is the point. The post-fix assertion only checksndim/shape[2], though. Compare against the in-memory 3-Dreproject()output so the un-xfailed test has real teeth from day one.
Nits (optional improvements)
- The dask.bag distributed branch of
_reproject_streaming(xrspatial/reproject/__init__.py:1592-1619) stays uncovered; it needs a live dask.distributed client. The sweep CSV records it as a carried LOW, fine to leave -- just noting that "streaming path covered" excludes that branch.
What looks good
- Parity tests compare against the public
reproject()output rather than stored fixtures, so the tile assembly is checked against the contract that matters. _parse_max_memorycases cover every suffix plus int, float, bare numeric string, and the None default.- The 3-D crash is pinned strict with
raises=ValueErrorand links the source issue; source stays untouched.
Checklist
- Algorithm matches reference -- N/A, test-only PR
- Backend consistency -- streaming path is CPU-only by design; no GPU variants apply
- NaN handling -- covered by the nearest-resampling NaN-block test
- Edge cases -- single-tile, multi-tile, serial, 3-D strict xfail
- Dask chunk boundaries -- N/A
- No premature materialization -- N/A (tests)
- Benchmark -- not needed for tests
- README feature matrix -- not applicable
- Docstrings -- module docstring explains why the helpers are called directly
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after d07dd09)
Both suggestions from the first pass are addressed:
_max_concurrentmirrors thetile_memarithmetic from_reproject_streaming/_process_tile_batch, and the threaded/serial tests now assert which batching branch they engage (>= 2vs== 1), plus a multi-tile shape check on the threaded run. A formula change upstream now fails loudly instead of collapsing both tests onto one branch. Fixed.test_3d_source_streamscompares against the in-memory 3-Dreproject()output, so the assertion is full parity once #3100 lands and the strict xfail comes off. Fixed.
Remaining open item is the nit about the dask.bag distributed branch (xrspatial/reproject/__init__.py:1592-1619), deferred with reason: it needs a live dask.distributed client fixture, and it is recorded as a carried LOW in the sweep state CSV.
File re-ran clean: 14 passed, 1 xfailed. No blockers.
) First CI run of the threaded streaming branch aborted on macos-arm64 py3.14: two pool threads entered the @njit(parallel=True) projection kernels at once and numba's workqueue layer called abort() (filed as #3141). Route the threaded parity test through transform_precision=0 (exact pyproj, one Transformer per thread) and pin the multi-tile NaN test and the 3-D xfail to the serial loop so the numba fast path is still covered, just never entered concurrently.
…age-reproject-2026-06-09
|
CI note: the first run's macos-3.14 job died with SIGABRT during test_multi_tile_threaded -- two pool threads entered the @njit(parallel=True) projection kernels at once, and numba's workqueue threading layer aborts on concurrent entry. That is a real bug in the streaming path itself (the production ThreadPoolExecutor does the same thing); filed as #3141. Commit 6432929 keeps the coverage but avoids the unsupported pattern: the threaded parity test runs with transform_precision=0 (exact pyproj, one Transformer per thread), and the multi-tile NaN test plus the 3-D xfail are pinned to the serial loop so the numba fast path is still exercised, just never concurrently. The windows-3.14 "failure" was fail-fast cancellation; its suite had fully passed. |
…) (#3181) #3111 fixed the 2-D output buffer crash in _reproject_streaming (#3100) two minutes before #3118 merged a strict xfail pinning that same crash. On merged main the test body passes, so the strict marker fails CI with XPASS. Drop the xfail and let the test assert streaming vs in-memory parity directly.
Closes #3101.
reproject()'s streaming fallback (_reproject_streaming,_process_tile_batch,_parse_max_memory) only runs when the source is over 512 MB and dask is not importable, so nothing in CI ever executed it. The new test file calls the helpers directly on small rasters and checks the output against the in-memory numpy path on the same grid: multi-tile threaded, serial (max_memory=1byte), single tile, and nearest resampling with NaN cells._parse_max_memorygets 10 unit cases (int, float, KB/MB/GB/TB suffixes, lowercase with whitespace, bare numeric string, None default).Test plan:
pytest xrspatial/tests/test_reproject_streaming_3101.py-> 14 passed, 1 xfailed