Skip to content

Match streaming reproject output dtype to the other backends (#3093)#3111

Merged
brendancol merged 5 commits into
mainfrom
deep-sweep-metadata-reproject-2026-06-09-01
Jun 10, 2026
Merged

Match streaming reproject output dtype to the other backends (#3093)#3111
brendancol merged 5 commits into
mainfrom
deep-sweep-metadata-reproject-2026-06-09-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3093

_reproject_streaming() (the fallback reproject() uses when dask is not installed and the in-memory source exceeds 512 MB) allocated its assembled output as np.full(out_shape, nodata, dtype=np.float64) in both the local and dask.bag branches.

  • Integer sources now keep their dtype, using the same round-trip rule as _reproject_dask / _reproject_dask_cupy (reproject: dask+cupy fast path drops integer dtype to float64 #2505 parity). The per-tile worker already cast each tile back to the source dtype; the float64 allocation was throwing that away.
  • 3-D (y, x, band) sources now get a (*out_shape, n_bands) allocation instead of crashing with could not broadcast input array from shape (H,W,B) into shape (H,W).

Backend coverage: this only touches the streaming numpy fallback. The numpy, cupy, dask+numpy, and dask+cupy paths are unchanged and their dtype behaviour is already pinned by TestDaskDtypeParity / TestDaskCupyDtypeParity.

Test plan:

  • New TestStreamingDtypeParity: int16/uint8 dtype round-trip, float64 stays float64, value+dtype parity against the in-memory path, 3-D band-axis assembly (the helper had no coverage at all before)
  • Full reproject suite: 449 passed

The streaming fallback allocated its assembled output as float64
regardless of source dtype, so integer rasters lost their dtype on the
one path built for memory-bound inputs. It also allocated 2-D, which
crashed on 3-D (y, x, band) sources. Use the same integer-round-trip
rule as _reproject_dask and carry the band axis through, in both the
local and dask.bag branches.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 9, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Match streaming reproject output dtype to the other backends (#3093)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • The dask.bag distributed branch (xrspatial/reproject/__init__.py:1633) gets the same allocation fix but no test exercises it; the new tests all run the local ThreadPoolExecutor branch. A test that spins up a dask.distributed LocalCluster would cover it, though that pulls distributed into the test deps. If that's not worth it, a comment in the test class noting the distributed branch is verified only by inspection would do.

Nits (optional improvements)

  • _reproject_streaming's docstring (xrspatial/reproject/__init__.py:1565) still says nothing about dtype. The dask paths document the integer round-trip rule in comments; one line in the streaming docstring would save the next reader a trip to _reproject_dask.

What looks good

  • The dtype rule is copied from _reproject_dask / _reproject_dask_cupy verbatim, so the parity claim is structural, not coincidental.
  • Both allocation sites (local and distributed) are fixed, not just the tested one.
  • Empty/non-overlapping tiles return float64 sentinel fills from _reproject_chunk_numpy; assignment into an integer result casts exactly because _detect_nodata guarantees integer-valued, in-range sentinels for integer dtypes.
  • The value-parity test compares the streaming output against the in-memory reproject() result rather than just checking dtype, and the 3-D test pins the previously-crashing case.

Checklist

  • Algorithm matches reference (dtype rule identical to _reproject_dask)
  • Backend parity restored (streaming now matches numpy/cupy/dask/dask+cupy)
  • NaN handling unchanged (float sources keep NaN sentinel semantics)
  • Edge cases covered (int16, uint8, float64, 3-D band stack, multi-tile)
  • Distributed dask.bag branch has no direct test (see suggestion)
  • No premature materialization introduced
  • Benchmark not needed (allocation-only change)
  • README matrix unchanged (no new function, no backend support change)
  • Docstrings: see nit

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after 784bdd3.

Both findings from the first pass are addressed:

  • The distributed dask.bag branch now has direct coverage: test_streaming_distributed_branch_preserves_dtype runs _reproject_streaming under an in-process LocalCluster (16 tiles vs 1 worker, so the bag branch is actually taken) and checks dtype plus exact value parity against the local-branch result. It skips cleanly when distributed is not installed.
  • _reproject_streaming's docstring now states the dtype rule.

No new issues found. The diff is still confined to the two allocation sites, one docstring, and tests.

…eproject-2026-06-09-01

# Conflicts:
#	.claude/sweep-metadata-state.csv
The projection kernels in _projections.py are parallel=True, and
numba's default workqueue threading layer terminates the process when
two host threads enter a parallel region at once. The streaming tile
pool added test coverage for exactly that pattern, and the macos-3.14
job died with SIGABRT inside try_numba_transform. dask's threaded
scheduler launches the same kernels from worker threads, so the lock
lives in _projections.py: try_numba_transform and transform_points now
take a module-level lock before launching kernels. Regression test
hammers both entry points from 4 threads in a subprocess with
NUMBA_THREADING_LAYER=workqueue forced.
@brendancol

Copy link
Copy Markdown
Contributor Author

CI follow-up on the macos-3.14 SIGABRT (run 27243383958):

The new streaming tests exercise _process_tile_batch, which runs _reproject_chunk_numpy from a ThreadPoolExecutor. That calls the parallel=True numba kernels in _projections.py, and numba's default workqueue threading layer is not threadsafe: two host threads entering a parallel region at once terminate the process. Reproduced locally with NUMBA_THREADING_LAYER=workqueue and 4 threads hammering try_numba_transform; numba prints "The workqueue threading layer is not threadsafe and may not be accessed concurrently by multiple threads" before dying. On macOS it aborts outright, which is exactly what the faulthandler dump in the CI log shows (Dispatcher_call -> abort, threads parked in _do_one).

So the streaming path has had this crash latent in production since it was written; the new coverage just made CI roll the dice. The dask threaded scheduler launches the same kernels from worker threads too.

Fix in 72a0f17: try_numba_transform and transform_points now take a module-level lock before launching kernels, and a regression test hammers both entry points from 4 threads in a subprocess with the workqueue layer forced (fails on the unfixed code, passes now). The same pattern exists in _vertical.py's geoid kernels under dask; filed #3142 to keep this PR focused.

The windows-3.14 "failure" was a fail-fast cancellation after macos died, and the Read the Docs failure is "Build terminated due to time out" (RTD resource limit, no docs in this diff; recent unrelated PRs show the same flap).

@brendancol brendancol merged commit 1a63f90 into main Jun 10, 2026
7 checks passed
brendancol added a commit that referenced this pull request Jun 10, 2026
…) (#3181)

#3111 fixed the 2-D output buffer crash in _reproject_streaming (#3100)
two minutes before #3118 merged a strict xfail pinning that same crash.
On merged main the test body passes, so the strict marker fails CI with
XPASS. Drop the xfail and let the test assert streaming vs in-memory
parity directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

reproject streaming fallback returns float64 for integer sources and crashes on 3-D inputs

1 participant