Drop two avoidable full-raster allocations in rasterize backends (#3107) by brendancol · Pull Request #3131 · xarray-contrib/xarray-spatial

brendancol · 2026-06-09T23:52:29Z

Closes #3107.

All four backends now return through astype(dtype, copy=False), so the default float64 case no longer copies the full work buffer (_run_numpy, _run_cupy, _rasterize_tile_numpy, _rasterize_tile_cupy). The buffer is local to each function, so nothing aliases it.
The CPU paths allocate the per-pixel order buffer through a new _alloc_order helper: int8 for merges whose predicate never reads it (max/min/sum/count and user callables), int64 only for first/last. The int8 buffer still receives int64 owner-index stores; numba wraps them and nothing reads the values back. GPU paths keep int64 because order is an atomic target there.

Measured with tracemalloc on a 4000x4000 numpy rasterize: peak drops from 25 B/px to 10 B/px for merge='sum' and to 17 B/px for merge='last'.

Backends: numpy and dask+numpy get both changes; cupy and dask+cupy get the copy=False change. Verified on a CUDA host: cupy last/sum/max and dask+cupy sum match the CPU backend, output stays on device, and the dask graph is unchanged (400 tasks for 100 chunks).

Test plan:

New test_rasterize_alloc_3107.py: _alloc_order dtype selection, output parity with 300+ geometries (int8 owner indices wrap repeatedly), numpy/dask parity for all six merge modes plus a user callable, tracemalloc peak-memory bounds, dtype casts still applied.
Full rasterize suite: 662 passed, 2 skipped (includes the GPU tests on this host).

Found by the performance sweep (Cat 4, memory allocation). The sweep state CSV update for the rasterize row rides along in the first commit.

All four backends now return through astype(dtype, copy=False), so the default float64 case skips a full-raster copy. The CPU paths allocate the per-pixel order buffer as int8 instead of int64 for merges whose predicate never reads it (max/min/sum/count and user callables); first/last keep the int64 buffer. Measured on a 4000x4000 numpy rasterize: peak goes from 25 B/px to 10 B/px for merge='sum' and 17 B/px for merge='last'.

brendancol

Review scope: the two allocation changes in xrspatial/rasterize.py and the new test_rasterize_alloc_3107.py. Checked against the four-backend dispatch, the merge-mode contract (ordered vs order-insensitive predicates), and the dask tile path.

Blockers

None. The copy=False casts return function-local buffers, so no aliasing is possible. The int8 order buffer is only ever selected for _should_write_any, whose cur_idx argument is ignored, and a failed identity check degrades to int64 (the old behavior), never to wrong output. GPU paths are untouched apart from copy=False; order stays int64 where it is an atomic target. Parity for all six merge modes plus a user callable is locked by tests with 300+ geometries, which forces the wrapped int8 stores through their full range.

Suggestions

order[r, c] = new_idx wraps in numba-compiled code (verified), but in pure-Python mode (NUMBA_DISABLE_JIT=1) NumPy 2 raises OverflowError once new_idx > 127. CI never runs that mode, but someone debugging rasterize that way with a realistic dataset will hit it. Add a sentence to the _alloc_order docstring so the failure is self-explaining.
test_order_insensitive_merge_values closes with assert np.isfinite(inside), which is much weaker than the rest of the test. The point pixel (48, 16) has an exact expected value per merge (sum 45750, count 301, min 1, max 600); assert those instead.

Nits

The peak-memory thresholds (16 and 21 B/px) are generous relative to the measured 10 and ~17.5, which is the right call given this repo's history with resource-assertion flakes (#2889). The docstring already records the before/after numbers. No change requested.

tracemalloc-based assertions are deterministic (allocation counts, not wall clock), so they don't reintroduce the timing-flake class that #2889 removed.

…-pixel values (#3107)

brendancol

Follow-up pass after a0903e7.

Suggestion 1: fixed. _alloc_order docstring now spells out the NUMBA_DISABLE_JIT=1 OverflowError caveat and the escape hatch (force the int64 branch).
Suggestion 2: fixed. test_order_insensitive_merge_values now asserts exact values at both the boxes-only pixel (60, 4) and the point-burn pixel (48, 16) for all four merges; the weak isfinite check is gone.
Nit 3: no change requested, none made.

All 21 tests in test_rasterize_alloc_3107.py pass after the changes; flake8 clean. Nothing else outstanding.

…e-rasterize-2026-06-09-01 # Conflicts: # .claude/sweep-performance-state.csv

brendancol added 2 commits June 9, 2026 16:47

Update performance sweep state for rasterize pass 4 (#3107)

05618d1

brendancol commented Jun 9, 2026

View reviewed changes

Address review: document NUMBA_DISABLE_JIT caveat, assert exact point…

a0903e7

…-pixel values (#3107)

brendancol commented Jun 9, 2026

View reviewed changes

github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026

Merge remote-tracking branch 'origin/main' into deep-sweep-performanc…

7307d36

…e-rasterize-2026-06-09-01 # Conflicts: # .claude/sweep-performance-state.csv

brendancol merged commit 2f2b758 into main Jun 10, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop two avoidable full-raster allocations in rasterize backends (#3107)#3131

Drop two avoidable full-raster allocations in rasterize backends (#3107)#3131
brendancol merged 4 commits into
mainfrom
deep-sweep-performance-rasterize-2026-06-09-01

brendancol commented Jun 9, 2026

Uh oh!

brendancol left a comment

Uh oh!

brendancol left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented Jun 9, 2026

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Uh oh!

brendancol left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant