Skip to content

rasterize: hoist poly_props/poly_global cupy.asarray above all_touched (#2506)#2510

Merged
brendancol merged 3 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-rasterize-2026-05-27
May 28, 2026
Merged

rasterize: hoist poly_props/poly_global cupy.asarray above all_touched (#2506)#2510
brendancol merged 3 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-performance-rasterize-2026-05-27

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

  • _run_cupy and _rasterize_tile_cupy previously transferred poly_props and poly_global to the GPU twice when all_touched=True (once for the scanline launch, once for the supercover boundary launch). Hoist the cupy.asarray() calls above the scanline / boundary conditional so both launches share the same device buffer.
  • For 10k polygons / 8 columns, the per-tile props transfer drops from 0.218 ms to 0.092 ms (2.4x), saving 720 KB / tile of redundant PCIe traffic. On a 100-tile dask+cupy raster that is ~13 ms and 72 MB saved per call.
  • Closes rasterize: duplicate cupy.asarray(poly_props/poly_global) when all_touched=True #2506.

Test plan

  • 4 new AST-level assertions in test_rasterize_props_hoist_2506.py confirm each function calls cupy.asarray(poly_props/poly_global) exactly once.
  • 5 new cupy vs numpy parity tests covering last/first/max/min/sum merges under all_touched=True.
  • 3 new dask+cupy smoke tests that exercise the hoisted upload through every per-tile launch.
  • All 470 existing rasterize tests pass alongside the 12 new ones.

Notes

The dask+cupy + all_touched pixel-level parity gap (boundary segments crossing tile borders behave differently than the eager numpy path) predates this fix and is not addressed here. The smoke tests assert the hoisted launches still produce a populated raster rather than asserting full numpy parity.

Discovered via /deep-sweep performance pass on rasterize (2026-05-27).

xarray-contrib#2506)

Both `_run_cupy` and `_rasterize_tile_cupy` previously called
`cupy.asarray(poly_props)` and `cupy.asarray(poly_global)` twice
when `all_touched=True` -- once for the scanline `poly_launch`
tuple and once for the supercover `boundary_launch` tuple. The two
launches operate on the same tile, so the second upload re-transferred
identical bytes for every dask tile.

Stage the device buffers above the conditional so both launches share
them. For 10k polygons / 8 cols the per-tile transfer cost drops from
0.218 ms to 0.092 ms (2.4x) and 720 KB of redundant PCIe traffic per
tile is eliminated; for a 100-tile dask+cupy raster that is ~13 ms and
72 MB saved end to end.

Adds 12 regression tests in test_rasterize_props_hoist_2506.py:
- 4 AST-level assertions that each function calls
  `cupy.asarray(poly_props/poly_global)` exactly once.
- 5 cupy vs numpy all_touched parity tests covering
  last/first/max/min/sum merges.
- 3 dask+cupy smoke tests that exercise the hoisted upload through
  every per-tile launch.

The dask+cupy + all_touched pixel-level parity gap (boundary segments
crossing tile borders behave differently than the eager numpy path)
predates this fix and is not addressed here.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 27, 2026
Tighten the props/global hoist guard added in xarray-contrib#2506 so the host-to-device
transfer is skipped when neither the scanline nor the supercover boundary
launch will consume it (no in-tile edges and not all_touched). Without
this guard the hoist could upload poly_props/poly_global for a tile whose
polygons fall entirely outside the raster bounds even when all_touched
is False, which the pre-hoist code never did.

In _rasterize_tile_cupy the upload also has to move below _extract_edges
so the guard can read len(edge_y_min).
… helper

The _to_numpy helper in the xarray-contrib#2506 regression test carried a comment
referencing a sweep skill's authoring rule about .compute(). That note
belongs in the agent's prompt, not committed test code.
@brendancol

Copy link
Copy Markdown
Contributor Author

Self-review pass via /review-pr. 0 blockers, 1 suggestion, 1 nit, both addressed on this branch.

S1: gate the hoist on actual usage

As originally landed, the hoist uploads poly_props / poly_global to the GPU whenever poly_geoms (or poly_wkb) is non-empty. If all polygons fall outside the raster bounds (so edge_y_min is empty) and all_touched is False, neither launch fires, so the upload becomes work the pre-hoist code did not do. Tightened the guard to len(edge_y_min) > 0 or all_touched in both _run_cupy and _rasterize_tile_cupy. The all_touched fast path still gets the single shared upload that #2506 was about. Commit 958a557.

N1: strip agent prompt residue from the test

The _to_numpy helper carried a comment referencing "Skill instructions" about .compute(). That is prompt context that leaked into committed code. Removed. Commit 2b43981.

Tests: 464 passed, 2 skipped (full xrspatial/tests/test_rasterize*.py suite) after both fixes.

@brendancol brendancol merged commit 343a49c into xarray-contrib:main May 28, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rasterize: duplicate cupy.asarray(poly_props/poly_global) when all_touched=True

1 participant