Skip to content

rasterize: _build_row_csr_numba int32 overflow on large/dense edge inputs #1388

@brendancol

Description

@brendancol

Summary

_build_row_csr_numba in xrspatial/rasterize.py allocates row_ptr and diff as np.int32, then computes total = row_ptr[height] and uses it to size col_idx. For very tall rasters with many long edges the cumulative sum overflows int32 (max ~2.15e9), wraps to a negative or wrong-positive value, and the subsequent np.empty(total, dtype=np.int32) either raises a confusing ValueError (negative size) or allocates an undersized buffer that the Pass 2 fill writes past, corrupting memory inside the numba kernel.

Trigger

Worst case is a CSR fan-out where every edge spans most of the raster. With height = 50_000 rows and ~50_000 edges each spanning the full height, total = sum of edges_per_row across rows ~= 2.5e9, which overflows int32. Real polygon rasterizations rarely hit this on realistic inputs, but the upper bound on len(edge_y_min) is now bounded only by the raster guard added in #1223, so a deliberately tall raster with many long polygon edges can reach this regime.

Site

xrspatial/rasterize.py, _build_row_csr_numba:

diff = np.zeros(height + 1, dtype=np.int32)
...
row_ptr = np.empty(height + 1, dtype=np.int32)
row_ptr[0] = 0
running = np.int32(0)
for r in range(height):
    running += diff[r]
    row_ptr[r + 1] = row_ptr[r] + running

total = row_ptr[height]
col_idx = np.empty(total, dtype=np.int32)

running is forced to int32 by the explicit np.int32(0) cast, and row_ptr storage is int32, so the per-row addition wraps without warning under numba.

Fix

Cast row_ptr, diff, running, and offsets to int64. The CSR offset values index col_idx, so the size domain is what needs to grow; the values stored in col_idx are edge indices and stay int32. Downstream consumers (_scanline_fill_gpu and np.diff(row_ptr).max()) accept int64 without modification.

Audit reference

Flagged as MEDIUM Cat 2 in the rasterize security audit row, deferred from PR #1223 / #1224 (the original HIGH allocation-cap fix).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions