Skip to content

focal memory guard rejects large dask rasters the dask path never materializes #3218

@brendancol

Description

@brendancol

Describe the bug

_check_kernel_vs_raster_memory() in xrspatial/focal.py guards apply(), focal_stats(), and hotspots() against kernel/raster combinations that would OOM the host. It budgets kernel_bytes + padded_raster_bytes and raises MemoryError when that exceeds half of available host memory. The padded-raster term comes from the full raster shape, and the check runs on every backend.

That's wrong for dask input. The dask backends go through map_overlap with depth=kernel.shape//2, so peak memory scales with chunk size, not raster size. The padded full raster never exists. But the guard charges for it anyway, so a dask raster bigger than about half of host RAM can't run these three functions even with a 3x3 kernel.

Repro (graph construction only, no compute):

import dask.array as da
import numpy as np
import xarray as xr
from xrspatial.focal import apply

big = xr.DataArray(
    da.zeros((200_000, 200_000), chunks=(1024, 1024), dtype='float32'),
    dims=['y', 'x'])
kernel = np.array([[0., 1., 0.], [1., 1., 1.], [0., 1., 0.]])
apply(big, kernel)
# MemoryError: apply(): kernel of shape (3, 3) on a 200000x200000 raster
# would need ~160.0 GB (kernel + padded raster), but only 45.7 GB is
# available. Use a smaller kernel or a coarser cellsize.

mean() on the same input builds its graph fine (no guard). focal_stats() and hotspots() fail the same way as apply().

Expected behavior

For dask-backed input the guard should budget what a map_overlap task actually allocates: the largest chunk plus 2*pad per side. The numpy and cupy eager paths really do materialize full-size padded arrays, so they should keep the current full-raster check. Same class of fix as the merge() output-size guard in #3048.

Additional context

The performance sweep over the focal module turned this up. The dask compute paths themselves are chunk-scaled (checked by graph inspection); only the guard blocks large workloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingoomOut-of-memory risk with large datasetsperformancePR touches performance-sensitive code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions