Describe the bug
The numpy backend of proximity() uses the GDAL-ported line-sweep for EUCLIDEAN (the bounded dask+numpy path runs the same kernel per chunk). On some target layouts the sweep's neighbour propagation never delivers the true nearest target to a pixel, so it reports a distance that is too large. The cupy backend, which does an exact brute-force search, matches a float64 ground-truth check; the numpy result does not.
Repro
import numpy as np, xarray as xr
from xrspatial import proximity
data = np.zeros((8, 10))
for i, j in [(0, 4), (0, 8), (1, 2), (2, 1), (6, 6)]:
data[i, j] = 1.0
r = xr.DataArray(data, dims=['y', 'x'])
r['y'] = np.linspace(0, 10, 101)[12:20] # pitch 0.1
r['x'] = np.linspace(0, 10, 103)[28:38] # pitch ~0.098
out = proximity(r)
print(out.data[3, 4]) # 0.3
# true nearest: target at (1, 2), distance 0.28008347
Pixel (3, 4) is 0.28008 away from the target at (1, 2), but the line-sweep reports 0.3, the distance to the target at (0, 4). About 0.2 pixel off, 7% relative error.
On a 101x103 random raster (seed 0, ~600 targets), 4 pixels differ from ground truth by more than 1e-4 (max error 0.0199). It happens with square pixels as well (2 pixels, max error 0.0252), and the bounded dask+numpy path reproduces the same 4 wrong pixels. MANHATTAN came back clean on the same input, but it runs the same sweep, so I would not assume it is immune.
ALLOCATION and DIRECTION are unaffected; they already route through the brute-force kernel (#2881). GREAT_CIRCLE was moved off the line-sweep in #2816. So the only modes still using the sweep are PROXIMITY with EUCLIDEAN or MANHATTAN.
This looks like an inherent limitation of the 4-pass nearest-point propagation (Danielsson-style sweeps have documented worst-case errors of a fraction of a pixel) carried over from GDAL, not a porting mistake.
Expected behavior
proximity() returns the exact nearest-target distance, and numpy, cupy, dask+numpy and dask+cupy agree. cupy is already exact.
Additional context
Found by the security sweep's CPU/GPU parity probe against the proximity module. One possible fix: route numpy PROXIMITY through cKDTree when scipy is available, mirroring the unbounded dask path (distance_upper_bound covers max_distance), with brute force as the fallback when scipy is missing. The dask chunk function should follow whatever the numpy backend does so the backends stay in sync.
Describe the bug
The numpy backend of
proximity()uses the GDAL-ported line-sweep for EUCLIDEAN (the bounded dask+numpy path runs the same kernel per chunk). On some target layouts the sweep's neighbour propagation never delivers the true nearest target to a pixel, so it reports a distance that is too large. The cupy backend, which does an exact brute-force search, matches a float64 ground-truth check; the numpy result does not.Repro
Pixel (3, 4) is 0.28008 away from the target at (1, 2), but the line-sweep reports 0.3, the distance to the target at (0, 4). About 0.2 pixel off, 7% relative error.
On a 101x103 random raster (seed 0, ~600 targets), 4 pixels differ from ground truth by more than 1e-4 (max error 0.0199). It happens with square pixels as well (2 pixels, max error 0.0252), and the bounded dask+numpy path reproduces the same 4 wrong pixels. MANHATTAN came back clean on the same input, but it runs the same sweep, so I would not assume it is immune.
ALLOCATION and DIRECTION are unaffected; they already route through the brute-force kernel (#2881). GREAT_CIRCLE was moved off the line-sweep in #2816. So the only modes still using the sweep are PROXIMITY with EUCLIDEAN or MANHATTAN.
This looks like an inherent limitation of the 4-pass nearest-point propagation (Danielsson-style sweeps have documented worst-case errors of a fraction of a pixel) carried over from GDAL, not a porting mistake.
Expected behavior
proximity()returns the exact nearest-target distance, and numpy, cupy, dask+numpy and dask+cupy agree. cupy is already exact.Additional context
Found by the security sweep's CPU/GPU parity probe against the proximity module. One possible fix: route numpy PROXIMITY through cKDTree when scipy is available, mirroring the unbounded dask path (
distance_upper_boundcovers max_distance), with brute force as the fallback when scipy is missing. The dask chunk function should follow whatever the numpy backend does so the backends stay in sync.