Skip to content

EUCLIDEAN proximity line-sweep on numpy misses the true nearest target for some pixels #3121

@brendancol

Description

@brendancol

Describe the bug

The numpy backend of proximity() uses the GDAL-ported line-sweep for EUCLIDEAN (the bounded dask+numpy path runs the same kernel per chunk). On some target layouts the sweep's neighbour propagation never delivers the true nearest target to a pixel, so it reports a distance that is too large. The cupy backend, which does an exact brute-force search, matches a float64 ground-truth check; the numpy result does not.

Repro

import numpy as np, xarray as xr
from xrspatial import proximity

data = np.zeros((8, 10))
for i, j in [(0, 4), (0, 8), (1, 2), (2, 1), (6, 6)]:
    data[i, j] = 1.0

r = xr.DataArray(data, dims=['y', 'x'])
r['y'] = np.linspace(0, 10, 101)[12:20]   # pitch 0.1
r['x'] = np.linspace(0, 10, 103)[28:38]   # pitch ~0.098

out = proximity(r)
print(out.data[3, 4])   # 0.3
# true nearest: target at (1, 2), distance 0.28008347

Pixel (3, 4) is 0.28008 away from the target at (1, 2), but the line-sweep reports 0.3, the distance to the target at (0, 4). About 0.2 pixel off, 7% relative error.

On a 101x103 random raster (seed 0, ~600 targets), 4 pixels differ from ground truth by more than 1e-4 (max error 0.0199). It happens with square pixels as well (2 pixels, max error 0.0252), and the bounded dask+numpy path reproduces the same 4 wrong pixels. MANHATTAN came back clean on the same input, but it runs the same sweep, so I would not assume it is immune.

ALLOCATION and DIRECTION are unaffected; they already route through the brute-force kernel (#2881). GREAT_CIRCLE was moved off the line-sweep in #2816. So the only modes still using the sweep are PROXIMITY with EUCLIDEAN or MANHATTAN.

This looks like an inherent limitation of the 4-pass nearest-point propagation (Danielsson-style sweeps have documented worst-case errors of a fraction of a pixel) carried over from GDAL, not a porting mistake.

Expected behavior

proximity() returns the exact nearest-target distance, and numpy, cupy, dask+numpy and dask+cupy agree. cupy is already exact.

Additional context

Found by the security sweep's CPU/GPU parity probe against the proximity module. One possible fix: route numpy PROXIMITY through cKDTree when scipy is available, mirroring the unbounded dask path (distance_upper_bound covers max_distance), with brute force as the fallback when scipy is missing. The dask chunk function should follow whatever the numpy backend does so the backends stay in sync.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions