Describe the bug
For float rasters, polygonize on a dask-backed array can produce a different number of polygons than the same raster as a single numpy chunk, when rtol is non-trivial. The result depends on how the input is chunked, so it is not chunk-invariant.
The numpy connected-component predicate in _calculate_regions is direction-aware. For two adjacent pixels it tests abs(value - reference) <= atol + rtol*abs(reference), where reference is the current (higher-ij) pixel in scan order and value is its West/South neighbour. The rtol term scales by the magnitude of the higher-ij pixel.
The dask cross-chunk merge compares unordered (min, max) value ranges via _ranges_close / _values_close, which does not preserve that scan-order orientation. Whichever boundary polygon happens to be iterated first becomes the reference, so the merge decision flips with chunk order instead of matching numpy.
Reproduce
import numpy as np, xarray as xr, dask.array as da
from xrspatial import polygonize
arr = np.array([[10.0, 11.05]])
polygonize(xr.DataArray(arr), atol=0.0, rtol=0.1) # 1 polygon
polygonize(xr.DataArray(da.from_array(arr, chunks=(1, 1))), atol=0.0, rtol=0.1) # 2 polygons
Reverse the values ([[11.05, 10.0]]) and the failure flips: numpy returns 2 polygons, dask returns 1.
Expected behavior
A chunked float raster produces the same polygon partition as the unchunked input for the same atol / rtol. The fix must be direction-aware: it has to replicate numpy's scan-order orientation (higher-ij pixel is the reference) rather than using a symmetric range comparison, so it does not over-merge the cases numpy deliberately splits.
Additional context
Found while auditing the #2583 dask-parity work. The _ranges_close interval-overlap fallback was added to handle transitive value chains across chunks; the orientation bug is separate and affects the per-edge close-value decision.
Describe the bug
For float rasters,
polygonizeon a dask-backed array can produce a different number of polygons than the same raster as a single numpy chunk, whenrtolis non-trivial. The result depends on how the input is chunked, so it is not chunk-invariant.The numpy connected-component predicate in
_calculate_regionsis direction-aware. For two adjacent pixels it testsabs(value - reference) <= atol + rtol*abs(reference), wherereferenceis the current (higher-ij) pixel in scan order andvalueis its West/South neighbour. Thertolterm scales by the magnitude of the higher-ijpixel.The dask cross-chunk merge compares unordered
(min, max)value ranges via_ranges_close/_values_close, which does not preserve that scan-order orientation. Whichever boundary polygon happens to be iterated first becomes thereference, so the merge decision flips with chunk order instead of matching numpy.Reproduce
Reverse the values (
[[11.05, 10.0]]) and the failure flips: numpy returns 2 polygons, dask returns 1.Expected behavior
A chunked float raster produces the same polygon partition as the unchunked input for the same
atol/rtol. The fix must be direction-aware: it has to replicate numpy's scan-order orientation (higher-ijpixel is thereference) rather than using a symmetric range comparison, so it does not over-merge the cases numpy deliberately splits.Additional context
Found while auditing the #2583 dask-parity work. The
_ranges_closeinterval-overlap fallback was added to handle transitive value chains across chunks; the orientation bug is separate and affects the per-edge close-value decision.