Describe the bug
The dask backend of polygonize returns wrong results for connectivity=8 when two same-value regions touch only diagonally across a chunk boundary. The cross-chunk merge fills in the diagonal notch, so the merged polygon covers more pixels than it should. The total polygon area comes out larger than the raster, and the merged ring self-intersects (invalid per shapely).
numpy is fine. dask with connectivity=4 is fine. Only dask connectivity=8 is wrong, and only at chunk corners where the diagonal adjacency crosses a chunk boundary.
Reproduce
import numpy as np, xarray as xr
import dask.array as da
from shapely.geometry import Polygon
from xrspatial.polygonize import polygonize
data = np.array([[1,1,1,0],
[1,0,1,0],
[0,1,0,0],
[0,1,0,1]], dtype=np.int32)
rn = xr.DataArray(data.copy())
rd = xr.DataArray(da.from_array(data.copy(), chunks=(2,4)))
vn, pn = polygonize(rn, connectivity=8)
vd, pd = polygonize(rd, connectivity=8)
area = lambda v, p: sum(Polygon(r[0], r[1:]).area for r in p)
print(area(vn, pn)) # 16.0 (correct, equals raster area)
print(area(vd, pd)) # 17.0 (wrong, larger than raster)
Expected behavior
dask connectivity=8 should match numpy: same per-value area sums, valid (non-self-intersecting) polygons, total area equal to the raster area.
Root cause
_group_boundary_polygons groups two diagonally adjacent same-value boundary polygons into one group under 8-connectivity. _merge_polygon_rings then edge-cancels and re-traces them into a single ring that fills the diagonal corner. numpy 8-connectivity keeps the notch, because the different-value cells on the opposite diagonal are also 8-connected and split the region the other way.
Additional context
Found by the deep-sweep accuracy audit (Cat 5, backend inconsistency). 4 of 300 random integer rasters with random chunkings showed an area mismatch for connectivity=8; 0 mismatches for connectivity=4.
Describe the bug
The dask backend of
polygonizereturns wrong results forconnectivity=8when two same-value regions touch only diagonally across a chunk boundary. The cross-chunk merge fills in the diagonal notch, so the merged polygon covers more pixels than it should. The total polygon area comes out larger than the raster, and the merged ring self-intersects (invalid per shapely).numpy is fine. dask with
connectivity=4is fine. Only daskconnectivity=8is wrong, and only at chunk corners where the diagonal adjacency crosses a chunk boundary.Reproduce
Expected behavior
dask
connectivity=8should match numpy: same per-value area sums, valid (non-self-intersecting) polygons, total area equal to the raster area.Root cause
_group_boundary_polygonsgroups two diagonally adjacent same-value boundary polygons into one group under 8-connectivity._merge_polygon_ringsthen edge-cancels and re-traces them into a single ring that fills the diagonal corner. numpy 8-connectivity keeps the notch, because the different-value cells on the opposite diagonal are also 8-connected and split the region the other way.Additional context
Found by the deep-sweep accuracy audit (Cat 5, backend inconsistency). 4 of 300 random integer rasters with random chunkings showed an area mismatch for
connectivity=8; 0 mismatches forconnectivity=4.