Skip to content

polygonize dask connectivity=8 over-fills diagonal notches at chunk boundaries #2606

@brendancol

Description

@brendancol

Describe the bug

The dask backend of polygonize returns wrong results for connectivity=8 when two same-value regions touch only diagonally across a chunk boundary. The cross-chunk merge fills in the diagonal notch, so the merged polygon covers more pixels than it should. The total polygon area comes out larger than the raster, and the merged ring self-intersects (invalid per shapely).

numpy is fine. dask with connectivity=4 is fine. Only dask connectivity=8 is wrong, and only at chunk corners where the diagonal adjacency crosses a chunk boundary.

Reproduce

import numpy as np, xarray as xr
import dask.array as da
from shapely.geometry import Polygon
from xrspatial.polygonize import polygonize

data = np.array([[1,1,1,0],
                 [1,0,1,0],
                 [0,1,0,0],
                 [0,1,0,1]], dtype=np.int32)

rn = xr.DataArray(data.copy())
rd = xr.DataArray(da.from_array(data.copy(), chunks=(2,4)))

vn, pn = polygonize(rn, connectivity=8)
vd, pd = polygonize(rd, connectivity=8)

area = lambda v, p: sum(Polygon(r[0], r[1:]).area for r in p)
print(area(vn, pn))  # 16.0  (correct, equals raster area)
print(area(vd, pd))  # 17.0  (wrong, larger than raster)

Expected behavior

dask connectivity=8 should match numpy: same per-value area sums, valid (non-self-intersecting) polygons, total area equal to the raster area.

Root cause

_group_boundary_polygons groups two diagonally adjacent same-value boundary polygons into one group under 8-connectivity. _merge_polygon_rings then edge-cancels and re-traces them into a single ring that fills the diagonal corner. numpy 8-connectivity keeps the notch, because the different-value cells on the opposite diagonal are also 8-connected and split the region the other way.

Additional context

Found by the deep-sweep accuracy audit (Cat 5, backend inconsistency). 4 of 300 random integer rasters with random chunkings showed an area mismatch for connectivity=8; 0 mismatches for connectivity=4.

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-coverageAdding missing dask/cupy/dask+cupy backend supportbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions