Skip to content

Dask float polygonize is not chunk-invariant for rtol #2666

@brendancol

Description

@brendancol

Describe the bug

For float rasters, polygonize on a dask-backed array can produce a different number of polygons than the same raster as a single numpy chunk, when rtol is non-trivial. The result depends on how the input is chunked, so it is not chunk-invariant.

The numpy connected-component predicate in _calculate_regions is direction-aware. For two adjacent pixels it tests abs(value - reference) <= atol + rtol*abs(reference), where reference is the current (higher-ij) pixel in scan order and value is its West/South neighbour. The rtol term scales by the magnitude of the higher-ij pixel.

The dask cross-chunk merge compares unordered (min, max) value ranges via _ranges_close / _values_close, which does not preserve that scan-order orientation. Whichever boundary polygon happens to be iterated first becomes the reference, so the merge decision flips with chunk order instead of matching numpy.

Reproduce

import numpy as np, xarray as xr, dask.array as da
from xrspatial import polygonize

arr = np.array([[10.0, 11.05]])
polygonize(xr.DataArray(arr), atol=0.0, rtol=0.1)                               # 1 polygon
polygonize(xr.DataArray(da.from_array(arr, chunks=(1, 1))), atol=0.0, rtol=0.1) # 2 polygons

Reverse the values ([[11.05, 10.0]]) and the failure flips: numpy returns 2 polygons, dask returns 1.

Expected behavior

A chunked float raster produces the same polygon partition as the unchunked input for the same atol / rtol. The fix must be direction-aware: it has to replicate numpy's scan-order orientation (higher-ij pixel is the reference) rather than using a symmetric range comparison, so it does not over-merge the cases numpy deliberately splits.

Additional context

Found while auditing the #2583 dask-parity work. The _ranges_close interval-overlap fallback was added to handle transitive value chains across chunks; the orientation bug is separate and affects the per-edge close-value decision.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdaskDask backend / chunked arrays

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions