Skip to content

polygonize: derive transform from x/y coords so geopandas CRS matches geometry (#2607)#2630

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-polygonize-2026-05-29-01
May 29, 2026
Merged

polygonize: derive transform from x/y coords so geopandas CRS matches geometry (#2607)#2630
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-polygonize-2026-05-29-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

polygonize() auto-detected an affine transform from attrs['transform'] and rio.transform() (added in #2536) but never from the raster's own x/y coordinate values, which is the usual way an xarray raster carries its georeferencing. A raster georeferenced through coords alone, plus an attrs['crs'], produced a return_type='geopandas' result whose .crs claimed projected space while the geometries stayed in pixel space. It's the same CRS/coordinate mismatch as #2536, just reached through coords instead of the transform attr.

  • _detect_raster_transform() now falls back to deriving a rasterio-ordered transform from evenly-spaced 1-D x/y coords. The origin sits at the first pixel corner (first_center - 0.5 * spacing) so the pixel-corner indices from _follow map to cell corners.
  • The rio.transform() identity-affine case now falls through to coords instead of returning None, so coords get used when rioxarray does not recognize the dims as spatial (for example lat/lon).
  • Precedence is unchanged: explicit transform=, then attrs['transform'], then rio.transform(), then coords. The _xrspatial_no_georef marker still suppresses auto-detection.

Backend coverage

Transform detection runs in the public polygonize() before backend dispatch, so all four backends (numpy, cupy, dask+numpy, dask+cupy) behave the same way. Tests cover all four.

Test plan

  • Coords-derived transform applied for numpy and geopandas output
  • geopandas .crs agrees with geometry coordinates
  • attrs['transform'] and explicit transform= still take precedence
  • Irregular / length-1 / no coords fall back safely
  • Non-y/x dim names (lat/lon) georeferenced from coords
  • _xrspatial_no_georef marker suppresses
  • numpy, cupy, dask+numpy, dask+cupy
  • Full existing polygonize suite passes (172 passed, 13 skipped)

Closes #2607

)

polygonize() auto-detected attrs['transform'] and rio.transform() but
never the raster's own x/y coordinate values, the standard xarray /
xrspatial georeferencing channel. A coords-georeferenced raster with a
crs attr produced a geopandas result whose .crs claimed projected space
while the geometries stayed in pixel space -- the same misalignment
#2536 fixed for the transform attr.

_detect_raster_transform() now falls back to deriving a rasterio-ordered
6-tuple from evenly-spaced 1-D x/y coords (origin at the first pixel
corner = first_center - 0.5*spacing). The rio identity-affine case now
falls through to coords instead of returning None, so coords are used
when rioxarray finds no spatial dims. Explicit transform=, attrs[
'transform'], rio.transform(), and the _xrspatial_no_georef marker keep
their precedence. Detection runs before backend dispatch, so all four
backends (numpy, cupy, dask+numpy, dask+cupy) behave identically.

Adds TestPolygonizeCoordsTransform covering numpy/geopandas output,
precedence, irregular/length-1/no-coords fallback, non-y/x dim names,
the no-georef marker, and all four backends.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: polygonize derive transform from x/y coords

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • The identity-affine fall-through changes behavior for rasters with integer-index coords that used to get pixel-space output. With rioxarray installed, a raster carrying coords={'y': np.arange(3), 'x': np.arange(3)} and no transform now resolves to a transform with origin -0.5 (the pixel-center to corner convention), so output spans -0.5..2.5 instead of 0..3. That is geometrically correct, but the existing test_no_transform_info_yields_pixel_coords (xrspatial/tests/test_polygonize.py:1458) now has a misleading name and docstring: arange coords are transform info after this change, and the test only passes because its assertion uses <= 3.0 with slack. Worth a one-line note in that test's docstring, or tighten its bounds to document the new -0.5 offset.

Nits (optional improvements)

  • The polygonize() docstring line at xrspatial/polygonize.py:2301 wraps mid-sentence ("This applies to all return") at 88 chars. The wrapping is pre-existing and just shifted by the edit, so it is not a new defect, but it reads awkwardly.

What looks good

  • The corner-origin math is right: pixel centers map to cell corners via first_center - 0.5*spacing, matching the existing attrs['transform'] convention.
  • The _coord_spacing guard rejects irregular spacing, length-1 dims, and zero-step coords, returning None instead of emitting a wrong affine from the first two points.
  • Precedence is preserved and tested: explicit transform=, then attrs['transform'], then rio.transform(), then coords.
  • Uses dims[-1]/dims[-2] for x/y instead of hardcoding 'x'/'y', matching get_dataarray_resolution; a lat/lon test covers it.
  • All four backends are tested (numpy, cupy, dask+numpy, dask+cupy). Detection runs before dispatch, so parity is structural.
  • Descending y coords (negative spacing) are handled correctly.

Checklist

  • Algorithm matches reference/convention (corner-origin transform)
  • All implemented backends produce consistent results
  • NaN handling unchanged (not touched by this change)
  • Edge cases covered by tests (irregular, length-1, no coords, no-georef marker)
  • Dask chunk boundaries handled correctly (detection is pre-dispatch)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (detection is O(n) over a 1-D coord array)
  • README feature matrix not applicable (no new function, no backend change)
  • Docstrings present and accurate

- test_no_transform_info_yields_pixel_coords now builds a coordless
  raster instead of one with arange coords, so it exercises the
  genuinely-no-transform path; arange coords are a transform source
  since this change.
- Reflow the polygonize() docstring line that wrapped mid-sentence.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after fixes)

Both findings from the previous review are addressed.

  • Suggestion (test name/docstring): test_no_transform_info_yields_pixel_coords now builds a coordless raster instead of one with np.arange(3) coords, so it actually exercises the no-transform path. The docstring explains why, since arange coords are a transform source after this change.
  • Nit (docstring wrap): the polygonize() docstring paragraph is reflowed; no more mid-sentence break.

Full polygonize suite passes locally: 172 passed, 13 skipped (the 13 skips are optional-dependency / GPU-gated cases not present in CI's matrix slot). No remaining open items.

@brendancol brendancol merged commit b01ac1e into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

polygonize: derive transform from x/y coords so geopandas CRS does not lie about pixel-space geometries

1 participant