Skip to content

polygonize: propagate raster CRS to GeoDataFrame output (#2149)#2154

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-polygonize-2026-05-19
May 20, 2026
Merged

polygonize: propagate raster CRS to GeoDataFrame output (#2149)#2154
brendancol merged 2 commits into
mainfrom
deep-sweep-metadata-polygonize-2026-05-19

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

  • polygonize(raster, return_type="geopandas") now propagates the raster's CRS so GeoDataFrame.crs is set when the input DataArray carries CRS via attrs["crs"], attrs["crs_wkt"], or rioxarray's rio.crs. Before this change the output GeoDataFrame always had crs=None, silently breaking spatial joins, overlays, reprojections, and file writes.
  • New _detect_raster_crs helper mirrors the resolution order in reproject._crs_utils._detect_source_crs (no pyproj hard dep). CRS detection runs at the public API level so all four backends (numpy / cupy / dask+numpy / dask+cupy) emit identical CRS metadata. Unparseable CRS values are swallowed so the call never crashes.

Test plan

  • New TestPolygonizeCRSPropagation class (8 tests) covers EPSG string, EPSG int, crs_wkt, no-CRS, unparseable CRS, attrs-vs-rioxarray preference, rioxarray-only path, and simplify interaction.
  • Full test_polygonize.py suite passes: 130 passed, 13 skipped (no regressions).
  • CRS propagation verified end-to-end across numpy, cupy, dask+numpy, dask+cupy backends with a hand-run probe.

Closes #2149.

Dispatched by /deep-sweep (sweep-metadata, agent worktree agent-ad1070530d37a4fdf).

polygonize(raster, return_type="geopandas") returned a GeoDataFrame
with crs=None even when the input DataArray carried CRS info via
attrs["crs"], attrs["crs_wkt"], or rioxarray's rio.crs. Downstream
spatial joins, overlays, and file writes silently lost georeferencing.

A new _detect_raster_crs helper mirrors the resolution order in
reproject._crs_utils._detect_source_crs (attrs first, then crs_wkt,
then rio.crs) and returns the raw attribute so GeoDataFrame.set_crs
handles parsing. The CRS is detected at the public API level, before
backend dispatch, so all four backends (numpy / cupy / dask+numpy /
dask+cupy) emit the same CRS. An unparseable CRS attribute is caught
so the call never crashes -- the GeoDataFrame is returned without
CRS in that case.

spatialpandas does not expose a CRS slot and GeoJSON RFC 7946 is
WGS84-only, so propagation lives only on the geopandas path.

8 new tests in TestPolygonizeCRSPropagation cover EPSG string and int
attrs, crs_wkt, no-CRS, unparseable CRS, attrs-vs-rioxarray
preference, rioxarray-only detection, and interaction with
simplify_tolerance. Also updates .claude/sweep-metadata-state.csv
with the 2026-05-19 polygonize audit notes.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
Adds a Notes paragraph to the public polygonize() docstring describing
the new GeoDataFrame CRS propagation (resolution order, what happens
when the value is unparseable, why spatialpandas/geojson return types
do not carry CRS). Also corrects the comment in
test_crs_prefers_attrs_over_rio: rio.write_crs stores the CRS on a
spatial_ref coord, not in attrs['crs'].

Review feedback on #2154.
@brendancol brendancol merged commit ae3794c into main May 20, 2026
4 of 5 checks passed
@brendancol brendancol deleted the deep-sweep-metadata-polygonize-2026-05-19 branch May 27, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

polygonize: GeoDataFrame output drops input CRS

1 participant