Skip to content

zonal.stats cupy backend drops all-NaN zones and desyncs zone/index when zone_ids requests missing zones #2562

@brendancol

Description

@brendancol

Describe the bug

_stats_cupy() in xrspatial/zonal.py behaves differently from the numpy and dask paths in two ways.

First, the cupy path filters values_by_zone for finite / non-nodata entries before it computes unique_zones. If every value in a zone is NaN (or equals the nodata sentinel), the zone vanishes from the result. The numpy path keeps that zone and returns NaN stats for it.

Second, when the caller passes zone_ids containing IDs that aren't in the raster, the cupy path overwrites unique_zones with the full requested list but only accumulates counts/indices for zones it actually found. The downstream loop iterates over unique_zones (the requested list) while indexing into unique_index / unique_counts (the found-only list), so per-zone stats can shift onto the wrong zone ID. The numpy and dask paths filter zone_ids down to zones that do exist in the raster, so the same call returns a consistent (zone, stats) mapping.

Expected behavior

The cupy result should match the numpy result row-for-row for the same inputs:

  • a zone whose values are all NaN or all nodata should appear in the output with NaN stats, not disappear
  • zone_ids=[...] containing IDs not present in the zones raster should not desync the zone/stat alignment

Steps to reproduce

Build a zones array where zone 5 contains only NaN values in the corresponding values array. Call zonal_stats on the numpy and cupy backends with the same input. The numpy result lists zone 5 with NaN stats; the cupy result omits zone 5.

Separately, call zonal_stats(zones, values, zone_ids=[1, 999]) where 999 doesn't appear in zones. The cupy path returns a row labeled 999 but populated with stats from whichever zone landed at that index. The numpy path returns just the zone 1 row.

Fix direction

  1. Compute unique_zones from the raw zones array (filtered only by valid zone IDs, not by value validity), then accumulate stats per zone. All-NaN zones should appear with NaN stats.
  2. When zone_ids includes zones not in the raster, either filter them out (matching numpy/dask) or return NaN at the correct positions. Either way, the zone column and stats columns must stay aligned.
  3. Add cross-backend tests covering both scenarios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggpuCuPy / CUDA GPU support

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions