Describe the bug
_stats_cupy() in xrspatial/zonal.py behaves differently from the numpy and dask paths in two ways.
First, the cupy path filters values_by_zone for finite / non-nodata entries before it computes unique_zones. If every value in a zone is NaN (or equals the nodata sentinel), the zone vanishes from the result. The numpy path keeps that zone and returns NaN stats for it.
Second, when the caller passes zone_ids containing IDs that aren't in the raster, the cupy path overwrites unique_zones with the full requested list but only accumulates counts/indices for zones it actually found. The downstream loop iterates over unique_zones (the requested list) while indexing into unique_index / unique_counts (the found-only list), so per-zone stats can shift onto the wrong zone ID. The numpy and dask paths filter zone_ids down to zones that do exist in the raster, so the same call returns a consistent (zone, stats) mapping.
Expected behavior
The cupy result should match the numpy result row-for-row for the same inputs:
- a zone whose values are all NaN or all nodata should appear in the output with NaN stats, not disappear
zone_ids=[...] containing IDs not present in the zones raster should not desync the zone/stat alignment
Steps to reproduce
Build a zones array where zone 5 contains only NaN values in the corresponding values array. Call zonal_stats on the numpy and cupy backends with the same input. The numpy result lists zone 5 with NaN stats; the cupy result omits zone 5.
Separately, call zonal_stats(zones, values, zone_ids=[1, 999]) where 999 doesn't appear in zones. The cupy path returns a row labeled 999 but populated with stats from whichever zone landed at that index. The numpy path returns just the zone 1 row.
Fix direction
- Compute
unique_zones from the raw zones array (filtered only by valid zone IDs, not by value validity), then accumulate stats per zone. All-NaN zones should appear with NaN stats.
- When
zone_ids includes zones not in the raster, either filter them out (matching numpy/dask) or return NaN at the correct positions. Either way, the zone column and stats columns must stay aligned.
- Add cross-backend tests covering both scenarios.
Describe the bug
_stats_cupy()inxrspatial/zonal.pybehaves differently from the numpy and dask paths in two ways.First, the cupy path filters
values_by_zonefor finite / non-nodata entries before it computesunique_zones. If every value in a zone is NaN (or equals the nodata sentinel), the zone vanishes from the result. The numpy path keeps that zone and returns NaN stats for it.Second, when the caller passes
zone_idscontaining IDs that aren't in the raster, the cupy path overwritesunique_zoneswith the full requested list but only accumulates counts/indices for zones it actually found. The downstream loop iterates overunique_zones(the requested list) while indexing intounique_index/unique_counts(the found-only list), so per-zone stats can shift onto the wrong zone ID. The numpy and dask paths filterzone_idsdown to zones that do exist in the raster, so the same call returns a consistent (zone, stats) mapping.Expected behavior
The cupy result should match the numpy result row-for-row for the same inputs:
zone_ids=[...]containing IDs not present in the zones raster should not desync the zone/stat alignmentSteps to reproduce
Build a zones array where zone 5 contains only NaN values in the corresponding values array. Call
zonal_statson the numpy and cupy backends with the same input. The numpy result lists zone 5 with NaN stats; the cupy result omits zone 5.Separately, call
zonal_stats(zones, values, zone_ids=[1, 999])where 999 doesn't appear inzones. The cupy path returns a row labeled 999 but populated with stats from whichever zone landed at that index. The numpy path returns just the zone 1 row.Fix direction
unique_zonesfrom the raw zones array (filtered only by valid zone IDs, not by value validity), then accumulate stats per zone. All-NaN zones should appear with NaN stats.zone_idsincludes zones not in the raster, either filter them out (matching numpy/dask) or return NaN at the correct positions. Either way, the zone column and stats columns must stay aligned.