Describe the bug
crosstab(zones, values, cat_ids=[...]) returns inflated counts when the cat_ids filter excludes one or more categories that actually appear in the values raster.
The cause is in _single_zone_crosstab_2d in xrspatial/zonal.py (around line 959). The function builds cumulative break indices for every unique category via _strides, but cat_start is only advanced inside the if cat in cat_ids: branch. When an earlier category is filtered out, cat_start stays at 0, so the next selected category's count is computed as zone_cat_breaks[j] - 0 instead of zone_cat_breaks[j] - zone_cat_breaks[j-1].
The helper is shared across all four backends (numpy, dask+numpy, cupy, dask+cupy), so every backend produces the same wrong counts.
Expected behavior
The count for category c should equal the number of cells in the zone whose value is exactly c, regardless of which other categories are listed in cat_ids.
Reproduction
import numpy as np, xarray as xr
from xrspatial import zonal
zones = xr.DataArray(np.ones((2, 2), dtype=int))
values = xr.DataArray(np.array([[1, 2], [2, 3]]))
print(zonal.crosstab(zones=zones, values=values, cat_ids=[2]))
# Returns count 3 for cat 2; correct answer is 2.
print(zonal.crosstab(zones=zones, values=values, cat_ids=[3]))
# Returns count 4 for cat 3; correct answer is 1.
Confirmed on the numpy backend; the same code path runs for dask, cupy, and dask+cupy.
Fix direction
Advance cat_start for every category in unique_cats, not only for selected ones. Equivalently, compute each category's count as zone_cat_breaks[j] - (zone_cat_breaks[j-1] if j > 0 else 0).
Add cross-backend tests covering cat_ids that omit an earlier category.
Describe the bug
crosstab(zones, values, cat_ids=[...])returns inflated counts when thecat_idsfilter excludes one or more categories that actually appear in the values raster.The cause is in
_single_zone_crosstab_2dinxrspatial/zonal.py(around line 959). The function builds cumulative break indices for every unique category via_strides, butcat_startis only advanced inside theif cat in cat_ids:branch. When an earlier category is filtered out,cat_startstays at 0, so the next selected category's count is computed aszone_cat_breaks[j] - 0instead ofzone_cat_breaks[j] - zone_cat_breaks[j-1].The helper is shared across all four backends (numpy, dask+numpy, cupy, dask+cupy), so every backend produces the same wrong counts.
Expected behavior
The count for category
cshould equal the number of cells in the zone whose value is exactlyc, regardless of which other categories are listed incat_ids.Reproduction
Confirmed on the numpy backend; the same code path runs for dask, cupy, and dask+cupy.
Fix direction
Advance
cat_startfor every category inunique_cats, not only for selected ones. Equivalently, compute each category's count aszone_cat_breaks[j] - (zone_cat_breaks[j-1] if j > 0 else 0).Add cross-backend tests covering
cat_idsthat omit an earlier category.