Skip to content

crosstab(cat_ids=[...]) overcounts when earlier categories are filtered out #2560

@brendancol

Description

@brendancol

Describe the bug

crosstab(zones, values, cat_ids=[...]) returns inflated counts when the cat_ids filter excludes one or more categories that actually appear in the values raster.

The cause is in _single_zone_crosstab_2d in xrspatial/zonal.py (around line 959). The function builds cumulative break indices for every unique category via _strides, but cat_start is only advanced inside the if cat in cat_ids: branch. When an earlier category is filtered out, cat_start stays at 0, so the next selected category's count is computed as zone_cat_breaks[j] - 0 instead of zone_cat_breaks[j] - zone_cat_breaks[j-1].

The helper is shared across all four backends (numpy, dask+numpy, cupy, dask+cupy), so every backend produces the same wrong counts.

Expected behavior

The count for category c should equal the number of cells in the zone whose value is exactly c, regardless of which other categories are listed in cat_ids.

Reproduction

import numpy as np, xarray as xr
from xrspatial import zonal

zones = xr.DataArray(np.ones((2, 2), dtype=int))
values = xr.DataArray(np.array([[1, 2], [2, 3]]))

print(zonal.crosstab(zones=zones, values=values, cat_ids=[2]))
# Returns count 3 for cat 2; correct answer is 2.

print(zonal.crosstab(zones=zones, values=values, cat_ids=[3]))
# Returns count 4 for cat 3; correct answer is 1.

Confirmed on the numpy backend; the same code path runs for dask, cupy, and dask+cupy.

Fix direction

Advance cat_start for every category in unique_cats, not only for selected ones. Equivalently, compute each category's count as zone_cat_breaks[j] - (zone_cat_breaks[j-1] if j > 0 else 0).

Add cross-backend tests covering cat_ids that omit an earlier category.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions