Empty-zone count semantics in zonal.stats: NaN vs 0

## Reason or Problem

In `xrspatial/zonal.py`, `stats()` computes per-zone summary statistics. For a
zone whose cells are all NaN or all `nodata_values` (an "empty" zone), the numpy
path's `_calc_stats` only calls the stat function when `len(zone_values) > 0`,
so `results[i]` stays NaN for every statistic, including `count`. The cupy and
dask paths match this: an empty zone reports `count` as NaN.

For `mean`, `sum`, `std`, and the others, NaN is a defensible answer to "what is
the mean of no values". For `count` it is awkward. A count is a cardinality (the
number of valid cells in the zone), and the natural value for an empty zone is
`0`, not NaN. Downstream code that filters or sums on counts (`df[df['count'] >
0]`, `df['count'].sum()`) breaks or silently drops rows when the count column
carries NaN.

The current behavior is pinned by tests (`test_stats_all_nan_zone`,
`test_stats_all_nan_zone_preserved`), so changing it is a behavior change, not a
bug fix. This needs an explicit decision and documentation either way.

## Proposal

Two options:

**Option A -- keep NaN for empty zones, document it.** Leave the behavior as is.
Add a paragraph to the `stats()` docstring stating that empty zones report NaN
for every statistic, including `count`. No code or test changes beyond the
docstring.

**Option B (recommended for `count` only) -- empty-zone `count` returns 0.**
Treat `count` as a cardinality. An empty zone reports `count = 0` while
`mean`, `min`, `max`, `sum`, `std`, and `var` stay NaN. Document the rule
explicitly and update the tests that pin `count = NaN` for empty zones to expect
`0`, with a comment and commit message explaining the deliberate change.

**Design (Option B):**
- numpy: `_calc_stats` already detects the empty-zone branch (`len == 0`); set
  the result to 0 when the statistic being computed is `count`.
- cupy: `_stats_cupy` has an explicit `zone_values.size == 0` branch that appends
  `float('nan')` per stat; append 0 for `count`.
- dask: the `count` reducer uses `_nanreduce_preserve_allnan(..., np.nansum)`,
  which forces NaN when all blocks are NaN. Count should instead use plain
  `np.nansum` so an all-empty zone sums to 0 across blocks.

**Usage:** No API change. `stats(zones, values, stats_funcs=['count', 'mean'])`
returns `count = 0` and `mean = NaN` for an empty zone.

**Value:** `count` becomes safe to use in numeric filters and aggregations
without special-casing NaN, and the empty-zone semantics are documented rather
than implicit.

## Stakeholders and Impacts

Users of `zonal.stats` who request `count`. Impact is limited to empty zones
(all-NaN or all-nodata). Only `count` changes; other statistics keep NaN.
`crosstab` and `apply` have their own count paths and are out of scope unless
they share the affected code.

## Drawbacks

It is a behavior change. Code that currently checks `isnan(count)` to detect
empty zones would need to check `count == 0` instead. The change is gated behind
a clear docstring note and a migration comment in the tests.

## Alternatives

Option A (document-only) avoids the behavior change but leaves `count` as NaN,
which is the awkward value the finding flags.

## Unresolved Questions

Whether to extend the 0-for-empty rule to `crosstab`/`apply` count paths. This
proposal scopes the change to `stats()` only.

## Additional Notes or Context

If during implementation Option B turns out to cascade into `crosstab`/`apply`
or other shared code in a risky way, fall back to Option A (document-only) and
record why in the PR. Either way the docstring must state the empty-zone count
semantics precisely.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty-zone count semantics in zonal.stats: NaN vs 0 #2644

Reason or Problem

Proposal

Stakeholders and Impacts

Drawbacks

Alternatives

Unresolved Questions

Additional Notes or Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Empty-zone count semantics in zonal.stats: NaN vs 0 #2644

Description

Reason or Problem

Proposal

Stakeholders and Impacts

Drawbacks

Alternatives

Unresolved Questions

Additional Notes or Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions