Reason or Problem
In xrspatial/zonal.py, stats() computes per-zone summary statistics. For a
zone whose cells are all NaN or all nodata_values (an "empty" zone), the numpy
path's _calc_stats only calls the stat function when len(zone_values) > 0,
so results[i] stays NaN for every statistic, including count. The cupy and
dask paths match this: an empty zone reports count as NaN.
For mean, sum, std, and the others, NaN is a defensible answer to "what is
the mean of no values". For count it is awkward. A count is a cardinality (the
number of valid cells in the zone), and the natural value for an empty zone is
0, not NaN. Downstream code that filters or sums on counts (df[df['count'] > 0], df['count'].sum()) breaks or silently drops rows when the count column
carries NaN.
The current behavior is pinned by tests (test_stats_all_nan_zone,
test_stats_all_nan_zone_preserved), so changing it is a behavior change, not a
bug fix. This needs an explicit decision and documentation either way.
Proposal
Two options:
Option A -- keep NaN for empty zones, document it. Leave the behavior as is.
Add a paragraph to the stats() docstring stating that empty zones report NaN
for every statistic, including count. No code or test changes beyond the
docstring.
Option B (recommended for count only) -- empty-zone count returns 0.
Treat count as a cardinality. An empty zone reports count = 0 while
mean, min, max, sum, std, and var stay NaN. Document the rule
explicitly and update the tests that pin count = NaN for empty zones to expect
0, with a comment and commit message explaining the deliberate change.
Design (Option B):
- numpy:
_calc_stats already detects the empty-zone branch (len == 0); set
the result to 0 when the statistic being computed is count.
- cupy:
_stats_cupy has an explicit zone_values.size == 0 branch that appends
float('nan') per stat; append 0 for count.
- dask: the
count reducer uses _nanreduce_preserve_allnan(..., np.nansum),
which forces NaN when all blocks are NaN. Count should instead use plain
np.nansum so an all-empty zone sums to 0 across blocks.
Usage: No API change. stats(zones, values, stats_funcs=['count', 'mean'])
returns count = 0 and mean = NaN for an empty zone.
Value: count becomes safe to use in numeric filters and aggregations
without special-casing NaN, and the empty-zone semantics are documented rather
than implicit.
Stakeholders and Impacts
Users of zonal.stats who request count. Impact is limited to empty zones
(all-NaN or all-nodata). Only count changes; other statistics keep NaN.
crosstab and apply have their own count paths and are out of scope unless
they share the affected code.
Drawbacks
It is a behavior change. Code that currently checks isnan(count) to detect
empty zones would need to check count == 0 instead. The change is gated behind
a clear docstring note and a migration comment in the tests.
Alternatives
Option A (document-only) avoids the behavior change but leaves count as NaN,
which is the awkward value the finding flags.
Unresolved Questions
Whether to extend the 0-for-empty rule to crosstab/apply count paths. This
proposal scopes the change to stats() only.
Additional Notes or Context
If during implementation Option B turns out to cascade into crosstab/apply
or other shared code in a risky way, fall back to Option A (document-only) and
record why in the PR. Either way the docstring must state the empty-zone count
semantics precisely.
Reason or Problem
In
xrspatial/zonal.py,stats()computes per-zone summary statistics. For azone whose cells are all NaN or all
nodata_values(an "empty" zone), the numpypath's
_calc_statsonly calls the stat function whenlen(zone_values) > 0,so
results[i]stays NaN for every statistic, includingcount. The cupy anddask paths match this: an empty zone reports
countas NaN.For
mean,sum,std, and the others, NaN is a defensible answer to "what isthe mean of no values". For
countit is awkward. A count is a cardinality (thenumber of valid cells in the zone), and the natural value for an empty zone is
0, not NaN. Downstream code that filters or sums on counts (df[df['count'] > 0],df['count'].sum()) breaks or silently drops rows when the count columncarries NaN.
The current behavior is pinned by tests (
test_stats_all_nan_zone,test_stats_all_nan_zone_preserved), so changing it is a behavior change, not abug fix. This needs an explicit decision and documentation either way.
Proposal
Two options:
Option A -- keep NaN for empty zones, document it. Leave the behavior as is.
Add a paragraph to the
stats()docstring stating that empty zones report NaNfor every statistic, including
count. No code or test changes beyond thedocstring.
Option B (recommended for
countonly) -- empty-zonecountreturns 0.Treat
countas a cardinality. An empty zone reportscount = 0whilemean,min,max,sum,std, andvarstay NaN. Document the ruleexplicitly and update the tests that pin
count = NaNfor empty zones to expect0, with a comment and commit message explaining the deliberate change.Design (Option B):
_calc_statsalready detects the empty-zone branch (len == 0); setthe result to 0 when the statistic being computed is
count._stats_cupyhas an explicitzone_values.size == 0branch that appendsfloat('nan')per stat; append 0 forcount.countreducer uses_nanreduce_preserve_allnan(..., np.nansum),which forces NaN when all blocks are NaN. Count should instead use plain
np.nansumso an all-empty zone sums to 0 across blocks.Usage: No API change.
stats(zones, values, stats_funcs=['count', 'mean'])returns
count = 0andmean = NaNfor an empty zone.Value:
countbecomes safe to use in numeric filters and aggregationswithout special-casing NaN, and the empty-zone semantics are documented rather
than implicit.
Stakeholders and Impacts
Users of
zonal.statswho requestcount. Impact is limited to empty zones(all-NaN or all-nodata). Only
countchanges; other statistics keep NaN.crosstabandapplyhave their own count paths and are out of scope unlessthey share the affected code.
Drawbacks
It is a behavior change. Code that currently checks
isnan(count)to detectempty zones would need to check
count == 0instead. The change is gated behinda clear docstring note and a migration comment in the tests.
Alternatives
Option A (document-only) avoids the behavior change but leaves
countas NaN,which is the awkward value the finding flags.
Unresolved Questions
Whether to extend the 0-for-empty rule to
crosstab/applycount paths. Thisproposal scopes the change to
stats()only.Additional Notes or Context
If during implementation Option B turns out to cascade into
crosstab/applyor other shared code in a risky way, fall back to Option A (document-only) and
record why in the PR. Either way the docstring must state the empty-zone count
semantics precisely.