zonal.stats: count returns 0 for empty zones (other stats stay NaN)#2656
Merged
Conversation
Dedupe duplicate module rows (last-write-wins by last_inspected) and collapse multi-line notes to single physical lines. The notes had embedded newlines, which the merge=union .gitattributes strategy splits record-by-record, corrupting the file into a 156-column phantom row on parallel-agent appends. One line per record keeps union merges safe.
A zone that exists in the zones raster but has no valid values (all NaN, or all equal to nodata_values) is "empty". Previously stats() reported NaN for every statistic of an empty zone, including count, because the stat function was only called when the zone had at least one value. count is a cardinality: an empty zone has zero valid cells, so its count is 0, not undefined. NaN counts also break downstream numeric code that filters or sums on the count column. This changes count to 0 for empty zones while every other statistic (mean, min, max, sum, std, var, majority, custom callables) stays NaN, since those are undefined over an empty set. The rule holds across numpy, cupy, and dask backends. - numpy: _calc_stats takes an empty_zone_value; the count stat passes 0. - cupy: the size==0 branch appends 0 for count, NaN otherwise. - dask: count uses a plain nansum reducer so an all-empty zone totals 0 instead of being forced back to NaN. Tests that pinned NaN counts for empty zones (test_stats_all_nan_zone, test_stats_all_nan_zone_preserved, test_stats_nodata_wipes_zone) now expect 0, with comments noting the deliberate change. The docstring documents the empty-zone semantics explicitly.
brendancol
commented
May 29, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: zonal.stats count returns 0 for empty zones
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None. The change is small and the three backend paths line up.
Nits (optional improvements)
xrspatial/zonal.py_empty_zone_valuekeys on the literal stat name
'count'. A custom stats_funcs dict whose key happens to be'count'but
whose callable is not the cardinality counter would also get 0 for empty
zones. That is an unlikely corner and the current behavior is defensible
(a column named count should behave like a count), so this is just worth a
mention, not a change.
What looks good
- The empty-zone rule is consistent across numpy (
_calc_statsempty_zone_value),
cupy (thesize == 0branch), and dask (the dedicated_count_reduce).
Verified locally: numpy and dask both return count 0 and mean/sum NaN for an
all-NaN zone, including with ragged chunks that split a zone across blocks. - Variance is unaffected: the dask var merge reads the raw per-block count
stack, not the reduced count, so changing the reduced count to 0 does not
perturb std/var. - The dask mean stays NaN for an empty zone because sum is NaN and NaN/0 is NaN.
- The docstring documents the empty-zone semantics precisely, and the three
tests that pinned NaN counts were updated to 0 with comments explaining the
deliberate change.
Checklist
- Algorithm matches intent (count is a cardinality; 0 for empty)
- All implemented backends produce consistent results
- NaN handling is correct (other stats stay NaN)
- Edge cases covered by tests (all-NaN zone, all-nodata zone)
- Dask chunk boundaries handled correctly (verified with ragged chunks)
- No premature materialization or unnecessary copies
- Benchmark not needed (no new function, no hot-path change)
- README feature matrix not applicable (no new function)
- Docstring present and accurate
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2644
What this does
zonal.stats()used to report NaN for every statistic of an "empty" zone (onethat exists in the zones raster but has no valid values after filtering NaN and
nodata_values), includingcount. This makescountreturn0for emptyzones instead, while every other statistic stays NaN.
countis a cardinality: zero valid cells means a count of0, not undefined.mean,min,max,sum,std,var,majority, and custom callablesremain NaN for empty zones, since those are undefined over no values.
stats()docstring now documents the empty-zone rule explicitly.This is a deliberate behavior change. Three tests pinned the old NaN-count
behavior; they now expect
0, with comments and a commit message explainingwhy. Issue #2644 frames the tradeoff (keep NaN vs return 0) and recommends 0
for
countspecifically.Backend coverage
numpy, cupy, dask+numpy, dask+cupy. The numpy path passes an
empty_zone_valueinto
_calc_stats; the cupy path handles itssize == 0branch; the dask pathuses a plain
nansumcount reducer so an all-empty zone totals 0.crosstaband
applyare untouched (they do not share the affected count code).Test plan
test_stats_all_nan_zone(all 4 backends): empty-zone count is 0test_stats_all_nan_zone_preserved(numpy/cupy): count 0 for all-NaN zonetest_stats_nodata_wipes_zone(all 4 backends): count 0 for all-nodata zonetest_zonal.pysuite passes (169 passed locally; cupy/dask variantsskip without CUDA)
Skipped steps
No user-guide notebook and no README feature-matrix row: this refines the
documented behavior of an existing function and adds no new public API or
backend support.