Skip to content

zonal.stats: count returns 0 for empty zones (other stats stay NaN)#2656

Merged
brendancol merged 4 commits into
mainfrom
issue-2644
May 29, 2026
Merged

zonal.stats: count returns 0 for empty zones (other stats stay NaN)#2656
brendancol merged 4 commits into
mainfrom
issue-2644

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2644

What this does

zonal.stats() used to report NaN for every statistic of an "empty" zone (one
that exists in the zones raster but has no valid values after filtering NaN and
nodata_values), including count. This makes count return 0 for empty
zones instead, while every other statistic stays NaN.

  • count is a cardinality: zero valid cells means a count of 0, not undefined.
  • mean, min, max, sum, std, var, majority, and custom callables
    remain NaN for empty zones, since those are undefined over no values.
  • The stats() docstring now documents the empty-zone rule explicitly.

This is a deliberate behavior change. Three tests pinned the old NaN-count
behavior; they now expect 0, with comments and a commit message explaining
why. Issue #2644 frames the tradeoff (keep NaN vs return 0) and recommends 0
for count specifically.

Backend coverage

numpy, cupy, dask+numpy, dask+cupy. The numpy path passes an empty_zone_value
into _calc_stats; the cupy path handles its size == 0 branch; the dask path
uses a plain nansum count reducer so an all-empty zone totals 0. crosstab
and apply are untouched (they do not share the affected count code).

Test plan

  • test_stats_all_nan_zone (all 4 backends): empty-zone count is 0
  • test_stats_all_nan_zone_preserved (numpy/cupy): count 0 for all-NaN zone
  • test_stats_nodata_wipes_zone (all 4 backends): count 0 for all-nodata zone
  • full test_zonal.py suite passes (169 passed locally; cupy/dask variants
    skip without CUDA)

Skipped steps

No user-guide notebook and no README feature-matrix row: this refines the
documented behavior of an existing function and adds no new public API or
backend support.

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
A zone that exists in the zones raster but has no valid values (all NaN,
or all equal to nodata_values) is "empty". Previously stats() reported
NaN for every statistic of an empty zone, including count, because the
stat function was only called when the zone had at least one value.

count is a cardinality: an empty zone has zero valid cells, so its count
is 0, not undefined. NaN counts also break downstream numeric code that
filters or sums on the count column. This changes count to 0 for empty
zones while every other statistic (mean, min, max, sum, std, var,
majority, custom callables) stays NaN, since those are undefined over an
empty set. The rule holds across numpy, cupy, and dask backends.

- numpy: _calc_stats takes an empty_zone_value; the count stat passes 0.
- cupy: the size==0 branch appends 0 for count, NaN otherwise.
- dask: count uses a plain nansum reducer so an all-empty zone totals 0
  instead of being forced back to NaN.

Tests that pinned NaN counts for empty zones (test_stats_all_nan_zone,
test_stats_all_nan_zone_preserved, test_stats_nodata_wipes_zone) now
expect 0, with comments noting the deliberate change. The docstring
documents the empty-zone semantics explicitly.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: zonal.stats count returns 0 for empty zones

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None. The change is small and the three backend paths line up.

Nits (optional improvements)

  • xrspatial/zonal.py _empty_zone_value keys on the literal stat name
    'count'. A custom stats_funcs dict whose key happens to be 'count' but
    whose callable is not the cardinality counter would also get 0 for empty
    zones. That is an unlikely corner and the current behavior is defensible
    (a column named count should behave like a count), so this is just worth a
    mention, not a change.

What looks good

  • The empty-zone rule is consistent across numpy (_calc_stats empty_zone_value),
    cupy (the size == 0 branch), and dask (the dedicated _count_reduce).
    Verified locally: numpy and dask both return count 0 and mean/sum NaN for an
    all-NaN zone, including with ragged chunks that split a zone across blocks.
  • Variance is unaffected: the dask var merge reads the raw per-block count
    stack, not the reduced count, so changing the reduced count to 0 does not
    perturb std/var.
  • The dask mean stays NaN for an empty zone because sum is NaN and NaN/0 is NaN.
  • The docstring documents the empty-zone semantics precisely, and the three
    tests that pinned NaN counts were updated to 0 with comments explaining the
    deliberate change.

Checklist

  • Algorithm matches intent (count is a cardinality; 0 for empty)
  • All implemented backends produce consistent results
  • NaN handling is correct (other stats stay NaN)
  • Edge cases covered by tests (all-NaN zone, all-nodata zone)
  • Dask chunk boundaries handled correctly (verified with ragged chunks)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (no new function, no hot-path change)
  • README feature matrix not applicable (no new function)
  • Docstring present and accurate

@brendancol brendancol merged commit 7c8a548 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Empty-zone count semantics in zonal.stats: NaN vs 0

1 participant