Skip to content

Raise clear ValueError for empty Dataset in stats()#2642

Merged
brendancol merged 4 commits into
mainfrom
issue-2637
May 29, 2026
Merged

Raise clear ValueError for empty Dataset in stats()#2642
brendancol merged 4 commits into
mainfrom
issue-2637

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #2637

stats() accepts an xarray Dataset and runs per-variable, then merges the results. If the Dataset has no data variables, the merge hit result = dfs[0] on an empty list and raised an opaque IndexError that told the caller nothing.

This adds an early check in the Dataset branch: when values.data_vars is empty, raise a ValueError saying there is nothing to compute statistics over, before reaching the dfs[0] access.

  • Guard the empty-Dataset case in the stats() Dataset branch with a clear ValueError.
  • Add a regression test that passes an empty Dataset and asserts ValueError.

Backend coverage: the Dataset branch is shared by all backends and the guard runs before any backend dispatch, so numpy / cupy / dask+numpy / dask+cupy are all covered.

Test plan:

  • New test test_stats_empty_dataset_raises_value_error_2637 passes
  • Existing Dataset return_type test still passes

Dedupe duplicate module rows (last-write-wins by last_inspected) and
collapse multi-line notes to single physical lines. The notes had
embedded newlines, which the merge=union .gitattributes strategy splits
record-by-record, corrupting the file into a 156-column phantom row on
parallel-agent appends. One line per record keeps union merges safe.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 29, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Raise clear ValueError for empty Dataset in stats()

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/zonal.py:861 -- len(values.data_vars) == 0 works and is clear. if not values.data_vars: would be slightly more idiomatic since the data_vars mapping is falsy when empty, but the explicit length check reads fine and matches the surrounding style. Take it or leave it.

What looks good

  • The guard is in the right place: after the return_type check and before the dfs[0] access, so the empty case is caught early with a message that names the actual problem (no data variables).
  • The error message tells the caller what to do (pass a Dataset with at least one data variable), not just what went wrong.
  • The regression test reuses the existing small_zones_values_2558 fixture and asserts on the "no data variables" text, so the old IndexError would not satisfy it.
  • Scope is tight: two files, no unrelated edits.

Checklist

  • Algorithm matches reference/paper: n/a (input validation fix)
  • All implemented backends produce consistent results: yes, guard runs before backend dispatch
  • NaN handling is correct: n/a
  • Edge cases are covered by tests: yes, empty Dataset is the case in question
  • Dask chunk boundaries handled correctly: n/a
  • No premature materialization or unnecessary copies: yes, the data_vars length check is cheap
  • Benchmark exists or is not needed: not needed
  • README feature matrix updated: not needed, no new public API
  • Docstrings present and accurate: stats() docstring unchanged, still accurate

@brendancol brendancol merged commit cbae648 into main May 29, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stats() on an empty Dataset raises an opaque IndexError

1 participant