Summary
Several DataFrame methods from upstream DataFusion v53 are not yet exposed in datafusion-python. This issue covers set operations and query-related methods.
Missing Methods
Set operations:
Query/display:
Upstream Reference
Implementation
- Rust bindings:
crates/core/src/dataframe.rs
- Python wrappers:
python/datafusion/dataframe.py
Note: This gap analysis was performed using an AI agent comparing upstream DataFusion v53 documentation against the current datafusion-python codebase.
Summary
Several DataFrame methods from upstream DataFusion v53 are not yet exposed in datafusion-python. This issue covers set operations and query-related methods.
Missing Methods
Set operations:
distinct_on— deduplicate rows based on specific columns, keeping the first row per groupexcept_distinct— set difference with deduplication (complement to existingexcept_all)intersect_distinct— set intersection with deduplication (complement to existingintersect)union_by_name— union two DataFrames matching columns by name rather than positionunion_by_name_distinct— union by name with deduplicationQuery/display:
explain_with_options— explain plan with configurable detail optionsshow_limit— display results with a custom row limitsort_by— sort by column names (simpler API thansortwhich requiresExpr)with_param_values— bind parameter values for prepared statementsUpstream Reference
Implementation
crates/core/src/dataframe.rspython/datafusion/dataframe.py