fix(substrait): preserve window function semantics#22664
Open
bvolpato wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Substrait logical-plan conversion can currently return a valid window plan with
different semantics from its input. On current
main(496f2c2065c0), around trip of:
changes expected output
[A], [A], [C], [C], [E]to[A], [A, NULL], [NULL, C], [C, NULL], [NULL, E].Likewise,
window.slt:909uses finite intervalRANGEbounds; round tripchanges bounded counts such as
6, 2, 1into partition-wide counts8, 8, 8.Returning changed results is worse than reporting unsupported conversion.
What changes are included in this PR?
DISTINCTthrough SubstraitAggregationInvocation.FILTER, which the currentSubstrait window expression path does not encode.
offsets instead of widening them to
UNBOUNDED.DISTINCTand fail-closed coverage forIGNORE NULLS,RESPECT NULLS,FILTER, and intervalRANGEframes.Are these changes tested?
cargo fmt --allcargo clippy --all-targets --all-features -- -D warningscargo test -p datafusion-substraitPre-fix reproduction:
cargo test -p datafusion-sqllogictest --test sqllogictests --features substrait -- --substrait-round-trip array_agg_sliding_window.slt:150 window.slt:909array_agg_sliding_window.slt:150produced null-containing arrays despiteIGNORE NULLS.window.slt:909produced partition-wide counts for finite interval bounds.Post-fix targeted checks:
Window functions with IGNORE NULLS are not supported in Substrait.Unsupported Substrait window frame offset: 1 DAY.window.slt:909also runs unrelated existingEXPLAINrecords inthat fixture, which report separate plan-output mismatches after the target
conversion error.
Are there any user-facing changes?
Substrait producers now return explicit unsupported-feature errors for window
null treatment, window
FILTER, and finite non-integer window-frame offsetsinstead of producing plans with altered semantics. Window
DISTINCTnowround-trips correctly.