Skip to content

fix: Correct join cardinality estimation for semi and anti joins with disjoint column ranges#22674

Merged
Dandandan merged 1 commit into
apache:mainfrom
neilconway:neilc/fix-semi-join-disjoint-columns
Jun 1, 2026
Merged

fix: Correct join cardinality estimation for semi and anti joins with disjoint column ranges#22674
Dandandan merged 1 commit into
apache:mainfrom
neilconway:neilc/fix-semi-join-disjoint-columns

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

@neilconway neilconway commented May 31, 2026

Which issue does this PR close?

Rationale for this change

estimate_join_cardinality for semi-joins checks if ANY of the columns in the two join inputs are disjoint (comparing columns positionally); if so, it claims the join will not return any rows. This is wrong, for two reasons:

  1. If two columns don't participate in the join key, they have no impact on the cardinality of the join result
  2. Comparing arbitrary columns positionally is not a sensible thing to do in the first place

A similar issue exists for anti-joins, except we assume the anti-join will return the entire join input in this case.

We should instead just check for disjoint ranges between the pairs of columns that make up the join key.

What changes are included in this PR?

  • Fix estimate_join_cardinality behavior in the face of disjoint column ranges that aren't join key columns
  • Refactor estimate_join_cardinality, rename a variable for clarity
  • Add unit test

Are these changes tested?

Yes, new test added.

Are there any user-facing changes?

Better plans / avoid buggy cardinality estimate.

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 31, 2026
@nathanb9
Copy link
Copy Markdown
Contributor

Nice correctness fix 🚀

@Dandandan Dandandan added this pull request to the merge queue Jun 1, 2026
Merged via the queue into apache:main with commit d167450 Jun 1, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

estimate_join_cardinality for semi-joins is buggy for non-join-key disjoint columns

3 participants