Skip to content

bench: add correlated-proxy case to the predicate_eval suite#22919

Open
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:predicate-eval-correlated-case
Open

bench: add correlated-proxy case to the predicate_eval suite#22919
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:predicate-eval-correlated-case

Conversation

@adriangb

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The correlation subgroup's existing cases (q70–q72) use two predicates of equal
cost and equal selectivity. For two conjuncts the evaluation cost of an order is
cost(first) + selectivity(first) × cost(second), which is symmetric here — so
the two orders cost the same and correlation only affects the result cardinality.
These cases measure the overhead of an ordering system, but give it no
opportunity: nothing in the suite rewards (or even detects) correlation-aware
ordering.

This adds a case with real, measurable headroom that only joint statistics can
find. A cheap integer predicate (c0 = 1, ~30%) is a perfect proxy for three
string regexes on s1; a fourth regex on s2 has the same ~30% selectivity and
similar cost but is independent. Marginally, the four regexes are
indistinguishable in any position. Conditionally — behind the proxy — the three
s1 regexes keep every survivor while the s2 regex still discards ~70%.

The query is written in the natural-but-pessimal order (the redundant regexes
grouped with their proxy, the informative one last). On an M-series laptop the
written order runs ~1.9x slower than the hand-optimal order [c0, s2-regex, s1-regexes...] (16.4 ms vs 8.6 ms per iteration), so:

  • an ordering system using marginal per-predicate statistics (or an
    independence assumption) is blind to the difference — every ranking of the four
    regexes looks equivalent;
  • a system measuring the predicates' joint behaviour can reliably collect ~1.9x.

What changes are included in this PR?

  • load/corrproxy.sql — the correlated-proxy dataset (deterministic, generated
    from generate_series like the existing datasets; PRED_ROWS/PRED_FILL
    knobs as elsewhere).
  • queries/correlation/q73.sql, benchmarks/correlation/q73.benchmark — the new
    case, following the suite's existing conventions.

Run with: BENCH_NAME=predicate_eval BENCH_SUBGROUP=correlation cargo bench --bench sql

Are these changes tested?

The suite's shared template asserts the query returns rows; the case runs green
locally alongside q70–q72.

Are there any user-facing changes?

No — benchmark-only.

🤖 Generated with Claude Code

The correlation subgroup's existing cases (q70-q72) use two predicates of
equal cost and equal selectivity, so the two orders cost the same and
correlation only affects the result cardinality - no ordering system can win
or lose on them. They measure overhead, not opportunity.

Add q73: a cheap integer predicate that is a perfect proxy for three string
regexes, plus one independent regex of the same ~30% selectivity and similar
cost. Marginal statistics cannot tell the four regexes apart in any position;
their joint distribution with the proxy is what matters. Written in the
natural-but-pessimal order (redundant regexes grouped with their proxy), the
query runs ~1.9x slower than the hand-optimal order [c0, s2, s1...] on an
M-series laptop, so a correlation-aware ordering system has real, measurable
headroom here while an independence-assuming one is blind to it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
adriangb added a commit to pydantic/datafusion that referenced this pull request Jun 12, 2026
…posal

Marginal per-conjunct statistics are blind to correlation: arrangements with
very different costs can be statistically identical (the new predicate_eval
correlation_q73 case has ~1.9x headroom invisible to any independence-based
ranking), and the fused-vs-compact-once strategy difference is invisible to
per-conjunct numbers entirely.

Borrow the exploration idea from DuckDB's AdaptiveFilter (keep-or-revert
timing of random swaps, src/execution/adaptive_filter.cpp): when a measuring
window ends with nothing material to propose, occasionally put a random
adjacent swap of the incumbent through the existing shared paired A/B trial
instead of freezing. Each position carries a likelihood (halved when a swap
there loses its trial, restored to 100 when one wins), so exploration of
barren positions decays geometrically on top of the re-thaw backoff. The
candidate bypasses the model gates by design — it exists because the model
cannot see it — but adoption still requires the same measured,
confidence-separated end-to-end win as any other proposal, which is a
stronger keep-or-revert rule than DuckDB's strict mean comparison.

On correlation_q73 (PR apache#22919) this captures 1.28x of the ~1.9x headroom
within each 122-batch query (convergence needs two specific adjacent swaps;
the rest needs cross-query persistence, cf. DuckDB's multi-file adaptive
filter cache, left as future work). Tied micro-queries pay ~4-6% for the
exploration trials they decline.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant