Skip to content

perf: replace KeyError exception with dict.get() in BoundStatement.bind()#755

Closed
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/bind-dict-get
Closed

perf: replace KeyError exception with dict.get() in BoundStatement.bind()#755
mykaul wants to merge 2 commits intoscylladb:masterfrom
mykaul:perf/bind-dict-get

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 16, 2026

Summary

Replace try/except KeyError with dict.get() + sentinel pattern in the per-column binding loop of BoundStatement.bind(). This loop runs once per column per execute() call for dict-style bindings, making it a hot path. Using dict.get() avoids the overhead of raising and catching KeyError for every missing/optional column.

The sentinel object (_BIND_SENTINEL) is necessary to distinguish a missing key from an explicit None value in the bound dict.

A type(values_dict) is dict check gates the fast path so that dict subclasses with custom __missing__ methods continue to work via the original try/except fallback.

Benchmark results

Column-lookup loop (10 columns, bind() hot path, protocol v4):

Scenario Before (try/except) After (dict.get()) Change
All 10 keys present 244 ns 366 ns 1.50x slower
50% keys missing (5/10) 605 ns 395 ns 1.53x faster
All keys missing (0/10) 1,049 ns 413 ns 2.54x faster

Note: When all keys are present (common case), try/except is faster because no exceptions are raised and the overhead is minimal. The dict.get() path adds overhead from the method call and sentinel comparison. The benefit materializes when columns are missing (partial binds with protocol v4+ UNSET_VALUE), where avoiding exception creation/handling gives a significant speedup.

Testing

  • Added test_dict_subclass_missing_value tests for both v3 and v4 protocol versions to verify dict subclass __missing__ semantics are preserved
  • All 665 unit tests pass (38 skips)

@mykaul mykaul marked this pull request as draft March 16, 2026 16:10
@mykaul mykaul requested a review from Copilot March 16, 2026 17:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the performance of dict-style parameter binding in BoundStatement.bind() by avoiding KeyError exception handling in the per-column binding loop, which is on a hot path during execute() for prepared statements.

Changes:

  • Introduces a private sentinel (_BIND_SENTINEL) to distinguish “missing key” from an explicit None.
  • Replaces try/except KeyError with dict.get(..., sentinel) in the dict-binding loop to avoid exception overhead.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cassandra/query.py
Comment thread cassandra/query.py Outdated
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Mar 16, 2026

Review of Copilot's Comments

Comment 1: Module-level _BIND_SENTINEL exposure

Copilot's concern: The sentinel could be accidentally used by callers.

Assessment: Low severity, mostly noise. The _ prefix is the standard Python convention for private objects. Moving it inside bind() would create a new object() on every call to a hot path — counterproductive for a performance PR. The suggestion to use UNSET_VALUE instead would complicate the logic with separate v3/v4 code paths for marginal benefit.

Comment 2: dict.get() doesn't call __missing__

Copilot's concern: Switching from values_dict[col.name] (which triggers __missing__ for subclasses like defaultdict) to values_dict.get() is a behavior change.

Assessment: Technically correct, but negligible practical impact. Using a defaultdict for binding parameters would be unusual and arguably a misuse — it would silently provide defaults for missing columns instead of raising errors. No evidence in the codebase or tests that __missing__-implementing subclasses are used here.

Additional Observations

  1. The optimization is sound — avoiding try/except KeyError on a hot path is a reasonable micro-optimization.
  2. No bugs introduced — the new logic correctly handles all three cases (key present, key missing on v4+, key missing on v3). Existing unit tests cover these paths.
  3. _BIND_SENTINEL export is a non-issue — no __all__ in the module, and the _ prefix excludes it from wildcard imports.
  4. Missing benchmark data — the PR would be strengthened by including before/after timings, though this isn't blocking.

@mykaul mykaul self-assigned this Mar 16, 2026
@mykaul mykaul force-pushed the perf/bind-dict-get branch from 4b81afe to 68dbf6c Compare March 26, 2026 07:49
mykaul added 2 commits April 7, 2026 14:45
…nd()

Replace try/except KeyError with dict.get() + sentinel pattern in the
per-column binding loop of BoundStatement.bind(). This loop runs once
per column per execute() call for dict-style bindings, making it a hot
path. Using dict.get() avoids the overhead of raising and catching
KeyError for every missing/optional column.

The sentinel object (_BIND_SENTINEL) is necessary to distinguish a
missing key from an explicit None value in the bound dict.
@mykaul mykaul force-pushed the perf/bind-dict-get branch from 68dbf6c to 0b0dfaf Compare April 7, 2026 11:47
@mykaul mykaul changed the title perf: replace KeyError exception with dict.get() in BoundStatement.bi… perf: replace KeyError exception with dict.get() in BoundStatement.bind() Apr 7, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 7, 2026

Closing in favor of PR #749, which replaces the entire Python bind loop with a Cython serializer fast path — making this Python-level dict.get() optimization largely moot. Benchmarks also showed a 1.5x regression in the most common case (all keys present), which further reduces the value of this change.

@mykaul mykaul closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants