perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.5-5.6x speedup)#757
Merged
Lorak-mmk merged 1 commit intoscylladb:masterfrom Apr 9, 2026
Merged
Conversation
75eaf9b to
dd449ca
Compare
|
According to our README, we support Python 3.10-3.14 right now, so it should be fine to drop the pure-python impl and use the builtin one instead. |
- Use bisect.bisect_left from stdlib unconditionally (C implementation); drop the bundled pure-Python fallback since we only support Python 3.10+ - Replace per-call lambda closures with module-level operator.attrgetter for first_token/last_token extraction - Add unit tests for get_tablet_for_key Benchmark results (get_tablet_for_key hit): 10 tablets: 517 ns -> 365 ns (1.42x) 100 tablets: 616 ns -> 351 ns (1.75x) 1000 tablets: 1008 ns -> 529 ns (1.91x) 10000 tablets: 1339 ns -> 610 ns (2.20x)
dd449ca to
4ba3f2b
Compare
Author
Removed. |
Lorak-mmk
approved these changes
Apr 9, 2026
mykaul
added a commit
to mykaul/python-driver
that referenced
this pull request
Apr 9, 2026
Maintain parallel _first_tokens and _last_tokens dicts alongside _tablets, each mapping (keyspace, table) to a plain list[int]. This lets bisect_left run entirely in C on native ints instead of calling an attrgetter callback on every comparison during binary search. Follow-up to PR scylladb#757 which identified the opportunity: its own benchmarks showed bisect_left without key= is 2.7-5.7x faster than with key=attrgetter. Results (best-of-5, Python 3.14): get_tablet_for_key (hit): Tablets Before After Saved Speedup 10 293ns 216ns 78ns 1.36x 100 351ns 233ns 118ns 1.51x 1,000 448ns 267ns 181ns 1.68x 10,000 537ns 282ns 255ns 1.90x All three dicts are kept in sync by add_tablet, drop_tablets, and drop_tablets_by_host_id. The attrgetter imports are no longer needed and have been removed.
mykaul
added a commit
to mykaul/python-driver
that referenced
this pull request
Apr 21, 2026
Maintain parallel _first_tokens and _last_tokens dicts alongside _tablets, each mapping (keyspace, table) to a plain list[int]. This lets bisect_left run entirely in C on native ints instead of calling an attrgetter callback on every comparison during binary search. Follow-up to PR scylladb#757 which identified the opportunity: its own benchmarks showed bisect_left without key= is 2.7-5.7x faster than with key=attrgetter. Results (best-of-5, Python 3.14): get_tablet_for_key (hit): Tablets Before After Saved Speedup 10 293ns 216ns 78ns 1.36x 100 351ns 233ns 118ns 1.51x 1,000 448ns 267ns 181ns 1.68x 10,000 537ns 282ns 255ns 1.90x All three dicts are kept in sync by add_tablet, drop_tablets, and drop_tablets_by_host_id. The attrgetter imports are no longer needed and have been removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bisect.bisect_leftunconditionally (C implementation). Since we only support Python 3.10-3.14, drop the bundled pure-Python fallback entirely.operator.attrgetterforfirst_token/last_tokenextraction, avoiding repeated function-object allocation.get_tablet_for_key(3 tests).Benchmark Results
Measured on Intel i7-1270P, Python 3.14.3, CPU-pinned, 200k iterations.
get_tablet_for_key(hit — the primary hot path)bisect_leftwithkey=(isolated)bisect_leftwithoutkey=(plain ints)