Skip to content

perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.5-5.6x speedup)#757

Merged
Lorak-mmk merged 1 commit intoscylladb:masterfrom
mykaul:perf/tablets-stdlib-bisect
Apr 9, 2026
Merged

perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.5-5.6x speedup)#757
Lorak-mmk merged 1 commit intoscylladb:masterfrom
mykaul:perf/tablets-stdlib-bisect

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 20, 2026

Summary

  • Use stdlib bisect.bisect_left unconditionally (C implementation). Since we only support Python 3.10-3.14, drop the bundled pure-Python fallback entirely.
  • Replace per-call lambda closures with module-level operator.attrgetter for first_token / last_token extraction, avoiding repeated function-object allocation.
  • Add unit tests for get_tablet_for_key (3 tests).

Benchmark Results

Measured on Intel i7-1270P, Python 3.14.3, CPU-pinned, 200k iterations.

get_tablet_for_key (hit — the primary hot path)

Tablets Before (ns) After (ns) Speedup
10 359 240 1.50x
100 515 296 1.74x
1,000 823 405 2.03x
10,000 1,049 450 2.33x

bisect_left with key= (isolated)

Size Before (ns) After (ns) Speedup
10 262 120 2.18x
100 381 166 2.30x
1,000 607 276 2.20x
10,000 848 342 2.48x

bisect_left without key= (plain ints)

Size Before (ns) After (ns) Speedup
10 164 59 2.78x
100 255 124 2.06x
1,000 474 103 4.60x
10,000 628 111 5.66x

@Lorak-mmk Lorak-mmk self-requested a review March 20, 2026 19:09
@mykaul mykaul force-pushed the perf/tablets-stdlib-bisect branch from 75eaf9b to dd449ca Compare April 1, 2026 20:02
@Lorak-mmk
Copy link
Copy Markdown

According to our README, we support Python 3.10-3.14 right now, so it should be fine to drop the pure-python impl and use the builtin one instead.

@mykaul mykaul changed the title perf: use stdlib bisect and attrgetter in tablets.py perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.4-6.x speedup) Apr 7, 2026
- Use bisect.bisect_left from stdlib unconditionally (C implementation);
  drop the bundled pure-Python fallback since we only support Python 3.10+
- Replace per-call lambda closures with module-level operator.attrgetter
  for first_token/last_token extraction
- Add unit tests for get_tablet_for_key

Benchmark results (get_tablet_for_key hit):
  10 tablets:    517 ns -> 365 ns (1.42x)
  100 tablets:   616 ns -> 351 ns (1.75x)
  1000 tablets:  1008 ns -> 529 ns (1.91x)
  10000 tablets: 1339 ns -> 610 ns (2.20x)
@mykaul mykaul force-pushed the perf/tablets-stdlib-bisect branch from dd449ca to 4ba3f2b Compare April 9, 2026 07:55
@mykaul mykaul changed the title perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.4-6.x speedup) perf: use stdlib bisect and attrgetter in tablets.py (100's of ns, 1.5-5.6x speedup) Apr 9, 2026
@mykaul
Copy link
Copy Markdown
Author

mykaul commented Apr 9, 2026

According to our README, we support Python 3.10-3.14 right now, so it should be fine to drop the pure-python impl and use the builtin one instead.

Removed.

@Lorak-mmk Lorak-mmk merged commit 5094118 into scylladb:master Apr 9, 2026
11 of 13 checks passed
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 9, 2026
Maintain parallel _first_tokens and _last_tokens dicts alongside
_tablets, each mapping (keyspace, table) to a plain list[int].  This
lets bisect_left run entirely in C on native ints instead of calling
an attrgetter callback on every comparison during binary search.

Follow-up to PR scylladb#757 which identified the opportunity: its own
benchmarks showed bisect_left without key= is 2.7-5.7x faster than
with key=attrgetter.

Results (best-of-5, Python 3.14):

  get_tablet_for_key (hit):
  Tablets    Before    After    Saved   Speedup
       10    293ns    216ns     78ns     1.36x
      100    351ns    233ns    118ns     1.51x
    1,000    448ns    267ns    181ns     1.68x
   10,000    537ns    282ns    255ns     1.90x

All three dicts are kept in sync by add_tablet, drop_tablets, and
drop_tablets_by_host_id.  The attrgetter imports are no longer needed
and have been removed.
mykaul added a commit to mykaul/python-driver that referenced this pull request Apr 21, 2026
Maintain parallel _first_tokens and _last_tokens dicts alongside
_tablets, each mapping (keyspace, table) to a plain list[int].  This
lets bisect_left run entirely in C on native ints instead of calling
an attrgetter callback on every comparison during binary search.

Follow-up to PR scylladb#757 which identified the opportunity: its own
benchmarks showed bisect_left without key= is 2.7-5.7x faster than
with key=attrgetter.

Results (best-of-5, Python 3.14):

  get_tablet_for_key (hit):
  Tablets    Before    After    Saved   Speedup
       10    293ns    216ns     78ns     1.36x
      100    351ns    233ns    118ns     1.51x
    1,000    448ns    267ns    181ns     1.68x
   10,000    537ns    282ns    255ns     1.90x

All three dicts are kept in sync by add_tablet, drop_tablets, and
drop_tablets_by_host_id.  The attrgetter imports are no longer needed
and have been removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants