perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) by mykaul · Pull Request #690 · scylladb/python-driver

mykaul · 2026-02-06T12:17:39Z

Summary

Cache lookup_casstype_simple() and parse_casstype_args() with @functools.lru_cache() to avoid repeated string manipulation and regex scanning when the same type strings are resolved multiple times (common during schema parsing and query result deserialization).

The fast-path for simple types (without parentheses) was already merged separately. This PR adds caching on top of that.

Also fixes an unused variable warning (prev_names → _).

Includes a pytest-benchmark comparison (cached vs uncached).

Changes

cassandra/cqltypes.py: Added import functools, @functools.lru_cache() on lookup_casstype_simple() and parse_casstype_args(), fixed unused prev_names variable
benchmarks/test_casstype_cache_benchmark.py: New benchmark file with correctness tests and cached vs uncached performance comparisons

Benchmark results

`lookup_casstype_simple` — clear wins from cache

Benchmark	Cached	Uncached	Speedup
Short name (`UTF8Type`)	135 ns	217 ns	1.6x
Fully-qualified name (`o.a.c.db.marshal.UTF8Type`)	274 ns	414 ns	1.5x
Batch of 10 types	1.57 µs	1.83 µs	1.2x

`parse_casstype_args` — modest gains for parameterized types

Benchmark	Cached	Uncached	Speedup
MapType(UTF8,Int32)	26.9 µs	29.1 µs	1.08x
Nested MapType(UTF8,ListType(Int32))	46.9 µs	55.6 µs	1.19x

End-to-end `lookup_casstype` (mixed simple + parameterized)

Benchmark	Cached	Uncached	Speedup
Mixed batch (simple + parameterized)	296 µs	321 µs	1.08x

The biggest gains are on lookup_casstype_simple, which is called most frequently (every column in every row). The parse_casstype_args cache helps for parameterized types (maps, lists, sets, tuples) where the regex scanner is the bottleneck.

Rationale for unbounded cache

lru_cache() is used without maxsize because the set of distinct type strings is finite per schema — bounded by the number of column types defined in the cluster. There is no risk of unbounded memory growth.

Copilot

Pull request overview

This pull request optimizes custom type parsing by adding LRU caching to frequently-called type lookup functions and implementing a fast path for simple types without parameters. The optimization aims to reduce repeated string manipulation and regex scanning for type lookups that occur frequently during query execution.

Changes:

Added @functools.lru_cache(maxsize=256) decorators to lookup_casstype_simple and parse_casstype_args functions
Implemented fast-path optimization in lookup_casstype to avoid regex scanning for simple types (those without parentheses)
Removed error handling wrapper from lookup_casstype, changing behavior to return UnrecognizedType instead of raising ValueError for invalid types
Added benchmark script demonstrating cache effectiveness for repeated type lookups

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File	Description
cassandra/cqltypes.py	Added LRU caching to type parsing functions, removed error handling wrapper, cleaned up unused variable, optimized type lookup with fast path
tests/unit/test_types.py	Updated test to reflect new behavior where invalid type names create UnrecognizedType instead of raising ValueError
benchmarks/cache_benefit.py	New benchmark script demonstrating LRU cache benefits for repeated type lookups with various type complexities

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Cache lookup_casstype_simple() and parse_casstype_args() with @functools.lru_cache() to avoid repeated string manipulation and regex scanning when the same type strings are resolved multiple times (common during schema parsing and query result deserialization). Also fixes an unused variable warning (prev_names -> _). Includes a pytest-benchmark comparison (cached vs uncached).

mykaul requested a review from Copilot February 6, 2026 12:17

mykaul added the enhancement New feature or request label Feb 6, 2026

mykaul marked this pull request as draft February 6, 2026 12:17

Copilot started reviewing on behalf of mykaul February 6, 2026 12:17 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

mykaul force-pushed the parse_custom branch from 2e39da1 to 554d421 Compare February 8, 2026 21:54

mykaul mentioned this pull request Mar 14, 2026

Tracking: Vector search (VectorType) performance improvement PRs #746

Open

mykaul mentioned this pull request Mar 25, 2026

fix: correct 'clustering_key' to 'clustering' in column kind filter #761

Merged

mykaul changed the title ~~(improvement)Optimize custom type parsing with LRU caching~~ Cache cassandra type parsing with LRU caching Apr 2, 2026

mykaul force-pushed the parse_custom branch from 554d421 to 19a1710 Compare April 2, 2026 13:19

mykaul changed the title ~~Cache cassandra type parsing with LRU caching~~ Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) Apr 7, 2026

mykaul changed the title ~~Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)~~ perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690

perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690
mykaul wants to merge 1 commit into
scylladb:masterfrom
mykaul:parse_custom

mykaul commented Feb 6, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mykaul commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Benchmark results

lookup_casstype_simple — clear wins from cache

parse_casstype_args — modest gains for parameterized types

End-to-end lookup_casstype (mixed simple + parameterized)

Rationale for unbounded cache

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mykaul commented Feb 6, 2026 •

edited

Loading

`lookup_casstype_simple` — clear wins from cache

`parse_casstype_args` — modest gains for parameterized types

End-to-end `lookup_casstype` (mixed simple + parameterized)