Skip to content

perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690

Draft
mykaul wants to merge 1 commit into
scylladb:masterfrom
mykaul:parse_custom
Draft

perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690
mykaul wants to merge 1 commit into
scylladb:masterfrom
mykaul:parse_custom

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Feb 6, 2026

Summary

Cache lookup_casstype_simple() and parse_casstype_args() with @functools.lru_cache() to avoid repeated string manipulation and regex scanning when the same type strings are resolved multiple times (common during schema parsing and query result deserialization).

The fast-path for simple types (without parentheses) was already merged separately. This PR adds caching on top of that.

Also fixes an unused variable warning (prev_names_).

Includes a pytest-benchmark comparison (cached vs uncached).

Changes

  • cassandra/cqltypes.py: Added import functools, @functools.lru_cache() on lookup_casstype_simple() and parse_casstype_args(), fixed unused prev_names variable
  • benchmarks/test_casstype_cache_benchmark.py: New benchmark file with correctness tests and cached vs uncached performance comparisons

Benchmark results

lookup_casstype_simple — clear wins from cache

Benchmark Cached Uncached Speedup
Short name (UTF8Type) 135 ns 217 ns 1.6x
Fully-qualified name (o.a.c.db.marshal.UTF8Type) 274 ns 414 ns 1.5x
Batch of 10 types 1.57 µs 1.83 µs 1.2x

parse_casstype_args — modest gains for parameterized types

Benchmark Cached Uncached Speedup
MapType(UTF8,Int32) 26.9 µs 29.1 µs 1.08x
Nested MapType(UTF8,ListType(Int32)) 46.9 µs 55.6 µs 1.19x

End-to-end lookup_casstype (mixed simple + parameterized)

Benchmark Cached Uncached Speedup
Mixed batch (simple + parameterized) 296 µs 321 µs 1.08x

The biggest gains are on lookup_casstype_simple, which is called most frequently (every column in every row). The parse_casstype_args cache helps for parameterized types (maps, lists, sets, tuples) where the regex scanner is the bottleneck.

Rationale for unbounded cache

lru_cache() is used without maxsize because the set of distinct type strings is finite per schema — bounded by the number of column types defined in the cluster. There is no risk of unbounded memory growth.

@mykaul mykaul requested a review from Copilot February 6, 2026 12:17
@mykaul mykaul added the enhancement New feature or request label Feb 6, 2026
@mykaul mykaul marked this pull request as draft February 6, 2026 12:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request optimizes custom type parsing by adding LRU caching to frequently-called type lookup functions and implementing a fast path for simple types without parameters. The optimization aims to reduce repeated string manipulation and regex scanning for type lookups that occur frequently during query execution.

Changes:

  • Added @functools.lru_cache(maxsize=256) decorators to lookup_casstype_simple and parse_casstype_args functions
  • Implemented fast-path optimization in lookup_casstype to avoid regex scanning for simple types (those without parentheses)
  • Removed error handling wrapper from lookup_casstype, changing behavior to return UnrecognizedType instead of raising ValueError for invalid types
  • Added benchmark script demonstrating cache effectiveness for repeated type lookups

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

File Description
cassandra/cqltypes.py Added LRU caching to type parsing functions, removed error handling wrapper, cleaned up unused variable, optimized type lookup with fast path
tests/unit/test_types.py Updated test to reflect new behavior where invalid type names create UnrecognizedType instead of raising ValueError
benchmarks/cache_benefit.py New benchmark script demonstrating LRU cache benefits for repeated type lookups with various type complexities

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/test_types.py Outdated
Comment thread benchmarks/cache_benefit.py Outdated
Comment thread benchmarks/cache_benefit.py Outdated
Comment thread cassandra/cqltypes.py
Comment thread cassandra/cqltypes.py Outdated
Comment thread tests/unit/test_types.py Outdated
Comment thread cassandra/cqltypes.py Outdated
Comment thread cassandra/cqltypes.py
Comment thread cassandra/cqltypes.py Outdated
Comment thread benchmarks/cache_benefit.py Outdated
Cache lookup_casstype_simple() and parse_casstype_args() with
@functools.lru_cache() to avoid repeated string manipulation and
regex scanning when the same type strings are resolved multiple times
(common during schema parsing and query result deserialization).

Also fixes an unused variable warning (prev_names -> _).

Includes a pytest-benchmark comparison (cached vs uncached).
@mykaul mykaul changed the title (improvement)Optimize custom type parsing with LRU caching Cache cassandra type parsing with LRU caching Apr 2, 2026
@mykaul mykaul changed the title Cache cassandra type parsing with LRU caching Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) Apr 7, 2026
@mykaul mykaul changed the title Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants