perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690
Draft
mykaul wants to merge 1 commit into
Draft
perf: Cache cassandra type parsing with LRU caching (hundreds of ns improvements - x1.1-1.6 speedup)#690mykaul wants to merge 1 commit into
mykaul wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request optimizes custom type parsing by adding LRU caching to frequently-called type lookup functions and implementing a fast path for simple types without parameters. The optimization aims to reduce repeated string manipulation and regex scanning for type lookups that occur frequently during query execution.
Changes:
- Added
@functools.lru_cache(maxsize=256)decorators tolookup_casstype_simpleandparse_casstype_argsfunctions - Implemented fast-path optimization in
lookup_casstypeto avoid regex scanning for simple types (those without parentheses) - Removed error handling wrapper from
lookup_casstype, changing behavior to return UnrecognizedType instead of raising ValueError for invalid types - Added benchmark script demonstrating cache effectiveness for repeated type lookups
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| cassandra/cqltypes.py | Added LRU caching to type parsing functions, removed error handling wrapper, cleaned up unused variable, optimized type lookup with fast path |
| tests/unit/test_types.py | Updated test to reflect new behavior where invalid type names create UnrecognizedType instead of raising ValueError |
| benchmarks/cache_benefit.py | New benchmark script demonstrating LRU cache benefits for repeated type lookups with various type complexities |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cache lookup_casstype_simple() and parse_casstype_args() with @functools.lru_cache() to avoid repeated string manipulation and regex scanning when the same type strings are resolved multiple times (common during schema parsing and query result deserialization). Also fixes an unused variable warning (prev_names -> _). Includes a pytest-benchmark comparison (cached vs uncached).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cache
lookup_casstype_simple()andparse_casstype_args()with@functools.lru_cache()to avoid repeated string manipulation and regex scanning when the same type strings are resolved multiple times (common during schema parsing and query result deserialization).The fast-path for simple types (without parentheses) was already merged separately. This PR adds caching on top of that.
Also fixes an unused variable warning (
prev_names→_).Includes a pytest-benchmark comparison (cached vs uncached).
Changes
cassandra/cqltypes.py: Addedimport functools,@functools.lru_cache()onlookup_casstype_simple()andparse_casstype_args(), fixed unusedprev_namesvariablebenchmarks/test_casstype_cache_benchmark.py: New benchmark file with correctness tests and cached vs uncached performance comparisonsBenchmark results
lookup_casstype_simple— clear wins from cacheUTF8Type)o.a.c.db.marshal.UTF8Type)parse_casstype_args— modest gains for parameterized typesEnd-to-end
lookup_casstype(mixed simple + parameterized)The biggest gains are on
lookup_casstype_simple, which is called most frequently (every column in every row). Theparse_casstype_argscache helps for parameterized types (maps, lists, sets, tuples) where the regex scanner is the bottleneck.Rationale for unbounded cache
lru_cache()is used withoutmaxsizebecause the set of distinct type strings is finite per schema — bounded by the number of column types defined in the cluster. There is no risk of unbounded memory growth.