(improvement) Optimize Cython deserialization primitives and add VectorType Cython deserializer (substantial - 11x-30x speedup mainly via DesVectorType.deserialize_bytes with Cython)#732
Draft
mykaul wants to merge 4 commits intoscylladb:masterfrom
Conversation
8 tasks
….from_bytes Performance improvements to serialization/deserialization hot paths: 1. unpack_num(): Use ntohs()/ntohl() for 16-bit and 32-bit integer types instead of byte-by-byte swapping loop. These compile to single bswap instructions on x86, providing more predictable performance. 2. read_int(): Simplify to use ntohl() directly instead of going through unpack_num() with a temporary Buffer. 3. varint_unpack(): Replace hex string conversion with int.from_bytes(). This eliminates string allocations and provides 4-18x speedup for the function itself (larger gains for longer varints). 4. Remove slice_buffer() and replaced with direct assignment 5. _unpack_len() is now implemented similar to read_int() Also removes unused 'start' and 'end' variables from unpack_num(). End-to-end benchmark shows ~4-5% improvement in row throughput. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
8673d95 to
34dd41e
Compare
…helpers Add buffer bounds validation to Cython deserializers for safety against malformed buffers, refactor to use from_ptr_and_size() helper consistently, and add float ntohl() specialization for consistency with int32/int16 paths. Changes: - subelem(): Add CQL protocol-compliant value handling (NULL/-1, not-set/-2, invalid/<-2) with bounds checking - _unpack_len(): Add bounds check and use memcpy for alignment safety - DesTupleType: Add defensive bounds checking for tuple item lengths - DesCompositeType: Add bounds validation for composite element lengths - Refactor 4 locations to use from_ptr_and_size() instead of manual Buffer field assignment - Add float branch to unpack_num(): reinterpret bits as uint32, ntohl(), reinterpret back (consistent with int16/int32 intrinsic paths) - Add from_ptr_and_size() declaration to buffer.pxd Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
…izer Addded DesVectorType Cython deserializer with C-level optimizations for improved performance in row parsing for vectors. The deserializer uses: - Direct C byte swapping (ntohl, ntohs) for numeric types - Memory operations without Python object overhead - Unified numpy path for large vectors (≥32 elements) - struct.unpack fallback for small vectors (<32 elements) Performance improvements: - Small vectors (3-4 elements): 4.4-4.7x faster - Medium vectors (128 elements): 1.0-1.5x faster - Large vectors (384-1536 elements): 0.9-1.0x (marginal) The Cython deserializer is automatically used by the row parser when available via find_deserializer(). Includes unit tests and benchmark code. Follow-up commits will try to get Numpy arrays, and perhaps more. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The 'values' list was allocated but never used — the method builds results directly into a pre-allocated tuple via tuple_set(res, i, item). Removes one unnecessary list allocation per tuple deserialization.
34dd41e to
9b7697c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimize foundational Cython byte-unpacking and add a dedicated VectorType Cython deserializer.
Commits (4, squashed from 7)
1. Optimize Cython byte unpacking with ntohs/ntohl and int.from_bytes
unpack_num()withntohs()/ntohl()intrinsics for 16/32-bit types (compiles to singlebswapon x86)varint_unpack()hex-string-based conversion withint.from_bytes(term, 'big', signed=True)— 7.7x speedupread_int()to direct pointer cast +ntohl()slice_buffer(), replace all call sites withfrom_ptr_and_size()#ifdef _WIN32forwinsock2.hvsarpa/inet.h2. Optimize float deserialization with ntohl() intrinsic
uint32_t, applyntohl(), reinterpret back to floatfrom_ptr_and_size()helper consistentlysubelem(), bounds checks in_unpack_len(),DesTupleType,DesCompositeType)3. Optimize VectorType deserialization with Cython deserializer
DesVectorTypeclass with specialized deserialization methods:_deserialize_float(): C-levelmemcpy+ntohl+ pointer-cast (no Python dispatch per element)_deserialize_double()/_deserialize_int64(): 8-byte manual byte-swap_deserialize_int32():memcpy+ntohl+ cast_deserialize_int16():ntohscastfind_deserializer()for the Cython row parser4. Remove dead
values = []in DesTupleType.deserializevalueslist was allocated but never used — results built directly into pre-allocated tuple viatuple_set()Benchmark Results
All benchmarks:
min(timeit.repeat(number=N, repeat=5)), per-call nanoseconds.Machine: idle Linux workstation, Cython extensions compiled.
Primitives (via
CqlType.deserialize())The ntohs/ntohl change replaces a byte-swap loop that was already fast for 2/4-byte types. The big win is
varint_unpack()whereint.from_bytes()replaces hex-string conversion.VectorType — Python path (via
VectorType.deserialize())VectorType — Cython path (via
DesVectorType.deserialize_bytes())The Cython
DesVectorTypeis used by the Cython row parser (find_deserializer()), bypassing the PythonVectorType.deserialize()entirely:Unit tests
640 passed, 49 skipped (baseline: 645 passed, 43 skipped on master).