(improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (tens hundreds of us improvement, x2.5-x3 speedup)#743
Conversation
There was a problem hiding this comment.
Pull request overview
Optimizes text deserialization performance in the Cython row parser by decoding UTF-8/ASCII directly from the underlying buffer, and adds a benchmark module intended to measure the improvement.
Changes:
- Replace
to_bytes(buf).decode(...)withPyUnicode_DecodeASCII/UTF8(buf.ptr, buf.size, NULL)in Cython deserializers. - Add a new pytest-based benchmark suite for UTF-8/ASCII decode and pipeline parsing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
cassandra/deserializers.pyx |
Uses CPython unicode decode APIs to avoid intermediate bytes allocations during ASCII/UTF-8 deserialization. |
benchmarks/test_utf8_decode_benchmark.py |
Adds end-to-end and microbenchmarks plus correctness checks for the new decode approach. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cbb8652 to
97eb4e8
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes Cython-based UTF-8 and ASCII text deserialization by decoding directly from the underlying C buffer (avoiding an intermediate bytes allocation), and adds correctness tests plus a benchmark to validate/measure the change.
Changes:
- Switch
DesUTF8Type/DesAsciiTypeto usePyUnicode_DecodeUTF8/PyUnicode_DecodeASCIIonBuffer.ptr/Buffer.size. - Add unit tests covering common and edge-case text decode scenarios (empty, ASCII, multibyte, long strings, NULLs, multiple rows).
- Add a pytest-benchmark benchmark file to compare end-to-end Cython row parsing before/after the optimization.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cassandra/deserializers.pyx |
Replace to_bytes(...).decode(...) with direct CPython Unicode decoding calls for UTF-8/ASCII deserializers. |
tests/unit/cython/test_deserializers.py |
New correctness-focused unit tests for the optimized Cython decoding path. |
benchmarks/utf8_decode_benchmark.py |
New benchmark suite to measure the performance impact of the UTF-8/ASCII decode optimization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
97eb4e8 to
a9815c1
Compare
…om C buffer pointer
Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and
DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and
PyUnicode_DecodeASCII). This eliminates an intermediate bytes object
allocation per text cell — the old code created a Python bytes object from
the C buffer pointer via to_bytes(buf), then immediately decoded it to str
and discarded the bytes.
Text (UTF8Type/VarcharType) is the most common CQL column type, so this
optimization applies to the majority of cells in typical workloads.
Benchmark results (Cython row parsing pipeline, median times):
| Scenario | Before (original) | After (direct decode) | Speedup |
|---------------------------------|-------------------:|----------------------:|--------:|
| UTF8 1row x 1col short (11B) | 565 ns | 454 ns | 1.24x |
| UTF8 1row x 10col short | 1,594 ns | 1,023 ns | 1.56x |
| UTF8 100rows x 5col medium | 61,396 ns | 28,766 ns | 2.13x |
| UTF8 1000rows x 5col medium | 547,145 ns | 290,361 ns | 1.88x |
| UTF8 100rows x 5col long(200B) | 57,940 ns | 35,680 ns | 1.62x |
| UTF8 100rows x 5col multibyte | 125,149 ns | 103,370 ns | 1.21x |
| ASCII 100rows x 5col medium | 41,608 ns | 35,817 ns | 1.16x |
| ASCII 1000rows x 5col medium | 416,350 ns | 374,341 ns | 1.11x |
| Mixed 100rows 3text+2int | 44,646 ns | 31,189 ns | 1.43x |
All existing unit tests pass (62 type tests, 116 total across key suites).
…uards, add invalid-input tests - Replace hand-rolled try/except ImportError with the project-standard cythontest decorator and HAVE_CYTHON conditional imports, so VERIFY_CYTHON=True CI mode fails loudly instead of silently skipping. - Add pytest.importorskip guards to the benchmark file so it skips gracefully when pytest-benchmark or Cython extensions are missing. - Add test_utf8_invalid_bytes and test_ascii_invalid_bytes to confirm error propagation through the DriverException wrapper.
a9815c1 to
f0ce46c
Compare
(improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer
Replace the two-step
to_bytes(buf).decode('utf8')pattern inDesUTF8TypeandDesAsciiTypewith direct CPython C API calls (PyUnicode_DecodeUTF8andPyUnicode_DecodeASCII). This eliminates an intermediatebytesobject allocation per text cell — the old code created a Python bytes object from the C buffer pointer viato_bytes(buf), then immediately decoded it tostrand discarded the bytes.Text (
UTF8Type/VarcharType) is the most common CQL column type, so this optimization applies to the majority of cells in typical workloads.Benchmark results
Measured on the same machine, back-to-back (Python 3.14.3, Cython 3.2.4, GCC 15.2.1,
--benchmark-disable-gc). Median times reported.All existing unit tests pass (15 Cython tests, 9 deserializer-specific including invalid-input error propagation tests).
Review comments addressed
All 3 Copilot review comments have been addressed:
test_utf8_decode_benchmark.pytoutf8_decode_benchmark.py(notest_prefix, won't be auto-collected) andpytest.importorskipguards added.tests/unit/cython/test_deserializers.py(decoupled from benchmark dependency).cythontestdecorator adopted instead of hand-rolledtry/except ImportError, consistent with the rest oftests/unit/cython/and respectingVERIFY_CYTHON=TrueCI mode.Pre-review checklist
./docs/source/.Fixes:annotations to PR description.