Skip to content

(improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (tens hundreds of us improvement, x2.5-x3 speedup)#743

Draft
mykaul wants to merge 2 commits into
scylladb:masterfrom
mykaul:perf/direct-utf8-decode
Draft

(improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (tens hundreds of us improvement, x2.5-x3 speedup)#743
mykaul wants to merge 2 commits into
scylladb:masterfrom
mykaul:perf/direct-utf8-decode

Conversation

@mykaul
Copy link
Copy Markdown

@mykaul mykaul commented Mar 13, 2026

(improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII from C buffer pointer

Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and PyUnicode_DecodeASCII). This eliminates an intermediate bytes object allocation per text cell — the old code created a Python bytes object from the C buffer pointer via to_bytes(buf), then immediately decoded it to str and discarded the bytes.

Text (UTF8Type/VarcharType) is the most common CQL column type, so this optimization applies to the majority of cells in typical workloads.

Benchmark results

Measured on the same machine, back-to-back (Python 3.14.3, Cython 3.2.4, GCC 15.2.1, --benchmark-disable-gc). Median times reported.

Scenario Before (master) After (this PR) Speedup
UTF8 1row x 1col short (11B) 1,158 ns 454 ns 2.55x
UTF8 1row x 10col short 3,262 ns 1,505 ns 2.17x
UTF8 100rows x 5col medium (46B) 129,349 ns 36,270 ns 3.57x
UTF8 1000rows x 5col medium 1,131,159 ns 322,309 ns 3.51x
UTF8 100rows x 5col long (200B) 121,866 ns 40,662 ns 3.00x
UTF8 100rows x 5col multibyte 275,148 ns 113,751 ns 2.42x
ASCII 100rows x 5col medium 86,118 ns 36,401 ns 2.37x
ASCII 1000rows x 5col medium 832,972 ns 348,660 ns 2.39x
Mixed 100rows 3text+2int 82,798 ns 27,792 ns 2.98x

All existing unit tests pass (15 Cython tests, 9 deserializer-specific including invalid-input error propagation tests).

Review comments addressed

All 3 Copilot review comments have been addressed:

  1. Benchmark file renamed from test_utf8_decode_benchmark.py to utf8_decode_benchmark.py (no test_ prefix, won't be auto-collected) and pytest.importorskip guards added.
  2. Correctness tests moved to tests/unit/cython/test_deserializers.py (decoupled from benchmark dependency).
  3. cythontest decorator adopted instead of hand-rolled try/except ImportError, consistent with the rest of tests/unit/cython/ and respecting VERIFY_CYTHON=True CI mode.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@mykaul mykaul marked this pull request as draft March 13, 2026 11:48
@mykaul mykaul changed the title (improvement) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… Mar 13, 2026
@mykaul mykaul requested a review from Copilot March 14, 2026 09:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes text deserialization performance in the Cython row parser by decoding UTF-8/ASCII directly from the underlying buffer, and adds a benchmark module intended to measure the improvement.

Changes:

  • Replace to_bytes(buf).decode(...) with PyUnicode_DecodeASCII/UTF8(buf.ptr, buf.size, NULL) in Cython deserializers.
  • Add a new pytest-based benchmark suite for UTF-8/ASCII decode and pipeline parsing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cassandra/deserializers.pyx Uses CPython unicode decode APIs to avoid intermediate bytes allocations during ASCII/UTF-8 deserialization.
benchmarks/test_utf8_decode_benchmark.py Adds end-to-end and microbenchmarks plus correctness checks for the new decode approach.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread benchmarks/test_utf8_decode_benchmark.py Outdated
Comment thread benchmarks/test_utf8_decode_benchmark.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Cython-based UTF-8 and ASCII text deserialization by decoding directly from the underlying C buffer (avoiding an intermediate bytes allocation), and adds correctness tests plus a benchmark to validate/measure the change.

Changes:

  • Switch DesUTF8Type/DesAsciiType to use PyUnicode_DecodeUTF8 / PyUnicode_DecodeASCII on Buffer.ptr/Buffer.size.
  • Add unit tests covering common and edge-case text decode scenarios (empty, ASCII, multibyte, long strings, NULLs, multiple rows).
  • Add a pytest-benchmark benchmark file to compare end-to-end Cython row parsing before/after the optimization.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
cassandra/deserializers.pyx Replace to_bytes(...).decode(...) with direct CPython Unicode decoding calls for UTF-8/ASCII deserializers.
tests/unit/cython/test_deserializers.py New correctness-focused unit tests for the optimized Cython decoding path.
benchmarks/utf8_decode_benchmark.py New benchmark suite to measure the performance impact of the UTF-8/ASCII decode optimization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/cython/test_deserializers.py Outdated
@mykaul mykaul force-pushed the perf/direct-utf8-decode branch from 97eb4e8 to a9815c1 Compare April 2, 2026 14:30
mykaul added 2 commits April 2, 2026 20:07
…om C buffer pointer

Replace the two-step to_bytes(buf).decode('utf8') pattern in DesUTF8Type and
DesAsciiType with direct CPython C API calls (PyUnicode_DecodeUTF8 and
PyUnicode_DecodeASCII). This eliminates an intermediate bytes object
allocation per text cell — the old code created a Python bytes object from
the C buffer pointer via to_bytes(buf), then immediately decoded it to str
and discarded the bytes.

Text (UTF8Type/VarcharType) is the most common CQL column type, so this
optimization applies to the majority of cells in typical workloads.

Benchmark results (Cython row parsing pipeline, median times):

| Scenario                        | Before (original) | After (direct decode) | Speedup |
|---------------------------------|-------------------:|----------------------:|--------:|
| UTF8 1row x 1col short (11B)   |             565 ns |                454 ns |   1.24x |
| UTF8 1row x 10col short        |           1,594 ns |              1,023 ns |   1.56x |
| UTF8 100rows x 5col medium     |          61,396 ns |             28,766 ns |   2.13x |
| UTF8 1000rows x 5col medium    |         547,145 ns |            290,361 ns |   1.88x |
| UTF8 100rows x 5col long(200B) |          57,940 ns |             35,680 ns |   1.62x |
| UTF8 100rows x 5col multibyte  |         125,149 ns |            103,370 ns |   1.21x |
| ASCII 100rows x 5col medium    |          41,608 ns |             35,817 ns |   1.16x |
| ASCII 1000rows x 5col medium   |         416,350 ns |            374,341 ns |   1.11x |
| Mixed 100rows 3text+2int       |          44,646 ns |             31,189 ns |   1.43x |

All existing unit tests pass (62 type tests, 116 total across key suites).
…uards, add invalid-input tests

- Replace hand-rolled try/except ImportError with the project-standard
  cythontest decorator and HAVE_CYTHON conditional imports, so
  VERIFY_CYTHON=True CI mode fails loudly instead of silently skipping.
- Add pytest.importorskip guards to the benchmark file so it skips
  gracefully when pytest-benchmark or Cython extensions are missing.
- Add test_utf8_invalid_bytes and test_ascii_invalid_bytes to confirm
  error propagation through the DriverException wrapper.
@mykaul mykaul force-pushed the perf/direct-utf8-decode branch from a9815c1 to f0ce46c Compare April 2, 2026 17:07
@mykaul mykaul changed the title (improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (hundreds to tens of thousands of ns improvement, x2.5-x3 speedup) Apr 7, 2026
@mykaul mykaul changed the title (improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (hundreds to tens of thousands of ns improvement, x2.5-x3 speedup) (improvement) (Cython only) deserializers: use direct PyUnicode_DecodeUTF8/ASCII fr… (tens hundreds of us improvement, x2.5-x3 speedup) Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants