perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup, Python only) by mykaul · Pull Request #765 · scylladb/python-driver

mykaul · 2026-03-25T20:01:48Z

Summary

Split recv_results_rows into fast path (no column encryption) and slow path (CE enabled)
Eliminate per-call closure allocation and merge two-pass row processing into single-pass decoding

Note: This optimization applies to the pure Python decode path only. When Cython extensions are compiled (the default for pip-installed packages), FastResultMessage from row_parser.pyx replaces recv_results_rows entirely. Users running without Cython (e.g., environments where C compilation is unavailable, or explicit use of _ProtocolHandler) will benefit from this change.

Details

Problem

The current recv_results_rows has three sources of overhead on every call:

Two passes over row data: First recv_row reads all raw bytes into a list[list[bytes]], then decode_row iterates again to deserialize — doubling iteration and creating intermediate lists that are immediately discarded.
Per-call closures: decode_val and decode_row are defined as closures inside recv_results_rows, meaning Python allocates new function objects on every result set.
Unconditional ColDesc creation: ColDesc namedtuples are built for every column even when column encryption is not configured (the vast majority of deployments).

Solution

Fast path (no column encryption — the common case):

_decode_row_inline(f, colcount, col_types, protocol_version) reads each column's size, reads the bytes, and immediately calls from_binary() — one pass, no intermediate list
ColDesc creation is skipped entirely
No closures allocated

Slow path (column encryption enabled):

Preserves the existing two-pass logic (needed because CE must decrypt before type decoding)
decode_val/decode_row moved to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead

Benchmark results

Measured on CPython 3.14.3, Protocol V4, 300 iterations, 100 warmup. All values in nanoseconds per row.

Scenario	Master (min ns/row)	PR (min ns/row)	Master (median ns/row)	PR (median ns/row)	Speedup (min)	Speedup (median)
5 int cols, 10 rows	2677	1911	3192	2558	1.40x	1.25x
5 int cols, 100 rows	2155	1489	2877	1908	1.45x	1.51x
5 int cols, 1000 rows	2675	1848	3165	2260	1.45x	1.40x
5 mixed cols, 100 rows	2625	2024	3225	2203	1.30x	1.46x
5 mixed cols, 1000 rows	2942	1926	3880	2118	1.53x	1.83x
10 int cols, 100 rows, 50% NULL	4666	3095	5284	3314	1.51x	1.59x
10 int cols, 1000 rows, 50% NULL	4812	2737	6156	3166	1.76x	1.94x
10 int cols, 100 rows, no NULL	5082	3826	5339	4201	1.33x	1.27x
10 int cols, 1000 rows, no NULL	5116	3647	6184	4589	1.40x	1.35x

1.3x–1.8x speedup on the pure Python path. The speedup is higher with NULL-heavy workloads because the inline path short-circuits from_binary() for negative-length (NULL) columns.

Merge conflict note

⚠️ This PR modifies the same recv_results_rows method as PR #630, which also splits the method into CE/non-CE branches. If both PRs are accepted, there will be a merge conflict requiring manual resolution.

Testing

All 651 existing unit tests pass (16 pre-existing skips)
Added test for decode error wrapping in the inline path (test_protocol.py)

Split recv_results_rows into fast path (no column encryption) and slow path (column encryption enabled): Fast path (common case): - Reads raw column bytes and decodes types in a single pass per row via _decode_row_inline(), eliminating the intermediate list-of-lists - Skips ColDesc namedtuple creation entirely (only needed for CE) - No closure allocation per call - Wraps decode errors with column name/type info for diagnostics Slow path (column encryption): - Preserves full CE logic with ColDesc creation - Moves decode_val/decode_row closures to module-level functions (_decode_val_ce, _decode_row_ce) to avoid per-call closure overhead Note: This PR modifies the same method as PR scylladb#630 (which also splits recv_results_rows into CE/non-CE branches). There will be a merge conflict that needs manual resolution if both PRs are accepted.

mykaul marked this pull request as draft March 25, 2026 20:33

mykaul force-pushed the perf/inline-row-decode branch from 3c3fea8 to 020c764 Compare April 7, 2026 10:57

mykaul changed the title ~~perf: inline row decoding and eliminate closures in recv_results_rows~~ perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup) Apr 7, 2026

mykaul changed the title ~~perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup)~~ perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup, Python only) Apr 7, 2026

mykaul force-pushed the perf/inline-row-decode branch from 020c764 to 30d3a44 Compare April 9, 2026 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup, Python only)#765

perf: inline row decoding and eliminate closures in recv_results_rows (100's to 1000's of ns, x1.3-1.8 speedup, Python only)#765
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:perf/inline-row-decode

mykaul commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mykaul commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Problem

Solution

Benchmark results

Merge conflict note

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mykaul commented Mar 25, 2026 •

edited

Loading