Minor performance improvements in hot paths by alexmv · Pull Request #263 · jaysonsantos/python-binary-memcached

alexmv · 2026-04-29T16:37:53Z

Summary by CodeRabbit

Refactor
- Optimized binary protocol packing and request framing for better performance and more efficient serialization/deserialization while maintaining compatibility.
- Improved multi-operation pipelining and response handling for bulk set/CAS and related commands.
Tests
- Added tests for multi-key set/CAS with numeric values (including very large integers) and round-trip verification.
- Added tests covering single- and multi-key operations with non‑ASCII keys (including CAS).

jaysonsantos · 2026-05-08T15:52:50Z

Sorry I missed this I will take a look soon

jaysonsantos · 2026-05-08T15:53:31Z

@coderabbitai do an initial review

coderabbitai · 2026-05-08T15:53:40Z

@jaysonsantos Sure! I'll kick off a review of this PR now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-05-08T15:53:46Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8b6e8d3d-c008-4328-b029-5798c8a5f73c

📥 Commits

Reviewing files that changed from the base of the PR and between dfa7275 and 1bb563c.

📒 Files selected for processing (2)

bmemcached/protocol.py
test/test_simple_functions.py

🚧 Files skipped from review as they are similar to previous changes (1)

test/test_simple_functions.py

📝 Walkthrough

Walkthrough

This PR refactors the binary memcached protocol to use precompiled struct.Struct packers for command headers, appends variable payloads manually, fast-paths pickle serialization/deserialization for the standard pickler, and adds tests for numeric multi-set, very large numeric, and non-ASCII key behavior.

Changes

Protocol Struct Packer Refactoring

Layer / File(s)	Summary
Imports `bmemcached/protocol.py`	Import updated to expose `long` and `pickle` from `bmemcached.compat`.
Command Packer Schema `bmemcached/protocol.py`	`Protocol.COMMANDS` replaced with precompiled `struct.Struct` packers per opcode.
Serialization Utilities `bmemcached/protocol.py`	`serialize()`/`deserialize()` fast-path to `pickle.dumps`/`pickle.loads` for standard pickler/unpickler; otherwise use `BytesIO` with configured pickler/unpickler.
Authentication `bmemcached/protocol.py`	`auth_negotiation()` and `auth_request()` pack fixed headers via packers and append variable method/auth payloads.
Read Operations `bmemcached/protocol.py`	`get()`, `noop()`, and `get_multi()` use packers; `get_multi()` precomputes key bytes, selects opcodes per key, and parses until sentinel.
Set/Add/Replace Core `bmemcached/protocol.py`	`_set_add_replace()` packs fixed header via selected packer and appends key + value tails.
Write Operations (Multi-key) `bmemcached/protocol.py`	`set_multi()` and `set_multi_cas()` use per-opcode packers; `set_multi` appends trailing `noop`, `set_multi_cas` reads a fixed number of responses equal to mappings.
Arithmetic Operations `bmemcached/protocol.py`	`_incr_decr()` uses incr/decr packers with key bytes appended.
Delete Operations `bmemcached/protocol.py`	`delete()` and `delete_multi()` use delete packer; multi-key variant appends noop sentinel and aggregates per-response results.
Administrative Operations `bmemcached/protocol.py`	`flush_all()` and `stats()` use flush/stat packers; `flush_all()` includes time in extras; `stats()` appends optional key bytes.
Tests `test/test_simple_functions.py`	Added `MemcachedTests.testSetMultiNumericValues`, `MemcachedTests.testNonAsciiKeySingle`, `MemcachedTests.testSetLargeNumeric`, and `MemcachedTests.testNonAsciiKeyBulk` validating numeric multi-set/CAS, very large numeric storage, and non-ASCII key handling in single and bulk ops.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through bytes and struct so neat,

packed headers steady, tails complete.
Pickle paths sped, numbers held tight,
non‑ASCII keys stored through the night.
A cheerful rabbit claps for the write!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 59.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the pull request—performance optimizations in frequently-executed code paths through struct packing and serialization improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bmemcached/protocol.py`:
- Around line 777-784: The bulk set paths call self.serialize(...) but may get a
str for numeric scalars and then append it to a bytearray, causing TypeError;
update the bulk branches used by set_multi and set_multi_cas to coerce the
serialized payload to bytes (e.g., after calling serialize in the block that
builds msg with pack_set_prefix/MAGIC_REQ/opcode and before msg += value) by
checking for str and encoding (UTF-8) or otherwise ensuring bytes, mirroring the
approach used in _set_add_replace; apply the same change to the other similar
block (the second bulk path mentioned).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 15e4db7c-ef67-4d06-b941-e7d1ed66ea0b

📥 Commits

Reviewing files that changed from the base of the PR and between 1d980c9 and d8c33cd.

📒 Files selected for processing (1)

bmemcached/protocol.py

The struct format spec for incr/decr was sized as COMMANDS[command]['struct'] % len(key), but the value packed into that slot is keybytes (UTF-8). For a non-ASCII key the byte length exceeds the codepoint count, the format spec under-sizes the field, and struct.pack silently truncates the encoded key. The header keylen advertises the full byte length, so the resulting wire packet is shorter than the server expects -- the server blocks reading the "missing" key bytes while the client blocks waiting for a response, deadlocking the connection.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bmemcached/protocol.py`:
- Around line 424-426: The current branch in the deserialization code uses "if
self.unpickler is pickle.Unpickler" which fails when self.unpickler is None and
leads to a TypeError; update the logic in the method that reads values (the
block referencing self.unpickler and pickle.Unpickler) to treat None as the
default Pickle behavior—i.e. if self.unpickler is None or self.unpickler is
pickle.Unpickler use pickle.loads(value); otherwise assume self.unpickler is a
callable/class and invoke it as before with BytesIO(value)). Ensure you check
for None explicitly (or use truthiness) before calling
self.unpickler(BytesIO(value)).load().
- Around line 378-384: The serialize branch assumes self.pickler is callable but
it can be None; change the condition in the serialize routine to treat None as
the default Pickler (e.g., check "if self.pickler is None or self.pickler is
pickle.Pickler") and use pickle.dumps in that case, otherwise use the custom
pickler wrapper; apply the analogous fix in deserialize by treating
self.unpickler None as pickle.Unpickler (e.g., "if self.unpickler is None or
self.unpickler is pickle.Unpickler") and use pickle.loads for the default,
otherwise construct a BytesIO and call the custom unpickler to load the value.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 77fa5f5d-ee39-4178-9236-87cb26fe700a

📥 Commits

Reviewing files that changed from the base of the PR and between 7257cfa and 67730ec.

📒 Files selected for processing (2)

bmemcached/protocol.py
test/test_simple_functions.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bmemcached/protocol.py`:
- Around line 378-384: serialize() currently may pass Python3 str objects into
zlib.compress when numeric scalars were converted to strings; update serialize()
so that any payload that is a plain str (scalar string representations produced
for int/long) is encoded to bytes (e.g., utf-8) before any pickling/ compression
steps. Locate the block that checks self.pickler (references: serialize(),
pickler, pickle.Pickler) and ensure after pickling logic you normalize value to
bytes if it's an instance of str, so zlib.compress and downstream functions used
by set(), set_multi(), and set_multi_cas() always receive bytes (also consider
cases where custom pickler returns str). Ensure encoding is applied consistently
before compression calls (zlib.compress).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 04ccc81b-0316-40ce-b08d-15b1ca7be2bf

📥 Commits

Reviewing files that changed from the base of the PR and between 67730ec and dfa7275.

📒 Files selected for processing (1)

bmemcached/protocol.py

For int/long values serialize() did value = str(value), producing a Python 3 str. When that string is large enough to cross the COMPRESSION_THRESHOLD it falls into self.compression.compress(value), which rejects str with "TypeError: a bytes-like object is required". Encode to bytes inline in the int/long branches so every assignment to value in the type-dispatch already produces bytes; the binary, text, and pickler branches all already do. Reachable today via e.g. set('k', 10 ** 200): str(10 ** 200) is 201 chars, exceeds the 128-byte threshold, hits the compress branch, and raises before any caller-side coercion runs. The downstream text_type-to-bytes guards in _set_add_replace, set_multi, and set_multi_cas only cover the small-value path where compression is skipped; they cannot save the compress branch because the crash is inside serialize() itself. Drop the now-redundant guard in _set_add_replace. The function's :rtype: str was already inaccurate -- the text, binary, and pickler paths returned bytes; only int/long returned str. Update it to bytes now that the contract actually holds across every branch.

Each entry in the COMMANDS table now carries a precompiled struct.Struct for the fixed-size prefix of its wire format (HEADER_STRUCT plus any leading "extras" bytes); variable-length tails (key, value, auth payloads) are concatenated as bytes after packer.pack(...). Previously each call site built a fresh format string from HEADER_STRUCT + the per-command 'struct' suffix and substituted in the per-call lengths via "%". This costs a string concat plus a % format on every call, and -- because the resulting format string embeds the per-call lengths ('17s', '23s', ...) -- defeats the LRU cache that the struct module keeps over compiled formats. Once the working set of distinct lengths exceeds that cache (100 entries by default in CPython), every call recompiles its format from scratch. The hot build loops in get_multi, set_multi, and set_multi_cas paid this on every key; additionally bind packer.pack and the relevant COMMANDS / MAGIC / STATUS lookups to locals. Microbench (500-key get_multi request build, no network): 1181us -> 441us, ~2.7x.

…fault. The serialize / deserialize paths construct a BytesIO and a Pickler or Unpickler instance per call to support a user-overridable pickler class. When self.(un)pickler is pickle.(Un)Pickler -- the default -- this is equivalent to pickle.dumps / pickle.loads, which are implemented in C (_pickle) and skip the Python-level allocation. Microbench (round-trip a small dict): BytesIO+Unpickler.load 3.08us vs pickle.loads 1.59us, ~1.94x.

jaysonsantos · 2026-05-11T13:20:26Z

thank you @alexmv

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread bmemcached/protocol.py

alexmv force-pushed the perf-hot-paths branch from d8c33cd to 7257cfa Compare May 8, 2026 19:01

alexmv force-pushed the perf-hot-paths branch from 7257cfa to 67730ec Compare May 8, 2026 21:04

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread bmemcached/protocol.py Outdated

Comment thread bmemcached/protocol.py Outdated

alexmv force-pushed the perf-hot-paths branch from 67730ec to dfa7275 Compare May 8, 2026 21:20

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread bmemcached/protocol.py

alexmv added 3 commits May 10, 2026 02:12

alexmv force-pushed the perf-hot-paths branch from dfa7275 to 1bb563c Compare May 10, 2026 02:13

jaysonsantos merged commit 72d2aaf into jaysonsantos:master May 11, 2026
7 checks passed

Conversation

alexmv commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

jaysonsantos commented May 8, 2026

Uh oh!

jaysonsantos commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jaysonsantos commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexmv commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading