Minor performance improvements in hot paths#263
Conversation
|
Sorry I missed this I will take a look soon |
|
@coderabbitai do an initial review |
|
✅ Actions performedReview triggered.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR refactors the binary memcached protocol to use precompiled struct.Struct packers for command headers, appends variable payloads manually, fast-paths pickle serialization/deserialization for the standard pickler, and adds tests for numeric multi-set, very large numeric, and non-ASCII key behavior. ChangesProtocol Struct Packer Refactoring
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bmemcached/protocol.py`:
- Around line 777-784: The bulk set paths call self.serialize(...) but may get a
str for numeric scalars and then append it to a bytearray, causing TypeError;
update the bulk branches used by set_multi and set_multi_cas to coerce the
serialized payload to bytes (e.g., after calling serialize in the block that
builds msg with pack_set_prefix/MAGIC_REQ/opcode and before msg += value) by
checking for str and encoding (UTF-8) or otherwise ensuring bytes, mirroring the
approach used in _set_add_replace; apply the same change to the other similar
block (the second bulk path mentioned).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 15e4db7c-ef67-4d06-b941-e7d1ed66ea0b
📒 Files selected for processing (1)
bmemcached/protocol.py
The struct format spec for incr/decr was sized as COMMANDS[command]['struct'] % len(key), but the value packed into that slot is keybytes (UTF-8). For a non-ASCII key the byte length exceeds the codepoint count, the format spec under-sizes the field, and struct.pack silently truncates the encoded key. The header keylen advertises the full byte length, so the resulting wire packet is shorter than the server expects -- the server blocks reading the "missing" key bytes while the client blocks waiting for a response, deadlocking the connection.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bmemcached/protocol.py`:
- Around line 424-426: The current branch in the deserialization code uses "if
self.unpickler is pickle.Unpickler" which fails when self.unpickler is None and
leads to a TypeError; update the logic in the method that reads values (the
block referencing self.unpickler and pickle.Unpickler) to treat None as the
default Pickle behavior—i.e. if self.unpickler is None or self.unpickler is
pickle.Unpickler use pickle.loads(value); otherwise assume self.unpickler is a
callable/class and invoke it as before with BytesIO(value)). Ensure you check
for None explicitly (or use truthiness) before calling
self.unpickler(BytesIO(value)).load().
- Around line 378-384: The serialize branch assumes self.pickler is callable but
it can be None; change the condition in the serialize routine to treat None as
the default Pickler (e.g., check "if self.pickler is None or self.pickler is
pickle.Pickler") and use pickle.dumps in that case, otherwise use the custom
pickler wrapper; apply the analogous fix in deserialize by treating
self.unpickler None as pickle.Unpickler (e.g., "if self.unpickler is None or
self.unpickler is pickle.Unpickler") and use pickle.loads for the default,
otherwise construct a BytesIO and call the custom unpickler to load the value.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 77fa5f5d-ee39-4178-9236-87cb26fe700a
📒 Files selected for processing (2)
bmemcached/protocol.pytest/test_simple_functions.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bmemcached/protocol.py`:
- Around line 378-384: serialize() currently may pass Python3 str objects into
zlib.compress when numeric scalars were converted to strings; update serialize()
so that any payload that is a plain str (scalar string representations produced
for int/long) is encoded to bytes (e.g., utf-8) before any pickling/ compression
steps. Locate the block that checks self.pickler (references: serialize(),
pickler, pickle.Pickler) and ensure after pickling logic you normalize value to
bytes if it's an instance of str, so zlib.compress and downstream functions used
by set(), set_multi(), and set_multi_cas() always receive bytes (also consider
cases where custom pickler returns str). Ensure encoding is applied consistently
before compression calls (zlib.compress).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 04ccc81b-0316-40ce-b08d-15b1ca7be2bf
📒 Files selected for processing (1)
bmemcached/protocol.py
For int/long values serialize() did value = str(value), producing a
Python 3 str. When that string is large enough to cross the
COMPRESSION_THRESHOLD it falls into self.compression.compress(value),
which rejects str with "TypeError: a bytes-like object is required".
Encode to bytes inline in the int/long branches so every assignment
to value in the type-dispatch already produces bytes; the binary,
text, and pickler branches all already do.
Reachable today via e.g. set('k', 10 ** 200): str(10 ** 200) is 201
chars, exceeds the 128-byte threshold, hits the compress branch, and
raises before any caller-side coercion runs. The downstream
text_type-to-bytes guards in _set_add_replace, set_multi, and
set_multi_cas only cover the small-value path where compression is
skipped; they cannot save the compress branch because the crash is
inside serialize() itself. Drop the now-redundant guard in
_set_add_replace.
The function's :rtype: str was already inaccurate -- the text, binary,
and pickler paths returned bytes; only int/long returned str. Update
it to bytes now that the contract actually holds across every branch.
Each entry in the COMMANDS table now carries a precompiled
struct.Struct for the fixed-size prefix of its wire format
(HEADER_STRUCT plus any leading "extras" bytes); variable-length
tails (key, value, auth payloads) are concatenated as bytes after
packer.pack(...).
Previously each call site built a fresh format string from
HEADER_STRUCT + the per-command 'struct' suffix and substituted in
the per-call lengths via "%". This costs a string concat plus a %
format on every call, and -- because the resulting format string
embeds the per-call lengths ('17s', '23s', ...) -- defeats the LRU
cache that the struct module keeps over compiled formats. Once the
working set of distinct lengths exceeds that cache (100 entries by
default in CPython), every call recompiles its format from scratch.
The hot build loops in get_multi, set_multi, and set_multi_cas paid
this on every key; additionally bind packer.pack and the relevant
COMMANDS / MAGIC / STATUS lookups to locals.
Microbench (500-key get_multi request build, no network): 1181us
-> 441us, ~2.7x.
…fault. The serialize / deserialize paths construct a BytesIO and a Pickler or Unpickler instance per call to support a user-overridable pickler class. When self.(un)pickler is pickle.(Un)Pickler -- the default -- this is equivalent to pickle.dumps / pickle.loads, which are implemented in C (_pickle) and skip the Python-level allocation. Microbench (round-trip a small dict): BytesIO+Unpickler.load 3.08us vs pickle.loads 1.59us, ~1.94x.
|
thank you @alexmv |
Summary by CodeRabbit
Refactor
Tests