Skip to content

Optimize instruction prefix extraction loops#88

Open
r0ny123 wants to merge 5 commits intodanielplohmann:masterfrom
r0ny123:optimize-instruction-escaper-14142918817124732542
Open

Optimize instruction prefix extraction loops#88
r0ny123 wants to merge 5 commits intodanielplohmann:masterfrom
r0ny123:optimize-instruction-escaper-14142918817124732542

Conversation

@r0ny123
Copy link
Copy Markdown
Contributor

@r0ny123 r0ny123 commented May 7, 2026

💡 What:

  • Added a static, class-level _PREFIXES set to cache prefix lookups in smda/intel/IntelInstructionEscaper.py.
  • Replaced the inline list comprehensions [ins_bytes[i: i+2] ...] in getByteWithoutPrefixes and escapeToOpcodeOnly with sequential string slicing directly.

🎯 Why:

  • The prior code allocated a new list and a set of prefixes for every single instruction processing iteration, generating significant overhead inside what should be a fast utility function. This caused unnecessary memory allocations.

📊 Measured Improvement:

  • Micro-benchmarks running getByteWithoutPrefixes and escapeToOpcodeOnly over 100k instructions showed approx a 2x overall runtime improvement:
    • getByteWithoutPrefixes: 0.99s -> 0.41s
    • escapeToOpcodeOnly: 1.38s -> 0.74s
  • Both functional equivalence and test suites remain green.

r0ny123 and others added 5 commits April 8, 2026 20:12
Replaced inefficient list comprehensions with inline generator-like behavior in `getByteWithoutPrefixes` and `escapeToOpcodeOnly`. By using a static, class-level set `_PREFIXES` and doing straight array slicing, we avoid intermediate memory allocations and improve iteration performance by roughly 2x.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Replace the for/else loop in escapeToOpcodeOnly with a more idiomatic
next() expression as suggested in code review, and apply the same pattern
to getByteWithoutPrefixes for consistency. Also bind the class-level
_PREFIXES set to a local name to avoid repeated attribute lookups inside
the generator.
Move _PREFIXES up next to the other class-level constants for consistency
with the existing structure, and factor the duplicated prefix-length scan
out of escapeToOpcodeOnly and getByteWithoutPrefixes into a shared
_getPrefixLen helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants