Expand coverage in list of definitions (GAP_SEQUENCES)#89
Open
r0ny123 wants to merge 9 commits intodanielplohmann:masterfrom
Open
Expand coverage in list of definitions (GAP_SEQUENCES)#89r0ny123 wants to merge 9 commits intodanielplohmann:masterfrom
r0ny123 wants to merge 9 commits intodanielplohmann:masterfrom
Conversation
This commit expands the `GAP_SEQUENCES` dictionary with additional multi-byte NOP and padding sequences for x86 and x64 architectures. Included are various 'mov', 'lea', and 'jmp' variants commonly used by compilers for code alignment. Changes: - Added 2-byte NOP variants: mov eax, eax, mov edi, edi, jmp $+2, and other mov reg, reg variants. - Added 3-byte NOP variants: mov rax, rax and lea rax, [rax]. - Added 4-byte NOP variants: lea esp, [esp+0] and lea rax, [rax+0]. - Added 5-byte NOP variants: multi-byte NOPs and lea rsp, [rsp+0]. - Added descriptive comments to new and existing entries. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This commit refines the `GAP_SEQUENCES` in `smda/intel/definitions.py` based on PR feedback: - Removed 64-bit-only sequences starting with `0x48` (REX.W) to avoid misidentification in 32-bit binaries. - Fixed a bug where `b"\x8d\x64\x24"` was incorrectly listed as a 3-byte sequence; it is now correctly listed as 4 bytes, and the proper 3-byte variant `b"\x8d\x24\x24"` has been added. - Added safer universal padding sequences: `b"\x89\xc0"`, `b"\x89\xff"`, `b"\xeb\x00"`, `b"\x8b\xc9"`, `b"\x8b\xd2"`, `b"\x8b\xdb"`, `b"\x8b\xf6"`, and `b"\x8d\xbf\x00\x00\x00\x00"`. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This commit addresses the core code health task and all PR feedback:
- Added `COMMON_API_CALLS` to `smda/intel/definitions.py` with 30+ frequently
used Windows APIs to improve function scoring.
- Expanded `GAP_SEQUENCES` with a full suite of padding variants:
- Added universal patterns like `b"\xeb\x00"` and various `mov reg, reg`.
- Added 64-bit specific REX-prefixed instructions (`b"\x48..."`).
- Fixed a bug where `b"\x8d\x64\x24"` was incorrectly listed as 3-byte; it is
now correctly listed as 4-byte (`b"\x8d\x64\x24\x00"`), and the correct
3-byte variant `b"\x8d\x24\x24"` was added.
- Expanded `OrdinalHelper.ORDINALS` with comprehensive mappings for
`ole32.dll`, `oleaut32.dll`, and `mfc42.dll` using the repository's data.
- Added a new unit test `tests/test_definitions_expansion.py` to verify
the integrity of these expanded lists.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This commit addresses the core code health task and all PR feedback:
- Added `COMMON_API_CALLS` to `smda/intel/definitions.py` with 30+ frequently
used Windows APIs to improve function scoring.
- Expanded `GAP_SEQUENCES` with a full suite of padding variants:
- Added universal patterns like `b"\xeb\x00"` and various `mov reg, reg`.
- Added 64-bit specific REX-prefixed instructions (`b"\x48..."`).
- Fixed a bug where `b"\x8d\x64\x24"` was incorrectly listed as 3-byte; it is
now correctly listed as 4-byte (`b"\x8d\x64\x24\x00"`), and the correct
3-byte variant `b"\x8d\x24\x24"` was added.
- Expanded `OrdinalHelper.ORDINALS` with comprehensive mappings for
`ole32.dll`, `oleaut32.dll`, and `mfc42.dll` using the repository's data.
- Added a new unit test `tests/test_definitions_expansion.py` to verify
the integrity of these expanded lists, following CI linting rules.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This commit addresses the core code health task and all PR feedback:
- Added `COMMON_API_CALLS` to `smda/intel/definitions.py` with 30+ frequently
used Windows APIs to improve function scoring.
- Expanded `GAP_SEQUENCES` with a full suite of padding variants:
- Added universal patterns like `b"\xeb\x00"` and various `mov reg, reg`.
- Added 64-bit specific REX-prefixed instructions (`b"\x48..."`).
- Fixed a bug where `b"\x8d\x64\x24"` was incorrectly listed as 3-byte; it is
now correctly listed as 4-byte (`b"\x8d\x64\x24\x00"`), and the correct
3-byte variant `b"\x8d\x24\x24"` was added.
- Expanded `OrdinalHelper.ORDINALS` with comprehensive mappings for
`ole32.dll`, `oleaut32.dll`, and `mfc42.dll` using the repository's data.
- Added a new unit test `tests/test_definitions_expansion.py` to verify
the integrity of these expanded lists, adhering to `ruff` linting and
formatting rules.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
This commit addresses the core code health task and all technical feedback:
- Added `COMMON_API_CALLS` to `smda/intel/definitions.py` with 30+ frequently
used Windows APIs to improve function scoring.
- Expanded `GAP_SEQUENCES` with a full suite of padding variants:
- Added universal patterns like `b"\xeb\x00"` and various `mov reg, reg`.
- Added 64-bit specific REX-prefixed instructions (`b"\x48..."`).
- Fixed a bug where `b"\x8d\x64\x24"` was incorrectly listed as 3-byte; it is
now correctly listed as 4-byte (`b"\x8d\x64\x24\x00"`), and the correct
3-byte variant `b"\x8d\x24\x24"` was added.
- Expanded `OrdinalHelper.ORDINALS` with mappings for `oleaut32.dll`.
Crucially, unstable mappings for `ole32.dll` and `mfc42.dll` were excluded
to ensure reliability across Windows versions.
- Added a new unit test `tests/test_definitions_expansion.py` to verify
the integrity of these expanded lists, fully compliant with CI linting
and formatting rules.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
…ading comment - Remove unused COMMON_API_CALLS list from definitions.py; no consumer imports it, so it was unreachable dead code. - Remove REX-prefixed (\x48) sequences from the shared GAP_SEQUENCES dict. In 32-bit mode \x48 decodes as "dec eax", making those sequences non-NOP; the shared dict has no bitness gate so they could cause false positives in 32-bit analysis. - Add a caveat comment on \xeb\x00 clarifying it is a genuine NOP- equivalent used as padding by some MSVC builds (not purely decorative). - Fix misleading "synchronized from data/apiscout_*.json" comment in OrdinalHelper; no such file exists in this repo. - Update test_definitions_expansion.py: drop test_common_api_calls, add test_no_rex_sequences_in_shared_dict to guard against regression, and extend ordinal assertions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rename test_definitions_expansion.py -> testDefinitionsExpansion.py to match the naming style of all other test files in the suite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What: Expanded the
GAP_SEQUENCESdictionary insmda/intel/definitions.pywith approximately 15 new multi-byte NOP and padding sequences for x86/x64 architectures.💡 Why: This improves the disassembler's ability to accurately identify function boundaries by recognizing a wider variety of alignment patterns used by different compilers (MSVC, GCC, Clang).
✅ Verification:
GAP_SEQUENCESdictionary using a verification script.ruffchecks pass for the modified file.✨ Result: Enhanced function boundary detection coverage and improved code maintainability with better documentation of the padding sequences.