decompiler: add missing ccall condition rewrites + simplifier passes + 8 codegen bug fixes#6182
decompiler: add missing ccall condition rewrites + simplifier passes + 8 codegen bug fixes#6182zardus wants to merge 14 commits into
Conversation
Add handlers for previously unsupported condition code / operation combinations in the AMD64, x86, and ARM ccall rewriters. These are the VEX helper calls (amd64g_calculate_condition, etc.) that the decompiler must rewrite into C-level comparisons. New coverage includes: - AMD64: CondB/CondNB for ADD/ADC, CondBE/CondNBE for SUB/ADD, CondL/CondNL/CondLE/CondNLE for SUB/ADD, CondS/CondNS for logic/inc/dec, CondO/CondNO for ADD/SMUL, rflags_c for SUB/DEC - x86: Mirror of AMD64 additions adapted for 32-bit ops, plus _fix_size for sub-width operations - ARM: CondMI/CondPL (sign flag), CondVS/CondVC (overflow), CondHI/CondLS (unsigned >/<= via carry+zero) Also fixes 4 bugs found by semantic testing: - AMD64 CondLE used wrong operand order for SUB - AMD64 rflags_c for ADC had inverted carry computation - ARM CondGE/CondLT had swapped signed comparison operators - VEX ccall helpers had incorrect flag extraction bitmasks Includes comprehensive unit tests with Z3 semantic equivalence checking for all rewriter condition/operation combinations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the LLM refinement pass is enabled, the decompiler was creating phantom unified variables that didn't correspond to any actual variable in the function. This happened because the variable unification step ran before LLM refinement but the variable list wasn't updated after refinement removed some variables. Fix by refreshing the unified variable set after LLM refinement completes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three new decompiler optimization passes that simplify VEX flag-computation idioms into clean C equivalents: 1. OverflowBuiltinSimplifier: Rewrites paired overflow-check + conditional patterns (e.g., __OFADD__ followed by if-then) into GCC __builtin_add_overflow / __builtin_sub_overflow calls. 2. OverflowBuiltinPredicateSimplifier: Rewrites standalone overflow macro predicates (__OFADD__, __OFMUL__, etc.) that appear directly in conditions into __builtin_add_overflow_p / __builtin_mul_overflow_p. 3. CarryFlagSimplifier: Rewrites __CFADD__(a, b) != 0 patterns into the equivalent __builtin_add_overflow_p(a, b, (type)0), eliminating the IDA-style __CFADD__ macro from decompiled output. All three passes are registered in both fast and full presets. Disabling them via the decompiler's simplifier blacklist restores the traditional IDA-style macro output for users who prefer it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a complete ARM64 (AArch64) ccall rewriter that translates VEX arm64g_calculate_condition helper calls into C-level comparisons. Covers all standard ARM64 condition codes: - EQ/NE (zero flag) - CS/CC (carry flag, unsigned >=/<) - MI/PL (sign flag) - VS/VC (overflow flag) - HI/LS (unsigned >/<= via carry+zero) - GE/LT/GT/LE (signed comparisons via sign^overflow) Handles SUB, ADD, ADC, SBC, LOGIC, and shift operations. Also fixes ARMHF ccall rewriter registration in __init__.py (was missing the ARM rewriter import for 32-bit ARM targets). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a comprehensive test suite that compiles small C functions designed to trigger specific condition code / operation combinations, decompiles them, and verifies the output. Includes: - ccop_triggers source files (in angr-binaries repo) covering all AMD64/x86/ARM/ARM64 condition code operations: ADD, SUB, ADC, SBB, INC, DEC, logic ops, shifts, MUL, COPY, and rflags_c - Parametrized pytest test that decompiles each function and checks for absence of raw ccall helpers in the output - Z3-based semantic equivalence checking that verifies the decompiled C expression matches the VEX semantics for each condition - ccop_report.py utility for generating coverage reports Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a test that decompiles each ccop_trigger function, recompiles the decompiled C output with GCC, links it against the original compiled function, and compares outputs across a range of inputs to verify semantic equivalence. This catches issues that unit tests miss: signedness bugs, operator precedence errors, missing truncation casts due to C integer promotion rules, and type mismatches in builtin overflow calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the C code generator renders CmpEQ(expr, 0) as !expr (the
cstyle_null_cmp shorthand), it was using the *binary* CmpEQ precedence
(9) to decide whether the LHS needs parentheses. Since CmpEQ has
lower precedence than Add (12), Mul (13), etc., a compound LHS like
Add(a, b) would never get wrapped:
CmpEQ(Add(a, b), 0) → !a + b (WRONG: means (!a) + b)
In C, unary ! (precedence 14) binds tighter than ANY binary operator,
so the correct output is !(a + b).
Fix: when emitting the ! prefix for the CmpEQ==0 shorthand, always
force parentheses around a CBinaryOp LHS regardless of precedence.
Fixes 13 ccop_trigger functions that had inverted condition semantics.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ccall rewriter's _fix_size() helper narrows 64-bit VEX temporaries to the operation's actual width (8/16/32-bit) by emitting Convert nodes. It unconditionally set is_signed=False on every Convert, which caused all narrowed operands to be typed as unsigned in the C output. For signed condition codes (CondL, CondLE, CondNL, CondNLE, CondS, CondNS, CondO, CondNO), the comparison operands must be signed so that C's ordered comparison operators (<, <=, >, >=) use signed semantics. When both operands are unsigned, C's "usual arithmetic conversions" produce an unsigned comparison, which gives wrong results for negative values. The fix has three parts: 1. _fix_size() now accepts a signed= parameter (default False for backwards compat) and threads it to the Convert node. All call sites under signed conditions pass signed=True. 2. CBinaryOp gains a _cmp_signed slot that records the AIL comparison's intended signedness. A new _cmp_signedness_cast() method emits explicit C casts like (int) or (long long) when the C operand type disagrees with the comparison signedness — this handles function parameters whose types come from the ABI and cannot be changed. 3. _propagate_cmp_signedness() on the code generator updates LOCAL variable and constant types to match the comparison signedness directly in the variable manager. This changes the variable declaration (e.g., "unsigned long long v1" → "long long v1") so no cast is needed in the comparison itself. Function parameters are explicitly skipped — _cmp_signedness_cast handles those. This two-pronged approach (type propagation for locals, explicit casts for params) produces clean output for normal code while ensuring correctness for ccop functions where signedness matters. Also fixes SimTypeInt128.c_repr() to emit "__int128" / "unsigned __int128" instead of the non-standard "int128_t", and adds 128-bit to the default_simtype_from_bits mapping. Fixes 57 ccop_trigger functions that had unsigned args for signed conditions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the C code generator renders CmpEQ(expr, 0) as !expr or
CmpNE(expr, 0) as just expr (the cstyle_null_cmp shorthand), the
result is wrong for 8-bit and 16-bit operations due to C's integer
promotion rules.
In C, arithmetic on char and short is implicitly promoted to int
before the operation. So for 8-bit operands:
unsigned char a = 0x80, b = 0x80;
if (!(a + b)) // WRONG: int(0x80) + int(0x80) = 0x100, non-zero
The programmer's intent (and the AIL semantics) is to test the 8-bit
result, which wraps to 0. The correct C is:
if (!(unsigned char)(a + b)) // RIGHT: truncate to 8-bit first
Fix: when emitting the !expr or bare expr shorthand for CmpEQ/CmpNE
against zero, and the common type of the comparison is narrower than
32 bits (i.e., char or short), emit an explicit truncation cast around
the LHS expression. This forces C to evaluate the narrow-width result
before the boolean test.
The cast type is derived from the comparison's common_type, so it
respects signedness (emitting "(char)" or "(unsigned char)" as
appropriate).
Fixes 25 ccop_trigger functions at 8-bit and 16-bit widths for
condz/condnz/conds/condns conditions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
__builtin_add_overflow_p(a, b, (type)0) uses the type of the third
argument to determine the width and signedness of the overflow check.
The decompiler was emitting bare 0 (which is int, i.e., signed 32-bit
in C), so:
- 8-bit overflow checks tested int overflow, not char overflow
- 16-bit overflow checks tested int overflow, not short overflow
- 64-bit overflow checks tested int overflow, not long long overflow
- Unsigned multiply (UMUL) overflow checks used signed semantics
The fix threads signedness information from the ccall rewriter through
to the C code generator via expression tags:
1. The ccall rewriter tags __OFADD__/__OFMUL__ calls with
"overflow_signed": True/False based on whether the operation is
ADD/SMUL (signed) or UMUL (unsigned).
2. The OverflowBuiltinPredicateSimplifier propagates this to the
zero constant's tags as "overflow_p_signed".
3. MakeTypecastsImplicit.handle_CFunctionCall is rewritten to:
- Skip prototype-based cast collapse for __builtin_*_overflow_p
calls (which would strip the intentional type-conveying cast).
- After processing, inject explicit casts on the third argument
when its type differs from int: e.g., (unsigned char)0 for
8-bit unsigned, (long long)0 for 64-bit signed.
- Cast the first two operands when their signedness disagrees
with the overflow check type (e.g., unsigned params need
(int) casts for signed ADD overflow).
Fixes 20 ccop_trigger functions with overflow_p type mismatches.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two bugs in the amd64g_calculate_rflags_c rewriter:
**SUB carry flag (borrow):**
The existing SUB handler created a CmpLT node without explicit bits=
or tags, producing a comparison result at the operand width rather
than 1-bit. When this was wider than ccall.bits, the "if cf.bits ==
ccall.bits: return cf" early exit skipped the Convert, and the
comparison node (which evaluates to 0 or 1) was used directly as a
full-width integer. This caused the structurer to assign the wrong
branch polarity when threading the carry flag through if/else.
Fix: explicitly create a 1-bit CmpLT with tags, and always wrap in
Convert to ccall.bits. This matches the pattern used by the ADD
carry handler.
**DEC carry flag:**
DEC does not modify the carry flag — it preserves CF from the previous
operation, stored in ndep (the "old flags" VEX operand). The old code
extracted CF via (ndep & G_CC_MASK_C) >> G_CC_SHIFT_C, which produced
an opaque Shr(And(...)) tree that the simplifier couldn't reduce.
Fix: since CF is bit 0 of the x86-64 RFLAGS register, simplify to
(ndep & 1) which the C codegen renders cleanly.
**condition_processor.py __neg__ mapping:**
The claripy __neg__ operation (arithmetic negation, -x) was incorrectly
mapped to the AIL "Not" operator (logical/bitwise NOT, !x or ~x).
This corrupted conditions derived from negated carry flags. Fix: map
__neg__ to "Neg" (arithmetic negation) instead.
Fixes 4 ccop_trigger functions: rflagsc_sub_{32,64}, rflagsc_dec_{32,64}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The AMD64 ccall rewriter had no handlers for the SBB (subtract with
borrow) VEX operation, causing raw _ccall(calculate_condition, ...)
expressions to appear in the decompiled output. These don't compile
as C.
SBB is used for extended-precision subtraction (e.g., 128-bit subtract
via two 64-bit operations). VEX encodes it as:
dep_1 = argL (left operand)
dep_2 = argR ^ oldCF (right operand XORed with old carry)
ndep = old RFLAGS (carries the previous carry flag in bit 0)
The actual computation is: result = dep_1 - (arg2 + carry), where
arg2 = dep_2 ^ carry and carry = ndep & 1.
A new _sbb_prep() helper extracts the carry flag, recovers the
original arg2, and computes the result at the correct narrow width.
This is shared by all SBB condition handlers.
New handlers for SBB operations:
- CondZ/CondNZ: result == 0 / result != 0
- CondL: signed less-than, using extended precision (double-width
sign-extended comparison to avoid overflow in the subtraction)
- CondS/CondNS: sign flag of the result (result < 0 / result >= 0)
- CondNBE: unsigned above (!CF && !ZF), computed as no-borrow AND
result-nonzero using extended precision for the borrow check
- CondBE: unsigned below-or-equal (CF || ZF), borrow OR result-zero
- CondNB: unsigned above-or-equal (!CF), no borrow via extended
precision comparison
Also removes incorrect signed=True from the rflags_c ADD and SUB
handlers — carry flag computation is always an unsigned comparison,
regardless of the condition context.
Fixes 29 ccop_trigger functions that had unrewritten SBB ccalls.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update the recompilability test to reflect all bug fixes: - Simplify _classify() to only xfail remaining 8-bit char signedness issues (down from 190 xfails to 38) - Add 2-to-1 arg harness adapter for logic-op functions where the decompiler correctly optimizes away one argument - Add decomp_nargs detection to count actual decompiled parameters - Fix pylint warnings: add check=False to subprocess.run calls, encoding="utf-8" to open calls, narrow exception catching - Fix pyright regressions: explicit None checks for _func_args and _variables_in_use in _propagate_cmp_signedness Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR significantly enhances the angr decompiler's ability to translate VEX intermediate representation condition code calculations into clean C code. It adds ~60 new ccall condition rewriters for AMD64, x86, ARM, and ARM64 architectures, introduces 3 optimization passes for overflow handling, and fixes 8 code generation bugs discovered through recompilability testing.
Changes:
- Added comprehensive ccall rewriters for all major architectures (AMD64, x86, ARM32, ARM64)
- Implemented OverflowBuiltinSimplifier, OverflowBuiltinPredicateSimplifier, and CarryFlagSimplifier optimization passes
- Fixed 8 bugs in C code generation including signedness handling, operator precedence, and SBB support
- Added extensive test infrastructure with 3 new test files covering unit tests, integration tests, and round-trip recompilability
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test_recompilability.py | New 576-line round-trip test framework for verifying decompiled code can be recompiled |
| test_ccop_triggers.py | New 873-line integration test for ccall rewriter coverage |
| test_optimization_passes.py | Added 743 lines of unit tests for new optimization passes |
| test_decompiler_llm.py | Fixed variable collection to only use codegen-visible variables |
| test_semvar_naming.py | Updated regex to handle signedness casts in loop bounds |
| test_decompiler.py | Updated test expectation for CFADD rewrite |
| overflow_builtin_simplifier.py | New 377-line pass for OFADD/OFMUL pattern matching |
| overflow_builtin_p_simplifier.py | New 159-line pass for standalone overflow predicates |
| carry_flag_simplifier.py | New 141-line pass for CFADD rewriting |
| c.py | Enhanced CBinaryOp with signedness tracking and narrow cast absorption (195 new lines) |
| amd64_ccalls.py | Expanded from ~400 to ~1230 lines with comprehensive condition support |
| x86_ccalls.py | Expanded from ~300 to ~920 lines mirroring AMD64 additions |
| arm_ccalls.py | Added 170 lines of new condition handlers |
| arm64_ccalls.py | New 496-line complete ARM64 rewriter |
| sim_type.py | Fixed SimTypeInt128 to use __int128 instead of int128_t |
| ccall.py | Fixed VEX semantics bugs in UMUL/SMUL and ARM64 SBC |
| condition_processor.py | Fixed neg mapping from "Not" to "Neg" |
| decompiler.py | Updated LLM functions to only process codegen-visible variables |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Corpus decompilation diffs can be found at angr/dec-snapshots@master...angr/angr_6182 |
Corpus Decompilation Regression AnalysisDiff: angr/dec-snapshots@master...angr/angr_6182 Change breakdown
Bug 8 (new in this PR vs #6180) impactBug 8 ( Notably, Music_Store_Client/402530 ( Flagged for scrutinyBudgIT/402de0 ( Comparison with PR #6180 baselinePR #6180's corpus analysis found 42 files with the same Bug 3/4/type-name/struct-reorder patterns and 0 regressions. This PR (#6182) adds Bug 8 which manifests in ~6 files, all conservatively. The 2-file difference in count (40 vs 42) is within non-deterministic noise (struct reordering can cascade into different file sets). 🤖 Generated with Claude Code |
❌ 1 Tests Failed:
View the full list of 1 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Two improvements to comparison rendering in the C code generator: 1. Absorb narrowing CTypeCast nodes into variable declarations. When the ccall rewriter produces narrow-width comparisons like CmpLTs(Convert(32->8, dep1), Convert(32->8, dep2)), the codegen rendered them as (char)v3 < (char)v4. New _try_narrow_cmp_operand() propagates the narrow type to the variable declaration (so unsigned int v3 becomes char v3) and strips the cast. Function parameters and struct field accesses are left alone. 2. Suppress no-op signedness casts on leaf operands. _cmp_signedness_cast emits explicit casts when operand signedness disagrees with the comparison, but SimTypeChar(signed=True) and SimTypeChar(signed=False) both render as "char" in C, making (char)x a no-op when x is already char. Only suppress when the operand is a simple leaf (CVariable, CVariableField, CConstant) not subject to C integer promotion; compound expressions like (char)(a0 + a1) keep the cast since it is a real truncation from int-promoted width. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
da85c19 to
8d6de59
Compare
|
Reopening as new PR to retrigger corpus analysis after squashing Bug 8+9 commits. |
(AI COMMENT)
Summary
Companion PR
Requires angr/binaries#163 (ccop_triggers test binaries).
What changed
New ccall condition rewrites
The ccall rewriter translates VEX helper calls (
amd64g_calculate_condition, etc.) into C-level comparisons. Many condition/operation combinations were unhandled and fell through to raw_ccall(calculate_condition, ...)in the decompiled output.New coverage includes:
_fix_sizefor sub-width operationsNew optimization passes
__OFADD__followed by if-then) into__builtin_add_overflow/__builtin_sub_overflow__builtin_add_overflow_p/__builtin_mul_overflow_p__CFADD__(a, b) != 0into__builtin_add_overflow_p(a, b, (type)0)All registered in fast and full presets. Disabling via simplifier blacklist restores IDA-style macro output.
Bugs fixed in existing code
These were found by the recompilability round-trip test, which decompiles ccop_trigger functions, recompiles with GCC, and compares outputs against the originals, plus corpus decompilation analysis.
_fix_size()always emits unsigned Convert nodes, breaking signed conditions__builtin_*_overflow_pthird arg type mismatch for non-32-bit widths!x + yinstead of!(x + y)__neg__mapped toNotinstead ofNegBug fix details
Bug 6 — operator precedence: When rendering
CmpEQ(Add(a,b), 0)as!a + b, the code used binaryCmpEQprecedence to decide about parens, but the effective operator is unary!(higher precedence). Fix: always parenthesize compound LHS in!exprshorthand.Bug 3 — signedness:
_fix_size()unconditionally setis_signed=Falseon narrowing Convert nodes. Fix:_fix_size()gains asigned=parameter threaded from condition semantics;CBinaryOpgains_cmp_signedness_cast()for function parameter casts;_propagate_cmp_signedness()updates local variable types directly.Bug 4 — narrow truncation: C promotes
char/shorttointbefore arithmetic, so(unsigned char)0x80 + (unsigned char)0x80 = 0x100(non-zero) instead of wrapping to 0. Fix: emit explicit truncation cast for sub-32-bit null comparisons.Bug 5 — overflow_p types:
__builtin_add_overflow_p(a, b, 0)uses the third arg's type for width. Bare0isint, wrong for 8/64-bit. Fix: threadoverflow_signedtags from ccall rewriter through simplifier to codegen, inject explicit type casts.Bug 7 — rflagsc: SUB handler created
CmpLTwithoutbits=1, breaking branch polarity. DEC handler used opaqueShr(And(...))instead ofndep & 1.condition_processor.__neg__was mapped to"Not"instead of"Neg".Bug 1 — SBB: No handlers existed. New
_sbb_prep()recovers operands from VEX encoding (dep_1=argL, dep_2=argR^oldCF, ndep=old_flags), plus handlers for CondZ/NZ, CondL, CondS/NS, CondNBE, CondBE, CondNB.Bug 8 — narrowing cast absorption: When the ccall rewriter produces narrow-width comparisons like
CmpLTs(Convert(32→8, dep1), Convert(32→8, dep2)), the C codegen rendered them as(char)v3 < (char)v4. New_try_narrow_cmp_operand()method detects narrowingCTypeCastnodes wrapping localCVariables, propagates the narrow type to the variable declaration (sounsigned int v3becomeschar v3), and strips the cast. Function parameters and struct field accesses are left alone.Bug 9 — no-op signedness casts:
_cmp_signedness_cast()emits explicit casts when operand signedness disagrees with the comparison (e.g.unsigned charin a signed comparison). ButSimTypeChar(signed=True)andSimTypeChar(signed=False)both render as"char"in C, so(char)v3is a textual no-op whenv3is alreadychar. Fix: suppress the cast when both type renderings are identical AND the operand is a simple leaf (CVariable, CVariableField, CConstant) not subject to C integer promotion. Compound expressions like(char)(a0 + a1)keep the cast since it's a real truncation from int-promoted width.Also fixes (in existing ccall rewriter code, found by semantic tests)
pc_actions_UMUL/pc_actions_SMULextracted hi bits from already-truncated product (always zero)Test results
Recompilability round-trip: 191 pass / 38 xfail / 11 skip / 0 fail
Remaining 38 xfails:
Test plan
test_ccall_rewriters.py— Z3 semantic equivalence for all rewriter condition/op combostest_ccop_triggers.py— decompile ccop_trigger binaries, verify no raw ccallstest_recompilability.py— round-trip decompile → recompile → compare semanticstest_optimization_passes.py— unit tests for all 3 new simplifier passes🤖 Generated with Claude Code