Skip to content

Latest commit

 

History

History
332 lines (275 loc) · 11.5 KB

File metadata and controls

332 lines (275 loc) · 11.5 KB

Next Steps: Advanced Obfuscation Techniques Implementation

Session Summary

This session focused on integrating advanced JavaScript obfuscation detection and conversion into js-beautify-rs. We've completed the research phase and begun implementation.

Completed

  1. Exhaustive Research (30+ techniques documented)

    • VM-based obfuscation patterns
    • Promise/async/generator patterns
    • Proxy traps and Reflect API abuse
    • Symbol/WeakMap/WeakSet usage
    • Error-based and switch-true control flow
    • JSON/template literal decoders
    • Class features (private fields, static blocks)
    • Optional chaining, nullish coalescing, BigInt
    • Dynamic import, TextEncoder/TextDecoder, crypto.subtle
  2. Deobfuscator Tools Identified (10+ recent tools)

    • webcrack (2K stars) - Obfuscator.io patterns
    • wakaru (1.5K stars) - Unpacker, decompiler
    • humanify (3.1K stars) - LLM-based renaming
    • JSIMPLIFIER (NDSS 2026) - 20 techniques
    • DeCoda (arXiv 2025) - LLM + graph learning
  3. Academic Papers Located (7 papers)

    • NDSS 2026: "From Obfuscated to Obvious"
    • CCS 2025: "JsDeObsBench"
    • arXiv 2025: "Breaking Obfuscation: Cluster-Aware Graph"
    • WWW 2024, DSN 2021, USENIX 2021
  4. Malware Analysis Blogs (8+ posts)

    • Unit 42 (PaloAlto): JSFireTruck, DarkCloud Stealer
    • Forcepoint X-Labs: Q3 2025 threat brief
    • Wiz.io: JSFireTruck campaign analysis
  5. switch(true) Converter Integration

    • Module created and integrated into Phase 16.5
    • Stub implementation ready for full conversion
    • Compiles successfully, tests passing

High-Priority Implementation Tasks

Phase 1: Deterministic String Decoders (HIGH IMPACT, LOW RISK)

1. JSON.parse() Evaluation

  • Pattern: JSON.parse('{"key":"value"}')
  • Risk: LOW (deterministic, no side effects)
  • Frequency: MEDIUM (common in modern obfuscators)
  • Implementation: ~200 lines
  • Tests: 5-10 cases

2. Template Literal Tag Functions

  • Pattern: String.raw`\x48\x65\x6c\x6c\x6f`
  • Risk: LOW (deterministic)
  • Frequency: LOW (emerging pattern)
  • Implementation: ~150 lines
  • Tests: 5-8 cases

3. Optional Chaining Normalization

  • Pattern: obj?.prop?.method?.()
  • Risk: LOW (syntactic transformation)
  • Frequency: HIGH (modern JavaScript)
  • Implementation: ~200 lines
  • Tests: 8-12 cases

4. Nullish Coalescing Chains

  • Pattern: a ?? b ?? c ?? default
  • Risk: LOW (syntactic transformation)
  • Frequency: MEDIUM (modern JavaScript)
  • Implementation: ~150 lines
  • Tests: 6-10 cases

Phase 2: Control Flow Patterns (MEDIUM IMPACT, MEDIUM RISK)

5. Promise Chain Normalization

  • Pattern: .then(x => x).then(y => y).catch(e => e)
  • Risk: MEDIUM (requires data-flow analysis)
  • Frequency: MEDIUM (async obfuscation)
  • Implementation: ~300 lines
  • Tests: 8-15 cases

6. Generator Function Unrolling

  • Pattern: function* gen() { yield x; yield y; }
  • Risk: MEDIUM (requires control-flow graph)
  • Frequency: LOW (uncommon in obfuscators)
  • Implementation: ~250 lines
  • Tests: 5-10 cases

7. Try-Catch State Machine Extraction

  • Pattern: try { ... } catch(e) { ... } chains
  • Risk: MEDIUM (requires semantic analysis)
  • Frequency: MEDIUM (Akamai BMP pattern)
  • Implementation: ~350 lines
  • Tests: 8-12 cases

Phase 3: API Normalization (MEDIUM IMPACT, MEDIUM RISK)

8. Reflect API Normalization

  • Pattern: Reflect.apply(fn, this, [args])
  • Risk: MEDIUM (requires function resolution)
  • Frequency: MEDIUM (Akamai BMP pattern)
  • Implementation: ~200 lines
  • Tests: 6-10 cases

9. Symbol Key Renaming

  • Pattern: const sym = Symbol('key'); obj[sym] = value
  • Risk: MEDIUM (requires symbol tracking)
  • Frequency: LOW (uncommon in obfuscators)
  • Implementation: ~200 lines
  • Tests: 5-8 cases

10. WeakMap/WeakSet Conversion

  • Pattern: const wm = new WeakMap(); wm.set(key, value)
  • Risk: MEDIUM (semantic change)
  • Frequency: LOW (uncommon in obfuscators)
  • Implementation: ~250 lines
  • Tests: 5-8 cases

Implementation Priority Matrix

┌─────────────────────────────────────────────────────────────┐
│ PRIORITY MATRIX: Impact vs Risk                             │
├─────────────────────────────────────────────────────────────┤
│ HIGH IMPACT, LOW RISK (DO FIRST):                           │
│  • JSON.parse() evaluation                                  │
│  • Optional chaining normalization                          │
│  • Nullish coalescing chains                                │
│  • Template literal tags                                    │
│                                                              │
│ MEDIUM IMPACT, MEDIUM RISK (DO SECOND):                     │
│  • Promise chain normalization                              │
│  • Reflect API normalization                                │
│  • Try-catch state machine extraction                       │
│  • Generator function unrolling                             │
│                                                              │
│ LOW IMPACT, HIGH RISK (SKIP):                               │
│  • VM bytecode decompilation (requires ISA reverse-eng)     │
│  • Dynamic import() (can't be statically resolved)          │
│  • new Function() (security risk)                           │
│  • crypto.subtle (can't be statically resolved)             │
│  • Math.random() seeding (can't be statically resolved)     │
│  • Date-based obfuscation (can't be statically resolved)    │
└─────────────────────────────────────────────────────────────┘

Testing Strategy

Unit Tests

  • Each module: 5-15 test cases
  • Coverage: obfuscated + deobfuscated + edge cases
  • Pattern: test_convert_*, test_preserve_*

Integration Tests

  • Real-world samples from:
    • JSFireTruck malware (269K infected sites)
    • DarkCloud Stealer samples
    • SLOW#TEMPEST samples
    • JSIMPLIFIER dataset (44K samples)

Regression Tests

  • Ensure existing 43 modules still work
  • Run full test suite after each addition
  • Benchmark performance on 11MB+ bundles

Architecture Notes

Current Pipeline (20 phases)

Phase 0:   Encrypted eval decryption (pre-AST)
Phase 0.5: Akamai BMP detection & removal
Phase 1:   Control flow unflattening
Phase 2:   String array rotation
Phase 3:   Decoder/string array/dispatcher inlining
Phase 4:   Call proxy inlining
Phase 5:   Operator proxy inlining
Phase 6:   Expression simplification
Phase 7:   Dead code elimination
Phase 8:   Dead variable elimination
Phase 9:   Function inlining
Phase 10:  Array unpacking / dynamic property / ternary / try-catch
Phase 11:  Unicode / boolean / void / object sparsing
Phase 12:  Variable renaming
Phase 13:  Empty statement cleanup
Phase 14:  Sequence expression splitting
Phase 15:  Multi-variable splitting
Phase 16:  Ternary to if/else
Phase 16.5: switch(true) to if/else [NEW]
Phase 17:  Short-circuit to if
Phase 18:  IIFE unwrapping
Phase 19:  esbuild helper detection
Phase 20:  Module annotation

Proposed New Phases

Phase 16.5: switch(true) to if/else [DONE - STUB]
Phase 16.7: JSON.parse() evaluation [TODO]
Phase 16.9: Optional chaining normalization [TODO]
Phase 17.1: Nullish coalescing simplification [TODO]
Phase 17.3: Template literal tag evaluation [TODO]
Phase 17.5: Promise chain normalization [TODO]
Phase 17.7: Generator unrolling [TODO]
Phase 17.9: Reflect API normalization [TODO]
Phase 18.1: Symbol key renaming [TODO]
Phase 18.3: WeakMap/WeakSet conversion [TODO]
Phase 18.5: Try-catch state machine extraction [TODO]

Risk Assessment

Low-Risk Implementations (Safe to Auto-Rewrite)

  • JSON.parse() evaluation (deterministic)
  • Optional chaining normalization (syntactic)
  • Nullish coalescing chains (syntactic)
  • Template literal tags (deterministic if pure)

Medium-Risk Implementations (Warn User)

  • Promise chain normalization (may have side effects)
  • Reflect API normalization (may have side effects)
  • Generator unrolling (changes semantics)
  • Try-catch state machine (may have side effects)

High-Risk Implementations (Skip)

  • WeakMap/WeakSet conversion (changes GC semantics)
  • Private field conversion (changes scoping)
  • Symbol key conversion (changes identity)
  • VM bytecode decompilation (requires ISA knowledge)

Performance Considerations

Current Performance

  • 11MB+ bundles: ~2-5 seconds
  • 20-phase pipeline: ~100-200ms per phase
  • Memory: ~500MB for large bundles

Expected Impact

  • JSON.parse() evaluation: +10-20ms (low overhead)
  • Optional chaining: +5-10ms (syntactic)
  • Promise chains: +50-100ms (requires data-flow)
  • Reflect API: +30-50ms (requires symbol resolution)

Optimization Opportunities

  • Parallel phase execution (phases 16.7, 16.9, 17.1 are independent)
  • Memoization of decoded values
  • Early termination if no patterns found

Documentation Requirements

For Each Implementation

  1. Module Documentation

    • Pattern description with examples
    • Obfuscated vs deobfuscated code
    • AST signature for pattern matching
    • Tools that produce this pattern
    • Risk level and limitations
  2. Test Documentation

    • Test cases with expected outputs
    • Edge cases and boundary conditions
    • Real-world examples from malware
  3. Integration Documentation

    • Phase number and ordering
    • Dependencies on other phases
    • Performance characteristics
    • Known limitations

Success Criteria

Phase 1 (JSON.parse + Optional Chaining + Nullish Coalescing)

  • ✅ All 4 modules compile without warnings
  • ✅ 15+ unit tests passing
  • ✅ 5+ integration tests with real samples
  • ✅ Performance: <50ms overhead on 11MB bundle
  • ✅ Documentation complete

Phase 2 (Promise + Reflect + Try-Catch)

  • ✅ All 3 modules compile without warnings
  • ✅ 25+ unit tests passing
  • ✅ 10+ integration tests with real samples
  • ✅ Performance: <150ms overhead on 11MB bundle
  • ✅ Documentation complete

Phase 3 (Generator + Symbol + WeakMap)

  • ✅ All 3 modules compile without warnings
  • ✅ 20+ unit tests passing
  • ✅ 8+ integration tests with real samples
  • ✅ Performance: <100ms overhead on 11MB bundle
  • ✅ Documentation complete

Timeline Estimate

  • Phase 1 (JSON.parse + Optional Chaining + Nullish Coalescing): 2-3 hours
  • Phase 2 (Promise + Reflect + Try-Catch): 4-5 hours
  • Phase 3 (Generator + Symbol + WeakMap): 3-4 hours
  • Testing & Validation: 2-3 hours
  • Documentation: 1-2 hours

Total: 12-17 hours of implementation work

References

Academic Papers

  • NDSS 2026: "From Obfuscated to Obvious" (JSIMPLIFIER)
  • CCS 2025: "JsDeObsBench" (LLM evaluation)
  • arXiv 2025: "Breaking Obfuscation: Cluster-Aware Graph" (DeCoda)

Malware Analysis

  • Unit 42: JSFireTruck campaign (2025)
  • Unit 42: DarkCloud Stealer (2025)
  • Wiz.io: JSFireTruck analysis

Tools

Next Session Checklist

  • Implement JSON.parse() evaluation module
  • Implement optional chaining normalization
  • Implement nullish coalescing simplification
  • Add 15+ unit tests
  • Test with real-world samples
  • Commit and document progress
  • Run full test suite (should pass all 353+ tests)
  • Benchmark performance
  • Create PR with implementation notes