Token-based pipeline:
Original JS → Tokenize → 20 token passes → join(" ") → Broken syntax → Oxc fails
Key finding: Oxc CAN parse ALL original bundles perfectly! The problem is our token-based transformations break syntax.
AST-based pipeline:
Original JS → Parse with Oxc → AST transformations → Codegen with Oxc → Valid JS
- Guaranteed valid output - AST transformations preserve syntax
- Full Oxc optimizations - Can use entire oxc_minifier suite
- Single parse/codegen - Better performance
- More maintainable - Follow oxc_minifier patterns
- Composable - All passes use
Traverse<'a, DeobfuscateState>
These are the most important passes that handle obfuscator-specific patterns:
- object_dispatcher - Inline dispatcher switch statements (CRITICAL)
- rotation - Deobfuscate rotation patterns
- string_array - Inline string arrays
- decoder - Inline decoder functions
- control_flow - Unflatten control flow
These improve code quality:
- constant_folding - Fold constant expressions
- expression_simplify - Simplify boolean/arithmetic expressions
- algebraic_simplify - Algebraic identities
- strength_reduction - Replace expensive ops
- dead_code - Remove unreachable code
- dead_var_elimination - Remove unused variables
These are mostly syntax normalizations:
- function_inline - Inline single-use functions
- call_proxy - Inline call proxy patterns
- operator_proxy - Inline operator proxies
- array_unpack - Unpack array accesses
- dynamic_property - Convert computed to static properties
- try_catch - Remove empty try-catch
- ternary - Simplify ternary chains
- object_sparsing - Consolidate sparse objects
- unicode_mangling - Normalize unicode
- boolean_literals - Replace !0/!1
- void_replacer - Replace void 0
- ✅ loop_unroll - Unroll constant loops
- ✅ cse - Common subexpression elimination
Each pass follows the oxc_minifier pattern:
use oxc_traverse::{Traverse, TraverseCtx};
pub struct DispatcherInliner {
changed: bool,
dispatchers: Vec<DispatcherInfo>,
}
impl<'a> Traverse<'a, DeobfuscateState> for DispatcherInliner {
fn enter_function(&mut self, func: &mut Function<'a>, ctx: &mut Ctx<'a>) {
// Find and inline dispatcher patterns
}
}pub struct DeobfuscateState {
pub changed: bool,
pub string_arrays: Vec<StringArrayInfo>,
pub decoders: Vec<DecoderInfo>,
// ... other analysis results
}pub struct AstDeobfuscator {
// Analysis passes
string_array_detector: StringArrayDetector,
decoder_detector: DecoderDetector,
// Transformation passes
dispatcher_inliner: DispatcherInliner,
rotation_deobfuscator: RotationDeobfuscator,
// ... all other passes
}
impl AstDeobfuscator {
pub fn deobfuscate(&mut self, code: &str) -> Result<String> {
let allocator = Allocator::default();
let source_type = SourceType::mjs();
let parse_result = Parser::new(&allocator, code, source_type).parse();
if !parse_result.errors.is_empty() {
return Err("Parse failed");
}
let mut program = parse_result.program;
// Phase 1: Analysis
let state = self.analyze(&program, &allocator)?;
// Phase 2: Transformations
let scoping = SemanticBuilder::new().build(&program).semantic.into_scoping();
let mut ctx = ReusableTraverseCtx::new(state, scoping, &allocator);
// Run all transformation passes
traverse_mut_with_ctx(&mut self.dispatcher_inliner, &mut program, &mut ctx);
traverse_mut_with_ctx(&mut self.rotation_deobfuscator, &mut program, &mut ctx);
// ... all other passes
// Phase 3: Codegen
let output = Codegen::new().build(&program).code;
Ok(output)
}
}- Create new
ast_deobfuscatemodule alongside existingdeobfuscate - Implement passes one by one, testing against real bundles
- Once complete, make AST version the default
- Keep token-based as legacy fallback option
Each pass should have:
- Unit tests with simple obfuscated patterns
- Integration tests with real bundle snippets
- Regression tests comparing token vs AST output
- ✅ All original bundles parse successfully
- ✅ All transformations preserve valid JavaScript
- ✅ Output is semantically equivalent to input
- ✅ Deobfuscation quality matches or exceeds token-based version
- ✅ Performance is acceptable (<10s for 5MB bundle)