-
Notifications
You must be signed in to change notification settings - Fork 143
Search Architecture
The HexEditor search pipeline combines an LRU result cache, Boyer-Moore-Horspool pattern matching, SIMD acceleration for single-byte patterns, and parallel multi-core search for large files.
flowchart TD
Req["Search request\nbyte[] pattern + options"]
Cache["SearchCacheService\nLRU cache — O(1) lookup\nkeyed by pattern + options"]
Hit["Return cached positions\ninstant — 10-100x faster on repeat"]
FindSvc["FindReplaceService"]
Size{"File size?"}
Small["Single-thread\nBoyer-Moore-Horspool\nO(n/m) average"]
Large["Parallel.ForEach\nProcessorCount chunks\noverlapping boundaries"]
SIMD{"Single-byte\npattern?"}
AVX["SIMD AVX2/SSE2\n4-8x faster\n32/16 bytes per cycle"]
Merge["Merge + sort results\nfrom all chunks"]
Store["Store in LRU cache"]
HL["HighlightService\nmark all matches"]
Result["Return List<long>\nvirtual offsets"]
Req --> Cache
Cache -->|Hit| Hit
Cache -->|Miss| FindSvc
FindSvc --> Size
Size -->|"< 10 MB"| Small
Size -->|">= 10 MB"| Large
Large --> SIMD
Small --> SIMD
SIMD -->|Yes| AVX
SIMD -->|No| BM["Boyer-Moore-Horspool"]
AVX --> Merge
BM --> Merge
Merge --> Store
Store --> HL
HL --> Result
All search results are stored in an LRU (Least Recently Used) cache:
-
Key:
(byte[] pattern, SearchOptions options)— structural equality - Capacity: configurable (default: 64 entries)
- Eviction: when full, the least recently accessed entry is removed
- Invalidation: the cache is cleared when the file is modified (edit, undo, redo)
// Cache is checked transparently by FindReplaceService
// Direct access for advanced use:
searchCacheService.Invalidate(); // clear on file change
searchCacheService.TryGet(pattern, options, out results); // manual lookupUsed for multi-byte patterns on all file sizes:
- Preprocesses a bad-character skip table from the pattern (O(m))
- Search loop: O(n/m) average, O(n×m) worst case
- Aligned to
Span<byte>reads — no intermediate allocations - Falls back to SIMD when
pattern.Length == 1
For single-byte patterns (pattern.Length == 1), the search uses hardware intrinsics:
| ISA | Width | Throughput |
|---|---|---|
| AVX2 | 256-bit (32 bytes/cycle) | ~4x vs scalar |
| SSE2 | 128-bit (16 bytes/cycle) | ~2x vs scalar |
| Fallback | Scalar (Span<byte>.IndexOf) |
Baseline |
The correct path is selected at runtime via Vector256.IsHardwareAccelerated.
For files ≥ 10 MB, the file is split into Environment.ProcessorCount chunks:
File: [=== chunk 0 ===][=== chunk 1 ===][=== chunk 2 ===][=== chunk 3 ===]
overlap → ←overlap overlap → ←overlap overlap → ←overlap
Each chunk overlaps its neighbors by pattern.Length - 1 bytes to detect matches crossing chunk boundaries.
var results = new ConcurrentBag<long>();
Parallel.ForEach(chunks, chunk =>
{
foreach (long offset in SearchChunk(chunk, pattern))
results.Add(chunk.StartOffset + offset);
});
return results.OrderBy(x => x).ToList();// Inject via DI or access via HexEditor service
IFindReplaceService findReplace = hexEditor.GetService<IFindReplaceService>();
// Search
SearchResult result = await findReplace.FindAllAsync(
pattern: new byte[] { 0xDE, 0xAD, 0xBE, 0xEF },
options: SearchOptions.Default,
progress: new Progress<double>(p => progressBar.Value = p),
ct: cancellationToken);
foreach (long offset in result.Positions)
Console.WriteLine($"Found at: 0x{offset:X8}");
// Navigate (Go to next / previous match)
long? next = findReplace.FindNext(fromOffset: currentPosition);
long? prev = findReplace.FindPrevious(fromOffset: currentPosition);
// Replace
await findReplace.ReplaceAsync(offset, oldPattern, newPattern);
await findReplace.ReplaceAllAsync(oldPattern, newPattern);var options = new SearchOptions
{
CaseSensitive = true, // hex search always case-insensitive
SearchEncoding = Encoding.UTF8, // for string→byte conversion
MatchWholeWord = false,
Direction = SearchDirection.Forward
};The inline search bar (FindReplaceBar.xaml) binds to HexEditorViewModel.SearchCommand:
Ctrl+F → Open inline search bar
Enter / F3 → Find next
Shift+F3 → Find previous
Escape → Close bar, clear highlights
Ctrl+Shift+F → Open advanced search dialog
The advanced dialog supports:
-
Hex pattern input (space-separated bytes:
DE AD BE EF) - Text / string input with encoding selector
-
Wildcard bytes:
??matches any byte - Results list: click to jump to offset
After a search, HighlightService marks all result positions:
// Done automatically by FindReplaceService after search
highlightService.SetSearchHighlights(result.Positions, pattern.Length);
// Clear when search bar closes
highlightService.ClearSearchHighlights();Search highlights render as colored rectangles behind the hex bytes in the viewport — drawn by HexViewport.OnRender() before the text layer.
| Scenario | Result |
|---|---|
| Repeat search (cache hit) | Instant (O(1) lookup) |
| Single-byte SIMD search — 1 GB file | < 300 ms |
| 4-byte pattern — 1 GB file (parallel) | < 800 ms |
| 4-byte pattern — 1 GB file (single-core) | ~2 500 ms |
| Cache speedup on repeat query | 10–100× |
- ByteProvider — data source for search
- Architecture — search in the full stack
-
API Reference —
IFindReplaceService,SearchOptions
✨ Wpf HexEditor user control, by Derek Tremblay (derektremblay666@gmail.com) coded for your fun! 😊🤟
- API Reference
- Performance
- Services
- Core Components
- ByteProvider
- Rendering Engine
- Search Architecture
- whfmt.FileFormatCatalog v1.0.0 NuGet (cross-platform net8.0)
- 690+ .whfmt definitions (schema v2.3)
- Structure Editor — block DataGrid, drag-drop, validation, SmartComplete
- WhfmtBrowser/Catalog panels — browse all embedded formats
- AI Assistant (5 providers, 25 MCP tools)
- Tab Groups, Document Structure, Lazy Plugin Loading
- Window Menu + Win32 Fullscreen (
F11) - Git Integration UI (changes, history, blame)
- Shared Undo Engine (HexEditor ↔ CodeEditor)
- Bracket pair colorization, sticky scroll, peek definition
- Format detection hardening (thread-safe, crash guard)