Skip to content

Search Architecture

abbaye edited this page Mar 7, 2026 · 1 revision

Search Architecture

The HexEditor search pipeline combines an LRU result cache, Boyer-Moore-Horspool pattern matching, SIMD acceleration for single-byte patterns, and parallel multi-core search for large files.


Pipeline Overview

flowchart TD
    Req["Search request\nbyte[] pattern + options"]
    Cache["SearchCacheService\nLRU cache — O(1) lookup\nkeyed by pattern + options"]
    Hit["Return cached positions\ninstant — 10-100x faster on repeat"]
    FindSvc["FindReplaceService"]
    Size{"File size?"}
    Small["Single-thread\nBoyer-Moore-Horspool\nO(n/m) average"]
    Large["Parallel.ForEach\nProcessorCount chunks\noverlapping boundaries"]
    SIMD{"Single-byte\npattern?"}
    AVX["SIMD AVX2/SSE2\n4-8x faster\n32/16 bytes per cycle"]
    Merge["Merge + sort results\nfrom all chunks"]
    Store["Store in LRU cache"]
    HL["HighlightService\nmark all matches"]
    Result["Return List<long>\nvirtual offsets"]

    Req --> Cache
    Cache -->|Hit| Hit
    Cache -->|Miss| FindSvc
    FindSvc --> Size
    Size -->|"< 10 MB"| Small
    Size -->|">= 10 MB"| Large
    Large --> SIMD
    Small --> SIMD
    SIMD -->|Yes| AVX
    SIMD -->|No| BM["Boyer-Moore-Horspool"]
    AVX --> Merge
    BM --> Merge
    Merge --> Store
    Store --> HL
    HL --> Result
Loading

SearchCacheService — LRU Cache

All search results are stored in an LRU (Least Recently Used) cache:

  • Key: (byte[] pattern, SearchOptions options) — structural equality
  • Capacity: configurable (default: 64 entries)
  • Eviction: when full, the least recently accessed entry is removed
  • Invalidation: the cache is cleared when the file is modified (edit, undo, redo)
// Cache is checked transparently by FindReplaceService
// Direct access for advanced use:
searchCacheService.Invalidate();                          // clear on file change
searchCacheService.TryGet(pattern, options, out results); // manual lookup

Boyer-Moore-Horspool

Used for multi-byte patterns on all file sizes:

  • Preprocesses a bad-character skip table from the pattern (O(m))
  • Search loop: O(n/m) average, O(n×m) worst case
  • Aligned to Span<byte> reads — no intermediate allocations
  • Falls back to SIMD when pattern.Length == 1

SIMD Acceleration

For single-byte patterns (pattern.Length == 1), the search uses hardware intrinsics:

ISA Width Throughput
AVX2 256-bit (32 bytes/cycle) ~4x vs scalar
SSE2 128-bit (16 bytes/cycle) ~2x vs scalar
Fallback Scalar (Span<byte>.IndexOf) Baseline

The correct path is selected at runtime via Vector256.IsHardwareAccelerated.


Parallel Search (Large Files)

For files ≥ 10 MB, the file is split into Environment.ProcessorCount chunks:

File: [=== chunk 0 ===][=== chunk 1 ===][=== chunk 2 ===][=== chunk 3 ===]
         overlap →  ←overlap  overlap →  ←overlap  overlap →  ←overlap

Each chunk overlaps its neighbors by pattern.Length - 1 bytes to detect matches crossing chunk boundaries.

var results = new ConcurrentBag<long>();

Parallel.ForEach(chunks, chunk =>
{
    foreach (long offset in SearchChunk(chunk, pattern))
        results.Add(chunk.StartOffset + offset);
});

return results.OrderBy(x => x).ToList();

FindReplaceService API

// Inject via DI or access via HexEditor service
IFindReplaceService findReplace = hexEditor.GetService<IFindReplaceService>();

// Search
SearchResult result = await findReplace.FindAllAsync(
    pattern: new byte[] { 0xDE, 0xAD, 0xBE, 0xEF },
    options:  SearchOptions.Default,
    progress: new Progress<double>(p => progressBar.Value = p),
    ct:       cancellationToken);

foreach (long offset in result.Positions)
    Console.WriteLine($"Found at: 0x{offset:X8}");

// Navigate (Go to next / previous match)
long? next = findReplace.FindNext(fromOffset: currentPosition);
long? prev = findReplace.FindPrevious(fromOffset: currentPosition);

// Replace
await findReplace.ReplaceAsync(offset, oldPattern, newPattern);
await findReplace.ReplaceAllAsync(oldPattern, newPattern);

SearchOptions

var options = new SearchOptions
{
    CaseSensitive = true,           // hex search always case-insensitive
    SearchEncoding = Encoding.UTF8, // for string→byte conversion
    MatchWholeWord = false,
    Direction = SearchDirection.Forward
};

HexEditor Quick Search (Ctrl+F)

The inline search bar (FindReplaceBar.xaml) binds to HexEditorViewModel.SearchCommand:

Ctrl+F          → Open inline search bar
Enter / F3      → Find next
Shift+F3        → Find previous
Escape          → Close bar, clear highlights
Ctrl+Shift+F    → Open advanced search dialog

The advanced dialog supports:

  • Hex pattern input (space-separated bytes: DE AD BE EF)
  • Text / string input with encoding selector
  • Wildcard bytes: ?? matches any byte
  • Results list: click to jump to offset

Highlight Integration

After a search, HighlightService marks all result positions:

// Done automatically by FindReplaceService after search
highlightService.SetSearchHighlights(result.Positions, pattern.Length);

// Clear when search bar closes
highlightService.ClearSearchHighlights();

Search highlights render as colored rectangles behind the hex bytes in the viewport — drawn by HexViewport.OnRender() before the text layer.


Performance Numbers

Scenario Result
Repeat search (cache hit) Instant (O(1) lookup)
Single-byte SIMD search — 1 GB file < 300 ms
4-byte pattern — 1 GB file (parallel) < 800 ms
4-byte pattern — 1 GB file (single-core) ~2 500 ms
Cache speedup on repeat query 10–100×

See Also

Navigation

Getting Started

IDE Documentation

HexEditor Control

Advanced

Development


v0.6.4.75 Highlights

  • whfmt.FileFormatCatalog v1.0.0 NuGet (cross-platform net8.0)
  • 690+ .whfmt definitions (schema v2.3)
  • Structure Editor — block DataGrid, drag-drop, validation, SmartComplete
  • WhfmtBrowser/Catalog panels — browse all embedded formats
  • AI Assistant (5 providers, 25 MCP tools)
  • Tab Groups, Document Structure, Lazy Plugin Loading
  • Window Menu + Win32 Fullscreen (F11)
  • Git Integration UI (changes, history, blame)
  • Shared Undo Engine (HexEditor ↔ CodeEditor)
  • Bracket pair colorization, sticky scroll, peek definition
  • Format detection hardening (thread-safe, crash guard)

Links

Clone this wiki locally