fix: resolve issues #36–#44 — security, robustness, and correctness#49
Merged
farhan-syah merged 10 commits intomainfrom Apr 16, 2026
Merged
fix: resolve issues #36–#44 — security, robustness, and correctness#49farhan-syah merged 10 commits intomainfrom
farhan-syah merged 10 commits intomainfrom
Conversation
…tion Replace the LSN-only nonce derivation with a `[4-byte random epoch][8-byte LSN]` scheme. A fresh epoch is generated at construction time via getrandom, ensuring nonces are never reused across WAL lifetimes (process restart, snapshot restore, segment rotation) even when LSNs restart from 1. Update segment decryption to pass the epoch from the key so the AAD binding remains consistent. Refactor mmap_reader and reader for cleaner error paths.
- Add 10-second timeout on TLS handshakes across all listeners (pgwire, native, RESP, ILP) to prevent slow-handshake connection slot exhaustion - Enforce a 10 MiB per-line limit on ILP ingestion using read_until instead of lines(), dropping connections that exceed it - Limit recursive expression parsing depth to 128 in both the SQL resolver and the generated-expression parser to prevent stack overflow on deeply-nested malformed ASTs - Cap ef_search at 8192 in HNSW vector search to prevent DoS via unbounded beam width
…earch The quantized two-phase search was scanning all vectors to build candidates instead of using the HNSW graph. Replace the O(N) brute-force loop with HNSW graph traversal for O(log N) candidate generation, then rerank with exact FP32 distance as before. Also extend mmap_segment with additional helper methods and add unit tests for SQ8 search correctness via the collection API.
Add richer WAL record variants for columnar operations to support finer- grained replay. Fix mutation handling for edge cases in segment application.
…ng accumulators The previous implementation stored all matching raw document bytes grouped by key, then aggregated at the end — O(total_docs × avg_doc_size) memory. Replace with per-group streaming accumulators (accum.rs) that retain only the derived state needed for each aggregate function. Memory is now O(num_groups × num_aggregates) regardless of how many documents match. Supported functions: count, sum, avg, min, max, count_distinct, stddev/variance (Welford), approx_count_distinct (HLL), approx_percentile (t-digest), approx_topk (space-saving), array_agg, string_agg, percentile_cont. Array-materializing variants are capped at 10,000 items. Add aggregate_helpers.rs in nodedb-query for the field-extraction primitives used by the accumulator path.
…cans When the columnar memtable reaches the flush threshold, drain it into a compressed segment and retain the bytes in memory. Extend scan_normalize to read from flushed segments before falling back to the live memtable, so queries see all rows regardless of whether they have been flushed. This makes columnar scan results consistent across memtable boundaries without requiring a durable write path yet.
… result columns Two issues in the prepared-statement extended-query path: 1. DSL statements (SEARCH, GRAPH, MATCH, UPSERT INTO, etc.) were not handled by the Execute phase. Route them through the same DSL dispatcher used by simple queries; bound parameters are intentionally ignored for DSL. 2. When a statement declares typed result columns via Describe, Execute was producing a single-column JSON response against the N-column schema described to the client, causing null values for columns 2..N. Add a reproject step that parses each JSON object and re-encodes it with one pgwire field per declared column, with missing fields sent as SQL NULL.
…ements Cover GROUP BY with sum/avg/min/max/count over flushed columnar segments, and extended-query protocol correctness for typed result columns and DSL statement passthrough.
This was referenced Apr 16, 2026
Replace nested if-let and if-inside-if-let patterns with let-chain conditions in AggAccum::accumulate. Eliminates unnecessary nesting, makes null-skip and dedup conditions read left-to-right, and avoids a spurious clone of COUNT(*) row in the non-null branch. Also replace manual .max(64).min(MAX_EF_SEARCH) with .clamp() in the ef_search sizing helper for clarity.
…ds checks nodedb-vector/src/mmap_segment.rs: rewrite the mmap offset bounds check from a single nested checked_add/checked_mul expression into sequential early-returns via let-else. Each overflow or out-of-bounds condition is now its own guard, making the failure paths obvious. nodedb-wal/src/crypto.rs: replace .clone() calls on Copy epoch values in tests with copy-dereference to avoid unnecessary clone on a type that implements Copy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes 9 issues (14 sub-problems) spanning WAL crypto, memory safety, OOM, performance, DoS hardening, and pgwire correctness.
checked_mul/checked_addin header parsing, dim=0 rejection, bounds checks inget_vector/prefetchshould_flush→drain_optimized→SegmentWriterpersist → scan from flushed segmentsHashMap<String, Vec<Vec<u8>>>with streamingAggAccumaccumulators (O(groups) memory)MAX_EF_SEARCH = 8192tokio::time::timeouton all 3 listenersMAX_ILP_LINE_BYTES = 10 MiBcap withread_untilMAX_EXPR_DEPTH = 128in parser + resolverdata.len()offsetreader.rsandmmap_reader.rsread_slicehelper +MAX_FIELD_LENcapis_dslflag routes SEARCH/GRAPH/MATCH/UPSERT to DSL dispatcherreproject_responseextracts typed columns from JSON envelopeAlso fixes a silent data-loss bug in
encode_row_for_walthat swallowed serialization errors viaunwrap_or_default().Test plan
scripts/test.sh) passes — zero regressions across ~400 query resultscargo fmt --allcleancargo clippy --all-targetscleanCloses #36, closes #37, closes #38, closes #39, closes #40, closes #41, closes #42, closes #43, closes #44