fix: issues #46 + #47 — parser, planner, and executor correctness bugs by farhan-syah · Pull Request #54 · NodeDB-Lab/nodedb

farhan-syah · 2026-04-16T13:56:34Z

Summary

Fixes all 15 sub-items across #46 (hand-rolled parser bugs) and #47 (planner/executor correctness bugs).

Issue #46 — Parser string-handling and UTF-8 bugs

find_begin_pos now skips single-quoted string literals so 'BEGIN' in a WHEN clause doesn't corrupt the trigger header/body split
trim_matches('(' | ')') replaced with strip_outer_parens() that removes exactly one balanced pair; same fix applied to trim_matches('\'') on RLS value strings
find_matching_brace '' escape: converted for loop to while with i += 2 to actually skip the second quote
Expression tokenizer rewritten from byte-indexing to char_indices(). Also fixed 3 strip_new_prefix / replace_remaining_new_refs functions in check_constraint.rs and validate.rs that had the same &str[byte_index..] panic on multi-byte UTF-8
find_trailing_with_options() scans backward for WITH ( to distinguish options clause from CTEs and column aliases

Issue #47 — Planner and executor correctness

const_fold::fold_binary uses checked_add/sub/mul — overflow returns None (unfoldable) instead of panicking or wrapping
Procedural eval_binary_op uses checked_* for all integer ops; i64::MIN / -1 caught by checked_div; float division guards against non-finite results
ExecutionBudget::unlimited() replaced with trigger_default() (100K iterations, 10s wall-clock) — eliminates the 1-hour DoS vector
RIGHT JOIN rewrite now swaps inline_left/inline_right alongside collections/aliases/keys
Recursive CTE: planner extracts join_link from the recursive branch's JOIN ON clause; executor implements working-table hash-join with frontier iteration; strict-doc binary-tuple-to-msgpack conversion; PlanKind::MultiRow for pgwire response formatting. Value-generating CTEs return explicit unsupported error
extract_join_constraint folds ALL non-equi predicates with AND (was only keeping first). NATURAL JOIN and implicit cross-join return explicit errors. Non-equi conditions serialized as qualified FilterOp::Expr post-filters via sql_expr_to_bridge_expr_qualified (separate from bare-name path used by WHERE/CHECK)
ProcedureBlockCache stores body_sql in CacheEntry and verifies equality on hit — 64-bit hash collisions no longer return wrong procedure body
evaluate_default_expr falls back to parse_expr_string() + const-folder for expressions outside the keyword whitelist

Structural changes

nodedb-query/src/expr_parse.rs split into expr_parse/{mod.rs, tokenizer.rs}
nodedb-sql/src/lib.rs gains parse_expr_string() for the DEFAULT fallback path
sql_expr_to_bridge_expr split into bare-name and _qualified variants

Test plan

New integration tests covering all fixes
Existing integration and unit tests pass with no regressions
scripts/test.sh full SQL suite passes
clippy and fmt clean

Closes #46, closes #47

…tring Move the generated-expression tokenizer/parser from a single nodedb-query/src/expr_parse.rs into nodedb-query/src/expr_parse/ with separate tokenizer.rs and mod.rs files. Expose a new `parse_expr_string` function in nodedb-sql that delegates to sqlparser-rs so arbitrary DEFAULT/CHECK expressions can be parsed and const-folded at plan time without duplicating grammar logic.

Add a join_link field (collection_field, working_table_field) to RecursiveScan that drives proper tree-traversal CTEs. Each iteration now builds a hash-set of values from the frontier's working_field and finds collection rows whose collection_field is in that set, matching the SQL INNER JOIN ON semantics of standard recursive CTEs. Previously the recursive step applied filters to the full collection without any join relationship to the previous iteration, producing incorrect results for parent/child tree queries. Also handle strict (Binary Tuple) encoded collections in the recursive executor by converting through the schema before filter evaluation.

…ypes Fold all non-equi ON predicates with AND instead of silently dropping all but the first, which caused queries with compound ON clauses to produce wrong results. Return explicit errors for NATURAL JOIN and implicit cross-joins (no ON/USING clause) rather than silently succeeding with empty join keys. Add sql_expr_to_bridge_expr_qualified and expr_filter_qualified for join contexts where merged documents use table-qualified field names, and wire the join condition through serialize_join_filters so non-equi ON predicates are evaluated against merged rows alongside WHERE filters.

… cache collisions Replace wrapping integer arithmetic with checked_add/sub/mul/div/rem in the expression evaluator so overflows return None rather than panicking or silently wrapping. Replace the unlimited trigger budget (1-hour wall clock, MAX iterations) with a trigger_default (100k iterations, 10s) to prevent runaway procedural bodies from pinning Control Plane workers indefinitely. Guard the plan cache against 64-bit hash collisions by verifying the cached body SQL matches before returning a cached block, and evicting on mismatch.

The check constraint, constraint validator, RLS, trigger, and materialized view DDL parsers rewrote "NEW.col" references using byte indexing after a to_uppercase() call. This produced incorrect results for non-ASCII identifiers and is unsound for multi-byte UTF-8 sequences. Switch all affected parsers to collect chars and iterate over the char slice, matching the "NEW." prefix with eq_ignore_ascii_case on a 4-char window. Also fix the SQL preprocessor's string literal scanner to advance by 2 on '' escapes instead of continuing without incrementing, which previously caused an infinite loop on inputs with escaped single quotes.

The DEFAULT resolver previously only handled a small set of hard-coded keywords (NOW, UUID, etc.) and returned None for everything else. Add a fallback path that parses the DEFAULT string through sqlparser-rs and runs it through the planner's constant-folding evaluator, enabling DEFAULT values like upper('hello'), 1 + 2, or concat('a', 'b') to resolve at insert time without a Data Plane round-trip. Also replace wrapping integer arithmetic in the const-folder with checked variants to prevent silent overflow on constant expressions.

Cover the bug-fixes landed in this series: - sql_arithmetic_overflow: checked arithmetic in const-fold and eval - sql_default_expressions: DEFAULT parsing and const-fold fallback - sql_join_correctness: non-equi join predicates, NATURAL JOIN error - sql_parser_string_handling: escaped single-quote infinite-loop fix - sql_procedure_cache_safety: plan-cache hash-collision eviction - sql_recursive_cte: join-link tree-traversal correctness - sql_rls_predicate_parse: UTF-8 safe NEW. rewriting in RLS/DDL parsers - sql_trigger_fuel: trigger budget cap prevents unbounded execution - sql_utf8_expressions: multi-byte identifiers in constraint expressions

…while_let, checked_div)

farhan-syah added 7 commits April 16, 2026 21:52

This was referenced Apr 16, 2026

SQL parser & DDL hand-rolled-parse correctness bugs (6 sub-items) #46

Closed

SQL planner correctness & procedural execution safety (8 sub-items) #47

Closed

farhan-syah added 2 commits April 16, 2026 22:01

fix(types): use sort_by_key per clippy unnecessary_sort_by

e8fe4bc

fix: resolve Rust 1.95 clippy lints (collapsible_match, sort_by_key, …

96cbd80

…while_let, checked_div)

farhan-syah merged commit 4c55689 into main Apr 16, 2026
2 checks passed

farhan-syah deleted the fix/issues-46-47-parser-planner-bugs branch April 16, 2026 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: issues #46 + #47 — parser, planner, and executor correctness bugs#54

fix: issues #46 + #47 — parser, planner, and executor correctness bugs#54
farhan-syah merged 9 commits intomainfrom
fix/issues-46-47-parser-planner-bugs

farhan-syah commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

farhan-syah commented Apr 16, 2026

Summary

Issue #46 — Parser string-handling and UTF-8 bugs

Issue #47 — Planner and executor correctness

Structural changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant