Skip to content

fix: issues #46 + #47 — parser, planner, and executor correctness bugs#54

Merged
farhan-syah merged 9 commits intomainfrom
fix/issues-46-47-parser-planner-bugs
Apr 16, 2026
Merged

fix: issues #46 + #47 — parser, planner, and executor correctness bugs#54
farhan-syah merged 9 commits intomainfrom
fix/issues-46-47-parser-planner-bugs

Conversation

@farhan-syah
Copy link
Copy Markdown
Contributor

Summary

Fixes all 15 sub-items across #46 (hand-rolled parser bugs) and #47 (planner/executor correctness bugs).

Issue #46 — Parser string-handling and UTF-8 bugs

  • find_begin_pos now skips single-quoted string literals so 'BEGIN' in a WHEN clause doesn't corrupt the trigger header/body split
  • trim_matches('(' | ')') replaced with strip_outer_parens() that removes exactly one balanced pair; same fix applied to trim_matches('\'') on RLS value strings
  • find_matching_brace '' escape: converted for loop to while with i += 2 to actually skip the second quote
  • Expression tokenizer rewritten from byte-indexing to char_indices(). Also fixed 3 strip_new_prefix / replace_remaining_new_refs functions in check_constraint.rs and validate.rs that had the same &str[byte_index..] panic on multi-byte UTF-8
  • find_trailing_with_options() scans backward for WITH ( to distinguish options clause from CTEs and column aliases

Issue #47 — Planner and executor correctness

  • const_fold::fold_binary uses checked_add/sub/mul — overflow returns None (unfoldable) instead of panicking or wrapping
  • Procedural eval_binary_op uses checked_* for all integer ops; i64::MIN / -1 caught by checked_div; float division guards against non-finite results
  • ExecutionBudget::unlimited() replaced with trigger_default() (100K iterations, 10s wall-clock) — eliminates the 1-hour DoS vector
  • RIGHT JOIN rewrite now swaps inline_left/inline_right alongside collections/aliases/keys
  • Recursive CTE: planner extracts join_link from the recursive branch's JOIN ON clause; executor implements working-table hash-join with frontier iteration; strict-doc binary-tuple-to-msgpack conversion; PlanKind::MultiRow for pgwire response formatting. Value-generating CTEs return explicit unsupported error
  • extract_join_constraint folds ALL non-equi predicates with AND (was only keeping first). NATURAL JOIN and implicit cross-join return explicit errors. Non-equi conditions serialized as qualified FilterOp::Expr post-filters via sql_expr_to_bridge_expr_qualified (separate from bare-name path used by WHERE/CHECK)
  • ProcedureBlockCache stores body_sql in CacheEntry and verifies equality on hit — 64-bit hash collisions no longer return wrong procedure body
  • evaluate_default_expr falls back to parse_expr_string() + const-folder for expressions outside the keyword whitelist

Structural changes

  • nodedb-query/src/expr_parse.rs split into expr_parse/{mod.rs, tokenizer.rs}
  • nodedb-sql/src/lib.rs gains parse_expr_string() for the DEFAULT fallback path
  • sql_expr_to_bridge_expr split into bare-name and _qualified variants

Test plan

  • New integration tests covering all fixes
  • Existing integration and unit tests pass with no regressions
  • scripts/test.sh full SQL suite passes
  • clippy and fmt clean

Closes #46, closes #47

…tring

Move the generated-expression tokenizer/parser from a single
nodedb-query/src/expr_parse.rs into nodedb-query/src/expr_parse/
with separate tokenizer.rs and mod.rs files.

Expose a new `parse_expr_string` function in nodedb-sql that delegates
to sqlparser-rs so arbitrary DEFAULT/CHECK expressions can be parsed
and const-folded at plan time without duplicating grammar logic.
Add a join_link field (collection_field, working_table_field) to
RecursiveScan that drives proper tree-traversal CTEs. Each iteration
now builds a hash-set of values from the frontier's working_field and
finds collection rows whose collection_field is in that set, matching
the SQL INNER JOIN ON semantics of standard recursive CTEs.

Previously the recursive step applied filters to the full collection
without any join relationship to the previous iteration, producing
incorrect results for parent/child tree queries.

Also handle strict (Binary Tuple) encoded collections in the recursive
executor by converting through the schema before filter evaluation.
…ypes

Fold all non-equi ON predicates with AND instead of silently dropping
all but the first, which caused queries with compound ON clauses to
produce wrong results.

Return explicit errors for NATURAL JOIN and implicit cross-joins
(no ON/USING clause) rather than silently succeeding with empty
join keys.

Add sql_expr_to_bridge_expr_qualified and expr_filter_qualified for
join contexts where merged documents use table-qualified field names,
and wire the join condition through serialize_join_filters so non-equi
ON predicates are evaluated against merged rows alongside WHERE filters.
… cache collisions

Replace wrapping integer arithmetic with checked_add/sub/mul/div/rem
in the expression evaluator so overflows return None rather than
panicking or silently wrapping.

Replace the unlimited trigger budget (1-hour wall clock, MAX iterations)
with a trigger_default (100k iterations, 10s) to prevent runaway
procedural bodies from pinning Control Plane workers indefinitely.

Guard the plan cache against 64-bit hash collisions by verifying the
cached body SQL matches before returning a cached block, and evicting
on mismatch.
The check constraint, constraint validator, RLS, trigger, and
materialized view DDL parsers rewrote "NEW.col" references using byte
indexing after a to_uppercase() call. This produced incorrect results
for non-ASCII identifiers and is unsound for multi-byte UTF-8 sequences.

Switch all affected parsers to collect chars and iterate over the char
slice, matching the "NEW." prefix with eq_ignore_ascii_case on a
4-char window.

Also fix the SQL preprocessor's string literal scanner to advance by 2
on '' escapes instead of continuing without incrementing, which
previously caused an infinite loop on inputs with escaped single quotes.
The DEFAULT resolver previously only handled a small set of hard-coded
keywords (NOW, UUID, etc.) and returned None for everything else.

Add a fallback path that parses the DEFAULT string through sqlparser-rs
and runs it through the planner's constant-folding evaluator, enabling
DEFAULT values like upper('hello'), 1 + 2, or concat('a', 'b') to
resolve at insert time without a Data Plane round-trip.

Also replace wrapping integer arithmetic in the const-folder with
checked variants to prevent silent overflow on constant expressions.
Cover the bug-fixes landed in this series:

- sql_arithmetic_overflow: checked arithmetic in const-fold and eval
- sql_default_expressions: DEFAULT parsing and const-fold fallback
- sql_join_correctness: non-equi join predicates, NATURAL JOIN error
- sql_parser_string_handling: escaped single-quote infinite-loop fix
- sql_procedure_cache_safety: plan-cache hash-collision eviction
- sql_recursive_cte: join-link tree-traversal correctness
- sql_rls_predicate_parse: UTF-8 safe NEW. rewriting in RLS/DDL parsers
- sql_trigger_fuel: trigger budget cap prevents unbounded execution
- sql_utf8_expressions: multi-byte identifiers in constraint expressions
@farhan-syah farhan-syah merged commit 4c55689 into main Apr 16, 2026
2 checks passed
@farhan-syah farhan-syah deleted the fix/issues-46-47-parser-planner-bugs branch April 16, 2026 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL planner correctness & procedural execution safety (8 sub-items) SQL parser & DDL hand-rolled-parse correctness bugs (6 sub-items)

1 participant