fix(browser): three root-cause fixes for Neo4j Browser expand regression#268
fix(browser): three root-cause fixes for Neo4j Browser expand regression#268
Conversation
Three independent bugs caused the browser expand pattern to fail when expanding nodes with mixed-type edges (FOLLOWS: User→User + AUTHORED/LIKED: User→Post in the same UNION ALL): **Bug 1 — VLP branch WHERE aliases out of scope** (`to_sql_query.rs`) `rewrite_vlp_select_aliases` had an early-return guard: when the main plan's FROM was a regular table (e.g., `social.users` for the FOLLOWS branch), it assumed "optional VLP" and returned before reaching the UNION branch rewriting at lines 968-972. The VLP branch kept bare `a.user_id`/`b.post_id` in its WHERE clause — aliases not in scope when `FROM vlp_multi_type_a_b AS t`. Fix: before returning early, run the UNION branch rewriting so VLP branches get their WHERE rewritten to `t.start_id`/`t.end_id`. **Bug 2 — Wrong JOIN ON column for polymorphic endpoint** (`join_builder.rs`, `plan_builder_helpers.rs`) When the right-side endpoint is a Union node (polymorphic: Post from AUTHORED or LIKED), `extract_id_column` returned the first branch's id column (`post_id`) which leaked into the FOLLOWS branch JOIN ON as `b.post_id = r.followed_id`. Fix: `extract_id_column` now returns `None` when Union branches disagree (no consensus). `join_builder` adds a `rel_schemas_for_type` fallback to resolve the correct node label and id column from the schema when label extraction from the plan tree fails. **Bug 3 — VLP context bleeding into non-VLP branches** (`query_context.rs`, `to_sql_query.rs`) Multi-type VLP aliases registered in one UNION branch leaked into sibling branches, causing `JSON_VALUE(b.end_properties, ...)` to appear in the FOLLOWS branch's JOIN ON conditions. Fix: `BranchContextSnapshot`/`activate_scope_context` snapshots and restores the branch-local VLP alias state between UNION branch generation. **Regression tests** (`browser_expand_tests.rs`): 6 new tests covering both-endpoint IN-list filters, mixed-type VLP branching, and assertions that VLP branch segments never contain out-of-scope bare aliases. These would have caught all three regressions before they reached the browser. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
genezhang
left a comment
There was a problem hiding this comment.
Three solid root-cause fixes with good regression coverage. A few things worth addressing before merge, noted inline.
| .map(|ns| ns.node_id.id.clone()) | ||
| .unwrap_or_else(|| Identifier::Single(left_id_col)); | ||
| .or_else(|| left_id_col.map(Identifier::Single)) | ||
| .unwrap_or_else(|| Identifier::Single("id".to_string())); |
There was a problem hiding this comment.
Silent fallback to "id" is risky. The previous code hard-errored when extract_id_column failed; now both schema lookups can silently fall through to a literal "id" string that likely doesn't exist on the table. This produces broken SQL (b.id = r.followed_id) with no log signal. At minimum add a log::warn! here so failures are visible in RUST_LOG=warn output.
| .map(|ns| ns.node_id.id.clone()) | ||
| .unwrap_or_else(|| Identifier::Single(right_id_col)); | ||
| .or_else(|| right_id_col.map(Identifier::Single)) | ||
| .unwrap_or_else(|| Identifier::Single("id".to_string())); |
There was a problem hiding this comment.
Same silent fallback concern as the left-side case above — if right_id_col and right_label are both None, this produces c.id = r.to_id with no warning.
| } | ||
| }); | ||
| let left_node_id_flen: Identifier = left_label | ||
| .as_ref() |
There was a problem hiding this comment.
This polymorphic fallback block (collect from_nodes → consensus check → return first) is duplicated four times in this file (start_label, end_label, left_label, right_label). Suggest extracting:
fn consensus_endpoint_label(
labels: &[String],
schema: &GraphSchema,
side: EndpointSide, // From | To
) -> Option<String> {
let nodes: Vec<String> = labels.iter()
.filter_map(|rel_type| {
schema.rel_schemas_for_type(rel_type).into_iter().next()
.map(|rs| if side == EndpointSide::From { rs.from_node.clone() } else { rs.to_node.clone() })
})
.collect();
if !nodes.is_empty() && nodes.windows(2).all(|w| w[0] == w[1]) {
nodes.into_iter().next()
} else {
None
}
}Also note: .into_iter().next() on rel_schemas_for_type silently picks only the first schema for a given rel_type. If a rel_type can map to multiple schemas this could return the wrong one — worth a comment either way.
| } | ||
|
|
||
| /// Clear the alias→label mapping. | ||
| pub fn clear_alias_label_map() { |
There was a problem hiding this comment.
clear_alias_label_map is defined here but has no callers in the codebase. Either remove it or add #[allow(dead_code)] with a note about its intended use site.
| // (before the outer SELECT), not in the outer JOIN conditions. | ||
| if sql_lower.contains("json_value") { | ||
| // Find where the outer SELECT starts (after CTE declarations) | ||
| let outer_start = sql |
There was a problem hiding this comment.
The rfind("\nSELECT ") approach to find the outer SELECT boundary is copy-pasted ~6 times across these new tests and is fragile — if the SQL generator ever changes its newline convention the assertions silently pass with outer_start = 0 (i.e., checking the entire SQL string). Consider a shared helper:
fn outer_select_fragment(sql: &str) -> &str {
let pos = sql.rfind("\nSELECT ").or_else(|| sql.find("SELECT ")).unwrap_or(0);
&sql[pos..]
}Also: when rfind returns None and falls back to 0, the assertion checks the full SQL including CTE bodies — which may contain json_value legitimately. The test would then be a false positive. Better to assert! that the outer SELECT was actually found.
- join_builder.rs: extract `consensus_endpoint_label()` helper — the four
duplicate polymorphic-fallback blocks (left_label, right_label, start_label,
end_label) are now a single function. Adds doc comment explaining the
`.into_iter().next()` behaviour for multi-variant rel types.
- join_builder.rs: add `log::warn!` on the `"id"` fallback paths for both
left_node_id_flen and right_node_id_flen so failures are visible at
RUST_LOG=warn rather than silently producing broken SQL.
- query_context.rs: remove `clear_alias_label_map` — unused (no callers).
- browser_expand_tests.rs: extract `outer_select_fragment()` helper that
panics with a clear message if no SELECT is found; replaces six copies of
the fragile inline `rfind("\nSELECT ").unwrap_or(0)` pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In the GraphJoins CTE-reference code path (chained WITH clauses where the right node is a pre-computed CTE reference), when the endpoint resolves to a multi-type VLP CTE (`vlp_multi_type_*`), neither extract_node_label_from_viewscan nor consensus_endpoint_label can return a usable label — the Union branches disagree on the endpoint type (User vs Post). Both fell through to the `"id"` sentinel, producing broken JOIN conditions. Fix: detect `vlp_multi_type_*` CTE name on both left and right endpoints and use `start_id` / `end_id` directly. These are the unified columns the VLP CTE already exposes — the same unification scheme used by the VLP SELECT renderer and the toString() JOIN condition wrapping. Adds test_chained_with_vlp_endpoint_uses_end_id asserting that a WITH + VLP undirected expand query uses end_id in the outer query (not a raw node column). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Bug 1 (VLP branch WHERE out of scope):
rewrite_vlp_select_aliasesreturned early when the main plan's FROM was a regular table, skipping UNION branch rewriting entirely. VLP branches kept barea.user_id/b.post_idin their WHERE clause — aliases not in scope whenFROM vlp_multi_type_a_b AS t. Fix: run UNION branch rewriting before the early return.Bug 2 (Wrong JOIN ON column for polymorphic endpoint):
extract_id_columnon a Union node returned the first branch's id column (post_id), leaking into the FOLLOWS branch JOIN ON asb.post_id = r.followed_id. Fix: returnNonewhen Union branches disagree; addrel_schemas_for_typefallback injoin_builder.rsto resolve the correct column from schema.Bug 3 (VLP context bleeding between branches): Multi-type VLP aliases registered in one UNION branch leaked into sibling branches, causing
JSON_VALUE(b.end_properties, ...)in FOLLOWS JOIN conditions. Fix:BranchContextSnapshot/activate_scope_contextinquery_context.rssnapshots and restores branch-local VLP alias state.Files changed
src/clickhouse_query_generator/to_sql_query.rssrc/render_plan/join_builder.rssrc/render_plan/plan_builder_helpers.rsextract_id_columnUnion consensus checksrc/server/query_context.rstests/rust/integration/browser_expand_tests.rsTest plan
cargo test)a.user_id/b.post_idafterFROM vlp_multi_typetest_both_endpoint_in_list_mixed_type_vlpdirectly reproduces the browser regression pattern🤖 Generated with Claude Code