Skip to content

fix(browser): three root-cause fixes for Neo4j Browser expand regression#268

Merged
genezhang merged 4 commits intomainfrom
fix/browser-expand-vlp-regression
Apr 5, 2026
Merged

fix(browser): three root-cause fixes for Neo4j Browser expand regression#268
genezhang merged 4 commits intomainfrom
fix/browser-expand-vlp-regression

Conversation

@genezhang
Copy link
Copy Markdown
Owner

Summary

  • Bug 1 (VLP branch WHERE out of scope): rewrite_vlp_select_aliases returned early when the main plan's FROM was a regular table, skipping UNION branch rewriting entirely. VLP branches kept bare a.user_id/b.post_id in their WHERE clause — aliases not in scope when FROM vlp_multi_type_a_b AS t. Fix: run UNION branch rewriting before the early return.

  • Bug 2 (Wrong JOIN ON column for polymorphic endpoint): extract_id_column on a Union node returned the first branch's id column (post_id), leaking into the FOLLOWS branch JOIN ON as b.post_id = r.followed_id. Fix: return None when Union branches disagree; add rel_schemas_for_type fallback in join_builder.rs to resolve the correct column from schema.

  • Bug 3 (VLP context bleeding between branches): Multi-type VLP aliases registered in one UNION branch leaked into sibling branches, causing JSON_VALUE(b.end_properties, ...) in FOLLOWS JOIN conditions. Fix: BranchContextSnapshot/activate_scope_context in query_context.rs snapshots and restores branch-local VLP alias state.

Files changed

File Change
src/clickhouse_query_generator/to_sql_query.rs Bug 1 fix: rewrite union branches before optional-VLP early return
src/render_plan/join_builder.rs Bug 2 fix: schema fallback for polymorphic endpoint label/id resolution
src/render_plan/plan_builder_helpers.rs Bug 2 fix: extract_id_column Union consensus check
src/server/query_context.rs Bug 3 fix: branch context snapshot/restore API
tests/rust/integration/browser_expand_tests.rs 6 new regression tests for both-endpoint IN-list filters and VLP scope correctness

Test plan

  • All 1,613 Rust tests pass (cargo test)
  • 49 browser expand tests pass (43 original + 6 new)
  • New tests specifically assert that VLP branch segments do not contain out-of-scope aliases a.user_id/b.post_id after FROM vlp_multi_type
  • test_both_endpoint_in_list_mixed_type_vlp directly reproduces the browser regression pattern

🤖 Generated with Claude Code

genezhang and others added 2 commits April 5, 2026 09:18
Three independent bugs caused the browser expand pattern to fail when
expanding nodes with mixed-type edges (FOLLOWS: User→User + AUTHORED/LIKED:
User→Post in the same UNION ALL):

**Bug 1 — VLP branch WHERE aliases out of scope** (`to_sql_query.rs`)
`rewrite_vlp_select_aliases` had an early-return guard: when the main plan's
FROM was a regular table (e.g., `social.users` for the FOLLOWS branch), it
assumed "optional VLP" and returned before reaching the UNION branch rewriting
at lines 968-972. The VLP branch kept bare `a.user_id`/`b.post_id` in its
WHERE clause — aliases not in scope when `FROM vlp_multi_type_a_b AS t`.
Fix: before returning early, run the UNION branch rewriting so VLP branches
get their WHERE rewritten to `t.start_id`/`t.end_id`.

**Bug 2 — Wrong JOIN ON column for polymorphic endpoint** (`join_builder.rs`,
`plan_builder_helpers.rs`)
When the right-side endpoint is a Union node (polymorphic: Post from AUTHORED
or LIKED), `extract_id_column` returned the first branch's id column (`post_id`)
which leaked into the FOLLOWS branch JOIN ON as `b.post_id = r.followed_id`.
Fix: `extract_id_column` now returns `None` when Union branches disagree
(no consensus). `join_builder` adds a `rel_schemas_for_type` fallback to
resolve the correct node label and id column from the schema when label
extraction from the plan tree fails.

**Bug 3 — VLP context bleeding into non-VLP branches** (`query_context.rs`,
`to_sql_query.rs`)
Multi-type VLP aliases registered in one UNION branch leaked into sibling
branches, causing `JSON_VALUE(b.end_properties, ...)` to appear in the FOLLOWS
branch's JOIN ON conditions.
Fix: `BranchContextSnapshot`/`activate_scope_context` snapshots and restores
the branch-local VLP alias state between UNION branch generation.

**Regression tests** (`browser_expand_tests.rs`): 6 new tests covering
both-endpoint IN-list filters, mixed-type VLP branching, and assertions that
VLP branch segments never contain out-of-scope bare aliases. These would have
caught all three regressions before they reached the browser.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@genezhang genezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three solid root-cause fixes with good regression coverage. A few things worth addressing before merge, noted inline.

Comment thread src/render_plan/join_builder.rs Outdated
.map(|ns| ns.node_id.id.clone())
.unwrap_or_else(|| Identifier::Single(left_id_col));
.or_else(|| left_id_col.map(Identifier::Single))
.unwrap_or_else(|| Identifier::Single("id".to_string()));
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent fallback to "id" is risky. The previous code hard-errored when extract_id_column failed; now both schema lookups can silently fall through to a literal "id" string that likely doesn't exist on the table. This produces broken SQL (b.id = r.followed_id) with no log signal. At minimum add a log::warn! here so failures are visible in RUST_LOG=warn output.

Comment thread src/render_plan/join_builder.rs Outdated
.map(|ns| ns.node_id.id.clone())
.unwrap_or_else(|| Identifier::Single(right_id_col));
.or_else(|| right_id_col.map(Identifier::Single))
.unwrap_or_else(|| Identifier::Single("id".to_string()));
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same silent fallback concern as the left-side case above — if right_id_col and right_label are both None, this produces c.id = r.to_id with no warning.

Comment thread src/render_plan/join_builder.rs Outdated
}
});
let left_node_id_flen: Identifier = left_label
.as_ref()
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This polymorphic fallback block (collect from_nodes → consensus check → return first) is duplicated four times in this file (start_label, end_label, left_label, right_label). Suggest extracting:

fn consensus_endpoint_label(
    labels: &[String],
    schema: &GraphSchema,
    side: EndpointSide, // From | To
) -> Option<String> {
    let nodes: Vec<String> = labels.iter()
        .filter_map(|rel_type| {
            schema.rel_schemas_for_type(rel_type).into_iter().next()
                .map(|rs| if side == EndpointSide::From { rs.from_node.clone() } else { rs.to_node.clone() })
        })
        .collect();
    if !nodes.is_empty() && nodes.windows(2).all(|w| w[0] == w[1]) {
        nodes.into_iter().next()
    } else {
        None
    }
}

Also note: .into_iter().next() on rel_schemas_for_type silently picks only the first schema for a given rel_type. If a rel_type can map to multiple schemas this could return the wrong one — worth a comment either way.

Comment thread src/server/query_context.rs Outdated
}

/// Clear the alias→label mapping.
pub fn clear_alias_label_map() {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear_alias_label_map is defined here but has no callers in the codebase. Either remove it or add #[allow(dead_code)] with a note about its intended use site.

// (before the outer SELECT), not in the outer JOIN conditions.
if sql_lower.contains("json_value") {
// Find where the outer SELECT starts (after CTE declarations)
let outer_start = sql
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rfind("\nSELECT ") approach to find the outer SELECT boundary is copy-pasted ~6 times across these new tests and is fragile — if the SQL generator ever changes its newline convention the assertions silently pass with outer_start = 0 (i.e., checking the entire SQL string). Consider a shared helper:

fn outer_select_fragment(sql: &str) -> &str {
    let pos = sql.rfind("\nSELECT ").or_else(|| sql.find("SELECT ")).unwrap_or(0);
    &sql[pos..]
}

Also: when rfind returns None and falls back to 0, the assertion checks the full SQL including CTE bodies — which may contain json_value legitimately. The test would then be a false positive. Better to assert! that the outer SELECT was actually found.

genezhang and others added 2 commits April 5, 2026 09:54
- join_builder.rs: extract `consensus_endpoint_label()` helper — the four
  duplicate polymorphic-fallback blocks (left_label, right_label, start_label,
  end_label) are now a single function. Adds doc comment explaining the
  `.into_iter().next()` behaviour for multi-variant rel types.
- join_builder.rs: add `log::warn!` on the `"id"` fallback paths for both
  left_node_id_flen and right_node_id_flen so failures are visible at
  RUST_LOG=warn rather than silently producing broken SQL.
- query_context.rs: remove `clear_alias_label_map` — unused (no callers).
- browser_expand_tests.rs: extract `outer_select_fragment()` helper that
  panics with a clear message if no SELECT is found; replaces six copies of
  the fragile inline `rfind("\nSELECT ").unwrap_or(0)` pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In the GraphJoins CTE-reference code path (chained WITH clauses where the
right node is a pre-computed CTE reference), when the endpoint resolves to
a multi-type VLP CTE (`vlp_multi_type_*`), neither extract_node_label_from_viewscan
nor consensus_endpoint_label can return a usable label — the Union branches
disagree on the endpoint type (User vs Post). Both fell through to the `"id"`
sentinel, producing broken JOIN conditions.

Fix: detect `vlp_multi_type_*` CTE name on both left and right endpoints and
use `start_id` / `end_id` directly. These are the unified columns the VLP CTE
already exposes — the same unification scheme used by the VLP SELECT renderer
and the toString() JOIN condition wrapping.

Adds test_chained_with_vlp_endpoint_uses_end_id asserting that a WITH + VLP
undirected expand query uses end_id in the outer query (not a raw node column).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@genezhang genezhang merged commit cc5b6c8 into main Apr 5, 2026
4 checks passed
@genezhang genezhang deleted the fix/browser-expand-vlp-regression branch April 5, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant