Skip to content

Commit aadb804

Browse files
genezhangclaude
andcommitted
fix(review): address Copilot PR review comments
- type_inference.rs: proper IS NULL / IS NOT NULL handling — thread a null_check map through extract_props_from_expr/plan; IS NULL/IS NOT NULL operands go to null_check (not regular). Pruning uses regular ∪ null_check with ANY semantics, but skips pruning entirely when only null-check accesses exist (missing properties return NULL, not an error). Fixes the unsound early return that could drop types in `WHERE n.p IS NOT NULL OR n.q = 5`. - database.rs: revert #[non_exhaustive] (prohibits struct literal construction in external crates even with ..default()); document the All-Option convention as the forward-compat strategy instead. - README.md: use relative links (docs/motivation.md, skills/README.md) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent a05eba7 commit aadb804

3 files changed

Lines changed: 129 additions & 63 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@
1313
- Embedded mode with embedded `chDB`
1414
- Hybrid mode with remote querying and local storage
1515

16-
See [here](https://github.com/genezhang/clickgraph/blob/main/docs/motivation.md) for motivation and rationale.
16+
See [motivation and rationale](docs/motivation.md).
1717

1818
---
1919
## What's New in v0.6.6-dev
2020

2121
- **`cg` CLI tool** — Agent/script-oriented CLI (`clickgraph-tool` crate). Translate and execute Cypher without a running server: `cg sql`, `cg validate`, `cg query`, `cg nl` (NL→Cypher via LLM), `cg schema show/validate/discover/diff`. Config via `~/.config/cg/config.toml`. Designed for agentic callers, CI pipelines, and scripting.
2222
- **`embedded` feature now opt-in**`clickgraph-embedded` compiles without chdb by default. New `Database::new_remote(schema, RemoteConfig)` constructor executes Cypher against external ClickHouse with no chdb dependency — useful for lightweight tooling and the `cg` CLI.
23-
- **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](https://github.com/genezhang/clickgraph/blob/main/skills/README.md).
23+
- **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](skills/README.md).
2424
- **openCypher TCK** — 383/402 openCypher Technology Compatibility Kit scenarios passing (95.3%), 0 failures. The 19 skipped scenarios cover Cypher write clauses (`CREATE`, `SET`, `DELETE`, `MERGE`) — not yet supported as Cypher syntax. Note: a programmatic write API (`create_node()`, `create_edge()`, `upsert_node()`) is already available in embedded mode; Cypher write syntax support is planned.
2525

2626
## What's New in v0.6.5-dev

clickgraph-embedded/src/database.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ impl std::fmt::Debug for RemoteConfig {
5656
/// Configuration for an embedded database session.
5757
///
5858
/// Mirrors `kuzu::SystemConfig`.
59+
///
60+
/// All fields are `Option` so that callers can safely use `..SystemConfig::default()`
61+
/// to forward-compatibly add new fields without breaking struct literals.
5962
#[derive(Debug, Clone, Default)]
6063
pub struct SystemConfig {
6164
/// Directory where chdb stores its session data.

src/query_planner/analyzer/type_inference.rs

Lines changed: 124 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -1381,6 +1381,7 @@ impl TypeInference {
13811381
graph_schema: &GraphSchema,
13821382
relationships: &[RelationshipPattern],
13831383
property_accesses: &HashMap<String, HashSet<String>>,
1384+
null_check_accesses: &HashMap<String, HashSet<String>>,
13841385
) -> AnalyzerResult<Arc<LogicalPlan>> {
13851386
log::info!(
13861387
"🔀 UnifiedTypeInference: Generating UNION for {} untyped nodes",
@@ -1481,29 +1482,47 @@ impl TypeInference {
14811482
//
14821483
// We use ANY-semantics: a type is kept if it has at least one of the accessed
14831484
// properties. This correctly handles OR predicates (`WHERE n.p1 = 12 OR n.p2 = 13`)
1484-
// where different types satisfy different branches. IS NULL accesses are excluded
1485-
// from `property_accesses` entirely (see extract_props_from_expr) because
1486-
// accessing a missing property yields NULL, not an error.
1485+
// where different types satisfy different branches.
14871486
//
1488-
// ALL-semantics would incorrectly eliminate types when properties appear in
1489-
// disjunctive (OR) predicates or when a type has only some of the accessed
1490-
// properties (those it lacks resolve to NULL and are filtered by the WHERE clause).
1491-
if let Some(props) = property_accesses.get(var_name) {
1492-
candidates.retain(|type_name| {
1493-
if let Ok(node_schema) = graph_schema.node_schema(type_name) {
1494-
props
1495-
.iter()
1496-
.any(|prop| node_schema.has_cypher_property(prop))
1497-
} else {
1498-
false
1499-
}
1500-
});
1501-
log::debug!(
1502-
"🔍 Constrained '{}' by properties {:?}: {:?}",
1503-
var_name,
1504-
props,
1505-
candidates
1506-
);
1487+
// Properties are split into two sets:
1488+
// - `regular`: accessed in normal expressions (equality, comparisons, …)
1489+
// - `null_check`: accessed inside IS NULL / IS NOT NULL predicates
1490+
//
1491+
// ALL-semantics would incorrectly eliminate types in disjunctive predicates.
1492+
// Pruning only on `regular` props (ignoring null_check) would incorrectly drop
1493+
// types that only appear in an IS NOT NULL branch of an OR, e.g.:
1494+
// `WHERE n.p IS NOT NULL OR n.q = 5` — type A={p}, type B={q}: both must survive.
1495+
// Using `regular ∪ null_check` with ANY-semantics handles this correctly.
1496+
//
1497+
// Special case: if a variable has ONLY null-check accesses and no regular accesses,
1498+
// skip pruning entirely — missing properties return NULL, not an error
1499+
// (e.g., `WHERE n.missing IS NULL` should not eliminate any type candidates).
1500+
let regular_props = property_accesses.get(var_name);
1501+
let nc_props = null_check_accesses.get(var_name);
1502+
if regular_props.is_some() || nc_props.is_some() {
1503+
let has_regular = regular_props.map_or(false, |p| !p.is_empty());
1504+
if has_regular {
1505+
// Prune using regular ∪ null_check with ANY semantics
1506+
let all_props: HashSet<&String> = regular_props
1507+
.iter()
1508+
.flat_map(|s| s.iter())
1509+
.chain(nc_props.iter().flat_map(|s| s.iter()))
1510+
.collect();
1511+
candidates.retain(|type_name| {
1512+
if let Ok(node_schema) = graph_schema.node_schema(type_name) {
1513+
all_props.iter().any(|prop| node_schema.has_cypher_property(prop))
1514+
} else {
1515+
false
1516+
}
1517+
});
1518+
log::debug!(
1519+
"🔍 Constrained '{}' by properties {:?}: {:?}",
1520+
var_name,
1521+
all_props,
1522+
candidates
1523+
);
1524+
}
1525+
// else: only null-check accesses — skip pruning (missing props → NULL)
15071526
}
15081527

15091528
if candidates.is_empty() {
@@ -3665,6 +3684,13 @@ impl AnalyzerPass for TypeInference {
36653684
.cloned()
36663685
.collect();
36673686
let arm_props: HashMap<_, _> = original_property_accesses
3687+
.regular
3688+
.iter()
3689+
.filter(|(k, _)| arm_untyped.contains(k.as_str()))
3690+
.map(|(k, v)| (k.clone(), v.clone()))
3691+
.collect();
3692+
let arm_null_check: HashMap<_, _> = original_property_accesses
3693+
.null_check
36683694
.iter()
36693695
.filter(|(k, _)| arm_untyped.contains(k.as_str()))
36703696
.map(|(k, v)| (k.clone(), v.clone()))
@@ -3676,6 +3702,7 @@ impl AnalyzerPass for TypeInference {
36763702
graph_schema,
36773703
&arm_rels,
36783704
&arm_props,
3705+
&arm_null_check,
36793706
)?;
36803707
new_inputs.push(expanded);
36813708
}
@@ -3694,7 +3721,8 @@ impl AnalyzerPass for TypeInference {
36943721
plan_ctx,
36953722
graph_schema,
36963723
&original_relationships,
3697-
&original_property_accesses,
3724+
&original_property_accesses.regular,
3725+
&original_property_accesses.null_check,
36983726
)?;
36993727
}
37003728
} else {
@@ -4184,119 +4212,154 @@ fn extract_patterns_recursive(
41844212
}
41854213
}
41864214

4215+
/// Property accesses extracted from a plan, split by whether they appear inside
4216+
/// IS NULL / IS NOT NULL predicates.
4217+
///
4218+
/// - `regular`: properties accessed in normal expressions (equality, comparison, functions, …)
4219+
/// - `null_check`: properties accessed *only* as the operand of IS NULL / IS NOT NULL
4220+
///
4221+
/// Keeping them separate lets the pruning step use ANY-semantics on the full set while
4222+
/// still allowing relaxation when a variable's accesses are exclusively null-checks
4223+
/// (e.g., `WHERE n.missing IS NULL` — missing properties return NULL, not an error).
4224+
struct PropertyAccesses {
4225+
regular: HashMap<String, HashSet<String>>,
4226+
null_check: HashMap<String, HashSet<String>>,
4227+
}
4228+
41874229
/// Extract property accesses from the plan tree, grouped by variable name.
4188-
/// Returns a map: variable_alias → set of Cypher property names accessed.
41894230
/// Used to constrain node type candidates to only those types that have the accessed properties.
4190-
fn extract_property_accesses(plan: &Arc<LogicalPlan>) -> HashMap<String, HashSet<String>> {
4191-
let mut accesses: HashMap<String, HashSet<String>> = HashMap::new();
4192-
extract_props_from_plan(plan.as_ref(), &mut accesses);
4193-
accesses
4231+
fn extract_property_accesses(plan: &Arc<LogicalPlan>) -> PropertyAccesses {
4232+
let mut regular: HashMap<String, HashSet<String>> = HashMap::new();
4233+
let mut null_check: HashMap<String, HashSet<String>> = HashMap::new();
4234+
extract_props_from_plan(plan.as_ref(), &mut regular, &mut null_check);
4235+
PropertyAccesses { regular, null_check }
41944236
}
41954237

4196-
fn extract_props_from_plan(plan: &LogicalPlan, accesses: &mut HashMap<String, HashSet<String>>) {
4238+
fn extract_props_from_plan(
4239+
plan: &LogicalPlan,
4240+
accesses: &mut HashMap<String, HashSet<String>>,
4241+
null_check: &mut HashMap<String, HashSet<String>>,
4242+
) {
41974243
match plan {
41984244
LogicalPlan::Filter(filter) => {
4199-
extract_props_from_expr(&filter.predicate, accesses);
4200-
extract_props_from_plan(&filter.input, accesses);
4245+
extract_props_from_expr(&filter.predicate, accesses, null_check);
4246+
extract_props_from_plan(&filter.input, accesses, null_check);
42014247
}
42024248
LogicalPlan::Projection(proj) => {
42034249
for item in &proj.items {
4204-
extract_props_from_expr(&item.expression, accesses);
4250+
extract_props_from_expr(&item.expression, accesses, null_check);
42054251
}
4206-
extract_props_from_plan(&proj.input, accesses);
4252+
extract_props_from_plan(&proj.input, accesses, null_check);
42074253
}
42084254
LogicalPlan::OrderBy(ob) => {
42094255
for item in &ob.items {
4210-
extract_props_from_expr(&item.expression, accesses);
4256+
extract_props_from_expr(&item.expression, accesses, null_check);
42114257
}
4212-
extract_props_from_plan(&ob.input, accesses);
4258+
extract_props_from_plan(&ob.input, accesses, null_check);
42134259
}
42144260
LogicalPlan::GroupBy(gb) => {
42154261
for item in &gb.expressions {
4216-
extract_props_from_expr(item, accesses);
4262+
extract_props_from_expr(item, accesses, null_check);
42174263
}
42184264
if let Some(ref having) = gb.having_clause {
4219-
extract_props_from_expr(having, accesses);
4265+
extract_props_from_expr(having, accesses, null_check);
42204266
}
4221-
extract_props_from_plan(&gb.input, accesses);
4267+
extract_props_from_plan(&gb.input, accesses, null_check);
42224268
}
42234269
LogicalPlan::GraphRel(rel) => {
4224-
extract_props_from_plan(&rel.left, accesses);
4225-
extract_props_from_plan(&rel.center, accesses);
4226-
extract_props_from_plan(&rel.right, accesses);
4270+
extract_props_from_plan(&rel.left, accesses, null_check);
4271+
extract_props_from_plan(&rel.center, accesses, null_check);
4272+
extract_props_from_plan(&rel.right, accesses, null_check);
42274273
if let Some(ref pred) = rel.where_predicate {
4228-
extract_props_from_expr(pred, accesses);
4274+
extract_props_from_expr(pred, accesses, null_check);
42294275
}
42304276
}
42314277
LogicalPlan::GraphNode(node) => {
4232-
extract_props_from_plan(&node.input, accesses);
4278+
extract_props_from_plan(&node.input, accesses, null_check);
42334279
}
42344280
LogicalPlan::Union(u) => {
42354281
for input in &u.inputs {
4236-
extract_props_from_plan(input, accesses);
4282+
extract_props_from_plan(input, accesses, null_check);
42374283
}
42384284
}
42394285
LogicalPlan::WithClause(wc) => {
4240-
extract_props_from_plan(&wc.input, accesses);
4286+
extract_props_from_plan(&wc.input, accesses, null_check);
42414287
}
42424288
LogicalPlan::CartesianProduct(cp) => {
4243-
extract_props_from_plan(&cp.left, accesses);
4244-
extract_props_from_plan(&cp.right, accesses);
4289+
extract_props_from_plan(&cp.left, accesses, null_check);
4290+
extract_props_from_plan(&cp.right, accesses, null_check);
42454291
}
42464292
LogicalPlan::Limit(lim) => {
4247-
extract_props_from_plan(&lim.input, accesses);
4293+
extract_props_from_plan(&lim.input, accesses, null_check);
42484294
}
42494295
LogicalPlan::Skip(s) => {
4250-
extract_props_from_plan(&s.input, accesses);
4296+
extract_props_from_plan(&s.input, accesses, null_check);
42514297
}
42524298
_ => {}
42534299
}
42544300
}
42554301

4256-
fn extract_props_from_expr(expr: &LogicalExpr, accesses: &mut HashMap<String, HashSet<String>>) {
4302+
fn extract_props_from_expr(
4303+
expr: &LogicalExpr,
4304+
accesses: &mut HashMap<String, HashSet<String>>,
4305+
null_check: &mut HashMap<String, HashSet<String>>,
4306+
) {
42574307
match expr {
42584308
LogicalExpr::PropertyAccessExp(pa) => {
42594309
let var = pa.table_alias.0.clone();
42604310
let prop = pa.column.raw().to_string();
42614311
accesses.entry(var).or_default().insert(prop);
42624312
}
42634313
LogicalExpr::Operator(op) | LogicalExpr::OperatorApplicationExp(op) => {
4264-
// Skip IS NULL / IS NOT NULL — accessing a missing property returns NULL,
4265-
// so the property not existing in a type's schema is semantically valid.
4266-
// Including it would incorrectly eliminate types from the candidate set.
42674314
use crate::query_planner::logical_expr::Operator as Op;
42684315
if matches!(op.operator, Op::IsNull | Op::IsNotNull) {
4316+
// Record properties accessed inside IS NULL / IS NOT NULL in the
4317+
// null_check map, NOT in regular accesses. They still participate in
4318+
// ANY-semantics pruning (fixing the OR disjunction case, e.g.
4319+
// `WHERE n.p IS NOT NULL OR n.q = 5` keeps types with either p or q),
4320+
// but when a variable's ONLY accesses are null-checks the pruning is
4321+
// relaxed (missing properties return NULL, not an error — see pruning
4322+
// logic in generate_union_for_untyped_nodes).
4323+
// IS NULL / IS NOT NULL always takes a PropertyAccessExp operand —
4324+
// extract directly to avoid a double-mutable-borrow of null_check.
4325+
for operand in &op.operands {
4326+
if let LogicalExpr::PropertyAccessExp(pa) = operand {
4327+
let var = pa.table_alias.0.clone();
4328+
let prop = pa.column.raw().to_string();
4329+
null_check.entry(var).or_default().insert(prop);
4330+
}
4331+
}
42694332
return;
42704333
}
42714334
for operand in &op.operands {
4272-
extract_props_from_expr(operand, accesses);
4335+
extract_props_from_expr(operand, accesses, null_check);
42734336
}
42744337
}
42754338
LogicalExpr::ScalarFnCall(f) => {
42764339
for arg in &f.args {
4277-
extract_props_from_expr(arg, accesses);
4340+
extract_props_from_expr(arg, accesses, null_check);
42784341
}
42794342
}
42804343
LogicalExpr::AggregateFnCall(f) => {
42814344
for arg in &f.args {
4282-
extract_props_from_expr(arg, accesses);
4345+
extract_props_from_expr(arg, accesses, null_check);
42834346
}
42844347
}
42854348
LogicalExpr::Case(c) => {
42864349
if let Some(ref e) = c.expr {
4287-
extract_props_from_expr(e, accesses);
4350+
extract_props_from_expr(e, accesses, null_check);
42884351
}
42894352
for (w, t) in &c.when_then {
4290-
extract_props_from_expr(w, accesses);
4291-
extract_props_from_expr(t, accesses);
4353+
extract_props_from_expr(w, accesses, null_check);
4354+
extract_props_from_expr(t, accesses, null_check);
42924355
}
42934356
if let Some(ref e) = c.else_expr {
4294-
extract_props_from_expr(e, accesses);
4357+
extract_props_from_expr(e, accesses, null_check);
42954358
}
42964359
}
42974360
LogicalExpr::List(items) => {
42984361
for item in items {
4299-
extract_props_from_expr(item, accesses);
4362+
extract_props_from_expr(item, accesses, null_check);
43004363
}
43014364
}
43024365
_ => {}

0 commit comments

Comments
 (0)