release: v0.6.6 — TCK 383/402 (95.3%), type inference fix, chdb resource caps#269
release: v0.6.6 — TCK 383/402 (95.3%), type inference fix, chdb resource caps#269
Conversation
…caps - type_inference.rs: switch property-filter from ALL→ANY semantics so OR predicates across multi-label nodes keep all candidate types; exclude IS NULL / IS NOT NULL accesses from candidate pruning (missing props → NULL, not an error) — fixes MatchWhere1 and Null1 TCK failures - clickgraph-embedded: add max_memory_usage_bytes to SystemConfig; actually apply max_threads and max_memory_usage via SET commands at session init - clickgraph-tck: cap TCK session to 4 threads / 4 GiB; remove stale mutations_sync=2 SET; bump chdb-rust 1.3.0 → 1.3.1 - README.md: move motivation to docs/motivation.md, add modes summary, fix skills link to absolute GitHub URL, mention cg CLI in tagline - clickgraph-tck/README.md: update to 383/402 (95.3%), 0 failures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Release-focused update that improves Cypher type inference correctness for multi-label nodes (especially disjunctive WHERE predicates), adds embedded chdb session resource caps, and updates release/TCK documentation.
Changes:
- Adjusted type candidate pruning to use ANY-semantics for accessed properties and updated property-access extraction logic around NULL checks.
- Added embedded chdb session caps (
max_threads,max_memory_usage) and wired them into session init; updated TCK harness to use caps. - Documentation updates: README restructuring/links and a new
clickgraph-tck/README.md; bumpedchdb-rustto 1.3.1.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/query_planner/analyzer/type_inference.rs | Updates property-based type pruning semantics and property-access extraction behavior. |
| README.md | Restructures “Motivation” section and updates links / v0.6.6 notes. |
| clickgraph-tck/tests/tck.rs | Applies embedded resource caps for the TCK runner; removes an unused session SET. |
| clickgraph-tck/README.md | Adds documentation for the TCK runner and current scenario coverage. |
| clickgraph-tck/Cargo.toml | Forces clickgraph-embedded to build with the embedded feature for TCK tests. |
| clickgraph-embedded/src/write_helpers.rs | Switches generated delete SQL to DELETE FROM ... WHERE ... and updates tests/docs accordingly. |
| clickgraph-embedded/src/database.rs | Extends SystemConfig with max_memory_usage_bytes and applies chdb SETs at session init. |
| clickgraph-embedded/src/connection.rs | Updates delete API docs/tests to match the new delete SQL path. |
| Cargo.toml | Bumps chdb-rust dependency version. |
| Cargo.lock | Locks chdb-rust 1.3.1 and checksum updates. |
| // Skip IS NULL / IS NOT NULL — accessing a missing property returns NULL, | ||
| // so the property not existing in a type's schema is semantically valid. | ||
| // Including it would incorrectly eliminate types from the candidate set. | ||
| use crate::query_planner::logical_expr::Operator as Op; | ||
| if matches!(op.operator, Op::IsNull | Op::IsNotNull) { |
There was a problem hiding this comment.
extract_props_from_expr returns early for IS NULL / IS NOT NULL, which can make type pruning unsound with disjunctions. Example: WHERE n.p IS NOT NULL OR n.q = 5 will only record access to q, causing candidate filtering to drop types that only have p even though the left branch may match. Consider tracking the presence of null-check predicates per variable and disabling property-based pruning for that variable (or handling null-checks specially) instead of returning early here.
| See [here](https://github.com/genezhang/clickgraph/blob/main/docs/motivation.md) for motivation and rationale. | ||
|
|
||
| --- | ||
| ## What's New in v0.6.6-dev | ||
|
|
||
| - **`cg` CLI tool** — Agent/script-oriented CLI (`clickgraph-tool` crate). Translate and execute Cypher without a running server: `cg sql`, `cg validate`, `cg query`, `cg nl` (NL→Cypher via LLM), `cg schema show/validate/discover/diff`. Config via `~/.config/cg/config.toml`. Designed for agentic callers, CI pipelines, and scripting. | ||
| - **`embedded` feature now opt-in** — `clickgraph-embedded` compiles without chdb by default. New `Database::new_remote(schema, RemoteConfig)` constructor executes Cypher against external ClickHouse with no chdb dependency — useful for lightweight tooling and the `cg` CLI. | ||
| - **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](skills/README.md). | ||
| - **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](https://github.com/genezhang/clickgraph/blob/main/skills/README.md). |
There was a problem hiding this comment.
The README links to repo-absolute GitHub URLs (e.g., https://github.com/.../blob/main/docs/motivation.md). This breaks for forks, offline browsing, and non-main branches/tags. Prefer relative links like docs/motivation.md / skills/README.md so the README stays correct across branches and in published crate docs.
| /// Configuration for an embedded database session. | ||
| /// | ||
| /// Mirrors `kuzu::SystemConfig`. | ||
| #[derive(Debug, Clone, Default)] | ||
| pub struct SystemConfig { | ||
| /// Directory where chdb stores its session data. | ||
| /// Defaults to a temporary directory (auto-cleaned on drop). | ||
| pub session_dir: Option<PathBuf>, | ||
|
|
||
| /// Base directory for resolving relative `source:` paths in the schema. | ||
| /// If `None`, relative paths are resolved from the current working directory. | ||
| /// Reserved for future use -- not yet wired into source resolution. | ||
| pub data_dir: Option<PathBuf>, | ||
|
|
||
| /// Maximum number of threads for chdb query execution. | ||
| /// `None` uses the chdb default (typically number of CPU cores). | ||
| /// Reserved for future use -- not yet passed to chdb session. | ||
| pub max_threads: Option<usize>, | ||
|
|
||
| /// Maximum memory a single query may use, in bytes. | ||
| /// `None` uses the chdb/ClickHouse default (no cap). | ||
| /// Set this in test environments to prevent runaway memory usage. | ||
| pub max_memory_usage_bytes: Option<u64>, |
There was a problem hiding this comment.
SystemConfig is a public struct with public fields; adding a new field is a breaking change for downstream users who construct it via struct literals (without ..Default::default()). If this crate aims for semver-compatible minor releases, consider marking SystemConfig as #[non_exhaustive] and/or providing builder-style constructors to allow adding future options without breaking callers.
- type_inference.rs: proper IS NULL / IS NOT NULL handling — thread a null_check map through extract_props_from_expr/plan; IS NULL/IS NOT NULL operands go to null_check (not regular). Pruning uses regular ∪ null_check with ANY semantics, but skips pruning entirely when only null-check accesses exist (missing properties return NULL, not an error). Fixes the unsound early return that could drop types in `WHERE n.p IS NOT NULL OR n.q = 5`. - database.rs: revert #[non_exhaustive] (prohibits struct literal construction in external crates even with ..default()); document the All-Option convention as the forward-compat strategy instead. - README.md: use relative links (docs/motivation.md, skills/README.md) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
WHERE n.p1 = 12 OR n.p2 = 13across multi-label nodes now works correctly: property filtering uses ANY semantics (a type is kept if it has at least one accessed property) instead of ALL, fixing MatchWhere1 and Null1 TCK failuresSystemConfiggainsmax_memory_usage_bytes; bothmax_threadsandmax_memory_usageare now applied viaSETat session init (previouslymax_threadswas documented but never sent to chdb)docs/motivation.md, modes summary added,cgCLI mentioned in taglineTest plan
cargo test -p clickgraph-tck— 383 passed, 19 skipped, 0 failedcargo fmt --all/cargo clippy --all-targetsclean🤖 Generated with Claude Code