Skip to content

release: v0.6.6 — TCK 383/402 (95.3%), type inference fix, chdb resource caps#269

Closed
genezhang wants to merge 4 commits intomainfrom
release/v0.6.6
Closed

release: v0.6.6 — TCK 383/402 (95.3%), type inference fix, chdb resource caps#269
genezhang wants to merge 4 commits intomainfrom
release/v0.6.6

Conversation

@genezhang
Copy link
Copy Markdown
Owner

Summary

  • Type inference ANY fixWHERE n.p1 = 12 OR n.p2 = 13 across multi-label nodes now works correctly: property filtering uses ANY semantics (a type is kept if it has at least one accessed property) instead of ALL, fixing MatchWhere1 and Null1 TCK failures
  • IS NULL / IS NOT NULL excluded from candidate pruning — accessing a missing property returns NULL in Cypher semantics; types shouldn't be eliminated just because a property is absent from their schema
  • TCK: 383/402 passing (95.3%), 0 failures — up from 379/402 (94.3%) with 4 failures; 19 skipped (write clauses, out of scope)
  • chdb resource capsSystemConfig gains max_memory_usage_bytes; both max_threads and max_memory_usage are now applied via SET at session init (previously max_threads was documented but never sent to chdb)
  • chdb-rust 1.3.0 → 1.3.1
  • README: motivation moved to docs/motivation.md, modes summary added, cg CLI mentioned in tagline
  • clickgraph-tck/README.md: new file documenting the TCK runner and current status

Test plan

  • cargo test -p clickgraph-tck — 383 passed, 19 skipped, 0 failed
  • cargo fmt --all / cargo clippy --all-targets clean

🤖 Generated with Claude Code

genezhang and others added 2 commits April 5, 2026 22:10
…caps

- type_inference.rs: switch property-filter from ALL→ANY semantics so OR
  predicates across multi-label nodes keep all candidate types; exclude
  IS NULL / IS NOT NULL accesses from candidate pruning (missing props → NULL,
  not an error) — fixes MatchWhere1 and Null1 TCK failures
- clickgraph-embedded: add max_memory_usage_bytes to SystemConfig; actually
  apply max_threads and max_memory_usage via SET commands at session init
- clickgraph-tck: cap TCK session to 4 threads / 4 GiB; remove stale
  mutations_sync=2 SET; bump chdb-rust 1.3.0 → 1.3.1
- README.md: move motivation to docs/motivation.md, add modes summary,
  fix skills link to absolute GitHub URL, mention cg CLI in tagline
- clickgraph-tck/README.md: update to 383/402 (95.3%), 0 failures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release-focused update that improves Cypher type inference correctness for multi-label nodes (especially disjunctive WHERE predicates), adds embedded chdb session resource caps, and updates release/TCK documentation.

Changes:

  • Adjusted type candidate pruning to use ANY-semantics for accessed properties and updated property-access extraction logic around NULL checks.
  • Added embedded chdb session caps (max_threads, max_memory_usage) and wired them into session init; updated TCK harness to use caps.
  • Documentation updates: README restructuring/links and a new clickgraph-tck/README.md; bumped chdb-rust to 1.3.1.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/query_planner/analyzer/type_inference.rs Updates property-based type pruning semantics and property-access extraction behavior.
README.md Restructures “Motivation” section and updates links / v0.6.6 notes.
clickgraph-tck/tests/tck.rs Applies embedded resource caps for the TCK runner; removes an unused session SET.
clickgraph-tck/README.md Adds documentation for the TCK runner and current scenario coverage.
clickgraph-tck/Cargo.toml Forces clickgraph-embedded to build with the embedded feature for TCK tests.
clickgraph-embedded/src/write_helpers.rs Switches generated delete SQL to DELETE FROM ... WHERE ... and updates tests/docs accordingly.
clickgraph-embedded/src/database.rs Extends SystemConfig with max_memory_usage_bytes and applies chdb SETs at session init.
clickgraph-embedded/src/connection.rs Updates delete API docs/tests to match the new delete SQL path.
Cargo.toml Bumps chdb-rust dependency version.
Cargo.lock Locks chdb-rust 1.3.1 and checksum updates.

Comment on lines +4264 to +4268
// Skip IS NULL / IS NOT NULL — accessing a missing property returns NULL,
// so the property not existing in a type's schema is semantically valid.
// Including it would incorrectly eliminate types from the candidate set.
use crate::query_planner::logical_expr::Operator as Op;
if matches!(op.operator, Op::IsNull | Op::IsNotNull) {
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_props_from_expr returns early for IS NULL / IS NOT NULL, which can make type pruning unsound with disjunctions. Example: WHERE n.p IS NOT NULL OR n.q = 5 will only record access to q, causing candidate filtering to drop types that only have p even though the left branch may match. Consider tracking the presence of null-check predicates per variable and disabling property-based pruning for that variable (or handling null-checks specially) instead of returning early here.

Copilot uses AI. Check for mistakes.
Comment thread README.md Outdated
Comment on lines +16 to +23
See [here](https://github.com/genezhang/clickgraph/blob/main/docs/motivation.md) for motivation and rationale.

---
## What's New in v0.6.6-dev

- **`cg` CLI tool** — Agent/script-oriented CLI (`clickgraph-tool` crate). Translate and execute Cypher without a running server: `cg sql`, `cg validate`, `cg query`, `cg nl` (NL→Cypher via LLM), `cg schema show/validate/discover/diff`. Config via `~/.config/cg/config.toml`. Designed for agentic callers, CI pipelines, and scripting.
- **`embedded` feature now opt-in** — `clickgraph-embedded` compiles without chdb by default. New `Database::new_remote(schema, RemoteConfig)` constructor executes Cypher against external ClickHouse with no chdb dependency — useful for lightweight tooling and the `cg` CLI.
- **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](skills/README.md).
- **Agent skills** — Three publishable skills for any agentic framework (Claude Code, LangChain, AutoGen, CrewAI, OpenAI function calling): `/cypher` (NL→Cypher→execute), `/graph-schema` (show schema), `/schema-discover` (generate schema YAML from ClickHouse via LLM). See [skills/README.md](https://github.com/genezhang/clickgraph/blob/main/skills/README.md).
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README links to repo-absolute GitHub URLs (e.g., https://github.com/.../blob/main/docs/motivation.md). This breaks for forks, offline browsing, and non-main branches/tags. Prefer relative links like docs/motivation.md / skills/README.md so the README stays correct across branches and in published crate docs.

Copilot uses AI. Check for mistakes.
Comment on lines 56 to +77
/// Configuration for an embedded database session.
///
/// Mirrors `kuzu::SystemConfig`.
#[derive(Debug, Clone, Default)]
pub struct SystemConfig {
/// Directory where chdb stores its session data.
/// Defaults to a temporary directory (auto-cleaned on drop).
pub session_dir: Option<PathBuf>,

/// Base directory for resolving relative `source:` paths in the schema.
/// If `None`, relative paths are resolved from the current working directory.
/// Reserved for future use -- not yet wired into source resolution.
pub data_dir: Option<PathBuf>,

/// Maximum number of threads for chdb query execution.
/// `None` uses the chdb default (typically number of CPU cores).
/// Reserved for future use -- not yet passed to chdb session.
pub max_threads: Option<usize>,

/// Maximum memory a single query may use, in bytes.
/// `None` uses the chdb/ClickHouse default (no cap).
/// Set this in test environments to prevent runaway memory usage.
pub max_memory_usage_bytes: Option<u64>,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemConfig is a public struct with public fields; adding a new field is a breaking change for downstream users who construct it via struct literals (without ..Default::default()). If this crate aims for semver-compatible minor releases, consider marking SystemConfig as #[non_exhaustive] and/or providing builder-style constructors to allow adding future options without breaking callers.

Copilot uses AI. Check for mistakes.
genezhang and others added 2 commits April 5, 2026 22:33
- type_inference.rs: proper IS NULL / IS NOT NULL handling — thread a
  null_check map through extract_props_from_expr/plan; IS NULL/IS NOT NULL
  operands go to null_check (not regular). Pruning uses regular ∪ null_check
  with ANY semantics, but skips pruning entirely when only null-check accesses
  exist (missing properties return NULL, not an error). Fixes the unsound early
  return that could drop types in `WHERE n.p IS NOT NULL OR n.q = 5`.
- database.rs: revert #[non_exhaustive] (prohibits struct literal construction
  in external crates even with ..default()); document the All-Option convention
  as the forward-compat strategy instead.
- README.md: use relative links (docs/motivation.md, skills/README.md)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@genezhang genezhang closed this Apr 6, 2026
@genezhang genezhang deleted the release/v0.6.6 branch April 6, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants