-
-
Notifications
You must be signed in to change notification settings - Fork 4
release: v0.6.6 — TCK 383/402 (95.3%), type inference fix, chdb resource caps #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -69,9 +69,13 @@ pub struct SystemConfig { | |
|
|
||
| /// Maximum number of threads for chdb query execution. | ||
| /// `None` uses the chdb default (typically number of CPU cores). | ||
| /// Reserved for future use -- not yet passed to chdb session. | ||
| pub max_threads: Option<usize>, | ||
|
|
||
| /// Maximum memory a single query may use, in bytes. | ||
| /// `None` uses the chdb/ClickHouse default (no cap). | ||
| /// Set this in test environments to prevent runaway memory usage. | ||
| pub max_memory_usage_bytes: Option<u64>, | ||
|
Comment on lines
+56
to
+80
|
||
|
|
||
| /// Storage credentials for remote sources (S3, GCS, Azure Blob, Iceberg). | ||
| /// | ||
| /// Applied as ClickHouse session-level `SET` commands before any VIEWs are | ||
|
|
@@ -160,6 +164,17 @@ impl Database { | |
| ChdbExecutor::new_with_credentials(&session_dir, auto_cleanup, &config.credentials) | ||
| .map_err(|e| EmbeddedError::Executor(e.to_string()))?; | ||
|
|
||
| if let Some(threads) = config.max_threads { | ||
| executor | ||
| .execute_blocking_ddl(&format!("SET max_threads = {threads}")) | ||
| .map_err(|e| EmbeddedError::Executor(e.to_string()))?; | ||
| } | ||
| if let Some(bytes) = config.max_memory_usage_bytes { | ||
| executor | ||
| .execute_blocking_ddl(&format!("SET max_memory_usage = {bytes}")) | ||
| .map_err(|e| EmbeddedError::Executor(e.to_string()))?; | ||
| } | ||
|
|
||
| let view_count = clickgraph::executor::data_loader::load_schema_sources(&executor, &schema) | ||
| .map_err(|e| EmbeddedError::Executor(e.to_string()))?; | ||
| if view_count > 0 { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # clickgraph-tck | ||
|
|
||
| openCypher [Technology Compatibility Kit (TCK)](https://github.com/opencypher/openCypher/tree/master/tck) runner for ClickGraph, using the embedded chdb engine. | ||
|
|
||
| ## Current status | ||
|
|
||
| **383 / 402 scenarios passing (95.3%)** — 19 skipped (`@NegativeTests` / `@skip`), 0 failing. | ||
|
|
||
| ## What it tests | ||
|
|
||
| The TCK is the openCypher project's official conformance suite. Each scenario is a Gherkin `.feature` file that specifies a graph setup, a Cypher query, and expected results. This crate runs a subset of those scenarios against ClickGraph's embedded query engine. | ||
|
|
||
| **Current coverage** — 402 scenarios across 20 feature files: | ||
|
|
||
| | Category | Feature files | Scenarios | | ||
| |----------|--------------|-----------| | ||
| | `MATCH` / `OPTIONAL MATCH` | Match1–3, MatchWhere1–2 | 66 | | ||
| | `RETURN` / `ORDER BY` / `SKIP`+`LIMIT` | Return1–3, ReturnOrderBy1, ReturnSkipLimit1 | 46 | | ||
| | `WITH` | With1–3 | 9 | | ||
| | Aggregation (`count`, `min`, `max`) | Aggregation1–2 | 14 | | ||
| | Boolean expressions | Boolean1 | 7 | | ||
| | Comparison expressions | Comparison1 | 13 | | ||
| | List expressions | List1 | 5 | | ||
| | Null handling | Null1 | 5 | | ||
| | String functions | String1 | 1 | | ||
|
|
||
| Scenarios tagged `@NegativeTests`, `@skip`, `@fails`, `@crash`, or `@wip` are skipped. | ||
|
|
||
| ### Known gaps | ||
|
|
||
| Write operations (`SET`, `DELETE`, `MERGE`) are not covered — ClickGraph is a read-query engine. The TCK's write-oriented feature files are not included. | ||
|
|
||
| ## How it works | ||
|
|
||
| 1. **Schema generation** — at startup, all `.feature` files are scanned to extract every `CREATE` block. Node labels and relationship types are collected into a universal schema (`SchemaCatalog`), which is written as a ClickGraph YAML schema and used to create `ReplacingMergeTree` tables in chdb. | ||
|
|
||
| 2. **One chdb session per process** — chdb supports only one active session per process. A single `Database` is created at startup and shared across all scenarios via `LazyLock`. Tables are **truncated** between scenarios rather than recreated. | ||
|
|
||
| 3. **Test execution** — each scenario follows the standard Cucumber lifecycle: | ||
| - *Given* `an empty graph` / `having executed:` — truncates tables, then runs Cypher `CREATE` statements to populate data | ||
| - *When* `executing query:` — translates Cypher to SQL and executes it via the embedded engine | ||
| - *Then* `the result should be (in any order / in order)` — normalises output (bools, nulls, floats) and compares with the expected Gherkin table | ||
|
|
||
| ## Running | ||
|
|
||
| ```bash | ||
| # Requires CLICKGRAPH_CHDB_TESTS=1 to opt in to chdb e2e tests | ||
| CLICKGRAPH_CHDB_TESTS=1 cargo test -p clickgraph-tck --test tck | ||
|
|
||
| # Show SQL generated for failing scenarios (written to /tmp/tck_failing_sql.txt) | ||
| CLICKGRAPH_CHDB_TESTS=1 cargo test -p clickgraph-tck --test tck 2>&1 | grep FAIL | ||
| ``` | ||
|
|
||
| > **Important**: Never run multiple instances of these tests concurrently. chdb is a | ||
| > full in-process ClickHouse engine and is memory-intensive. The test harness caps | ||
| > each session to 4 threads and 4 GiB per query; running several instances in | ||
| > parallel will still saturate available RAM. | ||
|
|
||
| ## Adding feature files | ||
|
|
||
| 1. Copy the `.feature` file from the [openCypher TCK](https://github.com/opencypher/openCypher/tree/master/tck/features) into `tests/features/clauses/` or `tests/features/expressions/`. | ||
| 2. Update `tests/features/FEATURES_VERSION` with the source commit. | ||
| 3. Run the tests — the schema generator picks up new labels/rel-types automatically. | ||
| 4. Tag scenarios that rely on unsupported features with `@skip` and add a comment explaining why. | ||
|
|
||
| ## Directory structure | ||
|
|
||
| ``` | ||
| clickgraph-tck/ | ||
| ├── Cargo.toml | ||
| └── tests/ | ||
| ├── tck.rs # Cucumber test harness (step definitions, world state) | ||
| ├── create_parser.rs # Re-export of the embedded Cypher CREATE parser | ||
| ├── schema_gen.rs # Universal schema inference from feature files | ||
| ├── result_fmt.rs # Result normalisation and Gherkin table parsing | ||
| └── features/ | ||
| ├── FEATURES_VERSION | ||
| ├── clauses/ # MATCH, WITH, RETURN, ORDER BY, SKIP/LIMIT | ||
| └── expressions/ # Aggregation, Boolean, Comparison, List, Null, String | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1478,12 +1478,22 @@ impl TypeInference { | |
|
|
||
| // Constraint 2: Filter by accessed properties | ||
| // Uses property_mappings keys (Cypher names), NOT column_names (ClickHouse names) | ||
| // | ||
| // We use ANY-semantics: a type is kept if it has at least one of the accessed | ||
| // properties. This correctly handles OR predicates (`WHERE n.p1 = 12 OR n.p2 = 13`) | ||
| // where different types satisfy different branches. IS NULL accesses are excluded | ||
| // from `property_accesses` entirely (see extract_props_from_expr) because | ||
| // accessing a missing property yields NULL, not an error. | ||
| // | ||
| // ALL-semantics would incorrectly eliminate types when properties appear in | ||
| // disjunctive (OR) predicates or when a type has only some of the accessed | ||
| // properties (those it lacks resolve to NULL and are filtered by the WHERE clause). | ||
| if let Some(props) = property_accesses.get(var_name) { | ||
| candidates.retain(|type_name| { | ||
| if let Ok(node_schema) = graph_schema.node_schema(type_name) { | ||
| props | ||
| .iter() | ||
| .all(|prop| node_schema.has_cypher_property(prop)) | ||
| .any(|prop| node_schema.has_cypher_property(prop)) | ||
| } else { | ||
| false | ||
| } | ||
|
|
@@ -1501,10 +1511,8 @@ impl TypeInference { | |
| "🔍 No valid types for '{}' after constraints — query returns empty result", | ||
| var_name | ||
| ); | ||
| // No type satisfies the constraints for this variable, so the entire | ||
| // query pattern produces no rows. Return Empty immediately so that | ||
| // downstream passes (FilterTagging, rendering) produce a correct | ||
| // zero-row plan (SELECT 1 AS "_empty" WHERE false). | ||
| // No type has ANY of the accessed properties (or relationship constraints | ||
| // eliminated all candidates). The query can produce no rows. | ||
| return Ok(Arc::new(LogicalPlan::Empty)); | ||
| } | ||
|
|
||
|
|
@@ -4253,6 +4261,13 @@ fn extract_props_from_expr(expr: &LogicalExpr, accesses: &mut HashMap<String, Ha | |
| accesses.entry(var).or_default().insert(prop); | ||
| } | ||
| LogicalExpr::Operator(op) | LogicalExpr::OperatorApplicationExp(op) => { | ||
| // Skip IS NULL / IS NOT NULL — accessing a missing property returns NULL, | ||
| // so the property not existing in a type's schema is semantically valid. | ||
| // Including it would incorrectly eliminate types from the candidate set. | ||
| use crate::query_planner::logical_expr::Operator as Op; | ||
| if matches!(op.operator, Op::IsNull | Op::IsNotNull) { | ||
|
||
| return; | ||
| } | ||
| for operand in &op.operands { | ||
| extract_props_from_expr(operand, accesses); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README links to repo-absolute GitHub URLs (e.g.,
https://github.com/.../blob/main/docs/motivation.md). This breaks for forks, offline browsing, and non-mainbranches/tags. Prefer relative links likedocs/motivation.md/skills/README.mdso the README stays correct across branches and in published crate docs.