Skip to content

Commit 4f8100f

Browse files
genezhangclaude
andauthored
release: v0.6.6-dev — TCK 383/402, type inference ANY fix, chdb resource caps (#270)
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent cc5b6c8 commit 4f8100f

10 files changed

Lines changed: 268 additions & 85 deletions

File tree

Cargo.lock

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ lazy_static = "1.5"
4949
regex = "1.12.3"
5050
async-trait = "0.1"
5151
# chdb: embedded ClickHouse (optional — only compiled when feature "embedded" is enabled)
52-
chdb-rust = { version = "1.3.0", optional = true }
52+
chdb-rust = { version = "1.3.1", optional = true }
5353

5454
[target.'cfg(not(target_env = "msvc"))'.dependencies]
5555
tikv-jemallocator = { version = "0.6", optional = true }

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,17 @@
44

55
# ClickGraph
66

7-
#### ClickGraph - A high-performance, stateless, read-only graph query service for ClickHouse, written in Rust, with Neo4j ecosystem compatibility - Cypher and Bolt Protocol 5.8 support. Now supports embedded mode with local writes, and exporting query results to external destinations, with Golang, Python bindings, in addition to native Rust.
7+
#### ClickGraph - A high-performance, stateless, read-only graph query service for ClickHouse, written in Rust, with Neo4j ecosystem compatibility - Cypher and Bolt Protocol 5.8 support. Now supports embedded mode with local writes, and exporting query results to external destinations, with Golang and Python bindings, in addition to native Rust. New `cg` CLI tool supports agentic workflows.
88

99
> **Note: ClickGraph dev release is at beta quality for view-based graph analytics applications. Kindly raise an issue if you encounter any problem.**
1010
11-
---
12-
## Motivation and Rationale
13-
- Viewing ClickHouse databases (including external sources) as graph data with graph analytics capability brings another level of abstraction and boosts productivity with graph tools, and enables agentic GraphRAG support with local writes.
14-
- Research shows relational analytics with columnar stores and vectorized execution engines like ClickHouse provide superior analytical performance and scalability to graph-native technologies, which usually leverage explicit adjacency representations and are more suitable for local-area graph traversals.
15-
- View-based graph analytics offer the benefits of zero-ETL without the hassle of data migration and duplicate cost, yet better performance and scalability than most of the native graph analytics options.
16-
- Neo4j Bolt protocol support gives access to the tools available based on the Bolt protocol.
11+
`ClickGraph` provides three modes now:
12+
- Stateless service
13+
- Embedded mode with embedded `chDB`
14+
- Hybrid mode with remote querying and local storage
15+
16+
See [motivation and rationale](docs/motivation.md).
17+
1718
---
1819
## What's New in v0.6.6-dev
1920

@@ -75,7 +76,7 @@ See [CHANGELOG.md](CHANGELOG.md) for complete release history.
7576

7677
## Architecture
7778

78-
ClickGraph runs as a lightweight stateless query translator alongside ClickHouse:
79+
ClickGraph service runs as a lightweight stateless query translator alongside ClickHouse:
7980

8081
```mermaid
8182
flowchart LR

clickgraph-embedded/src/connection.rs

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -626,9 +626,8 @@ impl<'db> Connection<'db> {
626626

627627
/// Delete nodes matching the given label and filter criteria.
628628
///
629-
/// Uses `ALTER TABLE DELETE` which is a ClickHouse mutation — asynchronous
630-
/// and resource-heavy. Not suitable for high-frequency use in tight loops.
631-
/// For bulk cleanup, prefer fewer calls with broader filters.
629+
/// Uses lightweight `DELETE FROM` — synchronous and low-overhead compared to
630+
/// the old `ALTER TABLE DELETE` mutation path.
632631
pub fn delete_nodes(
633632
&self,
634633
label: &str,
@@ -653,7 +652,6 @@ impl<'db> Connection<'db> {
653652
}
654653

655654
/// Delete edges matching the given type and filter criteria.
656-
/// See [`delete_nodes`] for mutation performance caveats.
657655
pub fn delete_edges(
658656
&self,
659657
edge_type: &str,
@@ -1569,7 +1567,7 @@ graph_schema:
15691567
assert!(conn.delete_nodes("Person", filters).is_ok());
15701568
let sqls = captured.lock().unwrap();
15711569
let sql = &sqls[0];
1572-
assert!(sql.contains("ALTER TABLE") && sql.contains("DELETE WHERE"));
1570+
assert!(sql.contains("DELETE FROM") && sql.contains("WHERE"));
15731571
assert!(sql.contains("full_name") && sql.contains("'Alice'"));
15741572
}
15751573

clickgraph-embedded/src/database.rs

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ impl std::fmt::Debug for RemoteConfig {
5656
/// Configuration for an embedded database session.
5757
///
5858
/// Mirrors `kuzu::SystemConfig`.
59+
///
60+
/// All fields are `Option` so that callers can safely use `..SystemConfig::default()`
61+
/// to forward-compatibly add new fields without breaking struct literals.
5962
#[derive(Debug, Clone, Default)]
6063
pub struct SystemConfig {
6164
/// Directory where chdb stores its session data.
@@ -69,9 +72,13 @@ pub struct SystemConfig {
6972

7073
/// Maximum number of threads for chdb query execution.
7174
/// `None` uses the chdb default (typically number of CPU cores).
72-
/// Reserved for future use -- not yet passed to chdb session.
7375
pub max_threads: Option<usize>,
7476

77+
/// Maximum memory a single query may use, in bytes.
78+
/// `None` uses the chdb/ClickHouse default (no cap).
79+
/// Set this in test environments to prevent runaway memory usage.
80+
pub max_memory_usage_bytes: Option<u64>,
81+
7582
/// Storage credentials for remote sources (S3, GCS, Azure Blob, Iceberg).
7683
///
7784
/// Applied as ClickHouse session-level `SET` commands before any VIEWs are
@@ -160,6 +167,17 @@ impl Database {
160167
ChdbExecutor::new_with_credentials(&session_dir, auto_cleanup, &config.credentials)
161168
.map_err(|e| EmbeddedError::Executor(e.to_string()))?;
162169

170+
if let Some(threads) = config.max_threads {
171+
executor
172+
.execute_blocking_ddl(&format!("SET max_threads = {threads}"))
173+
.map_err(|e| EmbeddedError::Executor(e.to_string()))?;
174+
}
175+
if let Some(bytes) = config.max_memory_usage_bytes {
176+
executor
177+
.execute_blocking_ddl(&format!("SET max_memory_usage = {bytes}"))
178+
.map_err(|e| EmbeddedError::Executor(e.to_string()))?;
179+
}
180+
163181
let view_count = clickgraph::executor::data_loader::load_schema_sources(&executor, &schema)
164182
.map_err(|e| EmbeddedError::Executor(e.to_string()))?;
165183
if view_count > 0 {

clickgraph-embedded/src/write_helpers.rs

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ pub fn build_insert_sql(
103103
)
104104
}
105105

106-
/// Build an `ALTER TABLE ... DELETE WHERE ...` SQL statement.
106+
/// Build a `DELETE FROM ... WHERE ...` SQL statement (lightweight delete).
107107
///
108108
/// Maps filter keys from Cypher property names to ClickHouse column names using
109109
/// `property_mappings`, and renders filter values via `Value::to_sql_literal()`.
@@ -146,7 +146,7 @@ pub fn build_delete_sql(
146146
}
147147

148148
Ok(format!(
149-
"ALTER TABLE `{}`.`{}` DELETE WHERE {}",
149+
"DELETE FROM `{}`.`{}` WHERE {}",
150150
db,
151151
table,
152152
conditions.join(" AND ")
@@ -393,7 +393,7 @@ mod tests {
393393
let sql = build_delete_sql("mydb", "users", &filters, &mappings, &["user_id"]).unwrap();
394394
assert_eq!(
395395
sql,
396-
"ALTER TABLE `mydb`.`users` DELETE WHERE `full_name` = 'Alice'"
396+
"DELETE FROM `mydb`.`users` WHERE `full_name` = 'Alice'"
397397
);
398398
}
399399

@@ -406,10 +406,7 @@ mod tests {
406406
filters.insert("user_id".to_string(), Value::String("u123".to_string()));
407407

408408
let sql = build_delete_sql("mydb", "users", &filters, &mappings, &["user_id"]).unwrap();
409-
assert_eq!(
410-
sql,
411-
"ALTER TABLE `mydb`.`users` DELETE WHERE `user_id` = 'u123'"
412-
);
409+
assert_eq!(sql, "DELETE FROM `mydb`.`users` WHERE `user_id` = 'u123'");
413410
}
414411

415412
#[test]
@@ -426,7 +423,7 @@ mod tests {
426423
// Keys are sorted for deterministic output
427424
assert_eq!(
428425
sql,
429-
"ALTER TABLE `mydb`.`users` DELETE WHERE `user_age` = 30 AND `full_name` = 'Bob'"
426+
"DELETE FROM `mydb`.`users` WHERE `user_age` = 30 AND `full_name` = 'Bob'"
430427
);
431428
}
432429

clickgraph-tck/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ path = "tests/tck.rs"
1515
harness = false
1616

1717
[dev-dependencies]
18-
clickgraph-embedded = { path = "../clickgraph-embedded" }
18+
clickgraph-embedded = { path = "../clickgraph-embedded", features = ["embedded"] }
1919
cucumber = "0.21"
2020
futures = "0.3"
2121
uuid = { version = "1", features = ["v4"] }

clickgraph-tck/README.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# clickgraph-tck
2+
3+
openCypher [Technology Compatibility Kit (TCK)](https://github.com/opencypher/openCypher/tree/master/tck) runner for ClickGraph, using the embedded chdb engine.
4+
5+
## Current status
6+
7+
**383 / 402 scenarios passing (95.3%)** — 19 skipped (`@NegativeTests` / `@skip`), 0 failing.
8+
9+
## What it tests
10+
11+
The TCK is the openCypher project's official conformance suite. Each scenario is a Gherkin `.feature` file that specifies a graph setup, a Cypher query, and expected results. This crate runs a subset of those scenarios against ClickGraph's embedded query engine.
12+
13+
**Current coverage** — 402 scenarios across 20 feature files:
14+
15+
| Category | Feature files | Scenarios |
16+
|----------|--------------|-----------|
17+
| `MATCH` / `OPTIONAL MATCH` | Match1–3, MatchWhere1–2 | 66 |
18+
| `RETURN` / `ORDER BY` / `SKIP`+`LIMIT` | Return1–3, ReturnOrderBy1, ReturnSkipLimit1 | 46 |
19+
| `WITH` | With1–3 | 9 |
20+
| Aggregation (`count`, `min`, `max`) | Aggregation1–2 | 14 |
21+
| Boolean expressions | Boolean1 | 7 |
22+
| Comparison expressions | Comparison1 | 13 |
23+
| List expressions | List1 | 5 |
24+
| Null handling | Null1 | 5 |
25+
| String functions | String1 | 1 |
26+
27+
Scenarios tagged `@NegativeTests`, `@skip`, `@fails`, `@crash`, or `@wip` are skipped.
28+
29+
### Known gaps
30+
31+
Write operations (`SET`, `DELETE`, `MERGE`) are not covered — ClickGraph is a read-query engine. The TCK's write-oriented feature files are not included.
32+
33+
## How it works
34+
35+
1. **Schema generation** — at startup, all `.feature` files are scanned to extract every `CREATE` block. Node labels and relationship types are collected into a universal schema (`SchemaCatalog`), which is written as a ClickGraph YAML schema and used to create `ReplacingMergeTree` tables in chdb.
36+
37+
2. **One chdb session per process** — chdb supports only one active session per process. A single `Database` is created at startup and shared across all scenarios via `LazyLock`. Tables are **truncated** between scenarios rather than recreated.
38+
39+
3. **Test execution** — each scenario follows the standard Cucumber lifecycle:
40+
- *Given* `an empty graph` / `having executed:` — truncates tables, then runs Cypher `CREATE` statements to populate data
41+
- *When* `executing query:` — translates Cypher to SQL and executes it via the embedded engine
42+
- *Then* `the result should be (in any order / in order)` — normalises output (bools, nulls, floats) and compares with the expected Gherkin table
43+
44+
## Running
45+
46+
```bash
47+
# Requires CLICKGRAPH_CHDB_TESTS=1 to opt in to chdb e2e tests
48+
CLICKGRAPH_CHDB_TESTS=1 cargo test -p clickgraph-tck --test tck
49+
50+
# Show SQL generated for failing scenarios (written to /tmp/tck_failing_sql.txt)
51+
CLICKGRAPH_CHDB_TESTS=1 cargo test -p clickgraph-tck --test tck 2>&1 | grep FAIL
52+
```
53+
54+
> **Important**: Never run multiple instances of these tests concurrently. chdb is a
55+
> full in-process ClickHouse engine and is memory-intensive. The test harness caps
56+
> each session to 4 threads and 4 GiB per query; running several instances in
57+
> parallel will still saturate available RAM.
58+
59+
## Adding feature files
60+
61+
1. Copy the `.feature` file from the [openCypher TCK](https://github.com/opencypher/openCypher/tree/master/tck/features) into `tests/features/clauses/` or `tests/features/expressions/`.
62+
2. Update `tests/features/FEATURES_VERSION` with the source commit.
63+
3. Run the tests — the schema generator picks up new labels/rel-types automatically.
64+
4. Tag scenarios that rely on unsupported features with `@skip` and add a comment explaining why.
65+
66+
## Directory structure
67+
68+
```
69+
clickgraph-tck/
70+
├── Cargo.toml
71+
└── tests/
72+
├── tck.rs # Cucumber test harness (step definitions, world state)
73+
├── create_parser.rs # Re-export of the embedded Cypher CREATE parser
74+
├── schema_gen.rs # Universal schema inference from feature files
75+
├── result_fmt.rs # Result normalisation and Gherkin table parsing
76+
└── features/
77+
├── FEATURES_VERSION
78+
├── clauses/ # MATCH, WITH, RETURN, ORDER BY, SKIP/LIMIT
79+
└── expressions/ # Aggregation, Boolean, Comparison, List, Null, String
80+
```

clickgraph-tck/tests/tck.rs

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,15 @@ static SHARED: LazyLock<&'static TckDatabase> = LazyLock::new(|| {
4848
let schema_path = std::env::temp_dir().join("clickgraph_tck_schema.yaml");
4949
std::fs::write(&schema_path, &yaml).expect("write TCK schema YAML");
5050

51-
let db =
52-
Database::in_memory(&schema_path, SystemConfig::default()).expect("create TCK database");
51+
// Cap resource usage: TCK queries are trivial; 4 threads and 4 GiB per
52+
// query is plenty and prevents runaway memory if multiple test processes
53+
// are accidentally started in parallel.
54+
let config = SystemConfig {
55+
max_threads: Some(4),
56+
max_memory_usage_bytes: Some(4 * 1024 * 1024 * 1024), // 4 GiB
57+
..SystemConfig::default()
58+
};
59+
let db = Database::in_memory(&schema_path, config).expect("create TCK database");
5360

5461
// Leak intentionally: chdb SIGABRT on Drop; same pattern as chdb_e2e.rs
5562
Box::leak(Box::new(TckDatabase { db, tables }))
@@ -67,7 +74,6 @@ fn all_tables() -> &'static [String] {
6774
fn truncate_all_tables() {
6875
let db = shared_db();
6976
if let Ok(conn) = Connection::new(db) {
70-
let _ = conn.execute_sql("SET mutations_sync=2");
7177
for table in all_tables() {
7278
let r = conn.execute_sql(&format!("TRUNCATE TABLE IF EXISTS `default`.`{table}`"));
7379
if let Err(ref e) = r {

0 commit comments

Comments
 (0)