This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ClickGraph is a read-only graph query engine for ClickHouse, written in Rust. It translates Cypher queries into ClickHouse SQL. Write operations (CREATE, SET, DELETE, MERGE) are explicitly out of scope.
Modes of operation:
- Server mode — HTTP (axum) + Bolt v5.8 protocol servers, querying a remote ClickHouse instance
- Embedded mode — In-process serverless execution via chdb (ClickHouse embedded). Query Parquet, S3, Iceberg, Delta Lake directly without a running server
- Remote mode — Cypher translated locally, executed against an external ClickHouse (no chdb needed)
- SQL-only mode — Translate Cypher to SQL without executing (for debugging, testing, or external execution)
Ground rules: (1) Never change query semantics — honestly return what is asked, no more, no less. (2) No shortcuts — fully understand the processing flow before making changes. Quality over speed.
clickgraph/ # Main engine crate (Cypher parser, planner, SQL generator, server)
clickgraph-embedded/ # Embedded Rust API (Database/Connection/QueryResult, Kuzu-compatible)
clickgraph-ffi/ # UniFFI FFI layer (cdylib — single source of truth for all bindings)
clickgraph-go/ # Idiomatic Go bindings via cgo + UniFFI-generated C bridge
clickgraph-py/ # Pythonic wrapper over UniFFI-generated ctypes bridge
clickgraph-client/ # Interactive REPL client for querying ClickGraph servers (human use)
clickgraph-tool/ # cg CLI — agent/script-oriented tool (sql, validate, query, nl, schema)
Workspace members (in Cargo.toml): clickgraph-client, clickgraph-embedded, clickgraph-ffi, clickgraph-tool
Go and Python bindings are not Cargo workspace members — they consume libclickgraph_ffi.so.
# Build
cargo build # Debug build (all workspace members)
cargo build --release # Release build
cargo build -p clickgraph-ffi # FFI shared library only
# Format (MANDATORY before push — CI will fail without this)
cargo fmt --all
# Lint
cargo clippy --all-targets
# Rust tests (~1,600 tests across workspace)
cargo test # All Rust tests
cargo test <test_name> # Single test
cargo test -- --nocapture # With output
# Python integration tests (~3,026 tests, requires running ClickHouse + ClickGraph server)
pytest tests/integration/
pytest tests/integration/test_optional_match.py -v # Single file
# Go tests (sql_only, no chdb)
cd clickgraph-go && CGO_LDFLAGS="-L../target/debug" LD_LIBRARY_PATH="../target/debug" go test -v
# Python binding tests (sql_only, no chdb)
cd clickgraph-py && LD_LIBRARY_PATH="../target/debug" python3 -m pytest tests/test_bindings.py -v
# Pre-push checklist (all mandatory)
cargo fmt --all && cargo clippy --all-targets && cargo test
# Run server
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"
cargo run --bin clickgraph
# Debug generated SQL without executing
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query":"MATCH (n) RETURN n","sql_only":true}'
# cg CLI — agent/script-oriented tool (no server needed)
cg --schema schema.yaml sql "MATCH (n:Person) RETURN n.name" # translate only
cg --schema schema.yaml validate "MATCH (n:Person) RETURN n" # parse + plan check
cg --schema schema.yaml \
--clickhouse http://localhost:8123 \
query "MATCH (n:Person) RETURN n.name LIMIT 10" # execute via remote CH
cg --schema schema.yaml nl "find people with more than 5 friends" # NL → Cypher
cg --schema schema.yaml schema show # agent-friendly schema view
cg schema discover --clickhouse http://localhost:8123 \
--database mydb --out schema.yaml # LLM-assisted discoveryCypher Query → Parse → Plan → Optimize → Render → Generate SQL → Execute
| Stage | Module | Purpose |
|---|---|---|
| Parse | src/open_cypher_parser/ |
Cypher → AST using nom combinators |
| Plan | src/query_planner/ |
AST → LogicalPlan (analysis, type inference, traversal planning) |
| Optimize | src/query_planner/optimizer/ |
Optimization passes (projection push-down, filter push-down) |
| Render | src/render_plan/ |
LogicalPlan → RenderPlan (CTEs, SELECT, FROM, JOINs) |
| Generate | src/clickhouse_query_generator/ |
RenderPlan → ClickHouse SQL string |
| Execute | src/server/ |
HTTP (axum) + Bolt v5.8 protocol servers, ClickHouse client |
| Schema | src/graph_catalog/ |
YAML graph schema management and validation |
query_planner/analyzer/— Type inference (4-phase), view resolution, graph traversal planningquery_planner/plan_ctx/— Query context, variable scopingrender_plan/plan_builder_utils.rs— The largest file (~12K lines); WITH→CTE transformation, expression rewriting. Most regressions originate here. Always run full test suite after changes.render_plan/variable_scope.rs— Scope-aware variable resolution across WITH barriersrender_plan/cte_manager/— CTE generation and managementclickhouse_query_generator/to_sql_query.rs— Final SQL rendering, CTE flatteningserver/bolt_protocol/— Neo4j Bolt v5.8 wire protocol implementationserver/query_context.rs— Task-local schema and variable registry
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Go App │ │ Python App │ │ Rust App │ │ Agent/Script│
│ (cgo) │ │ (ctypes) │ │ (direct) │ │ (cg CLI) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │ │
clickgraph-go clickgraph-py clickgraph-embedded clickgraph-tool
│ │ (sql_only/remote) │
└────────┬────────┘ (chdb: +embedded feat) │
│ │ │
┌─────────▼──────────┐ └────────┬────────┘
│ clickgraph-ffi │ │
│ (libclickgraph_ffi │ │
│ .so / UniFFI) │ │
└─────────┬──────────┘ │
└──────────────────┬────────────────┘
│
┌──────────▼──────────┐
│ clickgraph (core) │
│ Parser + Planner + │
│ SQL Generator │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ ClickHouse / chdb │
└─────────────────────┘
Single source of truth for all language bindings. Uses UniFFI proc macros (no .udl files). Exports: Database, Connection, QueryResult, Row, Value, SystemConfig, ExportOptions, ClickGraphError.
Adding a method here automatically becomes available to all bindings after regenerating.
Regenerating bindings:
# Python
uniffi-bindgen generate --library target/debug/libclickgraph_ffi.so --language python -o clickgraph-py/clickgraph/
mv clickgraph-py/clickgraph/clickgraph_ffi.py clickgraph-py/clickgraph/_ffi.py
# Go
uniffi-bindgen-go --library target/debug/libclickgraph_ffi.so --out-dir clickgraph-go/clickgraph_ffi/Core Rust crate with Kuzu-compatible sync API (Database → Connection → QueryResult). Three constructors:
| Constructor | Needs chdb? | Use case |
|---|---|---|
Database::sql_only(schema) |
No | Translate Cypher → SQL only |
Database::new_remote(schema, RemoteConfig) |
No | Execute against external ClickHouse |
Database::new(schema, SystemConfig) |
Yes (embedded feature) |
In-process chdb execution |
The embedded feature flag is opt-in (default off). clickgraph-ffi and clickgraph-tck enable it; clickgraph-tool does not.
Schema source: field supports: local files, s3://, iceberg+s3://, delta+s3://, table_function:....
Idiomatic Go API via cgo. Module: github.com/genezhang/clickgraph-go. Requires libclickgraph_ffi.so and CGO_ENABLED=1.
PyPI package: clickgraph. Thin wrapper over auto-generated UniFFI ctypes bridge (_ffi.py is auto-generated — never edit manually).
All CTEs must be top-level siblings under a single WITH RECURSIVE. Never nest CTE definitions inside another CTE body. Enforced by flatten_all_ctes() in to_sql_query.rs.
After a WITH→CTE barrier, downstream expressions reference CTE column names directly. No reverse mapping from DB columns back to CTE columns.
Before WITH: p.name → Person.full_name (DB column via schema mapping)
After WITH: p.name → CTE1.p6_person_name (CTE column via property_mapping)
Query-processing code MUST access schema via task-local QueryContext (get_current_schema()), never directly from GLOBAL_SCHEMAS. GLOBAL_SCHEMAS is only for server init and admin endpoints.
JOIN ordering uses topological sort based on anchor_connection. When the anchor node is the right connection in OPTIONAL MATCH, FROM and JOIN order must be reversed.
Five functions must agree on plan traversal for WITH processing (see render_plan/AGENTS.md §6). When adding a new LogicalPlan variant, ensure all five handle it:
has_with_clause_in_tree()plan_contains_with_clause()find_all_with_clauses_grouped()needs_processing()replace_with_clause_with_cte_reference_v2()
WITH CTE columns use unambiguous p{N}_{alias}_{property} format where N = character length of alias. Generated by cte_column_name() in src/utils/cte_column_naming.rs.
Always use the benchmark schema for development: benchmarks/social_network/schemas/social_benchmark.yaml
Property names in Cypher may differ from ClickHouse column names (e.g., u.name → full_name, u.email → email_address). Schema defines these mappings.
Five schema variations exist: Standard, FK-edge, Denormalized, Polymorphic, Composite ID. Bug fixes should be tested against all relevant variations.
- Branch naming:
fix/,perf/,refactor/,test/,docs/,feature/prefixes - Error handling:
thiserrorfor error types,?operator for propagation, no panics - Module pattern:
mod.rsas entry, separate files per component,errors.rsper module - Late-stage project: Reuse existing code before writing new. Investigate thoroughly before claiming code is dead. Add regression tests for fixed bugs.
- serde_json uses
preserve_orderfeature — column order in JSON responses must match SQL SELECT order
| Variable | Purpose |
|---|---|
CLICKHOUSE_URL |
ClickHouse connection URL (server mode) |
CLICKHOUSE_USER / CLICKHOUSE_PASSWORD |
Credentials (server mode) |
GRAPH_CONFIG_PATH |
Required for server mode — YAML schema file path |
RUST_LOG |
Logging level (debug, info) |
CLICKGRAPH_THREAD_STACK_MB |
Tokio worker thread stack (default 128 MB) |
CLICKGRAPH_CHDB_TESTS |
Set to 1 to enable chdb e2e tests |
CLICKGRAPH_LLM_PROVIDER |
LLM provider for schema discovery (anthropic or openai) |
ANTHROPIC_API_KEY / OPENAI_API_KEY |
API keys for LLM schema discovery |
CG_SCHEMA |
Default schema file path for cg CLI |
CG_CLICKHOUSE_URL |
ClickHouse URL for cg query |
CG_CLICKHOUSE_USER / CG_CLICKHOUSE_PASSWORD |
Credentials for cg query |
CG_LLM_PROVIDER |
LLM provider for cg nl and cg schema discover |
CG_LLM_MODEL / CG_LLM_API_KEY / CG_LLM_BASE_URL |
LLM config for cg |
STATUS.md— Single source of truth for project stateCHANGELOG.md— Release history (Keep-a-Changelog format)DEV_QUICK_START.md— Essential developer workflowDEVELOPMENT_PROCESS.md— Detailed 6-phase development process.github/copilot-instructions.md— Comprehensive architecture guide*/AGENTS.md— Module-level architecture guides (insrc/,src/render_plan/,src/server/,clickgraph-ffi/,clickgraph-embedded/,clickgraph-tool/,clickgraph-go/,clickgraph-py/, etc.)docs/wiki/cypher-language-reference.md— Primary feature documentation (must be updated for every feature)