This guide will help you get ClickGraph up and running quickly for your first graph analysis on ClickHouse data.
- ClickHouse: Version 21.3+ running locally or accessible via network
- Docker: For running ClickHouse and ClickGraph (recommended)
- Rust: Version 1.85+ (only if building from source)
Pull pre-built images and start everything:
# Download docker-compose.yaml
curl -o docker-compose.yaml https://raw.githubusercontent.com/genezhang/clickgraph/main/docker-compose.yaml
# Start all services (ClickHouse + ClickGraph)
docker-compose up -d
# Check logs
docker-compose logs -f clickgraphYou should see:
ClickGraph v0.5.1 (fork of Brahmand)
Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
ClickGraph server is running
HTTP API: http://0.0.0.0:8080
Bolt Protocol: bolt://0.0.0.0:7687
🎉 ClickGraph is now running!
What just happened?
- ✅ ClickHouse downloaded and started (ports 8123/9000)
- ✅ ClickGraph image pulled from Docker Hub (
genezhang/clickgraph:latest) - ✅ Pre-configured schema loaded (
test_integration.yaml) - ✅ HTTP API ready at http://localhost:8080
- ✅ Bolt protocol ready at bolt://localhost:7687
Skip to First Graph Query to start querying!
Prerequisites: Rust toolchain 1.85+ and Docker for ClickHouse
# 1. Clone repository
git clone https://github.com/genezhang/clickgraph
cd clickgraph
# 2. Start ONLY ClickHouse (not the clickgraph service)
docker-compose up -d clickhouse-service
# Note: If you accidentally started all services with 'docker-compose up -d',
# stop the clickgraph container first:
# docker-compose stop clickgraph💡 Why
clickhouse-serviceonly? This starts just ClickHouse, allowing you to run ClickGraph from source withcargo run. If you rundocker-compose up -d(all services), both the containerized ClickGraph and your localcargo runwill try to bind to port 8080, causing a conflict.
Build and Run:
# Build ClickGraph
cargo build --release
# Set required environment variables
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export CLICKHOUSE_DATABASE="brahmand"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"
# Start ClickGraph
cargo run --bin clickgraph
# Or use custom ports if 8080 is already in use:
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688
⚠️ Required:GRAPH_CONFIG_PATHmust be set to a valid schema YAML file. Without it, ClickGraph won't know how to map your ClickHouse tables to graph nodes and edges.
💡 Multi-Schema Support (NEW in v0.6.1): You can now load multiple independent graph schemas from a single YAML file! See Schema Configuration below for details.
You should see output like:
ClickGraph v0.5.1 (fork of Brahmand)
Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
Successfully bound Bolt listener to 0.0.0.0:7687
ClickGraph server is running
HTTP API: http://0.0.0.0:8080
Bolt Protocol: bolt://0.0.0.0:7687
Bolt server loop starting, listening for connections...
🎉 ClickGraph is now running!
⚠️ Port Conflict? If you seeAddress already in useerror when runningcargo run:Error: Address already in use (os error 10048)Cause: The Docker Compose ClickGraph container is running (competes for port 8080).
Solution 1: Stop the containerized ClickGraph:
docker-compose stop clickgraphSolution 2: Use a different port:
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688 # Then access at: http://localhost:8081
⚠️ Normal Startup Warnings: You may see warnings like:Warning: Failed to connect to ClickHouse, using empty schema Error fetching remote schema: no rows returned by a queryThese are expected warnings about ClickGraph's internal catalog system. They don't affect functionality - your queries will work correctly!
# Simple test query
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{"query": "RETURN 1 as test, \"Hello ClickGraph!\" as message"}'Expected response:
{
"columns": ["test", "message"],
"data": [
{"test": 1, "message": "Hello ClickGraph!"}
],
"stats": {
"execution_time": "2ms"
}
}from neo4j import GraphDatabase
# Connect to ClickGraph via Bolt protocol
driver = GraphDatabase.driver("bolt://localhost:7687")
with driver.session() as session:
result = session.run("RETURN 1 as test, 'Hello ClickGraph!' as message")
for record in result:
print(f"Test: {record['test']}, Message: {record['message']}")
driver.close()-- Connect to ClickHouse and create sample data
CREATE TABLE users (
user_id UInt32,
name String,
age UInt8,
country String,
active UInt8 DEFAULT 1
) ENGINE = MergeTree()
ORDER BY user_id;
CREATE TABLE user_follows (
follower_id UInt32,
followed_id UInt32,
created_date Date
) ENGINE = MergeTree()
ORDER BY (follower_id, followed_id);
-- Insert sample data
INSERT INTO users VALUES
(1, 'Alice', 28, 'USA', 1),
(2, 'Bob', 34, 'Canada', 1),
(3, 'Charlie', 22, 'UK', 1),
(4, 'Diana', 31, 'Australia', 1);
INSERT INTO user_follows VALUES
(1, 2, '2023-01-15'),
(1, 3, '2023-01-20'),
(2, 3, '2023-01-25'),
(3, 4, '2023-02-01'),
(2, 4, '2023-02-05');Create social_network.yaml:
name: social_network
version: "1.0"
description: "Social network analysis"
graph_schema:
nodes:
- label: User
database: brahmand
table: users
node_id: user_id
property_mappings:
name: name
age: age
country: country
filters:
- "active = 1"
relationships:
- type: FOLLOWS
database: brahmand
table: user_follows
from_node: User
to_node: User
from_id: follower_id
to_id: followed_id
property_mappings:
since: created_date# Set schema path
export GRAPH_CONFIG_PATH="./social_network.yaml"
# Restart ClickGraph to load the schema
cargo run --bin clickgraph
# Find Alice's friends (in a new terminal)
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (alice:User {name: \"Alice\"})-[:FOLLOWS]->(friend:User) RETURN friend.name, friend.age"
}'
# Find mutual connections
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (a:User)-[:FOLLOWS]->(mutual:User)<-[:FOLLOWS]-(b:User) WHERE a.name = \"Alice\" AND b.name = \"Bob\" RETURN mutual.name"
}'
# Count followers by country
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "MATCH (u:User)-[:FOLLOWS]->(f:User) RETURN f.country, count(u) as follower_count ORDER BY follower_count DESC"
}'Load one graph schema per deployment:
export GRAPH_CONFIG_PATH="./schemas/social_network.yaml"
cargo run --bin clickgraphLoad multiple independent graph schemas from a single YAML file, enabling schema isolation and flexible querying:
Example Multi-Schema File (schemas/multi.yaml):
default_schema: social_network
schemas:
- name: social_network
graph_schema:
nodes:
- label: User
database: social_db
table: users
node_id: user_id
property_mappings:
user_id: user_id
name: name
edges:
- type: FOLLOWS
database: social_db
table: follows
from_id: follower_id
to_id: followed_id
from_node: User
to_node: User
- name: security_logs
graph_schema:
nodes:
- label: IP
database: security
table: connections
node_id: ip_address
property_mappings:
ip: ip_address
edges:
- type: CONNECTED_TO
database: security
table: connections
from_id: source_ip
to_id: dest_ip
from_node: IP
to_node: IPLoad Multi-Schema File:
export GRAPH_CONFIG_PATH="./schemas/multi.yaml"
cargo run --bin clickgraph
# Verify schemas loaded
curl -s http://localhost:8080/schemas | jqQuery Different Schemas:
-- Query social_network schema
USE social_network
MATCH (u:User)-[:FOLLOWS]->(f:User)
RETURN u.name, f.name
-- Switch to security_logs schema
USE security_logs
MATCH (ip1:IP)-[:CONNECTED_TO]->(ip2:IP)
RETURN ip1.ip, ip2.ip
-- Use default schema (no USE clause needed)
MATCH (u:User) RETURN count(u)Benefits:
- ✅ Schema Isolation: Each schema maintains independent definitions
- ✅ Flexible Switching: Use
USE <schema_name>to switch between schemas - ✅ Simplified Management: One file for all test/dev environments
- ✅ Backward Compatible: Single-schema YAML files still work
API Endpoint:
# List all loaded schemas
curl -s http://localhost:8080/schemas | jq '.schemas[] | "\(.name): \(.node_count) nodes, \(.relationship_count) edges"'
# Example output:
# social_network: 2 nodes, 1 edge
# security_logs: 1 node, 1 edge
# default: 2 nodes, 1 edge (alias for default_schema)See Schema Reference for complete multi-schema format details.
# Custom ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688
# Disable Bolt protocol (HTTP only)
cargo run --bin clickgraph -- --disable-bolt
# Custom host binding
cargo run --bin clickgraph -- --http-host 127.0.0.1
# Show all options
cargo run --bin clickgraph -- --help# Server settings
export CLICKGRAPH_HOST="127.0.0.1"
export CLICKGRAPH_PORT="8081"
export CLICKGRAPH_BOLT_HOST="127.0.0.1"
export CLICKGRAPH_BOLT_PORT="7688"
export CLICKGRAPH_BOLT_ENABLED="false"
# ClickHouse connection
export CLICKHOUSE_URL="http://your-clickhouse:8123"
export CLICKHOUSE_USER="your_user"
export CLICKHOUSE_PASSWORD="your_password"
export CLICKHOUSE_DATABASE="your_database"
# Graph schema (required for graph queries)
export GRAPH_CONFIG_PATH="/path/to/your/schema.yaml" # Single or multi-schema file- Open Neo4j Browser
- Connect to
bolt://localhost:7687 - Run Cypher queries directly in the browser interface
# Install Neo4j Cypher Shell
# Connect to ClickGraph
cypher-shell -a bolt://localhost:7687
# Run queries interactively
neo4j> MATCH (n:User) RETURN n.name, n.age;Python
pip install neo4jJavaScript/Node.js
npm install neo4j-driverJava
<dependency>
<groupId>org.neo4j.driver</groupId>
<artifactId>neo4j-java-driver</artifactId>
<version>5.x.x</version>
</dependency>- Use
LIMITclauses to avoid large result sets - Create indexes on frequently queried columns in ClickHouse
- Use parameterized queries for better performance
- Leverage ClickHouse's columnar storage advantages
- Denormalize data for better graph query performance
- Create materialized views for complex relationships
- Use appropriate ClickHouse table engines (MergeTree, etc.)
- Consider partitioning large tables by date or category
Connection refused errors:
# Check if ClickGraph is running
curl http://localhost:8080/query
# Check ClickHouse connectivity
curl http://localhost:8123/pingClickHouse authentication errors:
# Test ClickHouse connection
curl "http://localhost:8123/?user=test_user&password=test_pass" -d "SELECT 1"Port conflicts:
# Use different ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688# Enable debug logging
RUST_LOG=debug cargo run --bin clickgraphIssue: Seeing warnings about "Failed to connect to ClickHouse, using empty schema"
Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query
Status:
Impact: None - core functionality works perfectly.
Action: Continue normally - no fix needed.
Issue: 401 Unauthorized or 403 Forbidden errors
Cause: Incorrect ClickHouse credentials
Solution:
# Use docker-compose credentials
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
# Or check your ClickHouse configIssue: Unable to connect to the remote server
Cause: ClickGraph server not fully initialized
Solution: Wait 5-10 seconds after seeing "ClickGraph server is running"
Issue: filesystem error: in rename: Permission denied
Cause: Docker volume permissions with MergeTree engine tables
Solutions:
- Use Memory engine for development:
ENGINE = Memory - Fix Docker permissions:
sudo chown -R 101:101 ./clickhouse_data - Recreate Docker volume:
docker volume rm clickgraph_clickhouse_data
Issue: Data disappears after restart
Cause: Memory engine tables are not persistent
Solution: Use MergeTree engine for production:
CREATE TABLE users (...) ENGINE = MergeTree() ORDER BY id;Issue: Slow query responses Solutions:
- Add ClickHouse indexes on frequently queried columns
- Use appropriate ORDER BY clauses in table definitions
- Enable ClickGraph query optimization features
Issue: Address already in use
Solution: Use different ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688- Read the Documentation: Check out the Features Guide and API Documentation
- Configure Graph Views: Create YAML configurations for your specific data model
- Integrate with Applications: Use HTTP API or Neo4j drivers in your applications
- Optimize Performance: Tune ClickHouse settings and create appropriate indexes
- Join the Community: Contribute to the project and share your use cases
- Documentation: Check the
docs/folder for comprehensive guides - Issues: Report bugs and feature requests on GitHub
- Examples: See
examples/folder for more complex configurations - Community: Join discussions and share your experiences
Happy graph analyzing! 🎉