ClickGraph - Getting Started Guide

This guide will help you get ClickGraph up and running quickly for your first graph analysis on ClickHouse data.

Prerequisites

ClickHouse: Version 21.3+ running locally or accessible via network
Docker: For running ClickHouse and ClickGraph (recommended)
Rust: Version 1.85+ (only if building from source)

Quick Setup (2 minutes)

Option 1: Docker Compose (Recommended - Fastest)

Pull pre-built images and start everything:

# Download docker-compose.yaml
curl -o docker-compose.yaml https://raw.githubusercontent.com/genezhang/clickgraph/main/docker-compose.yaml

# Start all services (ClickHouse + ClickGraph)
docker-compose up -d

# Check logs
docker-compose logs -f clickgraph

You should see:

ClickGraph v0.5.1 (fork of Brahmand)

Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
ClickGraph server is running
  HTTP API: http://0.0.0.0:8080
  Bolt Protocol: bolt://0.0.0.0:7687

🎉 ClickGraph is now running!

What just happened?

✅ ClickHouse downloaded and started (ports 8123/9000)
✅ ClickGraph image pulled from Docker Hub (genezhang/clickgraph:latest)
✅ Pre-configured schema loaded (test_integration.yaml)
✅ HTTP API ready at http://localhost:8080
✅ Bolt protocol ready at bolt://localhost:7687

Skip to First Graph Query to start querying!

Option 2: Build from Source (For Contributors)

Prerequisites: Rust toolchain 1.85+ and Docker for ClickHouse

# 1. Clone repository
git clone https://github.com/genezhang/clickgraph
cd clickgraph

# 2. Start ONLY ClickHouse (not the clickgraph service)
docker-compose up -d clickhouse-service

# Note: If you accidentally started all services with 'docker-compose up -d',
# stop the clickgraph container first:
# docker-compose stop clickgraph

💡 Why clickhouse-service only? This starts just ClickHouse, allowing you to run ClickGraph from source with cargo run. If you run docker-compose up -d (all services), both the containerized ClickGraph and your local cargo run will try to bind to port 8080, causing a conflict.

Build and Run:

# Build ClickGraph
cargo build --release

# Set required environment variables
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export CLICKHOUSE_DATABASE="brahmand"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"

# Start ClickGraph
cargo run --bin clickgraph

# Or use custom ports if 8080 is already in use:
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

⚠️ Required: GRAPH_CONFIG_PATH must be set to a valid schema YAML file. Without it, ClickGraph won't know how to map your ClickHouse tables to graph nodes and edges.

💡 Multi-Schema Support (NEW in v0.6.1): You can now load multiple independent graph schemas from a single YAML file! See Schema Configuration below for details.

You should see output like:

ClickGraph v0.5.1 (fork of Brahmand)

Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
Successfully bound Bolt listener to 0.0.0.0:7687
ClickGraph server is running
  HTTP API: http://0.0.0.0:8080
  Bolt Protocol: bolt://0.0.0.0:7687
Bolt server loop starting, listening for connections...

🎉 ClickGraph is now running!

Troubleshooting (Build from Source only)

⚠️ Port Conflict? If you see Address already in use error when running cargo run:
Error: Address already in use (os error 10048)
Cause: The Docker Compose ClickGraph container is running (competes for port 8080).

Solution 1: Stop the containerized ClickGraph:
docker-compose stop clickgraph
Solution 2: Use a different port:
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688
# Then access at: http://localhost:8081

⚠️ Normal Startup Warnings: You may see warnings like:
Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query
These are expected warnings about ClickGraph's internal catalog system. They don't affect functionality - your queries will work correctly!

First Graph Query

Test with HTTP API

# Simple test query
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "RETURN 1 as test, \"Hello ClickGraph!\" as message"}'

Expected response:

{
  "columns": ["test", "message"],
  "data": [
    {"test": 1, "message": "Hello ClickGraph!"}
  ],
  "stats": {
    "execution_time": "2ms"
  }
}

Test with Neo4j Driver

from neo4j import GraphDatabase

# Connect to ClickGraph via Bolt protocol
driver = GraphDatabase.driver("bolt://localhost:7687")

with driver.session() as session:
    result = session.run("RETURN 1 as test, 'Hello ClickGraph!' as message")
    for record in result:
        print(f"Test: {record['test']}, Message: {record['message']}")

driver.close()

Working with Real Data

Example: Social Network Analysis

1. Create Sample Tables in ClickHouse

-- Connect to ClickHouse and create sample data
CREATE TABLE users (
    user_id UInt32,
    name String,
    age UInt8,
    country String,
    active UInt8 DEFAULT 1
) ENGINE = MergeTree()
ORDER BY user_id;

CREATE TABLE user_follows (
    follower_id UInt32,
    followed_id UInt32,
    created_date Date
) ENGINE = MergeTree()
ORDER BY (follower_id, followed_id);

-- Insert sample data
INSERT INTO users VALUES 
    (1, 'Alice', 28, 'USA', 1),
    (2, 'Bob', 34, 'Canada', 1),
    (3, 'Charlie', 22, 'UK', 1),
    (4, 'Diana', 31, 'Australia', 1);

INSERT INTO user_follows VALUES
    (1, 2, '2023-01-15'),
    (1, 3, '2023-01-20'),
    (2, 3, '2023-01-25'),
    (3, 4, '2023-02-01'),
    (2, 4, '2023-02-05');

2. Configure Graph View

Create social_network.yaml:

name: social_network
version: "1.0"
description: "Social network analysis"

graph_schema:
  nodes:
    - label: User
      database: brahmand
      table: users
      node_id: user_id
      property_mappings:
        name: name
        age: age
        country: country
      filters:
        - "active = 1"
          
  relationships:
    - type: FOLLOWS
      database: brahmand
      table: user_follows
      from_node: User
      to_node: User
      from_id: follower_id
      to_id: followed_id
      property_mappings:
        since: created_date

3. Load Schema and Run Graph Queries

# Set schema path
export GRAPH_CONFIG_PATH="./social_network.yaml"

# Restart ClickGraph to load the schema
cargo run --bin clickgraph

# Find Alice's friends (in a new terminal)
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (alice:User {name: \"Alice\"})-[:FOLLOWS]->(friend:User) RETURN friend.name, friend.age"
  }'

# Find mutual connections
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (a:User)-[:FOLLOWS]->(mutual:User)<-[:FOLLOWS]-(b:User) WHERE a.name = \"Alice\" AND b.name = \"Bob\" RETURN mutual.name"
  }'

# Count followers by country
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (u:User)-[:FOLLOWS]->(f:User) RETURN f.country, count(u) as follower_count ORDER BY follower_count DESC"
  }'

Schema Configuration

Single Schema (Traditional)

Load one graph schema per deployment:

export GRAPH_CONFIG_PATH="./schemas/social_network.yaml"
cargo run --bin clickgraph

Multi-Schema Configuration (NEW in v0.6.1)

Load multiple independent graph schemas from a single YAML file, enabling schema isolation and flexible querying:

Example Multi-Schema File (schemas/multi.yaml):

default_schema: social_network
schemas:
  - name: social_network
    graph_schema:
      nodes:
        - label: User
          database: social_db
          table: users
          node_id: user_id
          property_mappings:
            user_id: user_id
            name: name
      edges:
        - type: FOLLOWS
          database: social_db
          table: follows
          from_id: follower_id
          to_id: followed_id
          from_node: User
          to_node: User

  - name: security_logs
    graph_schema:
      nodes:
        - label: IP
          database: security
          table: connections
          node_id: ip_address
          property_mappings:
            ip: ip_address
      edges:
        - type: CONNECTED_TO
          database: security
          table: connections
          from_id: source_ip
          to_id: dest_ip
          from_node: IP
          to_node: IP

Load Multi-Schema File:

export GRAPH_CONFIG_PATH="./schemas/multi.yaml"
cargo run --bin clickgraph

# Verify schemas loaded
curl -s http://localhost:8080/schemas | jq

Query Different Schemas:

-- Query social_network schema
USE social_network
MATCH (u:User)-[:FOLLOWS]->(f:User)
RETURN u.name, f.name

-- Switch to security_logs schema
USE security_logs
MATCH (ip1:IP)-[:CONNECTED_TO]->(ip2:IP)
RETURN ip1.ip, ip2.ip

-- Use default schema (no USE clause needed)
MATCH (u:User) RETURN count(u)

Benefits:

✅ Schema Isolation: Each schema maintains independent definitions
✅ Flexible Switching: Use USE <schema_name> to switch between schemas
✅ Simplified Management: One file for all test/dev environments
✅ Backward Compatible: Single-schema YAML files still work

API Endpoint:

# List all loaded schemas
curl -s http://localhost:8080/schemas | jq '.schemas[] | "\(.name): \(.node_count) nodes, \(.relationship_count) edges"'

# Example output:
# social_network: 2 nodes, 1 edge
# security_logs: 1 node, 1 edge
# default: 2 nodes, 1 edge  (alias for default_schema)

See Schema Reference for complete multi-schema format details.

Configuration Options

Command-Line Configuration

# Custom ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

# Disable Bolt protocol (HTTP only)
cargo run --bin clickgraph -- --disable-bolt

# Custom host binding
cargo run --bin clickgraph -- --http-host 127.0.0.1

# Show all options
cargo run --bin clickgraph -- --help

Environment Variable Configuration

# Server settings
export CLICKGRAPH_HOST="127.0.0.1"
export CLICKGRAPH_PORT="8081"  
export CLICKGRAPH_BOLT_HOST="127.0.0.1"
export CLICKGRAPH_BOLT_PORT="7688"
export CLICKGRAPH_BOLT_ENABLED="false"

# ClickHouse connection
export CLICKHOUSE_URL="http://your-clickhouse:8123"
export CLICKHOUSE_USER="your_user"
export CLICKHOUSE_PASSWORD="your_password"
export CLICKHOUSE_DATABASE="your_database"

# Graph schema (required for graph queries)
export GRAPH_CONFIG_PATH="/path/to/your/schema.yaml"  # Single or multi-schema file

Neo4j Tool Integration

Neo4j Browser

Open Neo4j Browser
Connect to bolt://localhost:7687
Run Cypher queries directly in the browser interface

Cypher Shell

# Install Neo4j Cypher Shell
# Connect to ClickGraph
cypher-shell -a bolt://localhost:7687

# Run queries interactively
neo4j> MATCH (n:User) RETURN n.name, n.age;

Programming Language Drivers

Python

pip install neo4j

JavaScript/Node.js

npm install neo4j-driver

Java

<dependency>
    <groupId>org.neo4j.driver</groupId>
    <artifactId>neo4j-java-driver</artifactId>
    <version>5.x.x</version>
</dependency>

Performance Tips

Query Optimization

Use LIMIT clauses to avoid large result sets
Create indexes on frequently queried columns in ClickHouse
Use parameterized queries for better performance
Leverage ClickHouse's columnar storage advantages

Data Modeling

Denormalize data for better graph query performance
Create materialized views for complex relationships
Use appropriate ClickHouse table engines (MergeTree, etc.)
Consider partitioning large tables by date or category

Troubleshooting

Common Issues

Connection refused errors:

# Check if ClickGraph is running
curl http://localhost:8080/query

# Check ClickHouse connectivity
curl http://localhost:8123/ping

ClickHouse authentication errors:

# Test ClickHouse connection
curl "http://localhost:8123/?user=test_user&password=test_pass" -d "SELECT 1"

Port conflicts:

# Use different ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

Debug Mode

# Enable debug logging
RUST_LOG=debug cargo run --bin clickgraph

Troubleshooting Common Issues

Schema Warnings (Normal)

Issue: Seeing warnings about "Failed to connect to ClickHouse, using empty schema"

Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query

Status: ⚠️ Expected behavior - these are cosmetic warnings about ClickGraph's internal catalog.
Impact: None - core functionality works perfectly.
Action: Continue normally - no fix needed.

Authentication Problems

Issue: 401 Unauthorized or 403 Forbidden errors Cause: Incorrect ClickHouse credentials
Solution:

# Use docker-compose credentials
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"

# Or check your ClickHouse config

Connection Issues

Issue: Unable to connect to the remote server Cause: ClickGraph server not fully initialized
Solution: Wait 5-10 seconds after seeing "ClickGraph server is running"

File Permission Errors

Issue: filesystem error: in rename: Permission denied Cause: Docker volume permissions with MergeTree engine tables
Solutions:

Use Memory engine for development: ENGINE = Memory
Fix Docker permissions: sudo chown -R 101:101 ./clickhouse_data
Recreate Docker volume: docker volume rm clickgraph_clickhouse_data

Memory Engine Data Loss

Issue: Data disappears after restart Cause: Memory engine tables are not persistent
Solution: Use MergeTree engine for production:

CREATE TABLE users (...) ENGINE = MergeTree() ORDER BY id;

Performance Issues

Issue: Slow query responses Solutions:

Add ClickHouse indexes on frequently queried columns
Use appropriate ORDER BY clauses in table definitions
Enable ClickGraph query optimization features

Port Conflicts

Issue: Address already in use Solution: Use different ports

cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

Next Steps

Read the Documentation: Check out the Features Guide and API Documentation
Configure Graph Views: Create YAML configurations for your specific data model
Integrate with Applications: Use HTTP API or Neo4j drivers in your applications
Optimize Performance: Tune ClickHouse settings and create appropriate indexes
Join the Community: Contribute to the project and share your use cases

Need Help?

Documentation: Check the docs/ folder for comprehensive guides
Issues: Report bugs and feature requests on GitHub
Examples: See examples/ folder for more complex configurations
Community: Join discussions and share your experiences

Happy graph analyzing! 🎉

Uh oh!

FilesExpand file tree

getting-started.md

Latest commit

History