Skip to content

Latest commit

 

History

History
578 lines (462 loc) · 15.3 KB

File metadata and controls

578 lines (462 loc) · 15.3 KB

ClickGraph - Getting Started Guide

This guide will help you get ClickGraph up and running quickly for your first graph analysis on ClickHouse data.

Prerequisites

  • ClickHouse: Version 21.3+ running locally or accessible via network
  • Docker: For running ClickHouse and ClickGraph (recommended)
  • Rust: Version 1.85+ (only if building from source)

Quick Setup (2 minutes)

Option 1: Docker Compose (Recommended - Fastest)

Pull pre-built images and start everything:

# Download docker-compose.yaml
curl -o docker-compose.yaml https://raw.githubusercontent.com/genezhang/clickgraph/main/docker-compose.yaml

# Start all services (ClickHouse + ClickGraph)
docker-compose up -d

# Check logs
docker-compose logs -f clickgraph

You should see:

ClickGraph v0.5.1 (fork of Brahmand)

Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
ClickGraph server is running
  HTTP API: http://0.0.0.0:8080
  Bolt Protocol: bolt://0.0.0.0:7687

🎉 ClickGraph is now running!

What just happened?

  • ✅ ClickHouse downloaded and started (ports 8123/9000)
  • ✅ ClickGraph image pulled from Docker Hub (genezhang/clickgraph:latest)
  • ✅ Pre-configured schema loaded (test_integration.yaml)
  • ✅ HTTP API ready at http://localhost:8080
  • ✅ Bolt protocol ready at bolt://localhost:7687

Skip to First Graph Query to start querying!


Option 2: Build from Source (For Contributors)

Prerequisites: Rust toolchain 1.85+ and Docker for ClickHouse

# 1. Clone repository
git clone https://github.com/genezhang/clickgraph
cd clickgraph

# 2. Start ONLY ClickHouse (not the clickgraph service)
docker-compose up -d clickhouse-service

# Note: If you accidentally started all services with 'docker-compose up -d',
# stop the clickgraph container first:
# docker-compose stop clickgraph

💡 Why clickhouse-service only? This starts just ClickHouse, allowing you to run ClickGraph from source with cargo run. If you run docker-compose up -d (all services), both the containerized ClickGraph and your local cargo run will try to bind to port 8080, causing a conflict.

Build and Run:

# Build ClickGraph
cargo build --release

# Set required environment variables
export CLICKHOUSE_URL="http://localhost:8123"
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"
export CLICKHOUSE_DATABASE="brahmand"
export GRAPH_CONFIG_PATH="./benchmarks/social_network/schemas/social_benchmark.yaml"

# Start ClickGraph
cargo run --bin clickgraph

# Or use custom ports if 8080 is already in use:
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

⚠️ Required: GRAPH_CONFIG_PATH must be set to a valid schema YAML file. Without it, ClickGraph won't know how to map your ClickHouse tables to graph nodes and edges.

💡 Multi-Schema Support (NEW in v0.6.1): You can now load multiple independent graph schemas from a single YAML file! See Schema Configuration below for details.

You should see output like:

ClickGraph v0.5.1 (fork of Brahmand)

Starting HTTP server on 0.0.0.0:8080
Starting Bolt server on 0.0.0.0:7687
Successfully bound Bolt listener to 0.0.0.0:7687
ClickGraph server is running
  HTTP API: http://0.0.0.0:8080
  Bolt Protocol: bolt://0.0.0.0:7687
Bolt server loop starting, listening for connections...

🎉 ClickGraph is now running!


Troubleshooting (Build from Source only)

⚠️ Port Conflict? If you see Address already in use error when running cargo run:

Error: Address already in use (os error 10048)

Cause: The Docker Compose ClickGraph container is running (competes for port 8080).

Solution 1: Stop the containerized ClickGraph:

docker-compose stop clickgraph

Solution 2: Use a different port:

cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688
# Then access at: http://localhost:8081

⚠️ Normal Startup Warnings: You may see warnings like:

Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query

These are expected warnings about ClickGraph's internal catalog system. They don't affect functionality - your queries will work correctly!


First Graph Query

Test with HTTP API

# Simple test query
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "RETURN 1 as test, \"Hello ClickGraph!\" as message"}'

Expected response:

{
  "columns": ["test", "message"],
  "data": [
    {"test": 1, "message": "Hello ClickGraph!"}
  ],
  "stats": {
    "execution_time": "2ms"
  }
}

Test with Neo4j Driver

from neo4j import GraphDatabase

# Connect to ClickGraph via Bolt protocol
driver = GraphDatabase.driver("bolt://localhost:7687")

with driver.session() as session:
    result = session.run("RETURN 1 as test, 'Hello ClickGraph!' as message")
    for record in result:
        print(f"Test: {record['test']}, Message: {record['message']}")

driver.close()

Working with Real Data

Example: Social Network Analysis

1. Create Sample Tables in ClickHouse

-- Connect to ClickHouse and create sample data
CREATE TABLE users (
    user_id UInt32,
    name String,
    age UInt8,
    country String,
    active UInt8 DEFAULT 1
) ENGINE = MergeTree()
ORDER BY user_id;

CREATE TABLE user_follows (
    follower_id UInt32,
    followed_id UInt32,
    created_date Date
) ENGINE = MergeTree()
ORDER BY (follower_id, followed_id);

-- Insert sample data
INSERT INTO users VALUES 
    (1, 'Alice', 28, 'USA', 1),
    (2, 'Bob', 34, 'Canada', 1),
    (3, 'Charlie', 22, 'UK', 1),
    (4, 'Diana', 31, 'Australia', 1);

INSERT INTO user_follows VALUES
    (1, 2, '2023-01-15'),
    (1, 3, '2023-01-20'),
    (2, 3, '2023-01-25'),
    (3, 4, '2023-02-01'),
    (2, 4, '2023-02-05');

2. Configure Graph View

Create social_network.yaml:

name: social_network
version: "1.0"
description: "Social network analysis"

graph_schema:
  nodes:
    - label: User
      database: brahmand
      table: users
      node_id: user_id
      property_mappings:
        name: name
        age: age
        country: country
      filters:
        - "active = 1"
          
  relationships:
    - type: FOLLOWS
      database: brahmand
      table: user_follows
      from_node: User
      to_node: User
      from_id: follower_id
      to_id: followed_id
      property_mappings:
        since: created_date

3. Load Schema and Run Graph Queries

# Set schema path
export GRAPH_CONFIG_PATH="./social_network.yaml"

# Restart ClickGraph to load the schema
cargo run --bin clickgraph

# Find Alice's friends (in a new terminal)
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (alice:User {name: \"Alice\"})-[:FOLLOWS]->(friend:User) RETURN friend.name, friend.age"
  }'

# Find mutual connections
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (a:User)-[:FOLLOWS]->(mutual:User)<-[:FOLLOWS]-(b:User) WHERE a.name = \"Alice\" AND b.name = \"Bob\" RETURN mutual.name"
  }'

# Count followers by country
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "MATCH (u:User)-[:FOLLOWS]->(f:User) RETURN f.country, count(u) as follower_count ORDER BY follower_count DESC"
  }'

Schema Configuration

Single Schema (Traditional)

Load one graph schema per deployment:

export GRAPH_CONFIG_PATH="./schemas/social_network.yaml"
cargo run --bin clickgraph

Multi-Schema Configuration (NEW in v0.6.1)

Load multiple independent graph schemas from a single YAML file, enabling schema isolation and flexible querying:

Example Multi-Schema File (schemas/multi.yaml):

default_schema: social_network
schemas:
  - name: social_network
    graph_schema:
      nodes:
        - label: User
          database: social_db
          table: users
          node_id: user_id
          property_mappings:
            user_id: user_id
            name: name
      edges:
        - type: FOLLOWS
          database: social_db
          table: follows
          from_id: follower_id
          to_id: followed_id
          from_node: User
          to_node: User

  - name: security_logs
    graph_schema:
      nodes:
        - label: IP
          database: security
          table: connections
          node_id: ip_address
          property_mappings:
            ip: ip_address
      edges:
        - type: CONNECTED_TO
          database: security
          table: connections
          from_id: source_ip
          to_id: dest_ip
          from_node: IP
          to_node: IP

Load Multi-Schema File:

export GRAPH_CONFIG_PATH="./schemas/multi.yaml"
cargo run --bin clickgraph

# Verify schemas loaded
curl -s http://localhost:8080/schemas | jq

Query Different Schemas:

-- Query social_network schema
USE social_network
MATCH (u:User)-[:FOLLOWS]->(f:User)
RETURN u.name, f.name

-- Switch to security_logs schema
USE security_logs
MATCH (ip1:IP)-[:CONNECTED_TO]->(ip2:IP)
RETURN ip1.ip, ip2.ip

-- Use default schema (no USE clause needed)
MATCH (u:User) RETURN count(u)

Benefits:

  • Schema Isolation: Each schema maintains independent definitions
  • Flexible Switching: Use USE <schema_name> to switch between schemas
  • Simplified Management: One file for all test/dev environments
  • Backward Compatible: Single-schema YAML files still work

API Endpoint:

# List all loaded schemas
curl -s http://localhost:8080/schemas | jq '.schemas[] | "\(.name): \(.node_count) nodes, \(.relationship_count) edges"'

# Example output:
# social_network: 2 nodes, 1 edge
# security_logs: 1 node, 1 edge
# default: 2 nodes, 1 edge  (alias for default_schema)

See Schema Reference for complete multi-schema format details.


Configuration Options

Command-Line Configuration

# Custom ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

# Disable Bolt protocol (HTTP only)
cargo run --bin clickgraph -- --disable-bolt

# Custom host binding
cargo run --bin clickgraph -- --http-host 127.0.0.1

# Show all options
cargo run --bin clickgraph -- --help

Environment Variable Configuration

# Server settings
export CLICKGRAPH_HOST="127.0.0.1"
export CLICKGRAPH_PORT="8081"  
export CLICKGRAPH_BOLT_HOST="127.0.0.1"
export CLICKGRAPH_BOLT_PORT="7688"
export CLICKGRAPH_BOLT_ENABLED="false"

# ClickHouse connection
export CLICKHOUSE_URL="http://your-clickhouse:8123"
export CLICKHOUSE_USER="your_user"
export CLICKHOUSE_PASSWORD="your_password"
export CLICKHOUSE_DATABASE="your_database"

# Graph schema (required for graph queries)
export GRAPH_CONFIG_PATH="/path/to/your/schema.yaml"  # Single or multi-schema file

Neo4j Tool Integration

Neo4j Browser

  1. Open Neo4j Browser
  2. Connect to bolt://localhost:7687
  3. Run Cypher queries directly in the browser interface

Cypher Shell

# Install Neo4j Cypher Shell
# Connect to ClickGraph
cypher-shell -a bolt://localhost:7687

# Run queries interactively
neo4j> MATCH (n:User) RETURN n.name, n.age;

Programming Language Drivers

Python

pip install neo4j

JavaScript/Node.js

npm install neo4j-driver

Java

<dependency>
    <groupId>org.neo4j.driver</groupId>
    <artifactId>neo4j-java-driver</artifactId>
    <version>5.x.x</version>
</dependency>

Performance Tips

Query Optimization

  • Use LIMIT clauses to avoid large result sets
  • Create indexes on frequently queried columns in ClickHouse
  • Use parameterized queries for better performance
  • Leverage ClickHouse's columnar storage advantages

Data Modeling

  • Denormalize data for better graph query performance
  • Create materialized views for complex relationships
  • Use appropriate ClickHouse table engines (MergeTree, etc.)
  • Consider partitioning large tables by date or category

Troubleshooting

Common Issues

Connection refused errors:

# Check if ClickGraph is running
curl http://localhost:8080/query

# Check ClickHouse connectivity
curl http://localhost:8123/ping

ClickHouse authentication errors:

# Test ClickHouse connection
curl "http://localhost:8123/?user=test_user&password=test_pass" -d "SELECT 1"

Port conflicts:

# Use different ports
cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

Debug Mode

# Enable debug logging
RUST_LOG=debug cargo run --bin clickgraph

Troubleshooting Common Issues

Schema Warnings (Normal)

Issue: Seeing warnings about "Failed to connect to ClickHouse, using empty schema"

Warning: Failed to connect to ClickHouse, using empty schema
Error fetching remote schema: no rows returned by a query

Status: ⚠️ Expected behavior - these are cosmetic warnings about ClickGraph's internal catalog.
Impact: None - core functionality works perfectly.
Action: Continue normally - no fix needed.

Authentication Problems

Issue: 401 Unauthorized or 403 Forbidden errors Cause: Incorrect ClickHouse credentials
Solution:

# Use docker-compose credentials
export CLICKHOUSE_USER="test_user"
export CLICKHOUSE_PASSWORD="test_pass"

# Or check your ClickHouse config

Connection Issues

Issue: Unable to connect to the remote server Cause: ClickGraph server not fully initialized
Solution: Wait 5-10 seconds after seeing "ClickGraph server is running"

File Permission Errors

Issue: filesystem error: in rename: Permission denied Cause: Docker volume permissions with MergeTree engine tables
Solutions:

  1. Use Memory engine for development: ENGINE = Memory
  2. Fix Docker permissions: sudo chown -R 101:101 ./clickhouse_data
  3. Recreate Docker volume: docker volume rm clickgraph_clickhouse_data

Memory Engine Data Loss

Issue: Data disappears after restart Cause: Memory engine tables are not persistent
Solution: Use MergeTree engine for production:

CREATE TABLE users (...) ENGINE = MergeTree() ORDER BY id;

Performance Issues

Issue: Slow query responses Solutions:

  1. Add ClickHouse indexes on frequently queried columns
  2. Use appropriate ORDER BY clauses in table definitions
  3. Enable ClickGraph query optimization features

Port Conflicts

Issue: Address already in use Solution: Use different ports

cargo run --bin clickgraph -- --http-port 8081 --bolt-port 7688

Next Steps

  1. Read the Documentation: Check out the Features Guide and API Documentation
  2. Configure Graph Views: Create YAML configurations for your specific data model
  3. Integrate with Applications: Use HTTP API or Neo4j drivers in your applications
  4. Optimize Performance: Tune ClickHouse settings and create appropriate indexes
  5. Join the Community: Contribute to the project and share your use cases

Need Help?

  • Documentation: Check the docs/ folder for comprehensive guides
  • Issues: Report bugs and feature requests on GitHub
  • Examples: See examples/ folder for more complex configurations
  • Community: Join discussions and share your experiences

Happy graph analyzing! 🎉