|
| 1 | +# clickgraph — Python bindings |
| 2 | + |
| 3 | +Embedded graph query engine — run Cypher queries over Parquet, Iceberg, Delta Lake and S3 data without a ClickHouse server. |
| 4 | + |
| 5 | +## Quick Start |
| 6 | + |
| 7 | +```python |
| 8 | +import clickgraph |
| 9 | + |
| 10 | +db = clickgraph.Database("schema.yaml") |
| 11 | +conn = db.connect() |
| 12 | + |
| 13 | +for row in conn.query("MATCH (u:User) RETURN u.name LIMIT 5"): |
| 14 | + print(row["u.name"]) |
| 15 | +``` |
| 16 | + |
| 17 | +## API Compatibility |
| 18 | + |
| 19 | +ClickGraph's Python API is designed to be familiar to users of other graph databases: |
| 20 | + |
| 21 | +| Operation | ClickGraph | Kuzu | Neo4j | |
| 22 | +|-----------|-----------|------|-------| |
| 23 | +| Open database | `Database("schema.yaml")` | `Database("path")` | `GraphDatabase.driver(uri)` | |
| 24 | +| Get connection | `db.connect()` or `Connection(db)` | `Connection(db)` | `driver.session()` | |
| 25 | +| Run query | `conn.query(cypher)` | `conn.execute(cypher)` | `session.run(cypher)` | |
| 26 | +| Iterate rows | `for row in result:` | `while result.has_next():` | `for record in result:` | |
| 27 | +| Access by name | `row["col"]` (dict) | `row[0]` (tuple) | `record["col"]` (dict-like) | |
| 28 | + |
| 29 | +All three calling styles work — use whichever feels natural: |
| 30 | + |
| 31 | +```python |
| 32 | +# ClickGraph style |
| 33 | +conn = db.connect() |
| 34 | +result = conn.query("MATCH (u:User) RETURN u.name") |
| 35 | + |
| 36 | +# Kuzu style |
| 37 | +conn = clickgraph.Connection(db) |
| 38 | +result = conn.execute("MATCH (u:User) RETURN u.name") |
| 39 | +while result.has_next(): |
| 40 | + row = result.get_next() |
| 41 | + print(row[0]) |
| 42 | + |
| 43 | +# Neo4j style |
| 44 | +conn = db.connect() |
| 45 | +result = conn.run("MATCH (u:User) RETURN u.name") |
| 46 | +for row in result: |
| 47 | + print(row["u.name"]) |
| 48 | +``` |
| 49 | + |
| 50 | +## API |
| 51 | + |
| 52 | +### `Database(schema_path, **kwargs)` |
| 53 | + |
| 54 | +Open an embedded database from a YAML schema file. |
| 55 | + |
| 56 | +**Keyword arguments** (all optional): |
| 57 | +- `session_dir` — directory for chdb session data (default: temp dir) |
| 58 | +- `data_dir` — base directory for relative `source:` paths |
| 59 | +- `max_threads` — maximum threads for chdb |
| 60 | +- `s3_access_key_id`, `s3_secret_access_key`, `s3_region`, `s3_endpoint_url`, `s3_session_token` — S3 credentials |
| 61 | +- `gcs_access_key_id`, `gcs_secret_access_key` — GCS HMAC credentials |
| 62 | +- `azure_storage_account_name`, `azure_storage_account_key`, `azure_storage_connection_string` — Azure credentials |
| 63 | + |
| 64 | +### `Database.connect() → Connection` |
| 65 | + |
| 66 | +Create a connection for executing queries. |
| 67 | + |
| 68 | +### `Connection(db)` *(Kuzu-compatible constructor)* |
| 69 | + |
| 70 | +Alternative to `db.connect()` — creates a connection from a Database instance. |
| 71 | + |
| 72 | +### `Database.execute(cypher) → QueryResult` |
| 73 | + |
| 74 | +Shorthand — execute a query without creating a separate connection. |
| 75 | + |
| 76 | +### `Connection.query(cypher) → QueryResult` |
| 77 | + |
| 78 | +Execute a Cypher query. Returns an iterable of row dicts. |
| 79 | + |
| 80 | +### `Connection.execute(cypher) → QueryResult` *(Kuzu-compatible alias)* |
| 81 | + |
| 82 | +Alias for `query()`. |
| 83 | + |
| 84 | +### `Connection.run(cypher) → QueryResult` *(Neo4j-compatible alias)* |
| 85 | + |
| 86 | +Alias for `query()`. |
| 87 | + |
| 88 | +### `Connection.query_to_sql(cypher) → str` |
| 89 | + |
| 90 | +Translate Cypher to ClickHouse SQL without executing. |
| 91 | + |
| 92 | +### `QueryResult` |
| 93 | + |
| 94 | +**Dict-style access** (ClickGraph/Neo4j pattern): |
| 95 | +- Iterable: `for row in result:` — each row is a `dict` |
| 96 | +- `result[i]` — access row by index (supports negative indexing) |
| 97 | +- `result.column_names` — list of column names |
| 98 | +- `result.num_rows` — number of rows |
| 99 | +- `result.as_dicts()` — all rows as a list of dicts |
| 100 | +- `result.get_row(i)` — single row by index as dict |
| 101 | +- `len(result)` — number of rows |
| 102 | + |
| 103 | +**Tuple-style access** (Kuzu pattern): |
| 104 | +- `result.has_next()` — True if more rows remain |
| 105 | +- `result.get_next()` — next row as a list of values (column order) |
| 106 | +- `result.reset_iterator()` — restart the cursor |
| 107 | + |
| 108 | +## Installation |
| 109 | + |
| 110 | +```bash |
| 111 | +# From source (requires Rust toolchain + chdb) |
| 112 | +cd clickgraph-py |
| 113 | +pip install maturin |
| 114 | +maturin develop |
| 115 | +``` |
| 116 | + |
| 117 | +## Example with S3 data |
| 118 | + |
| 119 | +```python |
| 120 | +import clickgraph |
| 121 | + |
| 122 | +db = clickgraph.Database( |
| 123 | + "schema.yaml", |
| 124 | + s3_access_key_id="AKIA...", |
| 125 | + s3_secret_access_key="...", |
| 126 | + s3_region="us-east-1", |
| 127 | +) |
| 128 | + |
| 129 | +conn = db.connect() |
| 130 | +result = conn.query(""" |
| 131 | + MATCH (u:User)-[:FOLLOWS]->(f:User) |
| 132 | + WHERE u.name = 'Alice' |
| 133 | + RETURN f.name, f.email |
| 134 | +""") |
| 135 | + |
| 136 | +for row in result: |
| 137 | + print(f"{row['f.name']}: {row['f.email']}") |
| 138 | +``` |
0 commit comments