|
1 | 1 | # Graph Engine |
2 | 2 |
|
3 | | -NodeDB's graph engine uses a native CSR (Compressed Sparse Row) adjacency index with interned node IDs (`u32`) and labels (`u16`) — not recursive JOINs pretending to be a graph. At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Sub-millisecond multi-hop traversals, 13 native algorithms, Cypher-subset pattern matching, and GraphRAG fusion — all in the same process as every other engine. |
| 3 | +NodeDB's graph engine uses a native CSR (Compressed Sparse Row) adjacency index with interned node IDs (`u32`) and labels (`u32`) — not recursive JOINs pretending to be a graph. At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Sub-millisecond multi-hop traversals, 13 native algorithms, Cypher-subset pattern matching, and GraphRAG fusion — all in the same process as every other engine. |
4 | 4 |
|
5 | 5 | --- |
6 | 6 |
|
7 | 7 | ## Storage Model |
8 | 8 |
|
9 | | -Edges are persisted in a redb B-Tree with forward and reverse indexes: |
| 9 | +Edges are persisted in a redb B-Tree with forward and reverse indexes, both keyed by a `(tenant_id, composite)` tuple: |
10 | 10 |
|
11 | 11 | ``` |
12 | 12 | Forward Index (EDGES table): |
13 | | - Key: "src_id\x00edge_label\x00dst_id" |
| 13 | + Key: (tenant_id: u32, "src\x00edge_label\x00dst") |
14 | 14 | Value: edge properties (MessagePack) |
15 | 15 |
|
16 | 16 | Reverse Index (REVERSE_EDGES table): |
17 | | - Key: "dst_id\x00edge_label\x00src_id" |
| 17 | + Key: (tenant_id: u32, "dst\x00edge_label\x00src") |
18 | 18 | Value: [] (existence check only) |
19 | 19 | ``` |
20 | 20 |
|
21 | | -Both tables are updated atomically in a single write transaction. The null-byte separator enables efficient prefix scans for outbound traversal. |
| 21 | +Tenant isolation is structural: the tenant id is a first-class key component, not a lexical prefix on node names. Node names in the composite portion are user-visible strings. Both tables update atomically in a single write transaction, and the null-byte separator enables prefix scans for outbound traversal within a tenant. |
22 | 22 |
|
23 | | -At query time, a CSR index is built from the B-Tree for cache-resident bulk operations: |
| 23 | +At query time the in-memory CSR is partitioned by tenant (`ShardedCsrIndex` = one `CsrIndex` per tenant); algorithms and traversals run against a single tenant's partition: |
24 | 24 |
|
25 | 25 | ``` |
26 | | -CSR Layout: |
| 26 | +CsrIndex (per tenant): |
27 | 27 | out_offsets: Vec<u32> [num_nodes + 1] — offset into target array per node |
28 | 28 | out_targets: Vec<u32> [num_edges] — destination node IDs (contiguous) |
29 | | - out_labels: Vec<u16> [num_edges] — edge labels (parallel array) |
| 29 | + out_labels: Vec<u32> [num_edges] — edge labels (parallel array) |
30 | 30 | out_weights: Vec<f64> [num_edges] — optional, allocated only when weighted |
31 | 31 |
|
32 | 32 | in_offsets / in_targets / in_labels / in_weights — symmetric for inbound |
33 | 33 | ``` |
34 | 34 |
|
| 35 | +Each `CsrIndex` gets a unique partition tag at construction. Public APIs that return dense node indices hand out `LocalNodeId { id, partition_tag }` — using a node id from one partition with another partition's API panics at the boundary. |
| 36 | + |
35 | 37 | Writes go to a mutable buffer and become visible immediately. Compaction merges the buffer into the dense CSR arrays via double-buffered swap when the buffer exceeds 10% of the dense size. |
36 | 38 |
|
37 | 39 | --- |
@@ -72,11 +74,11 @@ GRAPH TRAVERSE FROM 'users:alice' DEPTH 2 LABEL 'follows' DIRECTION out; |
72 | 74 |
|
73 | 75 | Breadth-first search from a start node. Returns discovered nodes at each depth level. |
74 | 76 |
|
75 | | -| Parameter | Default | Description | |
76 | | -| ----------- | ------- | ------------------------------ | |
77 | | -| `DEPTH` | 2 | Maximum hop count | |
78 | | -| `LABEL` | (any) | Filter by edge label | |
79 | | -| `DIRECTION` | out | `in`, `out`, or `both` | |
| 77 | +| Parameter | Default | Description | |
| 78 | +| ----------- | ------- | ---------------------- | |
| 79 | +| `DEPTH` | 2 | Maximum hop count | |
| 80 | +| `LABEL` | (any) | Filter by edge label | |
| 81 | +| `DIRECTION` | out | `in`, `out`, or `both` | |
80 | 82 |
|
81 | 83 | ### Neighbors (1-Hop) |
82 | 84 |
|
@@ -132,15 +134,15 @@ NodeDB embeds a Cypher-subset pattern engine. MATCH queries arrive through any p |
132 | 134 |
|
133 | 135 | **Clauses:** |
134 | 136 |
|
135 | | -| Clause | Description | |
136 | | -| ---------------- | ---------------------------------------------------- | |
137 | | -| `MATCH` | Required. Pattern to match against the graph. | |
138 | | -| `OPTIONAL MATCH` | LEFT JOIN semantics — preserves rows with no match. | |
139 | | -| `WHERE` | Filter on bound variables. Supports `=`, `!=`, `<`, `<=`, `>`, `>=`. | |
140 | | -| `WHERE NOT EXISTS { MATCH ... }` | Anti-join — exclude rows matching a sub-pattern. | |
141 | | -| `RETURN` | Project bindings. Supports aliases (`AS`). Default: all bound variables. | |
142 | | -| `ORDER BY` | Sort results. `ASC` or `DESC`. | |
143 | | -| `LIMIT` | Cap result count. | |
| 137 | +| Clause | Description | |
| 138 | +| -------------------------------- | ------------------------------------------------------------------------ | |
| 139 | +| `MATCH` | Required. Pattern to match against the graph. | |
| 140 | +| `OPTIONAL MATCH` | LEFT JOIN semantics — preserves rows with no match. | |
| 141 | +| `WHERE` | Filter on bound variables. Supports `=`, `!=`, `<`, `<=`, `>`, `>=`. | |
| 142 | +| `WHERE NOT EXISTS { MATCH ... }` | Anti-join — exclude rows matching a sub-pattern. | |
| 143 | +| `RETURN` | Project bindings. Supports aliases (`AS`). Default: all bound variables. | |
| 144 | +| `ORDER BY` | Sort results. `ASC` or `DESC`. | |
| 145 | +| `LIMIT` | Cap result count. | |
144 | 146 |
|
145 | 147 | Multiple comma-separated patterns in one `MATCH` clause act as self-joins. |
146 | 148 |
|
@@ -228,21 +230,21 @@ GRAPH ALGO DIAMETER ON web; |
228 | 230 |
|
229 | 231 | ### Algorithm Reference |
230 | 232 |
|
231 | | -| Algorithm | What it computes | Key Parameters | |
232 | | -| ------------------- | ------------------------------------------------------------------ | ------------------------------------------- | |
233 | | -| **PageRank** | Node importance via incoming link structure | `DAMPING` (0.85), `ITERATIONS` (20), `TOLERANCE` (1e-7) | |
234 | | -| **WCC** | Weakly connected components (union-find with path compression) | — | |
235 | | -| **Label Propagation** | Community detection via iterative label spreading | `ITERATIONS` (10) | |
236 | | -| **LCC** | Local clustering coefficient — how tightly neighbors connect | — | |
237 | | -| **SSSP** | Single-source shortest path (Dijkstra, rejects negative weights) | `FROM` (required) | |
238 | | -| **Betweenness** | Bridge nodes with high traffic (Brandes' algorithm) | `SAMPLE` (optional, for approximation) | |
239 | | -| **Closeness** | How close a node is to all others (inverse distance sum) | `SAMPLE` (optional) | |
240 | | -| **Harmonic** | Like closeness, but handles disconnected graphs | `SAMPLE` (optional) | |
241 | | -| **Degree** | Connection count per node | `DIRECTION` (in/out/both) | |
242 | | -| **Louvain** | Community detection via modularity optimization | `ITERATIONS` (10), `RESOLUTION` (1.0) | |
243 | | -| **Triangles** | Triangle count (per-node or global) | `MODE` (global/per_node) | |
244 | | -| **Diameter** | Longest shortest path in the graph | — | |
245 | | -| **k-Core** | Coreness decomposition (peeling algorithm) | — | |
| 233 | +| Algorithm | What it computes | Key Parameters | |
| 234 | +| --------------------- | ---------------------------------------------------------------- | ------------------------------------------------------- | |
| 235 | +| **PageRank** | Node importance via incoming link structure | `DAMPING` (0.85), `ITERATIONS` (20), `TOLERANCE` (1e-7) | |
| 236 | +| **WCC** | Weakly connected components (union-find with path compression) | — | |
| 237 | +| **Label Propagation** | Community detection via iterative label spreading | `ITERATIONS` (10) | |
| 238 | +| **LCC** | Local clustering coefficient — how tightly neighbors connect | — | |
| 239 | +| **SSSP** | Single-source shortest path (Dijkstra, rejects negative weights) | `FROM` (required) | |
| 240 | +| **Betweenness** | Bridge nodes with high traffic (Brandes' algorithm) | `SAMPLE` (optional, for approximation) | |
| 241 | +| **Closeness** | How close a node is to all others (inverse distance sum) | `SAMPLE` (optional) | |
| 242 | +| **Harmonic** | Like closeness, but handles disconnected graphs | `SAMPLE` (optional) | |
| 243 | +| **Degree** | Connection count per node | `DIRECTION` (in/out/both) | |
| 244 | +| **Louvain** | Community detection via modularity optimization | `ITERATIONS` (10), `RESOLUTION` (1.0) | |
| 245 | +| **Triangles** | Triangle count (per-node or global) | `MODE` (global/per_node) | |
| 246 | +| **Diameter** | Longest shortest path in the graph | — | |
| 247 | +| **k-Core** | Coreness decomposition (peeling algorithm) | — | |
246 | 248 |
|
247 | 249 | --- |
248 | 250 |
|
@@ -282,16 +284,16 @@ GRAPH RAG FUSION ON entities |
282 | 284 | MAX_VISITED 1000; |
283 | 285 | ``` |
284 | 286 |
|
285 | | -| Parameter | Description | |
286 | | -| ------------------ | -------------------------------------------------------- | |
287 | | -| `QUERY` | Query embedding vector | |
288 | | -| `VECTOR_TOP_K` | Number of seed nodes from vector search | |
289 | | -| `EXPANSION_DEPTH` | BFS hop count from seeds | |
290 | | -| `EDGE_LABEL` | Optional edge type filter during expansion | |
291 | | -| `DIRECTION` | `in`, `out`, or `both` for BFS | |
292 | | -| `FINAL_TOP_K` | Final result count after fusion | |
293 | | -| `RRF_K` | Weighting constants `(vector_k, graph_k)` for RRF scoring | |
294 | | -| `MAX_VISITED` | Memory budget cap — BFS stops early if exceeded | |
| 287 | +| Parameter | Description | |
| 288 | +| ----------------- | --------------------------------------------------------- | |
| 289 | +| `QUERY` | Query embedding vector | |
| 290 | +| `VECTOR_TOP_K` | Number of seed nodes from vector search | |
| 291 | +| `EXPANSION_DEPTH` | BFS hop count from seeds | |
| 292 | +| `EDGE_LABEL` | Optional edge type filter during expansion | |
| 293 | +| `DIRECTION` | `in`, `out`, or `both` for BFS | |
| 294 | +| `FINAL_TOP_K` | Final result count after fusion | |
| 295 | +| `RRF_K` | Weighting constants `(vector_k, graph_k)` for RRF scoring | |
| 296 | +| `MAX_VISITED` | Memory budget cap — BFS stops early if exceeded | |
295 | 297 |
|
296 | 298 | The response includes a truncation flag if the memory budget forced early termination. |
297 | 299 |
|
|
0 commit comments