Skip to content

Commit 94dae0e

Browse files
committed
docs(graph): document tenant-partitioned CSR and ShardedCsrIndex storage model
1 parent a05d3ee commit 94dae0e

File tree

1 file changed

+49
-47
lines changed

1 file changed

+49
-47
lines changed

docs/graph.md

Lines changed: 49 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,39 @@
11
# Graph Engine
22

3-
NodeDB's graph engine uses a native CSR (Compressed Sparse Row) adjacency index with interned node IDs (`u32`) and labels (`u16`) — not recursive JOINs pretending to be a graph. At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Sub-millisecond multi-hop traversals, 13 native algorithms, Cypher-subset pattern matching, and GraphRAG fusion — all in the same process as every other engine.
3+
NodeDB's graph engine uses a native CSR (Compressed Sparse Row) adjacency index with interned node IDs (`u32`) and labels (`u32`) — not recursive JOINs pretending to be a graph. At 1 billion edges, CSR uses ~10 GB vs ~60 GB for naive adjacency lists (6x improvement). Sub-millisecond multi-hop traversals, 13 native algorithms, Cypher-subset pattern matching, and GraphRAG fusion — all in the same process as every other engine.
44

55
---
66

77
## Storage Model
88

9-
Edges are persisted in a redb B-Tree with forward and reverse indexes:
9+
Edges are persisted in a redb B-Tree with forward and reverse indexes, both keyed by a `(tenant_id, composite)` tuple:
1010

1111
```
1212
Forward Index (EDGES table):
13-
Key: "src_id\x00edge_label\x00dst_id"
13+
Key: (tenant_id: u32, "src\x00edge_label\x00dst")
1414
Value: edge properties (MessagePack)
1515
1616
Reverse Index (REVERSE_EDGES table):
17-
Key: "dst_id\x00edge_label\x00src_id"
17+
Key: (tenant_id: u32, "dst\x00edge_label\x00src")
1818
Value: [] (existence check only)
1919
```
2020

21-
Both tables are updated atomically in a single write transaction. The null-byte separator enables efficient prefix scans for outbound traversal.
21+
Tenant isolation is structural: the tenant id is a first-class key component, not a lexical prefix on node names. Node names in the composite portion are user-visible strings. Both tables update atomically in a single write transaction, and the null-byte separator enables prefix scans for outbound traversal within a tenant.
2222

23-
At query time, a CSR index is built from the B-Tree for cache-resident bulk operations:
23+
At query time the in-memory CSR is partitioned by tenant (`ShardedCsrIndex` = one `CsrIndex` per tenant); algorithms and traversals run against a single tenant's partition:
2424

2525
```
26-
CSR Layout:
26+
CsrIndex (per tenant):
2727
out_offsets: Vec<u32> [num_nodes + 1] — offset into target array per node
2828
out_targets: Vec<u32> [num_edges] — destination node IDs (contiguous)
29-
out_labels: Vec<u16> [num_edges] — edge labels (parallel array)
29+
out_labels: Vec<u32> [num_edges] — edge labels (parallel array)
3030
out_weights: Vec<f64> [num_edges] — optional, allocated only when weighted
3131
3232
in_offsets / in_targets / in_labels / in_weights — symmetric for inbound
3333
```
3434

35+
Each `CsrIndex` gets a unique partition tag at construction. Public APIs that return dense node indices hand out `LocalNodeId { id, partition_tag }` — using a node id from one partition with another partition's API panics at the boundary.
36+
3537
Writes go to a mutable buffer and become visible immediately. Compaction merges the buffer into the dense CSR arrays via double-buffered swap when the buffer exceeds 10% of the dense size.
3638

3739
---
@@ -72,11 +74,11 @@ GRAPH TRAVERSE FROM 'users:alice' DEPTH 2 LABEL 'follows' DIRECTION out;
7274

7375
Breadth-first search from a start node. Returns discovered nodes at each depth level.
7476

75-
| Parameter | Default | Description |
76-
| ----------- | ------- | ------------------------------ |
77-
| `DEPTH` | 2 | Maximum hop count |
78-
| `LABEL` | (any) | Filter by edge label |
79-
| `DIRECTION` | out | `in`, `out`, or `both` |
77+
| Parameter | Default | Description |
78+
| ----------- | ------- | ---------------------- |
79+
| `DEPTH` | 2 | Maximum hop count |
80+
| `LABEL` | (any) | Filter by edge label |
81+
| `DIRECTION` | out | `in`, `out`, or `both` |
8082

8183
### Neighbors (1-Hop)
8284

@@ -132,15 +134,15 @@ NodeDB embeds a Cypher-subset pattern engine. MATCH queries arrive through any p
132134

133135
**Clauses:**
134136

135-
| Clause | Description |
136-
| ---------------- | ---------------------------------------------------- |
137-
| `MATCH` | Required. Pattern to match against the graph. |
138-
| `OPTIONAL MATCH` | LEFT JOIN semantics — preserves rows with no match. |
139-
| `WHERE` | Filter on bound variables. Supports `=`, `!=`, `<`, `<=`, `>`, `>=`. |
140-
| `WHERE NOT EXISTS { MATCH ... }` | Anti-join — exclude rows matching a sub-pattern. |
141-
| `RETURN` | Project bindings. Supports aliases (`AS`). Default: all bound variables. |
142-
| `ORDER BY` | Sort results. `ASC` or `DESC`. |
143-
| `LIMIT` | Cap result count. |
137+
| Clause | Description |
138+
| -------------------------------- | ------------------------------------------------------------------------ |
139+
| `MATCH` | Required. Pattern to match against the graph. |
140+
| `OPTIONAL MATCH` | LEFT JOIN semantics — preserves rows with no match. |
141+
| `WHERE` | Filter on bound variables. Supports `=`, `!=`, `<`, `<=`, `>`, `>=`. |
142+
| `WHERE NOT EXISTS { MATCH ... }` | Anti-join — exclude rows matching a sub-pattern. |
143+
| `RETURN` | Project bindings. Supports aliases (`AS`). Default: all bound variables. |
144+
| `ORDER BY` | Sort results. `ASC` or `DESC`. |
145+
| `LIMIT` | Cap result count. |
144146

145147
Multiple comma-separated patterns in one `MATCH` clause act as self-joins.
146148

@@ -228,21 +230,21 @@ GRAPH ALGO DIAMETER ON web;
228230

229231
### Algorithm Reference
230232

231-
| Algorithm | What it computes | Key Parameters |
232-
| ------------------- | ------------------------------------------------------------------ | ------------------------------------------- |
233-
| **PageRank** | Node importance via incoming link structure | `DAMPING` (0.85), `ITERATIONS` (20), `TOLERANCE` (1e-7) |
234-
| **WCC** | Weakly connected components (union-find with path compression) ||
235-
| **Label Propagation** | Community detection via iterative label spreading | `ITERATIONS` (10) |
236-
| **LCC** | Local clustering coefficient — how tightly neighbors connect ||
237-
| **SSSP** | Single-source shortest path (Dijkstra, rejects negative weights) | `FROM` (required) |
238-
| **Betweenness** | Bridge nodes with high traffic (Brandes' algorithm) | `SAMPLE` (optional, for approximation) |
239-
| **Closeness** | How close a node is to all others (inverse distance sum) | `SAMPLE` (optional) |
240-
| **Harmonic** | Like closeness, but handles disconnected graphs | `SAMPLE` (optional) |
241-
| **Degree** | Connection count per node | `DIRECTION` (in/out/both) |
242-
| **Louvain** | Community detection via modularity optimization | `ITERATIONS` (10), `RESOLUTION` (1.0) |
243-
| **Triangles** | Triangle count (per-node or global) | `MODE` (global/per_node) |
244-
| **Diameter** | Longest shortest path in the graph ||
245-
| **k-Core** | Coreness decomposition (peeling algorithm) ||
233+
| Algorithm | What it computes | Key Parameters |
234+
| --------------------- | ---------------------------------------------------------------- | ------------------------------------------------------- |
235+
| **PageRank** | Node importance via incoming link structure | `DAMPING` (0.85), `ITERATIONS` (20), `TOLERANCE` (1e-7) |
236+
| **WCC** | Weakly connected components (union-find with path compression) | |
237+
| **Label Propagation** | Community detection via iterative label spreading | `ITERATIONS` (10) |
238+
| **LCC** | Local clustering coefficient — how tightly neighbors connect | |
239+
| **SSSP** | Single-source shortest path (Dijkstra, rejects negative weights) | `FROM` (required) |
240+
| **Betweenness** | Bridge nodes with high traffic (Brandes' algorithm) | `SAMPLE` (optional, for approximation) |
241+
| **Closeness** | How close a node is to all others (inverse distance sum) | `SAMPLE` (optional) |
242+
| **Harmonic** | Like closeness, but handles disconnected graphs | `SAMPLE` (optional) |
243+
| **Degree** | Connection count per node | `DIRECTION` (in/out/both) |
244+
| **Louvain** | Community detection via modularity optimization | `ITERATIONS` (10), `RESOLUTION` (1.0) |
245+
| **Triangles** | Triangle count (per-node or global) | `MODE` (global/per_node) |
246+
| **Diameter** | Longest shortest path in the graph | |
247+
| **k-Core** | Coreness decomposition (peeling algorithm) | |
246248

247249
---
248250

@@ -282,16 +284,16 @@ GRAPH RAG FUSION ON entities
282284
MAX_VISITED 1000;
283285
```
284286

285-
| Parameter | Description |
286-
| ------------------ | -------------------------------------------------------- |
287-
| `QUERY` | Query embedding vector |
288-
| `VECTOR_TOP_K` | Number of seed nodes from vector search |
289-
| `EXPANSION_DEPTH` | BFS hop count from seeds |
290-
| `EDGE_LABEL` | Optional edge type filter during expansion |
291-
| `DIRECTION` | `in`, `out`, or `both` for BFS |
292-
| `FINAL_TOP_K` | Final result count after fusion |
293-
| `RRF_K` | Weighting constants `(vector_k, graph_k)` for RRF scoring |
294-
| `MAX_VISITED` | Memory budget cap — BFS stops early if exceeded |
287+
| Parameter | Description |
288+
| ----------------- | --------------------------------------------------------- |
289+
| `QUERY` | Query embedding vector |
290+
| `VECTOR_TOP_K` | Number of seed nodes from vector search |
291+
| `EXPANSION_DEPTH` | BFS hop count from seeds |
292+
| `EDGE_LABEL` | Optional edge type filter during expansion |
293+
| `DIRECTION` | `in`, `out`, or `both` for BFS |
294+
| `FINAL_TOP_K` | Final result count after fusion |
295+
| `RRF_K` | Weighting constants `(vector_k, graph_k)` for RRF scoring |
296+
| `MAX_VISITED` | Memory budget cap — BFS stops early if exceeded |
295297

296298
The response includes a truncation flag if the memory budget forced early termination.
297299

0 commit comments

Comments
 (0)