|
| 1 | +# clickhouse_query_generator Module — Agent Guide |
| 2 | + |
| 3 | +> **Purpose**: Converts `RenderPlan` → ClickHouse SQL string. |
| 4 | +> Contains VLP (variable-length path) CTE generation — the most schema-sensitive code. |
| 5 | +
|
| 6 | +## Module Architecture |
| 7 | + |
| 8 | +``` |
| 9 | +RenderPlan (from render_plan) |
| 10 | + │ |
| 11 | + ▼ |
| 12 | +to_sql_query.rs (3.2K) ← Main SQL renderer: SELECT/FROM/JOIN/WHERE/GROUP BY/ORDER BY |
| 13 | + │ Also: VLP alias rewriting, denormalized ORDER BY resolution |
| 14 | + │ |
| 15 | + ├─ variable_length_cte.rs (3.4K) ← Recursive CTE generator for *1..N path patterns |
| 16 | + │ 4 base-case generators × 5 schema variations = complexity |
| 17 | + │ |
| 18 | + ├─ multi_type_vlp_joins.rs (1.3K) ← UNION ALL of explicit JOINs for multi-type traversals |
| 19 | + │ Used when path crosses node types (User→Post via LIKES) |
| 20 | + │ |
| 21 | + ├─ function_translator.rs (952) ← Cypher→ClickHouse function mapping |
| 22 | + ├─ function_registry.rs (1.2K) ← Function signatures & type info |
| 23 | + ├─ json_builder.rs (331) ← formatRowNoNewline JSON blob generation |
| 24 | + ├─ pagerank.rs (387) ← PageRank SQL generation |
| 25 | + └─ mod.rs (50) ← Entry point: generate_sql() |
| 26 | +``` |
| 27 | + |
| 28 | +## variable_length_cte.rs — The Core |
| 29 | + |
| 30 | +### What It Does |
| 31 | +Generates `WITH RECURSIVE` CTEs for Cypher patterns like: |
| 32 | +```cypher |
| 33 | +MATCH (a:User)-[:FOLLOWS*1..3]->(b:User) |
| 34 | +MATCH path = (a)--(o) -- browser expand (1-hop, all types) |
| 35 | +``` |
| 36 | + |
| 37 | +### The Generator Struct |
| 38 | +```rust |
| 39 | +VariableLengthCteGenerator { |
| 40 | + schema: &GraphSchema, |
| 41 | + start_node_alias, end_node_alias, // Cypher aliases |
| 42 | + rel_type, start_label, end_label, // Type info |
| 43 | + min_hops, max_hops, // Range bounds |
| 44 | + is_fk_edge: bool, // FK column = edge |
| 45 | + start_is_denormalized: bool, // Start node in edge table |
| 46 | + end_is_denormalized: bool, // End node in edge table |
| 47 | + type_column: Option<String>, // Polymorphic discriminator |
| 48 | + shortest_path_mode: Option<...>, // shortestPath optimization |
| 49 | + // ... more fields |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +### 5 Schema Variations × 2 Cases = 10 Code Paths |
| 54 | + |
| 55 | +| Variation | Base Case | Recursive Case | |
| 56 | +|-----------|-----------|----------------| |
| 57 | +| **Standard** | 3-way JOIN (start→edge→end) | Recursive JOIN on prev end_id | |
| 58 | +| **FK-edge** | 2-way JOIN (node→FK target) | Recursive on FK column | |
| 59 | +| **Denormalized** | Single-table scan | Recursive single-table | |
| 60 | +| **Mixed denorm** | Hybrid JOIN | Hybrid recursive | |
| 61 | +| **Polymorphic** | Standard + WHERE type_column = 'X' | Recursive + type filter | |
| 62 | + |
| 63 | +### Key Functions |
| 64 | + |
| 65 | +``` |
| 66 | +generate_cte() |
| 67 | + └─ generate_recursive_sql() |
| 68 | + ├─ generate_heterogeneous_polymorphic_sql() // 2-CTE approach |
| 69 | + └─ standard path: |
| 70 | + ├─ generate_base_case() // First hop |
| 71 | + └─ generate_recursive_case_with_cte_name() // Subsequent hops |
| 72 | +``` |
| 73 | + |
| 74 | +### Critical Branching Points |
| 75 | + |
| 76 | +```rust |
| 77 | +// These booleans control EVERYTHING: |
| 78 | +if self.is_fk_edge { |
| 79 | + // No separate edge table — FK column on node table |
| 80 | + // JOIN: start_table.fk_col = end_table.id |
| 81 | +} |
| 82 | +if self.start_is_denormalized { |
| 83 | + // Start node properties come from edge table, not node table |
| 84 | + // SELECT: edge.start_col AS start_prop (not node.col) |
| 85 | +} |
| 86 | +if self.type_column.is_some() { |
| 87 | + // Polymorphic: add WHERE type_column = 'REL_TYPE' |
| 88 | + // Critical: must appear in BOTH base AND recursive case |
| 89 | +} |
| 90 | +if self.is_heterogeneous_polymorphic_path() { |
| 91 | + // Intermediate hops use different type than final hop |
| 92 | + // Generates TWO CTEs instead of one recursive CTE |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +## multi_type_vlp_joins.rs — Browser Expand |
| 97 | + |
| 98 | +### What It Does |
| 99 | +When browser sends `MATCH path = (a)--(o)` (undirected, all types), generates: |
| 100 | +```sql |
| 101 | +SELECT ... FROM users a JOIN user_follows r ON ... JOIN users u2 ON ... |
| 102 | +UNION ALL |
| 103 | +SELECT ... FROM users a JOIN post_likes r ON ... JOIN posts p2 ON ... |
| 104 | +UNION ALL |
| 105 | +SELECT ... FROM users a JOIN posts p2 ON a.user_id = p2.user_id -- FK-edge |
| 106 | +``` |
| 107 | + |
| 108 | +### Key Function |
| 109 | +``` |
| 110 | +generate_cte_sql(cte_name) |
| 111 | + └─ for each path in enumerate_vlp_paths(): |
| 112 | + generate_path_branch_sql(path, idx) |
| 113 | + └─ generate_select_items(node_type, mode) |
| 114 | +``` |
| 115 | + |
| 116 | +### PropertySelectionMode |
| 117 | +```rust |
| 118 | +enum PropertySelectionMode { |
| 119 | + IdOnly, // Just start_id, end_id |
| 120 | + Individual, // Named columns per type |
| 121 | + WholeNode, // JSON blob (formatRowNoNewline) |
| 122 | +} |
| 123 | +``` |
| 124 | +Browser expand uses `WholeNode` for heterogeneous end nodes (User vs Post). |
| 125 | + |
| 126 | +## to_sql_query.rs — VLP Rewriting |
| 127 | + |
| 128 | +### VLP Alias Rewriting |
| 129 | +After CTEs are generated, SELECT items reference Cypher aliases (`a.name`, `o.name`). |
| 130 | +These must be rewritten to CTE columns (`t.start_name`, `t.end_properties`). |
| 131 | + |
| 132 | +**Critical detection**: |
| 133 | +```rust |
| 134 | +// Standard VLP: FROM is the VLP CTE |
| 135 | +if from_ref.name.starts_with("vlp_") { ... } |
| 136 | + |
| 137 | +// OPTIONAL VLP: FROM is anchor table, VLP is LEFT JOINed |
| 138 | +// Must NOT rewrite anchor node properties! |
| 139 | + |
| 140 | +// WITH+VLP: FROM is VLP CTE, WITH CTE is JOINed |
| 141 | +// Must rewrite JOIN column to WITH CTE's actual ID column |
| 142 | +``` |
| 143 | + |
| 144 | +## Common Bug Patterns |
| 145 | + |
| 146 | +| Pattern | Symptom | Where | |
| 147 | +|---------|---------|-------| |
| 148 | +| Type filter missing in recursive case | Traverses wrong relationship types | `generate_recursive_case`: polymorphic WHERE | |
| 149 | +| FK-edge self-JOIN | Redundant JOIN on same table | `generate_base_case`: `is_fk_edge` + same table | |
| 150 | +| Wrong property source | Column not found | `start_is_denormalized` vs node table | |
| 151 | +| Heterogeneous path filter loss | Wrong intermediate nodes | `generate_heterogeneous_polymorphic_sql` | |
| 152 | +| JSON vs individual columns | Mismatched SELECT in UNION ALL | `PropertySelectionMode` mismatch across branches | |
| 153 | +| VLP rewriting on WITH CTE | Overwrites WITH CTE columns | `rewrite_vlp_select_aliases` not checking FROM type | |
| 154 | + |
| 155 | +## Testing After Changes |
| 156 | + |
| 157 | +```bash |
| 158 | +# VLP-specific tests: |
| 159 | +cargo test variable_length # VLP unit tests |
| 160 | +cargo test multi_type_vlp # Multi-type VLP tests |
| 161 | +cargo test test_vlp_with_cte # VLP+WITH regression |
| 162 | + |
| 163 | +# Manual: test the browser expand query |
| 164 | +curl -X POST localhost:8080/query -H "Content-Type: application/json" \ |
| 165 | + -d '{"query": "MATCH (a:User) WHERE a.user_id = 1 WITH a, size([(a)--() | 1]) AS c MATCH path = (a)--(o) RETURN path, c LIMIT 10", "sql_only": true}' |
| 166 | + |
| 167 | +# Check: no "a_start_id", must have "a_user_id" in JOIN condition |
| 168 | +``` |
| 169 | + |
| 170 | +## Schema Variation Checklist |
| 171 | + |
| 172 | +When modifying VLP generation, verify SQL output for: |
| 173 | +- [ ] Standard: `MATCH (a:User)-[:FOLLOWS*1..3]->(b:User)` |
| 174 | +- [ ] FK-edge: `MATCH (o:Order)-[:PLACED_BY]->(c:Customer)` |
| 175 | +- [ ] Denormalized: `MATCH (a:Airport)-[:FLIGHT*1..2]->(b:Airport)` |
| 176 | +- [ ] Polymorphic: `MATCH (u:User)-[:FOLLOWS]->(f:User)` on `social_polymorphic` schema |
| 177 | +- [ ] Multi-type expand: `MATCH (a:User)--(o)` (browser pattern) |
| 178 | +- [ ] Undirected: `MATCH (a:User)-[r]-(b:User)` (UNION ALL both directions) |
0 commit comments