|
1 | 1 | # v0.6.2 Release Readiness Assessment |
2 | 2 |
|
3 | | -**Date**: January 18, 2026 |
| 3 | +**Date**: January 19, 2026 |
4 | 4 | **Target**: Stable bugfix release before v0.7.0-beta |
5 | | -**Status**: 🟡 **IN PROGRESS** - 2 critical bugs remain |
| 5 | +**Status**: 🟡 **IN PROGRESS** - 2 critical bugs remain, 4 bugs fixed today |
6 | 6 |
|
7 | 7 | --- |
8 | 8 |
|
9 | 9 | ## Executive Summary |
10 | 10 |
|
11 | | -**Current State**: v0.6.1 has production-ready core functionality but contains 2 known bugs that block important query patterns. A v0.6.2 bugfix release is recommended before moving to beta. |
| 11 | +**Current State**: v0.6.1 has production-ready core functionality but contains bugs that block important query patterns. Fixed 4 bugs today (Bug #9, #10, #11, #14) during systematic integration testing. |
12 | 12 |
|
13 | | -**Test Status**: |
| 13 | +**Test Status** (Jan 19, 2026 - 12:47 PM): |
14 | 14 | - ✅ Unit tests: 770/770 passing (100%) |
15 | | -- ⚠️ Integration tests: Blocked by ClickHouse auth setup |
16 | | -- ✅ Core functionality: Stable and tested |
17 | | - |
18 | | -**Critical Path**: Fix 2 bugs → Validate with integration tests → Release v0.6.2 |
| 15 | +- 🟡 Integration tests: **104/354 passing (29.4%)** - systematic bug fixing in progress |
| 16 | + - TestBasicPatterns: ✅ All passing (24/24) |
| 17 | + - TestMultiHopPatterns: ✅ All passing (16/16) |
| 18 | + - TestVariableLengthPaths: ✅ All passing (16/16) |
| 19 | + - TestShortestPath: ⚠️ Skipped (performance issue - memory exhaustion) |
| 20 | + - TestOptionalMatch: ✅ **All passing (6/6)** - Bug #11 fixed! |
| 21 | + - TestWithChaining: ❌ All xfail (0/12) - Bug #3 (duplicate CTE names) |
| 22 | + - TestAggregations: ✅ Most passing (12/17) |
| 23 | + - TestGroupBy: ✅ **Most passing (3/4)** - Bug #14 fixed! |
| 24 | + - TestOrdering: ✅ All passing (6/8, 2 zeek skipped) |
| 25 | + - TestExpressions: ✅ All passing (18/24, 6 zeek skipped) |
| 26 | + - TestMultiplePatterns: ✅ All passing (2/2 xpassed!) |
| 27 | + - TestFunctions: ✅ Most passing (10/12) |
| 28 | +- ✅ Bug #9 (duplicate alias): **FIXED** (Jan 19) |
| 29 | +- ✅ Bug #10 (WHERE duplication): **FIXED** (Jan 19) |
| 30 | +- ✅ Bug #11 (anyLast wrapping): **FIXED** (Jan 19) |
| 31 | +- ✅ Bug #14 (missing FROM with GROUP BY): **FIXED** (Jan 19) |
| 32 | + |
| 33 | +**Critical Path**: Fix remaining bugs → Validate with integration tests → Release v0.6.2 |
19 | 34 |
|
20 | 35 | --- |
21 | 36 |
|
22 | 37 | ## Known Bugs for v0.6.2 |
23 | 38 |
|
24 | | -### 1. 🔥 **CRITICAL**: OPTIONAL MATCH + VLP Combination |
| 39 | +### 1. 🔥 **CRITICAL**: Missing anyLast() Wrapping for GROUP BY Queries |
| 40 | + |
| 41 | +**Priority**: HIGH (blocks OPTIONAL MATCH + aggregation) |
| 42 | +**Status**: ✅ **FIXED** (Jan 19, 2026) |
| 43 | + |
| 44 | +**Symptom**: |
| 45 | +```cypher |
| 46 | +MATCH (a:User) WHERE a.country >= 'test' |
| 47 | +OPTIONAL MATCH (a)-[r:FOLLOWS]->(b) |
| 48 | +RETURN a, count(b) as cnt |
| 49 | +LIMIT 10 |
| 50 | +``` |
| 51 | +**Error**: `Code: 215. DB::Exception: Column 'a.city' is not under aggregate function and not in GROUP BY keys` |
| 52 | + |
| 53 | +**Root Cause**: |
| 54 | +- When `RETURN a` is expanded to all properties (`a.city`, `a.country`, `a.email`, etc.) |
| 55 | +- AND there's an aggregation (`count(b)`) causing GROUP BY |
| 56 | +- The GROUP BY optimization only includes ID column (`GROUP BY a.user_id`) |
| 57 | +- But SELECT includes ALL columns without `anyLast()` wrapping |
| 58 | +- ClickHouse requires: non-aggregated, non-grouped columns must use aggregate functions |
| 59 | + |
| 60 | +**Generated SQL** (broken): |
| 61 | +```sql |
| 62 | +SELECT |
| 63 | + a.city AS "a.city", -- ❌ Not in GROUP BY, not wrapped |
| 64 | + a.country AS "a.country", -- ❌ Not in GROUP BY, not wrapped |
| 65 | + a.user_id AS "a.user_id", -- ✅ In GROUP BY (OK) |
| 66 | + count(b.user_id) AS "cnt" -- ✅ Aggregate function (OK) |
| 67 | +FROM users_bench AS a |
| 68 | +LEFT JOIN user_follows_bench AS r ... |
| 69 | +GROUP BY a.user_id -- Only ID column |
| 70 | +LIMIT 10 |
| 71 | +``` |
| 72 | + |
| 73 | +**Should Generate**: |
| 74 | +```sql |
| 75 | +SELECT |
| 76 | + anyLast(a.city) AS "a.city", -- ✅ Wrapped |
| 77 | + anyLast(a.country) AS "a.country", -- ✅ Wrapped |
| 78 | + a.user_id AS "a.user_id", -- ✅ In GROUP BY |
| 79 | + count(b.user_id) AS "cnt" -- ✅ Aggregate |
| 80 | +FROM users_bench AS a |
| 81 | +LEFT JOIN user_follows_bench AS r ... |
| 82 | +GROUP BY a.user_id |
| 83 | +LIMIT 10 |
| 84 | +``` |
| 85 | + |
| 86 | +**Impact**: |
| 87 | +- **Severity**: HIGH - Blocks OPTIONAL MATCH + aggregation + RETURN node |
| 88 | +- **Frequency**: HIGH - Common pattern in analytics |
| 89 | +- **Workaround**: Return only specific properties instead of entire node |
| 90 | +- **Blocked queries**: `RETURN a, count(b)`, `RETURN n, avg(r.weight)` |
| 91 | +- **Test Impact**: Blocks TestOptionalMatch::test_optional_match_with_filter (3/4 tests) |
| 92 | + |
| 93 | +**Technical Details**: |
| 94 | +- Issue discovered during systematic integration testing (Jan 19) |
| 95 | +- Existing `anyLast()` infrastructure in `property_expansion.rs` works correctly |
| 96 | +- Problem: `expand_alias_to_select_items_unified()` not being called for `RETURN a` expansion |
| 97 | +- Current code path: `select_builder.rs` manually expands without aggregation context |
| 98 | +- Fix location: `render_plan/plan_builder.rs::to_render_plan()` - post-process SELECT items after extracting GROUP BY |
| 99 | + |
| 100 | +**Fix Strategy** (Implemented): |
| 101 | +1. ✅ Added helper function `apply_anylast_wrapping_for_group_by()` in `plan_builder.rs` |
| 102 | +2. ✅ Called after extracting SELECT items and GROUP BY in `to_render_plan()` (lines ~720, ~750) |
| 103 | +3. ✅ For each SELECT item that is `PropertyAccessExp`: |
| 104 | + - Skip if already an aggregate function |
| 105 | + - Skip if in GROUP BY expressions |
| 106 | + - Skip if ID column (ends with `_id`, `.id`, or equals `id`) |
| 107 | + - Otherwise wrap with `anyLast(PropertyAccessExp)` |
| 108 | +4. ✅ Tested: All TestOptionalMatch tests now passing (6/6) |
| 109 | + |
| 110 | +**Files Modified**: |
| 111 | +- `src/render_plan/plan_builder.rs` - Added post-processing function (lines 102-198) |
| 112 | +- Applied in both GraphJoins and GraphRel cases (lines ~720, ~750) |
| 113 | + |
| 114 | +**Test Results**: |
| 115 | +- ✅ TestOptionalMatch: 6/6 passing (was 3/6 before fix) |
| 116 | +- ✅ Generated SQL correctly includes `anyLast()` for non-ID columns |
| 117 | +- ✅ ID columns remain unwrapped in SELECT |
| 118 | +- ✅ Aggregate functions remain unchanged |
| 119 | + |
| 120 | +**Estimated Fix Time**: 1-2 days ✅ **ACTUAL: 4 hours** |
| 121 | + |
| 122 | +**Related Functionality**: |
| 123 | +- ✅ Works in CTEs (existing code in CTE generation) |
| 124 | +- ✅ Works when called via `expand_alias_to_select_items_unified()` |
| 125 | +- ❌ Broken in main SELECT when `RETURN a` is expanded |
| 126 | + |
| 127 | +--- |
| 128 | + |
| 129 | +### 2. 🔥 **CRITICAL**: OPTIONAL MATCH + VLP Combination |
25 | 130 |
|
26 | 131 | **Priority**: HIGH (blocks multiple query patterns) |
27 | 132 | **Status**: 🐛 **UNFIXED** |
@@ -56,7 +161,7 @@ RETURN a.name, COUNT(DISTINCT b) as reachable |
56 | 161 |
|
57 | 162 | --- |
58 | 163 |
|
59 | | -### 2. 🔥 **CRITICAL**: Duplicate CTE Names with Chained WITH |
| 164 | +### 3. 🔥 **CRITICAL**: Duplicate CTE Names with Chained WITH |
60 | 165 |
|
61 | 166 | **Priority**: HIGH (blocks common patterns) |
62 | 167 | **Status**: 🐛 **UNFIXED** |
@@ -105,8 +210,96 @@ SELECT total, cnt FROM with_total_cte_1 |
105 | 210 |
|
106 | 211 | --- |
107 | 212 |
|
| 213 | +### 4. ⚡ **PERFORMANCE**: Shortest Path Memory Exhaustion on Large Graphs |
| 214 | + |
| 215 | +**Priority**: MEDIUM (performance optimization) |
| 216 | +**Status**: 🐛 **IDENTIFIED** (Jan 19, 2026) |
| 217 | + |
| 218 | +**Symptom**: |
| 219 | +```cypher |
| 220 | +MATCH p = shortestPath((a:User)-[:FOLLOWS*]->(b:User)) |
| 221 | +WHERE a.user_id = 1 |
| 222 | +RETURN p LIMIT 10 |
| 223 | +``` |
| 224 | +**Error**: `Code: 241. DB::Exception: memory limit exceeded: would use 70.69 GiB` |
| 225 | + |
| 226 | +**Root Cause**: |
| 227 | +- Shortest path without end node filter explores ALL paths from start to all reachable nodes |
| 228 | +- Query tracks millions of paths simultaneously before applying LIMIT |
| 229 | +- ClickHouse memory limit (70.69 GiB) exceeded on large social networks |
| 230 | +- No early termination optimization when LIMIT is reached |
| 231 | + |
| 232 | +**Generated SQL** (syntactically correct): |
| 233 | +```sql |
| 234 | +-- CTE correctly includes start filter |
| 235 | +WHERE start_node.user_id = 1 |
| 236 | + |
| 237 | +-- Outer query correctly has no WHERE clause (Bug #10 fixed) |
| 238 | +SELECT p.* FROM vlp_a_b AS p LIMIT 1 |
| 239 | +``` |
| 240 | + |
| 241 | +**Impact**: |
| 242 | +- **Severity**: MEDIUM - SQL is correct, but impractical for large graphs |
| 243 | +- **Frequency**: LOW - Only affects shortest path on large graphs without end filter |
| 244 | +- **Workaround**: Add end node filter (e.g., `WHERE a.user_id = 1 AND b.user_id = 100`) |
| 245 | +- **Blocked queries**: TestShortestPath integration test class |
| 246 | +- **Category**: Performance optimization, not syntax bug |
| 247 | + |
| 248 | +**Fixes Required (Bugs #9 & #10)**: |
| 249 | +- ✅ Bug #9: Path variable duplicate alias (`SELECT p.*` instead of `SELECT p AS "p"`) |
| 250 | +- ✅ Bug #10: WHERE clause duplication (skip GraphRel filter extraction for VLP/shortest path) |
| 251 | +- **Both bugs fixed Jan 19** - SQL generation now correct |
| 252 | + |
| 253 | +**Potential Solutions** (future optimization work): |
| 254 | +- Implement early termination when LIMIT rows are reached |
| 255 | +- Add bidirectional search optimization (meet-in-middle) |
| 256 | +- Require end node filter for large graphs |
| 257 | +- Add query-level resource limits/timeouts |
| 258 | +- Stream results instead of materializing all paths |
| 259 | + |
| 260 | +**Files Involved**: |
| 261 | +- `src/clickhouse_query_generator/to_sql_query.rs` - Bug #9 fix (SELECT items) |
| 262 | +- `src/render_plan/filter_builder.rs` - Bug #10 fix (GraphRel filter extraction) |
| 263 | +- `src/query_planner/optimizer/` - Future optimization passes |
| 264 | + |
| 265 | +**Estimated Fix Time**: 4-5 days (future work) |
| 266 | +- Day 1: Design early termination strategy |
| 267 | +- Day 2: Implement bidirectional search |
| 268 | +- Day 3-4: Add streaming result support |
| 269 | +- Day 5: Test and validate performance gains |
| 270 | + |
| 271 | +**Status**: Deferred to future optimization work (post-v0.6.2) |
| 272 | + |
| 273 | +--- |
| 274 | + |
108 | 275 | ## Recently Fixed (v0.6.1 → v0.6.2) |
109 | 276 |
|
| 277 | +### ✅ Bug #11: Missing anyLast() Wrapping for GROUP BY (Fixed Jan 19) |
| 278 | +- **Was**: `RETURN a, count(b)` failed with "Column not under aggregate function" |
| 279 | +- **Fix**: Post-process SELECT items to wrap non-ID, non-aggregated columns with `anyLast()` when GROUP BY present |
| 280 | +- **Impact**: Unblocked OPTIONAL MATCH + aggregation + RETURN node (TestOptionalMatch now 6/6 passing) |
| 281 | +- **File**: `src/render_plan/plan_builder.rs` (added 102 lines) |
| 282 | +- **Time**: 4 hours (faster than 1-2 day estimate) |
| 283 | + |
| 284 | +### ✅ Bug #14: Missing FROM Clause with GROUP BY (Fixed Jan 19) |
| 285 | +- **Was**: Single-node queries with GROUP BY failed with missing FROM clause |
| 286 | +- **Example**: `MATCH (n:User) RETURN n.name, count(*)` generated SQL without FROM |
| 287 | +- **Root Cause**: `find_graph_node()` helper couldn't traverse through `GroupBy` logical plan nodes |
| 288 | +- **Fix**: Added `LogicalPlan::GroupBy` case to traverse helper in `extract_from_graph_joins()` |
| 289 | +- **Impact**: Unblocked single-node aggregation queries (TestGroupBy now 3/4 passing) |
| 290 | +- **File**: `src/render_plan/from_builder.rs` (1 line added at line 554) |
| 291 | +- **Time**: 30 minutes |
| 292 | + |
| 293 | +### ✅ Bug #9: Path Variable Duplicate Alias (Fixed Jan 19) |
| 294 | +- **Was**: `RETURN p` failed with "Already registered p AS p" error |
| 295 | +- **Fix**: Render path variable as `p.*` instead of `p AS "p"` in SELECT |
| 296 | +- **Impact**: Unblocked path variable queries with shortest path/VLP |
| 297 | + |
| 298 | +### ✅ Bug #10: WHERE Clause Duplication in VLP (Fixed Jan 19) |
| 299 | +- **Was**: VLP queries failed with "Unknown identifier a.user_id" in outer SELECT |
| 300 | +- **Fix**: Skip GraphRel filter extraction for VLP/shortest path (filters already in CTE) |
| 301 | +- **Impact**: Proper WHERE clause scoping for variable-length path queries |
| 302 | + |
110 | 303 | ### ✅ MULTI_TABLE_LABEL Node Expansion (Fixed Jan 12) |
111 | 304 | - **Was**: `RETURN n` failed with dotted column names (e.g., `id.orig_h`) |
112 | 305 | - **Fix**: Changed column name extraction to preserve dotted columns |
|
0 commit comments