Skip to content

Commit fadbff9

Browse files
committed
Improve conclusions in iterative KKT solvers findings doc
Add TL;DR section, summary table of all approaches, and restructure the analysis sections to make conclusions more prominent: why every full-KKT method failed (off-diagonal coupling), why CG on normal equations succeeds (preconditioner captures A^T D^{-1} A), and corrected conditioning analysis.
1 parent 042f9fb commit fadbff9

File tree

1 file changed

+80
-30
lines changed

1 file changed

+80
-30
lines changed

docs/iterative_kkt_solvers.md

Lines changed: 80 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,25 @@ This document records the experiments and findings from investigating iterative
44
(matrix-free) alternatives to direct factorization for solving KKT systems in
55
QTQP's primal-dual interior point method.
66

7+
## TL;DR
8+
9+
**CG on the normal equations is the only iterative method that works.** All
10+
attempts to solve the full (n+m) x (n+m) KKT system iteratively failed --
11+
including MINRES with diagonal scaling (4 variants), PETSc GMRES with block
12+
fieldsplit (6 configurations), full ILU (2 configs), and Schur complement
13+
fieldsplit (5 configs). Every method breaks down at small mu (barrier parameter)
14+
because no tested preconditioner adequately captures the off-diagonal A/A^T
15+
coupling that dominates the KKT system as mu -> 0.
16+
17+
CG on normal equations works by reducing to an n x n SPD system where a cheap
18+
diagonal preconditioner (`diag(P) + mu + colsum(A^2/d)`) captures both
19+
diagonal and off-diagonal contributions. This is implemented as
20+
`LinearSolver.CG` and passes all 218 tests.
21+
22+
**If you are revisiting this work**: the barrier is preconditioning, not the
23+
Krylov method. Any future attempt on the full KKT needs a preconditioner that
24+
accounts for A/A^T coupling -- see "Future Directions" at the end.
25+
726
## Background
827

928
QTQP solves QPs via a primal-dual IPM. Each iteration solves multiple KKT systems
@@ -33,7 +52,21 @@ QTQP adds `mu*I` to both diagonal blocks (not just the (1,1) block). This means:
3352
The condition number of the KKT matrix is O(||A||^2 / mu^2), which grows as
3453
mu -> 0. This is the central challenge for iterative methods.
3554

36-
## Approaches Tested
55+
## Summary of Results
56+
57+
| Approach | Passes tests? | Failure mode |
58+
|----------|:---:|---|
59+
| CG on normal equations | YES | -- |
60+
| MINRES + diagonal preconditioner (scipy) | no | scipy checks preconditioned residual, not true residual |
61+
| MINRES unpreconditioned | no | kappa = O(1/mu^2), cannot converge at small mu |
62+
| MINRES + explicit diagonal scaling | no | unscaling amplifies residual by O(1/sqrt(mu)) |
63+
| MINRES + scaling + iterative refinement | no | each refinement step has same amplification |
64+
| PETSc GMRES + ICC/Jacobi fieldsplit | no | block PC ignores A/A^T coupling, breaks down at mu ~ 1e-4 |
65+
| PETSc GMRES + full ILU | no | ILU quality degrades at small mu |
66+
| PETSc Schur complement (5 variants) | no | inner solve needs normal-equations-quality preconditioner |
67+
| Direct LU (baseline) | YES | -- (but O(nnz^1.5) fill-in) |
68+
69+
## Detailed Experiments
3770

3871
### 1. CG on Normal Equations (WORKS -- shipped as `LinearSolver.CG`)
3972

@@ -241,48 +274,65 @@ Schur complement only uses -(D+mu*I) (the (2,2) block), missing the
241274
for the Schur complement would need `diag(A(P+mu*I)^{-1}A^T)` -- which is
242275
exactly what CgNormalEqSolver already computes.
243276

244-
## Why CG on Normal Equations is the Right Approach
245-
246-
The experiments above show a consistent pattern: all iterative methods on the
247-
full (n+m) x (n+m) KKT system fail at small mu because:
248-
249-
1. **Block preconditioners** ignore the A/A^T coupling that dominates at small mu
250-
2. **Full ILU** quality degrades with the condition number
251-
3. **Diagonal scaling** reduces kappa but introduces unscaling amplification
252-
4. **Schur complement** needs a good inner preconditioner, which is the normal
253-
equations diagonal -- exactly what CG on normal equations already uses
254-
255-
CG on normal equations succeeds because:
256-
257-
1. **SPD structure**: The reduction eliminates the indefinite structure, giving
258-
an n x n SPD system amenable to CG
259-
2. **Effective preconditioner**: `diag(N) = diag(P) + mu + colsum(A^2/d)` captures
260-
both the P and A^T D^{-1} A contributions to N's diagonal
261-
3. **Matrix-free**: N is never formed explicitly; each CG iteration costs
262-
O(nnz(A)) via `N v = (P+mu*I)v + A^T(D^{-1}(Av))`
277+
## Why Every Full-KKT Iterative Method Failed
278+
279+
All tested methods on the full (n+m) x (n+m) KKT system share the same root
280+
cause: **no tested preconditioner captures the off-diagonal A/A^T coupling**.
281+
282+
As mu -> 0, the KKT diagonal blocks shrink to O(mu) while the off-diagonal
283+
blocks A, A^T remain O(||A||). This means the off-diagonal coupling dominates
284+
the system, and any preconditioner that ignores it (block-diagonal, diagonal
285+
scaling, ILU with limited fill) becomes ineffective.
286+
287+
Specifically:
288+
- **Block preconditioners** treat A/A^T as zero -- completely wrong at small mu
289+
- **Diagonal scaling** equalizes the diagonal but cannot touch the off-diagonal
290+
structure, and unscaling amplifies the residual by O(1/sqrt(mu))
291+
- **Full ILU** tries to capture coupling but its quality degrades as
292+
kappa = O(1/mu^2) grows -- the incomplete factorization becomes a poor
293+
approximation of a very ill-conditioned matrix
294+
- **Schur complement** properly accounts for coupling in principle, but the
295+
inner solve on the Schur complement S = D + A H^{-1} A^T needs a
296+
preconditioner that captures `diag(A H^{-1} A^T)` -- which is exactly
297+
the normal equations preconditioner
298+
299+
## Why CG on Normal Equations Succeeds
300+
301+
CG on normal equations avoids the preconditioning barrier by **eliminating the
302+
indefinite structure entirely**:
303+
304+
1. **SPD reduction**: Substituting y out gives N = H + A^T D^{-1} A, an n x n
305+
SPD system. No indefiniteness, no sign-dependent amplification.
306+
2. **The preconditioner captures off-diagonal coupling**: `diag(N) = diag(P) + mu + colsum(A^2/d)` includes the `A^T D^{-1} A` contribution. This is
307+
the key difference -- every failed method lacked this.
308+
3. **Matrix-free**: N is never formed; each CG iteration costs O(nnz(A)) via
309+
`N v = (P+mu*I)v + A^T(D^{-1}(Av))`
263310
4. **No fill-in**: Unlike direct methods, there is no sparse factorization
264311

265-
The Schur complement approach via PETSc is mathematically equivalent but with
266-
more complexity and no advantage over the scipy CG implementation.
312+
The PETSc Schur complement approach is mathematically equivalent (the Schur
313+
complement IS the normal equations) but with more complexity and no advantage
314+
over scipy CG with the diagonal preconditioner.
267315

268316
## Conditioning Analysis
269317

270-
Both the full KKT and normal equations N have condition number O(||A||^2/mu^2):
318+
**Key fact**: Both the full KKT and normal equations N have condition number
319+
**O(||A||^2/mu^2)** -- they are equally ill-conditioned.
271320

272321
- **KKT**: eigenvalues in `[-max(d+mu), -mu] U [mu, max(diag(P)+mu) + ||A||^2/mu]`,
273322
giving `kappa(KKT) = O(||A||^2/mu^2)`
274323
- **Normal equations**: `N = P + mu*I + A^T D^{-1} A` has
275324
`lambda_min >= mu` and `lambda_max >= ||A_eq||^2/mu`,
276325
giving `kappa(N) >= ||A_eq||^2/mu^2`
277326

278-
The diagonal preconditioner for N reduces the effective condition number
279-
significantly in practice, making CG iteration counts manageable across all
280-
mu values encountered in the IPM.
327+
CG on normal equations does NOT succeed because of better conditioning. It
328+
succeeds because its diagonal preconditioner is effective (capturing both
329+
diagonal and off-diagonal contributions to N), while no comparably cheap
330+
preconditioner exists for the full indefinite KKT.
281331

282-
An initial claim that QTQP's mu*I regularization gives O(1/mu) conditioning
283-
(vs O(1/mu^2) for standard IPMs) was incorrect. The mu*I terms bound the
284-
smallest eigenvalue at O(mu), but the largest eigenvalue is dominated by the
285-
off-diagonal coupling ||A||^2/mu, giving O(1/mu^2) regardless.
332+
**Corrected misconception**: An initial claim that QTQP's mu*I regularization
333+
gives O(1/mu) conditioning (vs O(1/mu^2) for standard IPMs) was wrong. The
334+
mu*I terms bound the smallest eigenvalue at O(mu), but the largest eigenvalue
335+
is dominated by the off-diagonal coupling ||A||^2/mu, giving O(1/mu^2).
286336

287337
## Code in This Branch
288338

0 commit comments

Comments
 (0)