Skip to content

Commit f3404f1

Browse files
committed
Finish fix precon.
1 parent 32aa893 commit f3404f1

6 files changed

Lines changed: 141 additions & 45 deletions

File tree

Changelog.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,13 @@ The file was started with Version `0.4`.
66
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
77
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
88

9+
## [0.5.10] April 4, 2025
10+
11+
### Fixed
12+
13+
* a proper implementation of the preconditioning for `quasi_Newton`, that can be used instead
14+
of or in combination with the initial scaling.
15+
916
## [0.5.9] March 24, 2025
1017

1118
### Added

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "Manopt"
22
uuid = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5"
33
authors = ["Ronny Bergmann <manopt@ronnybergmann.net>"]
4-
version = "0.5.9"
4+
version = "0.5.10"
55

66
[deps]
77
ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4"

docs/src/solvers/quasi_Newton.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,63 +14,63 @@
1414
The aim is to minimize a real-valued function on a Riemannian manifold, that is
1515

1616
```math
17-
\min f(x), \quad x ∈ \mathcal{M}.
17+
\min f(p), \quad p ∈ \mathcal{M}.
1818
```
1919

20-
Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction ``η_k ∈ T_{x_k} \mathcal{M}`` at the current iterate ``x_k`` and a suitable stepsize ``α_k`` along ``\gamma(α) = R_{x_k}(α η_k)``, where ``R: T \mathcal{M} →\mathcal{M}`` is a retraction. The next iterate is obtained by
20+
Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction ``η_k ∈ T_{p_k} \mathcal{M}`` at the current iterate ``p_k`` and a suitable stepsize ``α_k`` along ``\gamma(α) = R_{p_k}(α η_k)``, where ``R: T \mathcal{M} →\mathcal{M}`` is a retraction. The next iterate is obtained by
2121

2222
```math
23-
x_{k+1} = R_{x_k}(α_k η_k).
23+
p_{k+1} = R_{p_k}(α_k η_k).
2424
```
2525

2626
In quasi-Newton methods, the search direction is given by
2727

2828
```math
29-
η_k = -{\mathcal{H}_k}^{-1}[\operatorname{grad}f (x_k)] = -\mathcal{B}_k [\operatorname{grad} (x_k)],
29+
η_k = -{\mathcal{H}_k}^{-1}[\operatorname{grad}f (p_k)] = -\mathcal{B}_k [\operatorname{grad} (p_k)],
3030
```
3131

32-
where ``\mathcal{H}_k : T_{x_k} \mathcal{M} →T_{x_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (x_k)[⋅]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(x_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{x_{k+1}} \mathcal{M}`` of the freshly computed iterate ``x_{k+1}``.
32+
where ``\mathcal{H}_k : T_{p_k} \mathcal{M} →T_{p_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (p_k)[⋅]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(p_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{p_{k+1}} \mathcal{M}`` of the freshly computed iterate ``p_{k+1}``.
3333
In order to get a well-defined method, the following requirements are placed on the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` that is created by an update.
34-
Since the Hessian ``\operatorname{Hess} f(x_{k+1})`` is a self-adjoint operator on the tangent space ``T_{x_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, one requirement is, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{x_{k+1}} \mathcal{M}``.
34+
Since the Hessian ``\operatorname{Hess} f(p_{k+1})`` is a self-adjoint operator on the tangent space ``T_{p_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, one requirement is, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{p_{k+1}} \mathcal{M}``.
3535
In order to achieve a steady descent, the next requirement is that ``η_k`` is a descent direction in each iteration.
36-
Hence a further requirement is that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{x_{k+1}} \mathcal{M}``.
36+
Hence a further requirement is that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{p_{k+1}} \mathcal{M}``.
3737
In order to get information about the curvature of the objective function into the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``, the last requirement is a form of a Riemannian quasi-Newton equation:
3838

3939
```math
40-
\mathcal{H}_{k+1} [T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))] = \operatorname{grad}(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k))
40+
\mathcal{H}_{k+1} [T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))] = \operatorname{grad}(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))
4141
```
4242

4343
or
4444

4545
```math
46-
\mathcal{B}_{k+1} [\operatorname{grad}f(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k))] = T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))
46+
\mathcal{B}_{k+1} [\operatorname{grad}f(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))] = T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))
4747
```
4848

49-
where ``T_{x_k \rightarrow x_{k+1}} : T_{x_k} \mathcal{M} →T_{x_{k+1}} \mathcal{M}`` and
49+
where ``T_{p_k \rightarrow p_{k+1}} : T_{p_k} \mathcal{M} →T_{p_{k+1}} \mathcal{M}`` and
5050
the chosen retraction ``R`` is the associated retraction of ``T``.
5151
Note that, of course, not all updates in all situations meet these conditions in every iteration.
5252
For specific quasi-Newton updates, the fulfilment of the Riemannian curvature condition, which requires that
5353

5454
```math
55-
g_{x_{k+1}}(s_k, y_k) > 0
55+
g_{p_{k+1}}(s_k, y_k) > 0
5656
```
5757

5858
holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the ``\mathcal{H}_k`` or ``\mathcal{B}_k`` to the operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``. Unfortunately, the fulfilment of the Riemannian curvature condition is not given by a step size ``\alpha_k > 0`` that satisfies the generalized Wolfe conditions. However, to create a positive definite operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` in each iteration, the so-called locking condition was introduced in [HuangGallivanAbsil:2015](@cite), which requires that the isometric vector transport ``T^S``, which is used in the update formula, and its associate retraction ``R`` fulfil
5959

6060
```math
61-
T^{S}{x, ξ_x}(ξ_x) = β T^{R}{x, ξ_x}(ξ_x), \quad β = \frac{\lVert ξ_x \rVert_x}{\lVert T^{R}{x, ξ_x}(ξ_x) \rVert_{R_{x}(ξ_x)}},
61+
T^{S}{p, ξ_p}(ξ_p) = β T^{R}{p, ξ_p}(ξ_p), \quad β = \frac{\lVert ξ_p \rVert_p}{\lVert T^{R}{p, ξ_p}(ξ_p) \rVert_{R_{p}(ξ_p)}},
6262
```
6363

6464
where ``T^R`` is the vector transport by differentiated retraction. With the requirement that the isometric vector transport ``T^S`` and its associated retraction ``R`` satisfies the locking condition and using the tangent vector
6565

6666
```math
67-
y_k = {β_k}^{-1} \operatorname{grad}f(x_{k+1}) - T^{S}{x_k, α_k η_k}(\operatorname{grad}f(x_k)),
67+
y_k = {β_k}^{-1} \operatorname{grad}f(p_{k+1}) - T^{S}{p_k, α_k η_k}(\operatorname{grad}f(p_k)),
6868
```
6969

7070
where
7171

7272
```math
73-
β_k = \frac{\lVert α_k η_k \rVert_{x_k}}{\lVert T^{R}{x_k, α_k η_k}(α_k η_k) \rVert_{x_{k+1}}},
73+
β_k = \frac{\lVert α_k η_k \rVert_{p_k}}{\lVert T^{R}{p_k, α_k η_k}(α_k η_k) \rVert_{p_{k+1}}},
7474
```
7575

7676
in the update, it can be shown that choosing a stepsize ``α_k > 0`` that satisfies the Riemannian Wolfe conditions leads to the fulfilment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite.
@@ -87,6 +87,7 @@ QuasiNewtonMatrixDirectionUpdate
8787
QuasiNewtonLimitedMemoryDirectionUpdate
8888
QuasiNewtonCautiousDirectionUpdate
8989
Manopt.initialize_update!
90+
QuasiNewtonPreconditioner
9091
```
9192

9293
## Hessian update rules

src/Manopt.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,7 @@ export SymmetricLinearSystemObjective
425425

426426
export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate
427427
export QuasiNewtonMatrixDirectionUpdate
428+
export QuasiNewtonPreconditioner
428429
export QuasiNewtonCautiousDirectionUpdate,
429430
BFGS, InverseBFGS, DFP, InverseDFP, SR1, InverseSR1
430431
export InverseBroyden, Broyden

src/plans/quasi_newton_plan.jl

Lines changed: 86 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -303,6 +303,60 @@ _doc_QN_B_full_system = raw"""
303303
```
304304
"""
305305

306+
"""
307+
QuasiNewtonPreconditioner{E<:AbstractEvaluationType, F}
308+
309+
Add a preconditioning
310+
311+
# Fields
312+
313+
* `preconditioner::F`: the preconditioner function
314+
315+
# Constructors
316+
317+
QuasiNewtonPreconditioner(
318+
preconditioner;
319+
evaluation::AbstractEvaluationType=AllocatingEvaluation()
320+
)
321+
322+
Add preconditioning to a gradient problem.
323+
324+
# Input
325+
326+
* `preconditioner`: preconditioner function, either as a `(M, p, X)` -> Y` allocating or `(M, Y, p, X) -> Y` mutating function
327+
328+
# Keyword arguments
329+
330+
$(_var(:Keyword, :evaluation))
331+
"""
332+
struct QuasiNewtonPreconditioner{E<:AbstractEvaluationType,F}
333+
preconditioner::F
334+
end
335+
function QuasiNewtonPreconditioner(
336+
preconditioner::F; evaluation::E=AllocatingEvaluation()
337+
) where {E<:AbstractEvaluationType,F}
338+
return QuasiNewtonPreconditioner{E,F}(preconditioner)
339+
end
340+
#
341+
#
342+
# Internally this always works in-place of X
343+
function (qnp::QuasiNewtonPreconditioner{AllocatingEvaluation})(
344+
X, mp::AbstractManoptProblem, s::AbstractGradientSolverState
345+
)
346+
M = get_manifold(mp)
347+
p = get_iterate(s)
348+
copyto!(M, X, p, qnp.preconditioner(M, p, X))
349+
return X
350+
end
351+
function (pg::QuasiNewtonPreconditioner{InplaceEvaluation})(
352+
X, mp::AbstractManoptProblem, s::AbstractGradientSolverState
353+
)
354+
M = get_manifold(mp)
355+
p = get_iterate(s)
356+
pg.preconditioner(M, X, p, X)
357+
return X
358+
end
359+
306360
@doc """
307361
QuasiNewtonMatrixDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate
308362
@@ -358,7 +412,7 @@ $(_var(:Field, :vector_transport_method))
358412
359413
## Keyword arguments
360414
361-
* `initial_scale=1.0`
415+
* `initial_scale=1.0` – this can also be deactivated by passing `nothing`.
362416
$(_var(:Keyword, :vector_transport_method))
363417
364418
Generate the Update rule with defaults from a manifold and the names corresponding to the fields.
@@ -374,7 +428,7 @@ mutable struct QuasiNewtonMatrixDirectionUpdate{
374428
B<:AbstractBasis,
375429
VT<:AbstractVectorTransportMethod,
376430
M<:AbstractMatrix,
377-
F<:Real,
431+
F<:Union{<:Real,Nothing},
378432
} <: AbstractQuasiNewtonDirectionUpdate
379433
basis::B
380434
matrix::M
@@ -383,7 +437,7 @@ mutable struct QuasiNewtonMatrixDirectionUpdate{
383437
vector_transport_method::VT
384438
end
385439
function status_summary(d::QuasiNewtonMatrixDirectionUpdate)
386-
return "$(d.update) with initial scaling $(d.initial_scale) and vector transport method $(d.vector_transport_method)."
440+
return "$(d.update) with $(!isnothing(d.initial_scale) ? "initial scaling $(d.initial_scale) and" : "") vector transport method $(d.vector_transport_method)."
387441
end
388442
function show(io::IO, d::QuasiNewtonMatrixDirectionUpdate)
389443
s = """
@@ -403,7 +457,7 @@ function QuasiNewtonMatrixDirectionUpdate(
403457
MT<:AbstractMatrix,
404458
B<:AbstractBasis,
405459
V<:AbstractVectorTransportMethod,
406-
F<:Real,
460+
F<:Union{<:Real,Nothing},
407461
}
408462
return QuasiNewtonMatrixDirectionUpdate{U,B,V,MT,F}(
409463
basis, m, initial_scale, update, vector_transport_method
@@ -419,7 +473,9 @@ function (d::QuasiNewtonMatrixDirectionUpdate{T})(
419473
M = get_manifold(mp)
420474
p = get_iterate(st)
421475
X = get_gradient(st)
422-
get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
476+
copyto!(M, r, p, X)
477+
st.preconditioner(r, mp, st)
478+
get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, r, d.basis), d.basis)
423479
return r
424480
end
425481
function (d::QuasiNewtonMatrixDirectionUpdate{T})(
@@ -481,7 +537,7 @@ function is always included and the old, probably no longer relevant, informatio
481537
* `memory_y`: set of the stored gradient differences ``$(_math(:Sequence, _tex(:widehat, "y"), "i", "k-m", "k-1"))``.
482538
* `ξ`: a variable used in the two-loop recursion.
483539
* `ρ`; a variable used in the two-loop recursion.
484-
* `initial_scale`: initial scaling of the Hessian
540+
* `initial_scale`: initial scaling of the Hessian, deactivate (e.g. when using a preconditioner) by passing `nothing`
485541
$(_var(:Field, :vector_transport_method))
486542
* `message`: a string containing a potential warning that might have appeared
487543
* `project!`: a function to stabilize the update by projecting on the tangent space
@@ -509,14 +565,15 @@ mutable struct QuasiNewtonLimitedMemoryDirectionUpdate{
509565
T,
510566
F,
511567
V<:AbstractVector{F},
568+
G<:Union{F,Nothing},
512569
VT<:AbstractVectorTransportMethod,
513570
Proj,
514571
} <: AbstractQuasiNewtonDirectionUpdate
515572
memory_s::CircularBuffer{T}
516573
memory_y::CircularBuffer{T}
517574
ξ::Vector{F}
518575
ρ::Vector{F}
519-
initial_scale::F
576+
initial_scale::G
520577
project!::Proj
521578
vector_transport_method::VT
522579
message::String
@@ -527,21 +584,30 @@ function QuasiNewtonLimitedMemoryDirectionUpdate(
527584
::NT,
528585
memory_size::Int;
529586
initial_vector::T=zero_vector(M, p),
530-
initial_scale::Real=1.0,
587+
initial_scale::G=1.0,
531588
(project!)::Proj=copyto!,
532589
vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)),
533-
) where {NT<:AbstractQuasiNewtonUpdateRule,T,VTM<:AbstractVectorTransportMethod,Proj}
590+
) where {
591+
NT<:AbstractQuasiNewtonUpdateRule,
592+
T,
593+
VTM<:AbstractVectorTransportMethod,
594+
Proj,
595+
G<:Union{<:Real,Nothing},
596+
}
534597
mT = allocate_result_type(
535598
M, QuasiNewtonLimitedMemoryDirectionUpdate, (p, initial_vector, initial_scale)
536599
)
537600
m1 = zeros(mT, memory_size)
538601
m2 = zeros(mT, memory_size)
539-
return QuasiNewtonLimitedMemoryDirectionUpdate{NT,T,mT,typeof(m1),VTM,Proj}(
602+
_initial_state = !isnothing(initial_scale) ? convert(mT, initial_scale) : initial_scale
603+
return QuasiNewtonLimitedMemoryDirectionUpdate{
604+
NT,T,mT,typeof(m1),typeof(_initial_state),VTM,Proj
605+
}(
540606
CircularBuffer{T}(memory_size),
541607
CircularBuffer{T}(memory_size),
542608
m1,
543609
m2,
544-
convert(mT, initial_scale),
610+
_initial_state,
545611
project!,
546612
vector_transport_method,
547613
"",
@@ -550,7 +616,7 @@ end
550616
get_message(d::QuasiNewtonLimitedMemoryDirectionUpdate) = d.message
551617
function status_summary(d::QuasiNewtonLimitedMemoryDirectionUpdate{T}) where {T}
552618
s = "limited memory $T (size $(length(d.memory_s)))"
553-
(d.initial_scale != 1.0) && (s = "$(s) initial scaling $(d.initial_scale)")
619+
!isnothing(d.initial_scale) && (s = "$(s) initial scaling $(d.initial_scale)")
554620
(d.project! !== copyto!) && (s = "$(s), projections, ")
555621
s = "$(s)and $(d.vector_transport_method) as vector transport."
556622
return s
@@ -602,8 +668,14 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st)
602668
return r
603669
end
604670
# initial scaling guess
605-
r .*=
606-
d.initial_scale / (d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2)
671+
if !isnothing(d.initial_scale)
672+
r .*=
673+
d.initial_scale /
674+
(d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2)
675+
end
676+
# precon
677+
st.preconditioner(r, mp, st)
678+
#
607679
# forward pass
608680
for i in eachindex(d.ρ)
609681
if abs(d.ρ[i]) > 0

0 commit comments

Comments
 (0)