Finish fix precon.

kellertuer · kellertuer · commit f3404f145a0f · 2025-04-04T16:47:14.000+02:00
diff --git a/Changelog.md b/Changelog.md
@@ -6,6 +6,13 @@ The file was started with Version `0.4`.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.5.10] April 4, 2025
+
+### Fixed
+
+* a proper implementation of the preconditioning for `quasi_Newton`, that can be used instead
+  of or in combination with the initial scaling.
+
 ## [0.5.9] March 24, 2025
 
 ### Added
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "Manopt"
 uuid = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5"
 authors = ["Ronny Bergmann <manopt@ronnybergmann.net>"]
-version = "0.5.9"
+version = "0.5.10"
 
 [deps]
 ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4"
diff --git a/docs/src/solvers/quasi_Newton.md b/docs/src/solvers/quasi_Newton.md
@@ -14,63 +14,63 @@
 The aim is to minimize a real-valued function on a Riemannian manifold, that is
 
 ```math
-\min f(x), \quad x ∈ \mathcal{M}.
+\min f(p), \quad p ∈ \mathcal{M}.
 ```
 
-Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction ``η_k ∈ T_{x_k} \mathcal{M}`` at the current iterate ``x_k`` and a suitable stepsize ``α_k`` along ``\gamma(α) = R_{x_k}(α η_k)``, where ``R: T \mathcal{M} →\mathcal{M}`` is a retraction. The next iterate is obtained by
+Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction ``η_k ∈ T_{p_k} \mathcal{M}`` at the current iterate ``p_k`` and a suitable stepsize ``α_k`` along ``\gamma(α) = R_{p_k}(α η_k)``, where ``R: T \mathcal{M} →\mathcal{M}`` is a retraction. The next iterate is obtained by
 
 ```math
-x_{k+1} = R_{x_k}(α_k η_k).
+p_{k+1} = R_{p_k}(α_k η_k).
 ```
 
 In quasi-Newton methods, the search direction is given by
 
 ```math
-η_k = -{\mathcal{H}_k}^{-1}[\operatorname{grad}f (x_k)] = -\mathcal{B}_k [\operatorname{grad} (x_k)],
+η_k = -{\mathcal{H}_k}^{-1}[\operatorname{grad}f (p_k)] = -\mathcal{B}_k [\operatorname{grad} (p_k)],
 ```
 
-where ``\mathcal{H}_k : T_{x_k} \mathcal{M} →T_{x_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (x_k)[⋅]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(x_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{x_{k+1}} \mathcal{M}`` of the freshly computed iterate ``x_{k+1}``.
+where ``\mathcal{H}_k : T_{p_k} \mathcal{M} →T_{p_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (p_k)[⋅]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(p_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{p_{k+1}} \mathcal{M}`` of the freshly computed iterate ``p_{k+1}``.
 In order to get a well-defined method, the following requirements are placed on the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` that is created by an update.
-Since the Hessian ``\operatorname{Hess} f(x_{k+1})`` is a self-adjoint operator on the tangent space ``T_{x_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, one requirement is, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{x_{k+1}} \mathcal{M}``.
+Since the Hessian ``\operatorname{Hess} f(p_{k+1})`` is a self-adjoint operator on the tangent space ``T_{p_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, one requirement is, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{p_{k+1}} \mathcal{M}``.
 In order to achieve a steady descent, the next requirement is that ``η_k`` is a descent direction in each iteration.
-Hence a further requirement is that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{x_{k+1}} \mathcal{M}``.
+Hence a further requirement is that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{p_{k+1}} \mathcal{M}``.
 In order to get information about the curvature of the objective function into the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``, the last requirement is a form of a Riemannian quasi-Newton equation:
 
 ```math
-\mathcal{H}_{k+1} [T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))] = \operatorname{grad}(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k))
+\mathcal{H}_{k+1} [T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))] = \operatorname{grad}(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))
 ```
 
 or
 
 ```math
-\mathcal{B}_{k+1} [\operatorname{grad}f(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(\operatorname{grad}f(x_k))] = T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))
+\mathcal{B}_{k+1} [\operatorname{grad}f(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))] = T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))
 ```
 
-where ``T_{x_k \rightarrow x_{k+1}} : T_{x_k} \mathcal{M} →T_{x_{k+1}} \mathcal{M}`` and
+where ``T_{p_k \rightarrow p_{k+1}} : T_{p_k} \mathcal{M} →T_{p_{k+1}} \mathcal{M}`` and
 the chosen retraction ``R`` is the associated retraction of ``T``.
 Note that, of course, not all updates in all situations meet these conditions in every iteration.
 For specific quasi-Newton updates, the fulfilment of the Riemannian curvature condition, which requires that
 
 ```math
-g_{x_{k+1}}(s_k, y_k) > 0
+g_{p_{k+1}}(s_k, y_k) > 0
 ```
 
 holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the ``\mathcal{H}_k`` or ``\mathcal{B}_k`` to the operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``. Unfortunately, the fulfilment of the Riemannian curvature condition is not given by a step size ``\alpha_k > 0`` that satisfies the generalized Wolfe conditions. However, to create a positive definite operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` in each iteration, the so-called locking condition was introduced in [HuangGallivanAbsil:2015](@cite), which requires that the isometric vector transport ``T^S``, which is used in the update formula, and its associate retraction ``R`` fulfil
 
 ```math
-T^{S}{x, ξ_x}(ξ_x) = β T^{R}{x, ξ_x}(ξ_x), \quad β = \frac{\lVert ξ_x \rVert_x}{\lVert T^{R}{x, ξ_x}(ξ_x) \rVert_{R_{x}(ξ_x)}},
+T^{S}{p, ξ_p}(ξ_p) = β T^{R}{p, ξ_p}(ξ_p), \quad β = \frac{\lVert ξ_p \rVert_p}{\lVert T^{R}{p, ξ_p}(ξ_p) \rVert_{R_{p}(ξ_p)}},
 ```
 
 where ``T^R`` is the vector transport by differentiated retraction. With the requirement that the isometric vector transport ``T^S`` and its associated retraction ``R`` satisfies the locking condition and using the tangent vector
 
 ```math
-y_k = {β_k}^{-1} \operatorname{grad}f(x_{k+1}) - T^{S}{x_k, α_k η_k}(\operatorname{grad}f(x_k)),
+y_k = {β_k}^{-1} \operatorname{grad}f(p_{k+1}) - T^{S}{p_k, α_k η_k}(\operatorname{grad}f(p_k)),
 ```
 
 where
 
 ```math
-β_k = \frac{\lVert α_k η_k \rVert_{x_k}}{\lVert T^{R}{x_k, α_k η_k}(α_k η_k) \rVert_{x_{k+1}}},
+β_k = \frac{\lVert α_k η_k \rVert_{p_k}}{\lVert T^{R}{p_k, α_k η_k}(α_k η_k) \rVert_{p_{k+1}}},
 ```
 
 in the update, it can be shown that choosing a stepsize ``α_k > 0`` that satisfies the Riemannian Wolfe conditions leads to the fulfilment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite.
@@ -87,6 +87,7 @@ QuasiNewtonMatrixDirectionUpdate
 QuasiNewtonLimitedMemoryDirectionUpdate
 QuasiNewtonCautiousDirectionUpdate
 Manopt.initialize_update!
+QuasiNewtonPreconditioner
 ```
 
 ## Hessian update rules
diff --git a/src/Manopt.jl b/src/Manopt.jl
@@ -425,6 +425,7 @@ export SymmetricLinearSystemObjective
 
 export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate
 export QuasiNewtonMatrixDirectionUpdate
+export QuasiNewtonPreconditioner
 export QuasiNewtonCautiousDirectionUpdate,
     BFGS, InverseBFGS, DFP, InverseDFP, SR1, InverseSR1
 export InverseBroyden, Broyden
diff --git a/src/plans/quasi_newton_plan.jl b/src/plans/quasi_newton_plan.jl
@@ -303,6 +303,60 @@ _doc_QN_B_full_system = raw"""
 ```
 """
 
+"""
+    QuasiNewtonPreconditioner{E<:AbstractEvaluationType, F}
+
+Add a preconditioning
+
+# Fields
+
+* `preconditioner::F`: the preconditioner function
+
+# Constructors
+
+    QuasiNewtonPreconditioner(
+        preconditioner;
+        evaluation::AbstractEvaluationType=AllocatingEvaluation()
+    )
+
+Add preconditioning to a gradient problem.
+
+# Input
+
+* `preconditioner`:   preconditioner function, either as a `(M, p, X)` -> Y` allocating or `(M, Y, p, X) -> Y` mutating function
+
+# Keyword arguments
+
+$(_var(:Keyword, :evaluation))
+"""
+struct QuasiNewtonPreconditioner{E<:AbstractEvaluationType,F}
+    preconditioner::F
+end
+function QuasiNewtonPreconditioner(
+    preconditioner::F; evaluation::E=AllocatingEvaluation()
+) where {E<:AbstractEvaluationType,F}
+    return QuasiNewtonPreconditioner{E,F}(preconditioner)
+end
+#
+#
+# Internally this always works in-place of X
+function (qnp::QuasiNewtonPreconditioner{AllocatingEvaluation})(
+    X, mp::AbstractManoptProblem, s::AbstractGradientSolverState
+)
+    M = get_manifold(mp)
+    p = get_iterate(s)
+    copyto!(M, X, p, qnp.preconditioner(M, p, X))
+    return X
+end
+function (pg::QuasiNewtonPreconditioner{InplaceEvaluation})(
+    X, mp::AbstractManoptProblem, s::AbstractGradientSolverState
+)
+    M = get_manifold(mp)
+    p = get_iterate(s)
+    pg.preconditioner(M, X, p, X)
+    return X
+end
+
 @doc """
     QuasiNewtonMatrixDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate
 
@@ -358,7 +412,7 @@ $(_var(:Field, :vector_transport_method))
 
 ## Keyword arguments
 
-* `initial_scale=1.0`
+* `initial_scale=1.0` – this can also be deactivated by passing `nothing`.
 $(_var(:Keyword, :vector_transport_method))
 
 Generate the Update rule with defaults from a manifold and the names corresponding to the fields.
@@ -374,7 +428,7 @@ mutable struct QuasiNewtonMatrixDirectionUpdate{
     B<:AbstractBasis,
     VT<:AbstractVectorTransportMethod,
     M<:AbstractMatrix,
-    F<:Real,
+    F<:Union{<:Real,Nothing},
 } <: AbstractQuasiNewtonDirectionUpdate
     basis::B
     matrix::M
@@ -383,7 +437,7 @@ mutable struct QuasiNewtonMatrixDirectionUpdate{
     vector_transport_method::VT
 end
 function status_summary(d::QuasiNewtonMatrixDirectionUpdate)
-    return "$(d.update) with initial scaling $(d.initial_scale) and vector transport method $(d.vector_transport_method)."
+    return "$(d.update) with $(!isnothing(d.initial_scale) ? "initial scaling $(d.initial_scale) and" : "") vector transport method $(d.vector_transport_method)."
 end
 function show(io::IO, d::QuasiNewtonMatrixDirectionUpdate)
     s = """
@@ -403,7 +457,7 @@ function QuasiNewtonMatrixDirectionUpdate(
     MT<:AbstractMatrix,
     B<:AbstractBasis,
     V<:AbstractVectorTransportMethod,
-    F<:Real,
+    F<:Union{<:Real,Nothing},
 }
     return QuasiNewtonMatrixDirectionUpdate{U,B,V,MT,F}(
         basis, m, initial_scale, update, vector_transport_method
@@ -419,7 +473,9 @@ function (d::QuasiNewtonMatrixDirectionUpdate{T})(
     M = get_manifold(mp)
     p = get_iterate(st)
     X = get_gradient(st)
-    get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, X, d.basis), d.basis)
+    copyto!(M, r, p, X)
+    st.preconditioner(r, mp, st)
+    get_vector!(M, r, p, -d.matrix * get_coordinates(M, p, r, d.basis), d.basis)
     return r
 end
 function (d::QuasiNewtonMatrixDirectionUpdate{T})(
@@ -481,7 +537,7 @@ function is always included and the old, probably no longer relevant, informatio
 * `memory_y`:                set of the stored gradient differences ``$(_math(:Sequence, _tex(:widehat, "y"), "i", "k-m", "k-1"))``.
 * `ξ`:                       a variable used in the two-loop recursion.
 * `ρ`;                       a variable used in the two-loop recursion.
-* `initial_scale`:           initial scaling of the Hessian
+* `initial_scale`:           initial scaling of the Hessian, deactivate (e.g. when using a preconditioner) by passing `nothing`
 $(_var(:Field, :vector_transport_method))
 * `message`:                 a string containing a potential warning that might have appeared
 * `project!`:                a function to stabilize the update by projecting on the tangent space
@@ -509,14 +565,15 @@ mutable struct QuasiNewtonLimitedMemoryDirectionUpdate{
     T,
     F,
     V<:AbstractVector{F},
+    G<:Union{F,Nothing},
     VT<:AbstractVectorTransportMethod,
     Proj,
 } <: AbstractQuasiNewtonDirectionUpdate
     memory_s::CircularBuffer{T}
     memory_y::CircularBuffer{T}
     ξ::Vector{F}
     ρ::Vector{F}
-    initial_scale::F
+    initial_scale::G
     project!::Proj
     vector_transport_method::VT
     message::String
@@ -527,21 +584,30 @@ function QuasiNewtonLimitedMemoryDirectionUpdate(
     ::NT,
     memory_size::Int;
     initial_vector::T=zero_vector(M, p),
-    initial_scale::Real=1.0,
+    initial_scale::G=1.0,
     (project!)::Proj=copyto!,
     vector_transport_method::VTM=default_vector_transport_method(M, typeof(p)),
-) where {NT<:AbstractQuasiNewtonUpdateRule,T,VTM<:AbstractVectorTransportMethod,Proj}
+) where {
+    NT<:AbstractQuasiNewtonUpdateRule,
+    T,
+    VTM<:AbstractVectorTransportMethod,
+    Proj,
+    G<:Union{<:Real,Nothing},
+}
     mT = allocate_result_type(
         M, QuasiNewtonLimitedMemoryDirectionUpdate, (p, initial_vector, initial_scale)
     )
     m1 = zeros(mT, memory_size)
     m2 = zeros(mT, memory_size)
-    return QuasiNewtonLimitedMemoryDirectionUpdate{NT,T,mT,typeof(m1),VTM,Proj}(
+    _initial_state = !isnothing(initial_scale) ? convert(mT, initial_scale) : initial_scale
+    return QuasiNewtonLimitedMemoryDirectionUpdate{
+        NT,T,mT,typeof(m1),typeof(_initial_state),VTM,Proj
+    }(
         CircularBuffer{T}(memory_size),
         CircularBuffer{T}(memory_size),
         m1,
         m2,
-        convert(mT, initial_scale),
+        _initial_state,
         project!,
         vector_transport_method,
         "",
@@ -550,7 +616,7 @@ end
 get_message(d::QuasiNewtonLimitedMemoryDirectionUpdate) = d.message
 function status_summary(d::QuasiNewtonLimitedMemoryDirectionUpdate{T}) where {T}
     s = "limited memory $T (size $(length(d.memory_s)))"
-    (d.initial_scale != 1.0) && (s = "$(s) initial scaling $(d.initial_scale)")
+    !isnothing(d.initial_scale) && (s = "$(s) initial scaling $(d.initial_scale)")
     (d.project! !== copyto!) && (s = "$(s), projections, ")
     s = "$(s)and $(d.vector_transport_method) as vector transport."
     return s
@@ -602,8 +668,14 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(r, mp, st)
         return r
     end
     # initial scaling guess
-    r .*=
-        d.initial_scale / (d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2)
+    if !isnothing(d.initial_scale)
+        r .*=
+            d.initial_scale /
+            (d.ρ[last_safe_index] * norm(M, p, d.memory_y[last_safe_index])^2)
+    end
+    # precon
+    st.preconditioner(r, mp, st)
+    #
     # forward pass
     for i in eachindex(d.ρ)
         if abs(d.ρ[i]) > 0
diff --git a/src/solvers/quasi_Newton.jl b/src/solvers/quasi_Newton.jl