Skip to content

Latest commit

 

History

History
143 lines (103 loc) · 7.78 KB

File metadata and controls

143 lines (103 loc) · 7.78 KB

Riemannian quasi-Newton methods

    CurrentModule = Manopt
    quasi_Newton
    quasi_Newton!

Background

The aim is to minimize a real-valued function on a Riemannian manifold, that is

$$\min f(p), \quad p ∈ \mathcal{M}.$$

Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction η_k ∈ T_{p_k} \mathcal{M} at the current iterate p_k and a suitable stepsize α_k along \gamma(α) = R_{p_k}(α η_k), where R: T \mathcal{M} →\mathcal{M} is a retraction. The next iterate is obtained by

$$p_{k+1} = R_{p_k}(α_k η_k).$$

In quasi-Newton methods, the search direction is given by

$$η_k = -{\mathcal{H}_k}^{-1}[\operatorname{grad}f (p_k)] = -\mathcal{B}_k [\operatorname{grad} (p_k)],$$

where \mathcal{H}_k : T_{p_k} \mathcal{M} →T_{p_k} \mathcal{M} is a positive definite self-adjoint operator, which approximates the action of the Hessian \operatorname{Hess} f (p_k)[⋅] and \mathcal{B}_k = {\mathcal{H}_k}^{-1}. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator \operatorname{Hess} f(p_{k+1}) or its inverse at every iteration, the previous operator \mathcal{H}_k or \mathcal{B}_k is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator \mathcal{H}_{k+1} or \mathcal{B}_{k+1} acts on the tangent space T_{p_{k+1}} \mathcal{M} of the freshly computed iterate p_{k+1}. In order to get a well-defined method, the following requirements are placed on the new operator \mathcal{H}_{k+1} or \mathcal{B}_{k+1} that is created by an update. Since the Hessian \operatorname{Hess} f(p_{k+1}) is a self-adjoint operator on the tangent space T_{p_{k+1}} \mathcal{M}, and \mathcal{H}_{k+1} approximates it, one requirement is, that \mathcal{H}_{k+1} or \mathcal{B}_{k+1} is also self-adjoint on T_{p_{k+1}} \mathcal{M}. In order to achieve a steady descent, the next requirement is that η_k is a descent direction in each iteration. Hence a further requirement is that \mathcal{H}_{k+1} or \mathcal{B}_{k+1} is a positive definite operator on T_{p_{k+1}} \mathcal{M}. In order to get information about the curvature of the objective function into the new operator \mathcal{H}_{k+1} or \mathcal{B}_{k+1}, the last requirement is a form of a Riemannian quasi-Newton equation:

$$\mathcal{H}_{k+1} [T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))] = \operatorname{grad}(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))$$

or

$$\mathcal{B}_{k+1} [\operatorname{grad}f(p_{k+1}) - T_{p_k \rightarrow p_{k+1}}(\operatorname{grad}f(p_k))] = T_{p_k \rightarrow p_{k+1}}({R_{p_k}}^{-1}(p_{k+1}))$$

where T_{p_k \rightarrow p_{k+1}} : T_{p_k} \mathcal{M} →T_{p_{k+1}} \mathcal{M} and the chosen retraction R is the associated retraction of T. Note that, of course, not all updates in all situations meet these conditions in every iteration. For specific quasi-Newton updates, the fulfilment of the Riemannian curvature condition, which requires that

$$g_{p_{k+1}}(s_k, y_k) > 0$$

holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the \mathcal{H}_k or \mathcal{B}_k to the operator \mathcal{H}_{k+1} or \mathcal{B}_{k+1}. Unfortunately, the fulfilment of the Riemannian curvature condition is not given by a step size \alpha_k > 0 that satisfies the generalized Wolfe conditions. However, to create a positive definite operator \mathcal{H}_{k+1} or \mathcal{B}_{k+1} in each iteration, the so-called locking condition was introduced in HuangGallivanAbsil:2015, which requires that the isometric vector transport T^S, which is used in the update formula, and its associate retraction R fulfil

$$T^{S}{p, ξ_p}(ξ_p) = β T^{R}{p, ξ_p}(ξ_p), \quad β = \frac{\lVert ξ_p \rVert_p}{\lVert T^{R}{p, ξ_p}(ξ_p) \rVert_{R_{p}(ξ_p)}},$$

where T^R is the vector transport by differentiated retraction. With the requirement that the isometric vector transport T^S and its associated retraction R satisfies the locking condition and using the tangent vector

$$y_k = {β_k}^{-1} \operatorname{grad}f(p_{k+1}) - T^{S}{p_k, α_k η_k}(\operatorname{grad}f(p_k)),$$

where

$$β_k = \frac{\lVert α_k η_k \rVert_{p_k}}{\lVert T^{R}{p_k, α_k η_k}(α_k η_k) \rVert_{p_{k+1}}},$$

in the update, it can be shown that choosing a stepsize α_k > 0 that satisfies the Riemannian Wolfe conditions leads to the fulfilment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite. In the following the specific operators are denoted in matrix notation and hence use H_k and B_k, respectively.

Direction updates

In general there are different ways to compute a fixed AbstractQuasiNewtonUpdateRule. In general these are represented by

AbstractQuasiNewtonDirectionUpdate
QuasiNewtonMatrixDirectionUpdate
QuasiNewtonLimitedMemoryDirectionUpdate
QuasiNewtonCautiousDirectionUpdate
Manopt.initialize_update!
QuasiNewtonPreconditioner
QuasiNewtonLimitedMemoryBoxDirectionUpdate

Hessian update rules

Using

update_hessian!

the following update formulae for either H_{k+1} or B_{k+1} are available.

AbstractQuasiNewtonUpdateRule
BFGS
DFP
Broyden
SR1
InverseBFGS
InverseDFP
InverseBroyden
InverseSR1

State

The quasi Newton algorithm is based on a DefaultManoptProblem.

QuasiNewtonState

[Technical details](@id sec-qn-technical-details)

The quasi_Newton solver requires the following functions of a manifold to be available

  • A [retract!](@extref ManifoldsBase :doc:retractions)(M, q, p, X); it is recommended to set the [default_retraction_method](@extref ManifoldsBase.default_retraction_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a retraction_method= does not have to be specified.
  • A [vector_transport_to!](@extref ManifoldsBase :doc:vector_transports)M, Y, p, X, q); it is recommended to set the [default_vector_transport_method](@extref ManifoldsBase.default_vector_transport_method-Tuple{AbstractManifold}) to a favourite retraction. If this default is set, a vector_transport_method= or vector_transport_method_dual= (for \mathcal N) does not have to be specified.
  • By default quasi Newton uses ArmijoLinesearch which requires max_stepsize(M) to be set and an implementation of [inner](@extref ManifoldsBase.inner-Tuple{AbstractManifold, Any, Any, Any})(M, p, X).
  • the [norm](@extref LinearAlgebra.norm-Tuple{AbstractManifold, Any, Any}) as well, to stop when the norm of the gradient is small, but if you implemented inner, the norm is provided already.
  • A [copyto!](@extref Base.copyto!-Tuple{AbstractManifold, Any, Any})(M, q, p) and [copy](@extref Base.copy-Tuple{AbstractManifold, Any})(M,p) for points and similarly copy(M, p, X) for tangent vectors.
  • By default the tangent vector storing the gradient is initialized calling [zero_vector](@extref ManifoldsBase.zero_vector-Tuple{AbstractManifold, Any})(M,p).

Most Hessian approximations further require [get_coordinates](@extref ManifoldsBase.get_coordinates)(M, p, X, b) with respect to the [AbstractBasis](@extref ManifoldsBase.AbstractBasis) b provided, which is [DefaultOrthonormalBasis](@extref ManifoldsBase.DefaultOrthonormalBasis) by default from the basis= keyword.

Literature

Pages = ["quasi_Newton.md"]
Canonical=false