diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ed530cec3f..bb5f3ac1e3 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -18,7 +18,6 @@ jobs: - uses: julia-actions/setup-julia@v2 with: version: ${{ matrix.julia-version }} - arch: x64 - uses: julia-actions/julia-buildpkg@v1 - uses: julia-actions/julia-runtest@v1 - uses: julia-actions/julia-processcoverage@v1 diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml index d1e08c3c23..a435557be3 100644 --- a/.github/workflows/nightly.yml +++ b/.github/workflows/nightly.yml @@ -15,7 +15,6 @@ jobs: - uses: julia-actions/setup-julia@v2 with: version: ${{ matrix.julia-version }} - arch: x64 - uses: julia-actions/julia-buildpkg@v1 - uses: julia-actions/julia-runtest@v1 env: diff --git a/Changelog.md b/Changelog.md index 9abbf73caa..d78dd80005 100644 --- a/Changelog.md +++ b/Changelog.md @@ -10,6 +10,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 This is a breaking change since the JuMP extension is dropped. +### Added + +* `nonpositive_curvature_behavior` for `QuasiNewtonLimitedMemoryDirectionUpdate` that determines how transported (y, s) vector pairs are treated after transport; if their inner product gets too low, it may lead to non-positive-definite Hessians which needs to be avoided. This resolves issue #549. (#554) +* `GeneralizedCauchyDirectionSubsolver` for handling direction selection in the presence of box (`Hyperrectangle`) constraints in quasi-Newton methods. This allows for L-BFGS-B-style box constraint handling. (#554) +* New stopping criteria: `StopWhenRelativeAPosterioriCostChangeLessOrEqual` and `StopWhenProjectedNegativeGradientNormLess`. (#554). +* `HagerZhangLinesearch` stepsize, a state-of-the-art line search for smooth objectives with cubic interpolation and adaptive Wolfe condition checking. (#554) +* Stopping criteria can now be initialized using `initialize_stepsize!`, similar to solvers. (#554) + + ### Changed * `NonlinearLeastSquaresObjective` is now called `ManifoldNonlinearLeastSquaresObjective` (#569). @@ -26,7 +35,8 @@ discontinue the `JuMP` extension. (#532) * Fixed some text descriptions of a few stopping criteria. * unify naming of fields, `debugDictionary` of the debug state is now called `debug_dictionary` * the `NesterovRule` now also stores an actual `AbstractRetractionMethod` instead of implicitly always using the default one. - +* Line searches consistently respect `stop_when_stepsize_exceeds` keyword argument as a hard limit. (#554) +* `StopWhenChangeLess` falsely claimed to indicate convergence. This is now fixed. (#554) ## [0.5.34] March 3, 2026 @@ -86,7 +96,7 @@ Moved the documentation glossaries to using the new [Glossaries.jl](https://gith * Removed `atol` from `DebugFeasibility` and instead use the one newly added `atol` from the `ConstrainedManifoldObjective`. (#546) * Move from CompatHelper to dependabot to keep track of dependency updates in Julia packages. (#547) * moved the `ManoptTestSuite` module to a sub module `Manopt.Test` within `Manopt.jl`, -so it can be easier reused by others as well (#550) + so it can be easier reused by others as well (#550) * moved to using a `Project.toml` for tests and an overall `[Workspace]`. This also allows finally to run single test files without installing all packages manually, but instead just switching to and instantiating the test environment. (#550) * for compatibility, state also `[source]` entries consistently in the sub `Project.toml` files. (#550) diff --git a/docs/make.jl b/docs/make.jl index 6ee570e723..07e7f20dfd 100755 --- a/docs/make.jl +++ b/docs/make.jl @@ -45,6 +45,7 @@ tutorials_menu = "Implement a solver" => "tutorials/ImplementASolver.md", "Optimize on your own manifold" => "tutorials/ImplementOwnManifold.md", "Do constrained optimization" => "tutorials/ConstrainedOptimization.md", + "Do optimization with bounds" => "tutorials/BoxDomain.md", ] # Check whether all tutorials are rendered, issue a warning if not (and quarto if not set) all_tutorials_exist = true @@ -188,6 +189,7 @@ makedocs(; "Douglas—Rachford" => "solvers/DouglasRachford.md", "Exact Penalty Method" => "solvers/exact_penalty_method.md", "Frank-Wolfe" => "solvers/FrankWolfe.md", + "Generalized Cauchy direction subsolver" => "solvers/generalized_cauchy_direction_subsolver.md", "Gradient Descent" => "solvers/gradient_descent.md", "Interior Point Newton" => "solvers/interior_point_Newton.md", "Levenberg–Marquardt" => "solvers/LevenbergMarquardt.md", diff --git a/docs/src/extensions.md b/docs/src/extensions.md index fe4b32e667..4a076cea73 100644 --- a/docs/src/extensions.md +++ b/docs/src/extensions.md @@ -45,10 +45,12 @@ x_opt = quasi_Newton( ) ``` -In general this defines the following new [stepsize](@ref Stepsize) +In general this defines the following new [stepsize](@ref Stepsize) with helper functions for setting and getting the maximum step size: ```@docs Manopt.LineSearchesStepsize +Manopt.linesearches_get_max_alpha +Manopt.linesearches_set_max_alpha ``` ## Manifolds.jl diff --git a/docs/src/references.bib b/docs/src/references.bib index e93d4b007f..b3fb0e5573 100644 --- a/docs/src/references.bib +++ b/docs/src/references.bib @@ -260,6 +260,33 @@ @book{Boumal:2023 ISBN = {978-1-00-916616-4} } +@article{ByrdNocedalSchnabel:1994, + title = {Representations of quasi-{Newton} matrices and their use in limited memory methods}, + volume = {63}, + issn = {1436-4646}, + doi = {10.1007/BF01582063}, + number = {1}, + journal = {Mathematical Programming}, + author = {Byrd, Richard H. and Nocedal, Jorge and Schnabel, Robert B.}, + month = jan, + year = {1994}, + pages = {129--156}, +} + +@article{ByrdLuNocedalZhu:1995, + title = {A {Limited} {Memory} {Algorithm} for {Bound} {Constrained} {Optimization}}, + volume = {16}, + issn = {1064-8275}, + doi = {10.1137/0916069}, + number = {5}, + journal = {SIAM Journal on Scientific Computing}, + author = {Byrd, Richard H. and Lu, Peihuang and Nocedal, Jorge and Zhu, Ciyou}, + month = sep, + year = {1995}, + note = {Publisher: Society for Industrial and Applied Mathematics}, + pages = {1190--1208}, +} + % --- C % % @@ -882,4 +909,17 @@ @article{ZhangSra:2018 TITLE = {Towards Riemannian accelerated gradient methods}, URL = {https://arxiv.org/abs/1806.02812}, YEAR = {2018}, -} \ No newline at end of file +} + +@article{ZhuByrdLuNocedal:1997, + title = {Algorithm 778: {L}-{BFGS}-{B}: {Fortran} subroutines for large-scale bound-constrained optimization}, + volume = {23}, + issn = {0098-3500}, + doi = {10.1145/279232.279236}, + number = {4}, + journal = {ACM Trans. Math. Softw.}, + author = {Zhu, Ciyou and Byrd, Richard H. and Lu, Peihuang and Nocedal, Jorge}, + month = dec, + year = {1997}, + pages = {550--560}, +} diff --git a/docs/src/solvers/generalized_cauchy_direction_subsolver.md b/docs/src/solvers/generalized_cauchy_direction_subsolver.md new file mode 100644 index 0000000000..cedee9c719 --- /dev/null +++ b/docs/src/solvers/generalized_cauchy_direction_subsolver.md @@ -0,0 +1,73 @@ +# Generalized Cauchy direction subsolver + +The generalized Cauchy direction (GCD) subsolver is a component in optimization algorithms that handle problems with bound constraints. It solves the following problem + +```math +\begin{align*} +\operatorname*{arg\,min}_{Y ∈ T_p D \times \mathcal{M}}&\ m_p(Y), \qquad m_p(Y) = ⟨X_g, Y⟩_p + \frac{1}{2} ⟨\mathcal{H}_p[Y], Y⟩_p\\ +\text{such that}& \ \exp_p(Y) = \exp_p(\alpha X) \in D \times \mathcal{M} \text{ for some } \alpha \in [0, A] +\end{align*} +``` + +where $X=(X_{\mathrm{D}}, X_{\mathcal{M}})$ is a given direction, the exponential map handles projection of the tangent vector when reaching the boundary, $D$ is a box domain ([`Hyperrectangle`](@extref Manifolds.Hyperrectangle)), $\mathcal{M}$ is a Riemannian manifold, $X_g$ is the gradient of a scalar function $f$ at point $p=(p_{\mathrm{D}}, p_{\mathcal{M}})$, $A$ is the maximum allowed step size on $\mathcal{M}$ at point $p=(p_{\mathrm{D}}, p_{\mathcal{M}})$ in direction $X_{\mathcal{M}}$ (infinity is supported) and $\mathcal{H}_p$ is a linear operator that approximates the Hessian of $f$ at $p$. + +Additionally, the subsolver indicates whether the selected direction $Y$ reaches the boundary of $D$ at some point, in which case the subsequent step size selection in direction $Y$ needs to be limited to the interval $[0, s_{\max}]$, where the number $1 ≤ s_{\max} ≤ ∞$ is also returned by the subsolver. +Note that the value $s_{\max}=1$ is obtained when the minimum lies at the boundary of $D$, while larger values indicate that we are further away from the boundary along the selected direction $X$. + +The solver is currently primarily intended for internal use by optimization algorithms that require bound-constrained subproblem solutions. + +## Simple stepsize limiting + +In case there is no Hessian approximation available, a simple stepsize limiting procedure is can be used to limit the stepsize in direction $X$ to the maximum allowed by the boundary of $D$ and the maximum allowed stepsize on $\mathcal{M}$. +This procedure is available using the following: + +```@docs +Manopt.MaxStepsizeInDirectionSubsolver +Manopt.find_max_stepsize_in_direction +``` + +## Internal types and method + +### Symbols related to the GCD computation + +These symbols are directly used by solvers to compute the descent direction corresponding to the Generalized Cauchy direction. + +```@docs +Manopt.has_anisotropic_max_stepsize +Manopt.find_generalized_cauchy_direction! +Manopt.GeneralizedCauchyDirectionSubsolver +``` + +### Symbols related to the Hessian approximation + +These symbols are used to evaluate the Hessian approximation at specific tangent vectors during the generalized Cauchy direction computation. + +```@docs +Manopt.hessian_value +Manopt.hessian_value_diag +``` + +### Symbols related to bound handling + +These are internal symbols used to manage and manipulate bound constraints during the GCP computation. + +```@docs +Manopt.init_updater! +Manopt.UnitVector +Manopt.to_coordinate_index +Manopt.AbstractSegmentHessianUpdater +Manopt.GenericSegmentHessianUpdater +Manopt.get_bounds_index +Manopt.get_stepsize_bound +Manopt.get_at_bound_index +Manopt.set_stepsize_bound! +Manopt.set_zero_at_index! +``` + +### Symbols related to specific Hessian approximations + +```@docs +Manopt.LimitedMemorySegmentHessianUpdater +Manopt.hessian_value_from_inner_products +Manopt.update_current_scale! +``` diff --git a/docs/src/solvers/quasi_Newton.md b/docs/src/solvers/quasi_Newton.md index 02c74365a5..2d608a34db 100644 --- a/docs/src/solvers/quasi_Newton.md +++ b/docs/src/solvers/quasi_Newton.md @@ -88,6 +88,7 @@ QuasiNewtonLimitedMemoryDirectionUpdate QuasiNewtonCautiousDirectionUpdate Manopt.initialize_update! QuasiNewtonPreconditioner +QuasiNewtonLimitedMemoryBoxDirectionUpdate ``` ## Hessian update rules diff --git a/ext/ManoptLineSearchesExt.jl b/ext/ManoptLineSearchesExt.jl index c58734e57c..5343e05289 100644 --- a/ext/ManoptLineSearchesExt.jl +++ b/ext/ManoptLineSearchesExt.jl @@ -5,6 +5,37 @@ import Manopt: LineSearchesStepsize using ManifoldsBase using LineSearches +Manopt.linesearches_get_max_alpha(ls::LineSearches.HagerZhang) = ls.alphamax +Manopt.linesearches_get_max_alpha(ls::LineSearches.MoreThuente) = ls.alphamax + +function Manopt.linesearches_set_max_alpha(ls::LineSearches.HagerZhang{T, Tm}, max_alpha::T) where {T, Tm} + return HagerZhang{T, Tm}( + delta = ls.delta, + sigma = ls.sigma, + alphamax = max_alpha, + rho = ls.rho, + epsilon = ls.epsilon, + gamma = ls.gamma, + linesearchmax = ls.linesearchmax, + psi3 = ls.psi3, + display = ls.display, + mayterminate = ls.mayterminate, + cache = ls.cache, + check_flatness = ls.check_flatness, + ) +end +function Manopt.linesearches_set_max_alpha(ls::LineSearches.MoreThuente{T}, max_alpha::T) where {T} + return MoreThuente{T}( + f_tol = ls.f_tol, + gtol = ls.gtol, + x_tol = ls.x_tol, + alphamin = ls.alphamin, + alphamax = max_alpha, + maxfev = ls.maxfev, + cache = ls.cache + ) +end + function (cs::Manopt.LineSearchesStepsize)( mp::AbstractManoptProblem, s::AbstractManoptSolverState, @@ -24,6 +55,19 @@ function (cs::Manopt.LineSearchesStepsize)( # guess initial alpha α0 = cs.initial_guess(mp, s, k, cs.last_stepsize, η; lf0 = fp, Dlf0 = dphi_0) + # handle stepsize limit + local ls # COV_EXCL_LINE + if :stop_when_stepsize_exceeds in keys(kwargs) + new_max_alpha = min( + kwargs[:stop_when_stepsize_exceeds], + Manopt.linesearches_get_max_alpha(cs.linesearch), + ) + ls = Manopt.linesearches_set_max_alpha(cs.linesearch, new_max_alpha) + α0 = min(α0, new_max_alpha) + else + ls = cs.linesearch + end + # perform actual line-search function ϕ(α) @@ -41,7 +85,7 @@ function (cs::Manopt.LineSearchesStepsize)( return Manopt.get_cost_and_differential(mp, p_tmp, Y_tmp; Y = X_tmp) end - α, fp = cs.linesearch(ϕ, dϕ, ϕdϕ, α0, fp, dphi_0) + α, fp = ls(ϕ, dϕ, ϕdϕ, α0, fp, dphi_0) cs.last_stepsize = α return α end diff --git a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl index b1e027d713..a502802751 100644 --- a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl +++ b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl @@ -2,7 +2,7 @@ module ManoptManifoldsExt using ManifoldsBase: exp, log, ParallelTransport, vector_transport_to using Manopt -using Manopt: _math, _tex, ManifoldDefaultsFactory, _produce_type +using Manopt: _math, _tex, ManifoldDefaultsFactory, _produce_type, get_stepsize_bound import Manopt: max_stepsize, get_gradient, diff --git a/ext/ManoptManifoldsExt/manifold_functions.jl b/ext/ManoptManifoldsExt/manifold_functions.jl index d063cbdf5b..44b5b2b317 100644 --- a/ext/ManoptManifoldsExt/manifold_functions.jl +++ b/ext/ManoptManifoldsExt/manifold_functions.jl @@ -8,6 +8,33 @@ Manopt.default_point_distance(::Euclidean, p) = norm(p, Inf) Manopt.default_vector_norm(::Euclidean, p, X) = norm(p, Inf) +""" + get_bounds_index(::Hyperrectangle) + +Get the bound indices of [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) `M`. They are the same as the indices of the +lower (or upper) bounds. +""" +Manopt.get_bounds_index(M::Hyperrectangle) = eachindex(M.lb) +""" + get_stepsize_bound(M::Hyperrectangle, x, d, i) + +Get the upper bound on moving in direction `d` from point `p` on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) `M`, +for the bound index `i`. There are three cases: + +1. If `d[i] > 0`, the formula reads `(M.ub[i] - p[i]) / d[i]`. +2. If `d[i] < 0`, the formula reads `(M.lb[i] - p[i]) / d[i]`. +3. If `d[i] == 0`, the result is `Inf`. +""" +function Manopt.get_stepsize_bound(M::Hyperrectangle, p, d, i) + if d[i] > 0 + return (M.ub[i] - p[i]) / d[i] + elseif d[i] < 0 + return (M.lb[i] - p[i]) / d[i] + else + return Inf + end +end + """ max_stepsize(M::TangentBundle, p) @@ -40,17 +67,16 @@ The default maximum stepsize for `Hyperrectangle` manifold with corners is maxim of distances from `p` to each boundary. """ function max_stepsize(M::Hyperrectangle, p) - ms = 0.0 - for i in eachindex(M.lb, p) - dist_ub = M.ub[i] - p[i] - if dist_ub > 0 - ms = max(ms, dist_ub) - end - dist_lb = p[i] - M.lb[i] - if dist_lb > 0 - ms = max(ms, dist_lb) - end - end + lb = M.lb + ub = M.ub + ms = zero(eltype(p)) + @inbounds @simd for i in eachindex(lb, ub, p) + dist_ub = ub[i] - p[i] + dist_lb = p[i] - lb[i] + cand_ub = ifelse(dist_ub > 0, dist_ub, zero(dist_ub)) + cand_lb = ifelse(dist_lb > 0, dist_lb, zero(dist_lb)) + ms = max(ms, max(cand_ub, cand_lb)) + end # COV_EXCL_LINE return ms end function max_stepsize(M::Hyperrectangle) @@ -60,6 +86,9 @@ function max_stepsize(M::Hyperrectangle) end return ms end +function max_stepsize(M::ProbabilitySimplex) + return 1.0 +end """ mid_point(M, p, q, x) @@ -168,3 +197,53 @@ function reflect!( X .*= -1 return retract!(M, q, p, X, retraction_method) end + + +""" + Manopt.set_zero_at_index!(M::Hyperrectangle, d, i) + +Set element of tangent vector `d` on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) +at index `i` to 0. +""" +function Manopt.set_zero_at_index!(M::Hyperrectangle, d, i) + d[i] = 0 + return d +end + +""" + Manopt.set_stepsize_bound!(M::Hyperrectangle, d_out, p, d, t_current::Real) + +For each element `i` in the tangent vector `d_out`, if the stepsize bound in direction `d` +for that element is less than `t_current`, set the element of `d_out` to the distance from +`p[i]` to the bound in the direction of `d[i]`. If the stepsize bound is non-positive, +set the element to 0. +""" +function Manopt.set_stepsize_bound!(M::Hyperrectangle, d_out, p, d, t_current::Real) + + for i in eachindex(d_out, d) + bound = get_stepsize_bound(M, p, d, i) + if bound > 0 + if bound < t_current && d_out[i] != 0 + d_out[i] = d[i] > 0 ? M.ub[i] - p[i] : M.lb[i] - p[i] + end + else + d_out[i] = 0 + end + end + return d_out +end + +""" + Manopt.has_anisotropic_max_stepsize(::Hyperrectangle) + +Returns `true`, as `Hyperrectangle` manifold requires generalized Cauchy point computation in solvers. +""" +Manopt.has_anisotropic_max_stepsize(::Hyperrectangle) = true + +""" + Manopt.get_at_bound_index(::Hyperrectangle, X, b) + +Extract the element of tangent vector `X` to a point on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) +at index `b`. +""" +Manopt.get_at_bound_index(::Hyperrectangle, X, b) = X[b] diff --git a/src/Manopt.jl b/src/Manopt.jl index 2abba31cc2..b43c83e71e 100644 --- a/src/Manopt.jl +++ b/src/Manopt.jl @@ -16,11 +16,11 @@ import LinearAlgebra: reflect! import ManifoldsBase: embed!, plot_slope, prepare_check_result, find_best_slope_window import ManifoldsBase: base_manifold, base_point, get_basis import ManifoldsBase: project, project! -import LinearAlgebra: cross +import LinearAlgebra: cross, LowerTriangular using ColorSchemes using ColorTypes using Colors -using DataStructures: CircularBuffer, capacity, length, push!, size, isfull +using DataStructures: CircularBuffer, capacity, length, push!, size, isfull, heapify!, heappop! using Dates: Millisecond, Nanosecond, Period, canonicalize, value using Glossaries using LinearAlgebra: @@ -142,6 +142,7 @@ using ManifoldsBase: set_component!, shortest_geodesic, shortest_geodesic!, + submanifold_component, submanifold_components, vector_transport_to, vector_transport_to!, @@ -427,7 +428,7 @@ export CondensedKKTVectorField, CondensedKKTVectorFieldJacobian export SymmetricLinearSystemObjective export ProximalGradientNonsmoothCost, ProximalGradientNonsmoothSubgradient -export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate +export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate, QuasiNewtonLimitedMemoryBoxDirectionUpdate export QuasiNewtonMatrixDirectionUpdate export QuasiNewtonPreconditioner export QuasiNewtonCautiousDirectionUpdate, @@ -550,6 +551,7 @@ export get_stepsize, get_initial_stepsize, get_last_stepsize export InteriorPointCentralityCondition export DomainBackTracking, DomainBackTrackingStepsize, NullStepBackTrackingStepsize export ProximalGradientMethodBacktracking +export HagerZhangLinesearch # # Stopping Criteria export StoppingCriterion, StoppingCriterionSet @@ -572,6 +574,7 @@ export StopAfter, StopWhenGradientChangeLess, StopWhenGradientMappingNormLess, StopWhenGradientNormLess, + StopWhenProjectedNegativeGradientNormLess, StopWhenFirstOrderProgress, StopWhenIterateNaN, StopWhenKKTResidualLess, @@ -583,6 +586,7 @@ export StopAfter, StopWhenPopulationDiverges, StopWhenPopulationStronglyConcentrated, StopWhenProjectedGradientStationary, + StopWhenRelativeAPosterioriCostChangeLessOrEqual, StopWhenRelativeResidualLess, StopWhenRepeated, StopWhenSmallerOrEqual, diff --git a/src/helpers/LineSearchesTypes.jl b/src/helpers/LineSearchesTypes.jl index dc01c98e38..17b3c9c442 100644 --- a/src/helpers/LineSearchesTypes.jl +++ b/src/helpers/LineSearchesTypes.jl @@ -12,7 +12,8 @@ Wrapper for line searches available in the `LineSearches.jl` library. Wrap `linesearch` (for example [`HagerZhang`](https://julianlsolvers.github.io/LineSearches.jl/latest/reference/linesearch.html#LineSearches.HagerZhang) or [`MoreThuente`](https://julianlsolvers.github.io/LineSearches.jl/latest/reference/linesearch.html#LineSearches.MoreThuente)). -The initial step selection from Linesearches.jl is not yet supported and the value 1.0 is used. +The initial step selection from Linesearches.jl is not yet supported and `initial_guess` is +always used (by default [`ConstantInitialGuess`](@ref)). # Keyword Arguments @@ -54,6 +55,26 @@ function LineSearchesStepsize( linesearch, initial_guess, retraction_method, vector_transport_method, last_stepsize ) end + +""" + linesearches_get_max_alpha(ls) + +Get the maximum step size for `LineSearches.jl` line search `ls`. +""" +linesearches_get_max_alpha(ls) + +function linesearches_get_max_alpha end + +""" + linesearches_set_max_alpha(ls, max_alpha::Real) + +Set the maximum step size for `LineSearches.jl` line search `ls` to `max_alpha`. +Return a new line search object with the updated maximum step size. +""" +linesearches_set_max_alpha(ls, max_alpha::Real) + +function linesearches_set_max_alpha end + function Base.show(io::IO, cs::LineSearchesStepsize) return print( io, diff --git a/src/plans/box_plan.jl b/src/plans/box_plan.jl new file mode 100644 index 0000000000..8a03898135 --- /dev/null +++ b/src/plans/box_plan.jl @@ -0,0 +1,863 @@ +""" + has_anisotropic_max_stepsize(M::AbstractManifold) + +Return `true` if `M` has `max_stepsize` that depends on the direction. +For example, if `M` is a `Hyperrectangle`-like manifold with corners, or a product of it +with a standard manifold. Otherwise return `false`. +""" +has_anisotropic_max_stepsize(::AbstractManifold) = false +has_anisotropic_max_stepsize(M::ProductManifold) = any(has_anisotropic_max_stepsize, M.manifolds) + +@doc raw""" + LimitedMemoryHessianApproximation <: AbstractQuasiNewtonDirectionUpdate + +An approximation of Hessian of a scalar function of the form ``B_0 = θ I``, +``B_{k+1} = B_k - W_k M_k W_k^{\mathrm{T}}``, +where ``θ > 0`` is an initial scaling guess. +Matrix ``M_k = \left(\begin{smallmatrix}M₁₁ & M₂₁^{\mathrm{T}}\\ M₂₁ & M₂₂\end{smallmatrix}\right)`` +is stored using its blocks. +Blocks ``W_k`` are (implicitly) composed from `memory_y` and `memory_s` stored in `qn_du` +of type [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref). + +Initial scale ``θ`` is stored in the field `initial_scale` but if the memory isn't empty, +the current scale is set to squared norm of $s_k$ divided by inner product of ``s_k`` and ``y_k`` +where ``k`` is the oldest index for which the denominator is not equal to 0. + +`last_gcd_result` stores the result of the last generalized Cauchy direction search. + +See [ByrdNocedalSchnabel:1994](@cite) for details. +""" +mutable struct QuasiNewtonLimitedMemoryBoxDirectionUpdate{ + TDU <: QuasiNewtonLimitedMemoryDirectionUpdate, + F <: Real, + T_HM <: AbstractMatrix, + V <: AbstractVector, + } <: AbstractQuasiNewtonDirectionUpdate + # this approximates inverse Hessian + qn_du::TDU + + # fields for approximating the Hessian + current_scale::F + M_11::T_HM + M_21::T_HM + M_22::T_HM + # buffer for calculating W_k blocks + buffer_inner_Sk_X::V + buffer_inner_Sk_Y::V + buffer_inner_Yk_X::V + buffer_inner_Yk_Y::V + last_gcd_result::Symbol + last_gcd_stepsize::F +end + +function status_summary(d::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + s = "limited memory direction update with support for box constraints; " + s *= "internal direction update status: $(status_summary(d.qn_du))" + return s +end + +function get_parameter(d::QuasiNewtonLimitedMemoryBoxDirectionUpdate, ::Val{:max_stepsize}) + if d.last_gcd_result === :found_limited + return d.last_gcd_stepsize + else + return Inf + end +end + +function QuasiNewtonLimitedMemoryBoxDirectionUpdate( + qn_du::QuasiNewtonLimitedMemoryDirectionUpdate{<:AbstractQuasiNewtonUpdateRule, T, F} + ) where {T, F <: Real} + memory_size = capacity(qn_du.memory_s) + M_11 = zeros(F, memory_size, memory_size) + M_21 = zeros(F, memory_size, memory_size) + M_22 = zeros(F, memory_size, memory_size) + buffer_inner_Sk_X = zeros(F, memory_size) + buffer_inner_Sk_Y = zeros(F, memory_size) + buffer_inner_Yk_X = zeros(F, memory_size) + buffer_inner_Yk_Y = zeros(F, memory_size) + return QuasiNewtonLimitedMemoryBoxDirectionUpdate{ + typeof(qn_du), F, typeof(M_11), typeof(buffer_inner_Sk_X), + }( + qn_du, + qn_du.initial_scale, + M_11, + M_21, + M_22, + buffer_inner_Sk_X, + buffer_inner_Sk_Y, + buffer_inner_Yk_X, + buffer_inner_Yk_Y, + :not_searched, + NaN, + ) +end + +function initialize_update!(ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + initialize_update!(ha.qn_du) + ha.last_gcd_result = :not_searched + return ha +end + +function (d::QuasiNewtonLimitedMemoryBoxDirectionUpdate)( + mp::AbstractManoptProblem, st + ) + r = zero_vector(get_manifold(mp), get_iterate(st)) + return d(r, mp, st) +end +function (d::QuasiNewtonLimitedMemoryBoxDirectionUpdate)( + r, mp::AbstractManoptProblem, st + ) + d.qn_du(r, mp, st) + M = get_manifold(mp) + p = get_iterate(st) + X = get_gradient(st) + gcd = GeneralizedCauchyDirectionSubsolver(M, p, d) + d.last_gcd_result, d.last_gcd_stepsize = find_generalized_cauchy_direction!(M, gcd, r, p, r, X) + return r +end + +get_update_vector_transport(u::QuasiNewtonLimitedMemoryBoxDirectionUpdate) = get_update_vector_transport(u.qn_du) + +function get_at_bound_index(M::ProductManifold, X, b::Tuple{Int, Any}) + return get_at_bound_index(M.manifolds[b[1]], submanifold_component(M, X, b[1]), b[2]) +end + +@doc raw""" + hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X) + +Compute ``⟨X, B X⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`. +""" +function hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X) + m = length(gh.qn_du.memory_s) + num_nonzero_rho = count(!iszero, gh.qn_du.ρ) + + normX_sqr = norm(M, p, X)^2 + + if m == 0 || num_nonzero_rho == 0 + return gh.qn_du.initial_scale \ normX_sqr + end + + ii = 1 + for i in 1:m + iszero(gh.qn_du.ρ[i]) && continue + gh.buffer_inner_Yk_X[ii] = inner(M, p, gh.qn_du.memory_y[i], X) + gh.buffer_inner_Sk_X[ii] = gh.current_scale * inner(M, p, gh.qn_du.memory_s[i], X) + + ii += 1 + end + buffer_inner_Yk = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho) + buffer_inner_Sk = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho) + + return hessian_value_from_inner_products(gh, normX_sqr, buffer_inner_Yk, buffer_inner_Sk, buffer_inner_Yk, buffer_inner_Sk) +end + +@doc raw""" + hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector) + +Compute ``⟨X, B X⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`, and `X` is the +[`UnitVector`](@ref). +""" +function hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector) + b = X.index + m = length(gh.qn_du.memory_s) + num_nonzero_rho = count(!iszero, gh.qn_du.ρ) + + if m == 0 || num_nonzero_rho == 0 + return inv(gh.qn_du.initial_scale) + end + + ii = 1 + for i in 1:m + iszero(gh.qn_du.ρ[i]) && continue + gh.buffer_inner_Yk_X[ii] = get_at_bound_index(M, gh.qn_du.memory_y[i], b) + gh.buffer_inner_Sk_X[ii] = gh.current_scale * get_at_bound_index(M, gh.qn_du.memory_s[i], b) + + ii += 1 + end + buffer_inner_Yk = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho) + buffer_inner_Sk = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho) + + return hessian_value_from_inner_products(gh, one(eltype(gh.qn_du.ρ)), buffer_inner_Yk, buffer_inner_Sk, buffer_inner_Yk, buffer_inner_Sk) +end + +@doc raw""" + hessian_value(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector, Y) + +Compute ``⟨X, B Y⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`, where `X` is the +[`UnitVector`](@ref). +""" +function hessian_value(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector, Y) + b = X.index + + m = length(gh.qn_du.memory_s) + num_nonzero_rho = count(!iszero, gh.qn_du.ρ) + + Yb = get_at_bound_index(M, Y, b) + if m == 0 || num_nonzero_rho == 0 + return gh.qn_du.initial_scale * Yb + end + + ii = 1 + for i in 1:m + iszero(gh.qn_du.ρ[i]) && continue + gh.buffer_inner_Yk_X[ii] = get_at_bound_index(M, gh.qn_du.memory_y[i], b) + gh.buffer_inner_Sk_X[ii] = gh.current_scale * get_at_bound_index(M, gh.qn_du.memory_s[i], b) + + gh.buffer_inner_Yk_Y[ii] = inner(M, p, gh.qn_du.memory_y[i], Y) + gh.buffer_inner_Sk_Y[ii] = gh.current_scale * inner(M, p, gh.qn_du.memory_s[i], Y) + ii += 1 + end + buffer_inner_Yk_X = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho) + buffer_inner_Yk_Y = view(gh.buffer_inner_Yk_Y, 1:num_nonzero_rho) + buffer_inner_Sk_X = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho) + buffer_inner_Sk_Y = view(gh.buffer_inner_Sk_Y, 1:num_nonzero_rho) + + return hessian_value_from_inner_products(gh, Yb, buffer_inner_Yk_X, buffer_inner_Sk_X, buffer_inner_Yk_Y, buffer_inner_Sk_Y) +end + +@doc raw""" + update_current_scale!(M::AbstractManifold, p, gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + +Refresh the scaling factor and blockwise Hessian approximation stored in `gh` using the +nonzero curvature pairs currently in memory. + +- Identifies the most recent index with nonzero ``ρ_i`` to scale the initial Hessian guess + by ``ρ_i‖y_i‖^2 / θ``. +- Builds ``L_k`` and ``S_k^\top S_k`` from the stored ``(s_i, y_i)`` pairs and updates the + block matrices ``M₁₁``, ``M₂₁``, and ``M₂₂`` via the blockwise inverse formula. +- If all ``ρ_i`` vanish, resets `current_scale` to the inverse of `initial_scale` and + clears the block matrices. + +Returns the mutated `gh`. +""" +function update_current_scale!(M::AbstractManifold, p, gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + m = length(gh.qn_du.memory_s) + last_safe_index = -1 + for i in eachindex(gh.qn_du.ρ) + if abs(gh.qn_du.ρ[i]) > 0 + last_safe_index = i + end + end + + if (last_safe_index == -1) + # All memory yield zero inner products + gh.current_scale = inv(gh.qn_du.initial_scale) + gh.M_11 = fill(0.0, 0, 0) + gh.M_21 = fill(0.0, 0, 0) + gh.M_22 = fill(0.0, 0, 0) + return gh + end + + invA = Diagonal([-ri for ri in gh.qn_du.ρ if !iszero(ri)]) + num_nonzero_rho = count(!iszero, gh.qn_du.ρ) + + Lk = LowerTriangular(zeros(num_nonzero_rho, num_nonzero_rho)) + + # total scaling factor for the initial Hessian + # written this way to avoid floating point overflow (when ynorm is finite but ynorm^2 is Inf) + # see CUTEst EXPQUAD problem for an example + ynorm = norm(M, p, gh.qn_du.memory_y[last_safe_index]) + gh.current_scale = ((gh.qn_du.ρ[last_safe_index] * ynorm) * ynorm) / gh.qn_du.initial_scale + + tsksk = Symmetric(zeros(num_nonzero_rho, num_nonzero_rho)) + ii = 1 + # fill Dk and Lk + for i in 1:m + iszero(gh.qn_du.ρ[i]) && continue + jj = 1 + for j in 1:m + iszero(gh.qn_du.ρ[j]) && continue + if jj < ii + Lk[ii, jj] = inner(M, p, gh.qn_du.memory_s[i], gh.qn_du.memory_y[j]) + end + if ii <= jj + tsksk.data[ii, jj] = inner(M, p, gh.qn_du.memory_s[i], gh.qn_du.memory_s[j]) + end + jj += 1 + end + ii += 1 + end + tsksk.data .*= gh.current_scale + + # matrix inversion using the blockwise formula for speed + # Schur complement of -Dk is the only non-diagonal matrix we actually need to inverse in this step + W1 = Lk * invA + W2 = W1 * Lk' + gh.M_22 = inv(Symmetric(tsksk - W2)) + W3 = gh.M_22 * W1 + W4 = W1' * W3 + + gh.M_11 = invA + W4 + gh.M_21 = -W3 + + return gh +end + +@doc raw""" + hessian_value_from_inner_products(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, iss::Real, cy1, cs1, cy2, cs2) + +Evaluate the quadratic form defined by the current blockwise Hessian approximation stored in +`gh`, given precomputed coordinate vectors. + +Arguments: +- `iss`: inner product of original vectors. +- `cy1`, `cy2`: coordinates of ``y``-like vectors in the ``Y_k`` basis. +- `cs1`, `cs2`: coordinates of ``s``-like vectors in the scaled ``S_k`` basis. + +The result is ``θ·iss - cy₁ᵀ M₁₁ cy₂ - 2·cs₁ᵀ M₂₁ cy₂ - cs₁ᵀ M₂₂ cs₂`` using the blocks +``M₁₁``, ``M₂₁``, ``M₂₂`` stored in `gh` and the current scale ``θ``. Returns the scalar value. +""" +function hessian_value_from_inner_products(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, iss::Real, cy1, cs1, cy2, cs2) + result = gh.current_scale * iss + if length(cy1) == 0 + return result + end + result -= dot(cy1, gh.M_11, cy2) + result -= 2 * dot(cs1, gh.M_21, cy2) + result -= dot(cs1, gh.M_22, cs2) + + return result +end + + +@doc raw""" + update_hessian!(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, p) + +Update Hessian approximation `gh` by moving it to point `p` and updating the stored `s` and +`y` vectors. +""" +function update_hessian!( + gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, + mp::AbstractManoptProblem, + st::AbstractManoptSolverState, + p_old, + k::Int, + ) + (capacity(gh.qn_du.memory_s) == 0) && return gh + update_hessian!(gh.qn_du, mp, st, p_old, k) + update_current_scale!(get_manifold(mp), get_iterate(st), gh) + return gh +end + + +""" + abstract type AbstractSegmentHessianUpdater end + +Abstract type for methods that calculate f' and f'' in the GCD calculation in subsequent +line segments in [`GeneralizedCauchyDirectionSubsolver`](@ref). +""" +abstract type AbstractSegmentHessianUpdater end + +""" + init_updater!(::AbstractManifold, hessian_segment_updater::AbstractSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate) + +Method for initialization of `AbstractSegmentHessianUpdater` `hessian_segment_updater` just before the loop +that examines subsequent intervals for GCD. +""" +init_updater!(::AbstractManifold, hessian_segment_updater::AbstractSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate) + +""" + struct GenericSegmentHessianUpdater <: AbstractSegmentHessianUpdater end + +Generic f' and f'' calculation that only relies on `hessian_value` but is relatively slow for +high-dimensional domains. +""" +struct GenericSegmentHessianUpdater{TX} <: AbstractSegmentHessianUpdater + d_z::TX + d_tmp::TX +end + +function get_default_hessian_segment_updater(M::AbstractManifold, p, ::AbstractQuasiNewtonDirectionUpdate) + return GenericSegmentHessianUpdater(zero_vector(M, p), zero_vector(M, p)) +end + +function init_updater!(M::AbstractManifold, hessian_segment_updater::GenericSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate) + zero_vector!(M, hessian_segment_updater.d_z, p) + copyto!(M, hessian_segment_updater.d_tmp, d) + return hessian_segment_updater +end + +@doc raw""" + (upd::GenericSegmentHessianUpdater)(M::AbstractManifold, p, t::Real, dt::Real, b, db, ha::AbstractQuasiNewtonDirectionUpdate) + +Calculate Hessian values ``⟨e_b, B d_z⟩`` and ``⟨e_b, B d_tmp⟩`` for the generalized Cauchy +point line search using the generic approach via `hessian_value` with [`UnitVector`](@ref). +``d_z`` start with 0 and is updated in-place by adding `dt * d` to it. +""" +function (upd::GenericSegmentHessianUpdater)(M::AbstractManifold, p, t::Real, dt::Real, b, db, ha) + upd.d_z .+= dt .* upd.d_tmp + hv_eb_dz = hessian_value(ha, M, p, UnitVector(b), upd.d_z) + hv_eb_d = hessian_value(ha, M, p, UnitVector(b), upd.d_tmp) + + set_zero_at_index!(M, upd.d_tmp, b) + + return hv_eb_dz, hv_eb_d +end + +""" + struct LimitedMemorySegmentHessianUpdater{TV <: AbstractVector} <: AbstractSegmentHessianUpdater + +Hessian value calculation for generalized Cauchy direction line segments that is optimized for +[`QuasiNewtonLimitedMemoryBoxDirectionUpdate`](@ref). It relies on a specific Hessian structure. +""" +struct LimitedMemorySegmentHessianUpdater{TV <: AbstractVector} <: AbstractSegmentHessianUpdater + p_s::TV + p_y::TV + c_s::TV + c_y::TV +end + +function get_default_hessian_segment_updater(::AbstractManifold, p, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + return LimitedMemorySegmentHessianUpdater(similar(ha.qn_du.ρ), similar(ha.qn_du.ρ), similar(ha.qn_du.ρ), similar(ha.qn_du.ρ)) +end + +function init_updater!(M::AbstractManifold, hessian_segment_updater::LimitedMemorySegmentHessianUpdater, p, d, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + fill!(hessian_segment_updater.c_s, 0) + fill!(hessian_segment_updater.c_y, 0) + ii = 1 + for i in eachindex(ha.qn_du.ρ) + if iszero(ha.qn_du.ρ[i]) + continue + end + + hessian_segment_updater.p_s[ii] = ha.current_scale * inner(M, p, ha.qn_du.memory_s[i], d) + hessian_segment_updater.p_y[ii] = inner(M, p, ha.qn_du.memory_y[i], d) + ii += 1 + end + return hessian_segment_updater +end + +@doc raw""" + (hessian_segment_updater::LimitedMemorySegmentHessianUpdater)( + M::AbstractManifold, p, + t::Real, dt::Real, b, db, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate + ) + +Calculate Hessian values ``⟨e_b, B d_z⟩`` and ``⟨e_b, B d⟩`` for the generalized Cauchy +point line search using the limited-memory block Hessian stored in `ha`. +``d_z`` start with 0 and is updated in-place by adding `dt * d` to it. + +## Arguments: + +- `M`: manifold. +- `p`: current iterate. +- `t`: current step length from `p`. +- `dt`: step length increment from the last step. +- `b`: bound index of the current segment. +- `db`: search direction component at the bound index `b`. + +The updater reuses cached coordinate projections in `hessian_segment_updater` to cheaply +evaluate Hessian quadratic forms via `hessian_value_from_inner_products`. +""" +function (hessian_segment_updater::LimitedMemorySegmentHessianUpdater)( + M::AbstractManifold, p, + t::Real, dt::Real, b, db, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate + ) + + m = length(ha.qn_du.memory_s) + num_nonzero_rho = count(!iszero, ha.qn_du.ρ) + + ii = 1 + for i in 1:m + iszero(ha.qn_du.ρ[i]) && continue + # setting _X to w_b from the paper + ha.buffer_inner_Yk_X[ii] = get_at_bound_index(M, ha.qn_du.memory_y[i], b) + ha.buffer_inner_Sk_X[ii] = ha.current_scale * get_at_bound_index(M, ha.qn_du.memory_s[i], b) + + ii += 1 + end + + buffer_inner_Yk_eb = view(ha.buffer_inner_Yk_X, 1:num_nonzero_rho) + buffer_inner_Sk_eb = view(ha.buffer_inner_Sk_X, 1:num_nonzero_rho) + + buffer_inner_cy = view(hessian_segment_updater.c_y, 1:num_nonzero_rho) + buffer_inner_cs = view(hessian_segment_updater.c_s, 1:num_nonzero_rho) + buffer_inner_py = view(hessian_segment_updater.p_y, 1:num_nonzero_rho) + buffer_inner_ps = view(hessian_segment_updater.p_s, 1:num_nonzero_rho) + + buffer_inner_cy .+= dt .* buffer_inner_py + buffer_inner_cs .+= dt .* buffer_inner_ps + + eb_B_z = hessian_value_from_inner_products(ha, t * db, buffer_inner_Yk_eb, buffer_inner_Sk_eb, buffer_inner_cy, buffer_inner_cs) + + eb_B_d = hessian_value_from_inner_products(ha, db, buffer_inner_Yk_eb, buffer_inner_Sk_eb, buffer_inner_py, buffer_inner_ps) + + buffer_inner_py .-= db .* buffer_inner_Yk_eb + buffer_inner_ps .-= db .* buffer_inner_Sk_eb + + return eb_B_z, eb_B_d +end + +struct ProductIndex{T <: Tuple} + ranges::T +end + +Base.iterate(itr::ProductIndex) = _iterate(itr.ranges, 1, nothing) +Base.iterate(itr::ProductIndex, state) = _iterate(itr.ranges, state...) + +function _iterate(ranges, i, st) + i > length(ranges) && return nothing + if st === nothing + it = iterate(ranges[i]) + it === nothing && return _iterate(ranges, i + 1, nothing) + (j, st2) = it + return ((i, j), (i, st2)) + else + it = iterate(ranges[i], st) + if it === nothing + return _iterate(ranges, i + 1, nothing) + else + (j, st2) = it + return ((i, j), (i, st2)) + end + end +end + + +""" + to_coordinate_index(M::ProductManifold, b::UnitVector, B::AbstractBasis) + +Get the index of coordinate equal to 1 of [`UnitVector`](@ref) `b` with respect to +`AbstractBasis` `B`. +""" +to_coordinate_index(::AbstractManifold, b::UnitVector{Int}, ::AbstractBasis) = b.index +""" + to_coordinate_index(M::ProductManifold, b::UnitVector, B::AbstractBasis) + +Get the index of coordinate equal to 1 of [`UnitVector`](@ref) `b` with respect to +`AbstractBasis` `B`. +""" +function to_coordinate_index(M::ProductManifold, b::UnitVector{Tuple{Int, Int}}, B::AbstractBasis) + i, j = b.index + offset = sum(k -> number_of_coordinates(M.manifolds[k], B), 1:(i - 1); init = 0) + return offset + j +end + +Base.length(itr::ProductIndex) = sum(length, itr.ranges) + + +""" + get_bounds_index(::AbstractManifold) + +Get the bound indices of manifold `M`. Standard manifolds don't have bounds, so +`Base.OneTo(1)` is returned. +""" +get_bounds_index(M::AbstractManifold) = Base.OneTo(0) +function get_bounds_index(M::ProductManifold) + ranges = map(get_bounds_index, M.manifolds) + iter = ProductIndex(ranges) + return iter +end + +""" + get_stepsize_bound(M::AbstractManifold, x, d, i) + +Get the upper bound on moving in direction `d` from point `p` on manifold `M`, for the +bound index `i`. +""" +get_stepsize_bound(M::AbstractManifold, p, d, i) +function get_stepsize_bound(M::ProductManifold, p, d, i::Tuple{Int, Any}) + i1, i2 = i + return get_stepsize_bound(M.manifolds[i1], submanifold_component(M, p, i1), submanifold_component(M, d, i1), i2) +end + +""" + set_zero_at_index!(M::ProductManifold, d, i::Tuple{Int,Any}) + +Set the element of the `i[1]`th component of `d` at bound index `i[2]` to zero. +""" +function set_zero_at_index!(M::ProductManifold, d, i::Tuple{Int, Any}) + i1, i2 = i + set_zero_at_index!(M.manifolds[i1], submanifold_component(M, d, i1), i2) + return d +end + +""" + set_stepsize_bound!(M::AbstractManifold, d_out, p, d, t_current::Real) + +For each component at index `i` in the tangent vector `d_out`, if the stepsize bound in +direction `d` for that component is less than `t_current`, set the element of `d_out` to +the distance from `p[i]` to the bound in the direction of `d[i]`. If the stepsize bound is +non-positive, set the element to 0. + +By default it does not modify `d_out` because most manifolds don't have direction-specific +stepsize bounds, and general anisotropic bounds are handled differently. +""" +function set_stepsize_bound!(::AbstractManifold, d_out, p, d, t_current::Real) + return d_out +end + +function set_stepsize_bound!(M::ProductManifold, d_out, p, d, t_current::Real) + map( + (N, d_out_c, p_c, d_c) -> set_stepsize_bound!(N, d_out_c, p_c, d_c, t_current), + M.manifolds, + submanifold_components(M, d_out), + submanifold_components(M, p), + submanifold_components(M, d), + ) + return d_out +end + +@doc raw""" + GeneralizedCauchyDirectionSubsolver{TM <: AbstractManifold, TP, T_HA <: AbstractQuasiNewtonDirectionUpdate, TFU <: AbstractSegmentHessianUpdater} + +Helper container for generalized Cauchy direction search. Stores the manifold `M`, cached +original descent direction (`d_original`), the quasi-Newton direction update `ha`, and the +`hessian_segment_updater`, which computes certain values of the Hessian while advancing segments. +Instances are reused across segments during [`find_generalized_cauchy_direction!`](@ref) to +avoid allocations. +""" +struct GeneralizedCauchyDirectionSubsolver{ + TX, + T_HA <: AbstractQuasiNewtonDirectionUpdate, TFU <: AbstractSegmentHessianUpdater, TFT <: Tuple{<:Real, Any}, TBI, + TO <: Base.Order.Ordering, + } + d_original::TX + ha::T_HA + hessian_segment_updater::TFU + F_list::Vector{TFT} + bounds_indices::TBI + ordering::TO +end + +function GeneralizedCauchyDirectionSubsolver( + M::AbstractManifold, p, ha::AbstractQuasiNewtonDirectionUpdate; + hessian_segment_updater::AbstractSegmentHessianUpdater = get_default_hessian_segment_updater(M, p, ha) + ) + bounds_indices = get_bounds_index(M) + TInd = eltype(bounds_indices) + TF = number_eltype(p) + F_list = Tuple{TF, TInd}[] + sizehint!(F_list, length(bounds_indices) + 1) + ordering = Base.By(first) + return GeneralizedCauchyDirectionSubsolver( + zero_vector(M, p), ha, + hessian_segment_updater, F_list, bounds_indices, ordering + ) +end + +function collect_isotropic_limits!(::AbstractManifold, ::Vector{<:Tuple{TF, Any}}, p, d)::Tuple{Bool, TF} where {TF <: Real} + return false, convert(TF, Inf) +end + +function collect_isotropic_limits!(M::ProductManifold, F_list::Vector{<:Tuple{TF, Any}}, p, d)::Tuple{Bool, TF} where {TF <: Real} + has_finite_limit = false + smallest_positive_limit = Inf + map(M.manifolds, submanifold_components(M, p), submanifold_components(M, d)) do Mi, p_i, d_i + if !has_anisotropic_max_stepsize(Mi) + max_step = Manopt.max_stepsize(Mi, p_i) + if isfinite(max_step) + tms = max_step / norm(Mi, p_i, d_i) + push!(F_list, (tms, -1)) + has_finite_limit = true + if tms < smallest_positive_limit + smallest_positive_limit = tms + end + end + end + end + return has_finite_limit, smallest_positive_limit +end + +""" + find_generalized_cauchy_direction!( + M::AbstractManifold, + gcd::GeneralizedCauchyDirectionSubsolver, d_out, p, d, X + ) + +Find generalized Cauchy direction looking from point `p` on manifold `M` in direction `d` +and save it to `d_out`. Gradient of the objective at `p` is `X`. + +The function returns a pair (status, max_stepsize) where `status` is a symbol describing +the result of the search, and `max_stepsize` is the maximum stepsize that can be taken in +the direction `d_out`. + +The `status` can be one of the following: +* `:found_limited` if the point was found and we can perform a step of length at most 1 + in direction `d_out` afterwards, +* `:found_unlimited` if the point was found and we can perform a step of length at most + `max_stepsize(M, p)` in direction `d_out` afterwards, +* `:not_found` if the search cannot be performed in direction `d`. +""" +function find_generalized_cauchy_direction!( + M::AbstractManifold, + gcd::GeneralizedCauchyDirectionSubsolver{ + <:Any, <:AbstractQuasiNewtonDirectionUpdate, + <:AbstractSegmentHessianUpdater, <:Tuple{TF, Any}, + }, + d_out, p, d, X + ) where {TF <: Real} + copyto!(M, gcd.d_original, d) + copyto!(M, d_out, d) + + ordering = gcd.ordering + F_list = gcd.F_list + empty!(F_list) + + bounds_indices = gcd.bounds_indices + + # isotropic limits + has_finite_limit, smallest_positive_limit = collect_isotropic_limits!(M, F_list, p, d) + # anisotropic limits + for i in bounds_indices + sbi = get_stepsize_bound(M, p, d, i)::TF + + if sbi > 0 + push!(F_list, (sbi, i)) + if sbi < smallest_positive_limit + smallest_positive_limit = sbi + end + end + has_finite_limit |= isfinite(sbi) + end + + # In this case we can't move in the direction `d` at all, though it's usually not + # a problem relevant to the end user because it can be handled by step_solver! that + # uses the GCD subsolver. + if isempty(F_list) + return (:not_found, NaN) + end + heapify!(F_list, ordering) + + f_prime = inner(M, p, X, d) + f_double_prime = hessian_value_diag(gcd.ha, M, p, d) + + if iszero(f_prime) || iszero(f_double_prime) + return (:not_found, NaN) + end + + dt_min = -f_prime / f_double_prime + t_old = 0.0 + + t_current, b = heappop!(F_list, ordering) + dt = t_current - t_old + + init_updater!(M, gcd.hessian_segment_updater, p, d, gcd.ha) + # b can be -1 if it corresponds to the max stepsize limit on the manifold part + while dt_min > dt && b != -1 + db = get_at_bound_index(M, d, b)::TF + gb = get_at_bound_index(M, X, b)::TF + + hv_eb_dz, hv_eb_d = gcd.hessian_segment_updater(M, p, t_current, dt, b, db, gcd.ha)::Tuple{TF, TF} + + f_prime += dt * f_double_prime - db * (gb + hv_eb_dz) + f_double_prime += (2 * -db * hv_eb_d) + db^2 * hessian_value_diag(gcd.ha, M, p, UnitVector(b)) + + t_old = t_current + + # If f_prime is 0, we've found the local minimizer (GCD) + if iszero(f_prime) || iszero(f_double_prime) + # It means that GCD is at the beginning of the t_current, so we want to set dt_min to 0 (stay in the point) + dt_min = 0.0 + break + end + + dt_min = -f_prime / f_double_prime + isempty(F_list) && break + + t_current, b = heappop!(F_list, ordering) + dt = t_current - t_old + end + + dt_min = max(dt_min, 0.0) + t_old = t_old + dt_min + d_out .*= t_old + # by construction, there is no bound achievable before stepsize 1.0 in direction d_out + # there first bound after that is achieved at smallest_positive_limit / t_old + max_feasible_stepsize = max(1.0, smallest_positive_limit / t_old) + + set_stepsize_bound!(M, d_out, p, gcd.d_original, t_old) + if has_finite_limit + return (:found_limited, max_feasible_stepsize) + else + return (:found_unlimited, Inf) + end +end + +""" + struct MaxStepsizeInDirectionSubsolver end + +Helper container for finding the maximum stepsize in a direction. Stores the manifold `M`, +container for the list of bounds `F_list`, and the bound indices. + +## Constructor + + MaxStepsizeInDirectionSubsolver(M::AbstractManifold, p) + +Initialize the `MaxStepsizeInDirectionSubsolver` for manifold `M` and point `p`. The `F_list` +is initialized to be empty and will be populated during the search for the maximum stepsize +in a direction. Floating point type of the elements bounds in `F_list` is determined by the +number type of `p`. + +The `MaxStepsizeInDirectionSubsolver` can be reused for multiple different points and +directions on the same manifold, but it is not thread-safe. +""" +struct MaxStepsizeInDirectionSubsolver{TFT <: Tuple{<:Real, Any}, TBI} + F_list::Vector{TFT} + bounds_indices::TBI +end +function MaxStepsizeInDirectionSubsolver(M::AbstractManifold, p) + bounds_indices = get_bounds_index(M) + TInd = eltype(bounds_indices) + TF = number_eltype(p) + F_list = Tuple{TF, TInd}[] + sizehint!(F_list, length(bounds_indices) + 1) + return MaxStepsizeInDirectionSubsolver{Tuple{TF, TInd}, typeof(bounds_indices)}(F_list, bounds_indices) +end + +""" + find_max_stepsize_in_direction(M::AbstractManifold, gcd::MaxStepsizeInDirectionSubsolver, p, d) + +Find the maximum stepsize that can be performed from point `p` in direction `d`. + +The function returns a pair (status, max_stepsize) where `status` is a symbol describing +the result of the search, and `max_stepsize` is the maximum stepsize that can be taken in +the direction `d_out`. + +The `status` can be one of the following: +* `:found_limited` if the point was found and we can perform a step of length at most 1 + in direction `d_out` afterwards, +* `:found_unlimited` if the point was found and we can perform a step of length at most + `max_stepsize(M, p)` in direction `d_out` afterwards, +* `:not_found` if the search cannot be performed in direction `d`. +""" +function find_max_stepsize_in_direction( + M::AbstractManifold, + sdf::MaxStepsizeInDirectionSubsolver{<:Tuple{TF, Any}}, + p, d + ) where {TF <: Real} + + F_list = sdf.F_list + empty!(F_list) + bounds_indices = sdf.bounds_indices + + # isotropic limits + has_finite_limit, smallest_positive_limit = collect_isotropic_limits!(M, F_list, p, d) + # anisotropic limits + for i in bounds_indices + sbi = get_stepsize_bound(M, p, d, i)::TF + + if sbi > 0 + push!(F_list, (sbi, i)) + if sbi < smallest_positive_limit + smallest_positive_limit = sbi + end + end + has_finite_limit |= isfinite(sbi) + end + + if isempty(F_list) + return (:not_found, NaN) + end + if has_finite_limit + return (:found_limited, smallest_positive_limit) + else + return (:found_unlimited, Inf) + end + +end + +function show(io::IO, qns::QuasiNewtonLimitedMemoryBoxDirectionUpdate) + print(io, "QuasiNewtonLimitedMemoryBoxDirectionUpdate with internal state:\n") + return print(io, qns.qn_du) +end diff --git a/src/plans/debug.jl b/src/plans/debug.jl index 760f0b46c8..b5104181bb 100644 --- a/src/plans/debug.jl +++ b/src/plans/debug.jl @@ -1252,7 +1252,7 @@ function (d::DebugWarnIfGradientNormTooLarge)( p_inj = d.factor * max_stepsize(M, p) if Xn > p_inj @warn """At iteration #$k - the gradient norm ($Xn) is larger that $(d.factor) times the injectivity radius $(p_inj) at the current iterate. + the gradient norm ($Xn) is larger than $(d.factor) times the injectivity radius $(p_inj) at the current iterate. """ if d.status === :Once @warn "Further warnings will be suppressed, use DebugWarnIfGradientNormTooLarge($(d.factor), :Always) to get all warnings." diff --git a/src/plans/plan.jl b/src/plans/plan.jl index a55c1d7967..b0438faeb8 100644 --- a/src/plans/plan.jl +++ b/src/plans/plan.jl @@ -208,6 +208,8 @@ include("higher_order_primal_dual_plan.jl") include("stochastic_gradient_plan.jl") +include("box_plan.jl") + include("embedded_objective.jl") include("scaled_objective.jl") diff --git a/src/plans/quasi_newton_plan.jl b/src/plans/quasi_newton_plan.jl index b167b01b4b..db4686a74f 100644 --- a/src/plans/quasi_newton_plan.jl +++ b/src/plans/quasi_newton_plan.jl @@ -409,8 +409,8 @@ space ``T_{p_{k+1}} $(_tex(:Cal, "M"))``, preferably with an isometric vector tr # Provided functors -* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction -* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction in-place of `η` +* `(mp::AbstractManoptProblem, st::QuasiNewtonState) -> η` to compute the update direction +* `(η, mp::AbstractManoptProblem, st::QuasiNewtonState) -> η` to compute the update direction in-place of `η` # Fields @@ -511,6 +511,55 @@ function initialize_update!(d::QuasiNewtonMatrixDirectionUpdate) copyto!(d.matrix, I) return d end +""" + hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate, M, p, X) + +Evaluate the quadratic form associated with the stored quasi-Newton matrix. +Returns the scalar ``c^{\top} B c`` where ``c`` are the coordinates of the +tangent vector `X` at `p` (in the basis `d.basis`) and ``B`` is `d.matrix`. +""" +function hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X) where {T <: Union{BFGS, DFP, SR1, Broyden}} + c = get_coordinates(M, p, X, d.basis) + return dot(c, d.matrix, c) +end + +""" + UnitVector{TB} + +A type representing a unit tangent vector on a `Hyperrectangle`-like manifold with corners, +or a product of it with a standard manifold. +The field `index` stores the index of the element equal to 1. +All other elements are equal to 0. +`its` stores the overall iterator over all bounds. +""" +struct UnitVector{TB} + index::TB +end + +""" + hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate, M, p, X::UnitVector) + +Evaluate the quadratic form associated with the stored quasi-Newton matrix. +Returns the scalar ``c^{\top} B c`` where ``c`` are the coordinates of the +[`UnitVector`](@ref) `X` at `p` (in the basis `d.basis`) and ``B`` is `d.matrix`. +""" +function hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X::UnitVector) where {T <: Union{BFGS, DFP, SR1, Broyden}} + b = to_coordinate_index(M, X, d.basis) + return d.matrix[b, b] +end +""" + hessian_value(d::QuasiNewtonMatrixDirectionUpdate, M, p, X::UnitVector, Y) + +Evaluate the quadratic form associated with the stored quasi-Newton matrix. +Returns the scalar ``c_b^{\top} B c`` where ``c_b`` are the coordinates of the +[`UnitVector`](@ref) `X` at `p` (assumed to correspond to the basis `d.basis`), +``c`` are the coordinates of the tangent vector `Y` at `p` (in the basis `d.basis`) +and ``B`` is `d.matrix`. +""" +function hessian_value(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X::UnitVector, Y) where {T <: Union{BFGS, DFP, SR1, Broyden}} + b = to_coordinate_index(M, X, d.basis) + return dot(d.matrix[b, :], get_coordinates(M, p, Y, d.basis)) +end _doc_QN_B = """ ```math @@ -562,6 +611,19 @@ function is always included and the old, probably no longer relevant, informatio $(_fields(:vector_transport_method)) * `message`: a string containing a potential warning that might have appeared * `project!`: a function to stabilize the update by projecting on the tangent space +* `vector_transport_method`: method for transporting stored s and y directions to the new point +* `nonpositive_curvature_behavior`: how non-positive-definite pairs (s, y) are detected and handled in vector transport. + Allowed values are: + - `:ignore` (default): pairs whose inner product is zero are + omitted from the current Hessian approximation but are + retained in memory for further iterations. This may lead + to non-positive-definite Hessians and non-descent directions + being selected and thus needs to be handled elsewhere. + - `:byrd`: pairs such that `inner(M, p, X_s, Y_s) <= iszero_abstol * norm(M, p, Y_s)^2` + are removed from memory (see [ByrdLuNocedalZhu:1995](@cite), + Eq. (3.9) and its discussion). +* `sy_tol`: tolerance for detecting non-positive-definite pairs (X_s, X_y). + The pairs may lose positive-definiteness after vector transport. # Constructor @@ -597,6 +659,8 @@ mutable struct QuasiNewtonLimitedMemoryDirectionUpdate{ initial_scale::G project!::Proj vector_transport_method::VT + nonpositive_curvature_behavior::Symbol + sy_tol::F message::String end function QuasiNewtonLimitedMemoryDirectionUpdate( @@ -608,6 +672,8 @@ function QuasiNewtonLimitedMemoryDirectionUpdate( initial_scale::G = 1.0, (project!)::Proj = (copyto!), vector_transport_method::VTM = default_vector_transport_method(M, typeof(p)), + nonpositive_curvature_behavior::Symbol = :ignore, + sy_tol::Real = 1.0e-8, ) where { NT <: AbstractQuasiNewtonUpdateRule, T, @@ -630,6 +696,8 @@ function QuasiNewtonLimitedMemoryDirectionUpdate( _initial_state, project!, vector_transport_method, + nonpositive_curvature_behavior, + sy_tol, "", ) end @@ -661,20 +729,7 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})( end # backward pass for i in m:-1:1 - # what if division by zero happened here, setting to zero ignores this in the next step - # pre-compute in case inner is expensive - v = inner(M, p, d.memory_s[i], d.memory_y[i]) - if iszero(v) - d.ρ[i] = zero(eltype(d.ρ)) - if length(d.message) > 0 - d.message = replace(d.message, " i=" => " i=$i,") - d.message = replace(d.message, "summand in" => "summands in") - else - d.message = "The inner products ⟨s_i,y_i⟩ ≈ 0, i=$i, ignoring summand in approximation." - end - else - d.ρ[i] = 1 / v - end + # d.ρ is precomputed in the Hessian update d.ξ[i] = inner(M, p, d.memory_s[i], r) * d.ρ[i] r .-= d.ξ[i] .* d.memory_y[i] end @@ -726,13 +781,17 @@ function initialize_update!(d::QuasiNewtonLimitedMemoryDirectionUpdate) return d end +function show(io::IO, qns::QuasiNewtonLimitedMemoryDirectionUpdate) + return print(io, "QuasiNewtonLimitedMemoryDirectionUpdate with memory size $(length(qns.memory_s)) and $(qns.vector_transport_method) as vector transport.") +end + @doc """ QuasiNewtonCautiousDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate These [`AbstractQuasiNewtonDirectionUpdate`](@ref)s represent any quasi-Newton update rule, which are based on the idea of a so-called cautious update. The search direction is calculated as given in [`QuasiNewtonMatrixDirectionUpdate`](@ref) or [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref), -butut the update then is only executed if +but the update then is only executed if ```math $(_tex(:frac, "g_{x_{k+1}}(y_k,s_k)", "$(_tex(:norm, "s_k"; index = "x_{k+1}"))^{2}")) ≥ θ $(_tex(:norm, "$(_tex(:grad))f(p_k)"; index = "p_k")), diff --git a/src/plans/stepsize/initial_guess.jl b/src/plans/stepsize/initial_guess.jl index 4c9788b6a8..be6d67b7d1 100644 --- a/src/plans/stepsize/initial_guess.jl +++ b/src/plans/stepsize/initial_guess.jl @@ -145,6 +145,7 @@ function (hzi::HagerZhangInitialGuess{TF})( k::Int, last_stepsize::Real, η; lf0 = get_cost(mp, get_iterate(s)), Dlf0 = get_differential(mp, get_iterate(s), η), + kwargs... ) where {TF <: Real} M = get_manifold(mp) p = get_iterate(s) @@ -152,6 +153,13 @@ function (hzi::HagerZhangInitialGuess{TF})( alphamax = min(hzi.alphamax, max_stepsize(M, p)) + if :stop_when_stepsize_exceeds in keys(kwargs) + alphamax = min( + kwargs[:stop_when_stepsize_exceeds], + alphamax, + ) + end + if k == 1 point_d = hzi.point_distance(M, p) # Step I0 diff --git a/src/plans/stepsize/linesearch.jl b/src/plans/stepsize/linesearch.jl index 3d2576ea41..d993b9e072 100644 --- a/src/plans/stepsize/linesearch.jl +++ b/src/plans/stepsize/linesearch.jl @@ -28,6 +28,19 @@ function Base.show(io::IO, ::MIME"text/plain", ams::Stepsize) return multiline ? status_summary(io, ams) : show(io, ams) end +""" + initialize_stepsize!(sm::Stepsize) + +Initialize the state of a stepsize functor. This function should be called in the +`initialize_solver!` function for solvers that do possess a stepsize and can be used to +set up internal state of the stepsize functor that is preserved between line searches in +the same optimization, for example adaptive thresholds for Wolfe criteria in Hager-Zhang +line search. + +By default it does nothing. +""" +initialize_stepsize!(sm::Stepsize) = sm + """ default_stepsize(M::AbstractManifold, ams::AbstractManoptSolverState) @@ -51,21 +64,40 @@ function max_stepsize(M::AbstractManifold, p) injectivity_radius(M, p) catch is_tutorial_mode() && - @warn "`max_stepsize was called, but there seems to not be an `injectivity_raidus` available on $M." + @warn "`max_stepsize was called, but there seems to not be an `injectivity_radius` available on $M." Inf end return s end +function max_stepsize(M::ProductManifold, p) + return min(map(max_stepsize, M.manifolds, submanifold_components(M, p))...) +end +function max_stepsize(M::AbstractPowerManifold, p) + stepsize = number_eltype(p)(Inf) + rep_size = representation_size(M.manifold) + for i in get_iterator(M) + cur_stepsize = max_stepsize(M.manifold, _read(M, rep_size, p, i)) + stepsize = min(cur_stepsize, stepsize) + end + return stepsize +end function max_stepsize(M::AbstractManifold) s = try injectivity_radius(M) catch is_tutorial_mode() && - @warn "`max_stepsize was called, but there seems to not be an `injectivity_raidus` available on $M." + @warn "`max_stepsize was called, but there seems to not be an `injectivity_radius` available on $M." Inf end return s end +function max_stepsize(M::ProductManifold) + return min(map(max_stepsize, M.manifolds)...) +end +function max_stepsize(M::AbstractPowerManifold) + return max_stepsize(M.manifold) +end + """ Linesearch <: Stepsize @@ -134,8 +166,8 @@ $(_kwargs(:retraction_method)) * `gradient = nothing`: precomputed gradient at point `p` * `report_messages_in::NamedTuple = (; )`: a named tuple of [`StepsizeMessage`](@ref)s to report messages in. currently supported keywords are `:non_descent_direction`, `:stepsize_exceeds`, `:stepsize_less`, `:stop_increasing`, `:stop_decreasing` -* `stop_when_stepsize_less=0.0`: to avoid numerical underflow -* `stop_when_stepsize_exceeds=`[`max_stepsize`](@ref)`(M, p) / norm(M, p, η)`) to avoid leaving the injectivity radius on a manifold +* `stop_when_stepsize_less::Real=0.0`: to avoid numerical underflow +* `stop_when_stepsize_exceeds::Real=`[`max_stepsize`](@ref)`(M, p) / norm(M, p, η)`) to avoid leaving the injectivity radius on a manifold or exceeding boundaries on a manifold with corners * `stop_increasing_at_step=100`: stop the initial increase of step size after these many steps * `stop_decreasing_at_step=`1000`: stop the decreasing search after these many steps @@ -157,13 +189,14 @@ end @doc "$_doc_linesearch_backtrack" function linesearch_backtrack!( M::AbstractManifold, q, f::TF, p, s, decrease, contract, η::T; - lf0 = f(M, p), - gradient = nothing, + lf0::Real = f(M, p), gradient = nothing, Dlf0 = isnothing(gradient) ? nothing : real(inner(M, p, gradient, η)), retraction_method::AbstractRetractionMethod = default_retraction_method(M, typeof(p)), - additional_increase_condition = (M, p) -> true, additional_decrease_condition = (M, p) -> true, - stop_when_stepsize_less = 0.0, stop_when_stepsize_exceeds = max_stepsize(M, p) / norm(M, p, η), - stop_increasing_at_step = 100, stop_decreasing_at_step = 1000, + additional_increase_condition = (M, p) -> true, + additional_decrease_condition = (M, p) -> true, + stop_when_stepsize_less::Real = 0.0, stop_when_stepsize_exceeds::Real = max_stepsize(M, p) / norm(M, p, η), + stop_increasing_at_step = 100, + stop_decreasing_at_step = 1000, report_messages_in::NamedTuple = (;), ) where {TF, T} ManifoldsBase.retract_fused!(M, q, p, η, s, retraction_method) @@ -177,14 +210,14 @@ function linesearch_backtrack!( while f_q < lf0 + decrease * s * Dlf0 || !additional_increase_condition(M, q) (stop_increasing_at_step == 0) && break i = i + 1 - s = s / contract + s = min(s / contract, stop_when_stepsize_exceeds) ManifoldsBase.retract_fused!(M, q, p, η, s, retraction_method) f_q = f(M, q) if i == stop_increasing_at_step set_message!(report_messages_in, :stop_increasing, at = i, bound = stop_increasing_at_step, value = s) break end - if s > stop_when_stepsize_exceeds + if s >= stop_when_stepsize_exceeds set_message!(report_messages_in, :stepsize_exceeds, at = i, bound = stop_when_stepsize_exceeds, value = s) break end diff --git a/src/plans/stepsize/stepsize.jl b/src/plans/stepsize/stepsize.jl index f24047b2d7..40ef290ec6 100644 --- a/src/plans/stepsize/stepsize.jl +++ b/src/plans/stepsize/stepsize.jl @@ -41,9 +41,10 @@ with the fields keyword arguments and the retraction is set to the default retra $(_kwargs(:retraction_method)) * `contraction_factor=0.95` * `sufficient_decrease=0.1` +* `last_stepsize=initial_stepsize` * `initial_guess=`[`ArmijoInitialGuess`](@ref)`()` * `stop_when_stepsize_less=0.0`: stop when the stepsize decreased below this version. -* `stop_when_stepsize_exceeds=[`max_step`](@ref)`(M)`: provide an absolute maximal step size. +* `stop_when_stepsize_exceeds=[`max_stepsize`](@ref)`(M)`: provide an absolute maximal step size. * `stop_increasing_at_step=100`: for the initial increase test, stop after these many steps * `stop_decreasing_at_step=1000`: in the backtrack, stop after these many steps """ @@ -81,17 +82,17 @@ mutable struct ArmijoLinesearchStepsize{TRM <: AbstractRetractionMethod, P, I, F M::AbstractManifold; additional_decrease_condition::DF = (M, p) -> true, additional_increase_condition::IF = (M, p) -> true, candidate_point::P = allocate_result(M, rand), - contraction_factor::Real = 0.95, initial_stepsize::Real = 1.0, + contraction_factor::Real = 0.95, initial_stepsize::Real = 1.0, last_stepsize::Real = initial_stepsize, initial_guess::IGF = ArmijoInitialGuess(), retraction_method::TRM = default_retraction_method(M), stop_when_stepsize_less::Real = 0.0, stop_when_stepsize_exceeds::Real = max_stepsize(M), stop_increasing_at_step::Integer = 100, stop_decreasing_at_step::Integer = 1000, sufficient_decrease::Real = 0.1, ) where {TRM <: AbstractRetractionMethod, P, IGF, DF, IF} R = promote_type( - typeof(contraction_factor), typeof(initial_stepsize), + typeof(contraction_factor), typeof(initial_stepsize), typeof(last_stepsize), typeof(stop_when_stepsize_exceeds), typeof(stop_when_stepsize_less), typeof(sufficient_decrease), ) - cf = convert(R, contraction_factor); is = convert(R, initial_stepsize) + cf = convert(R, contraction_factor); is = convert(R, initial_stepsize); ls = convert(R, last_stepsize) swse = convert(R, stop_when_stepsize_exceeds); swsl = convert(R, stop_when_stepsize_less) sd = convert(R, sufficient_decrease) I = promote_type(typeof(stop_increasing_at_step), typeof(stop_decreasing_at_step)) @@ -104,7 +105,7 @@ mutable struct ArmijoLinesearchStepsize{TRM <: AbstractRetractionMethod, P, I, F return ArmijoLinesearchStepsize(; additional_decrease_condition = additional_decrease_condition, additional_increase_condition = additional_increase_condition, - candidate_point = candidate_point, contraction_factor = cf, initial_stepsize = is, last_stepsize = is, + candidate_point = candidate_point, contraction_factor = cf, initial_stepsize = is, last_stepsize = ls, initial_guess = initial_guess, retraction_method = retraction_method, stop_when_stepsize_less = swsl, stop_when_stepsize_exceeds = swse, sufficient_decrease = sd, stop_increasing_at_step = sias, stop_decreasing_at_step = sdas, messages = msgs @@ -120,20 +121,26 @@ function (a::ArmijoLinesearchStepsize)( ) p = get_iterate(s) grad = isnothing(gradient) ? get_gradient(mp, get_iterate(s)) : gradient - return a(mp, p, grad, η; initial_guess = a.initial_guess(mp, s, k, a.last_stepsize, η)) + return a(mp, p, grad, η; initial_guess = a.initial_guess(mp, s, k, a.last_stepsize, η), kwargs...) end function (a::ArmijoLinesearchStepsize)( - mp::AbstractManoptProblem, p, X, η; initial_guess = 1.0, kwargs... + mp::AbstractManoptProblem, p, X, η; initial_guess::Real = 1.0, + stop_when_stepsize_exceeds = nothing, kwargs... ) reset_messages!(a.messages) l = norm(get_manifold(mp), p, η) + swse = if isnothing(stop_when_stepsize_exceeds) + (a.stop_when_stepsize_exceeds / l) + else + stop_when_stepsize_exceeds + end a.last_stepsize = linesearch_backtrack!( get_manifold(mp), a.candidate_point, (M, p) -> get_cost_function(get_objective(mp))(M, p), p, initial_guess, a.sufficient_decrease, a.contraction_factor, η; gradient = X, retraction_method = a.retraction_method, stop_when_stepsize_less = (a.stop_when_stepsize_less / l), - stop_when_stepsize_exceeds = (a.stop_when_stepsize_exceeds / l), + stop_when_stepsize_exceeds = swse, stop_increasing_at_step = a.stop_increasing_at_step, stop_decreasing_at_step = a.stop_decreasing_at_step, additional_decrease_condition = a.additional_decrease_condition, @@ -685,18 +692,20 @@ end Returns the extremal of the quadratic polynomial ``p`` with ``p'(a.t)=a.df``, ``p'(b.t)=b.df``. +The result is algebraically equivalent to `(a.t * b.df - b.t * a.df) / (b.df - a.df)` +but the used formula is more numerically stable. + # Input * `a::UnivariateTriple{R}`: triple of bracket value `a` * `b::UnivariateTriple{R}`: triple bracket value `b` """ function secant(a::UnivariateTriple{R}, b::UnivariateTriple{R}) where {R} - return (a.t * b.df - b.t * a.df) / (b.df - a.df) + return (a.t + b.t) / 2 + (b.t - a.t) * (a.df + b.df) / (2 * (a.df - b.df)) end """ cubic_stepsize_update_step(a::Real, b::Real, c::Real, τ::Real) - Step function to determine the stepsize update `c` described in [Hager:1989](@cite). @@ -720,6 +729,8 @@ function cubic_stepsize_update_step(a::Real, b::Real, c::Real, τ::Real) end """ + get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t::Real) + Get the `UnivariateTriple` of the problem `mp` related to the step with stepsize ``t`` from ``p`` in direction ``η``. @@ -730,7 +741,7 @@ stepsize ``t`` from ``p`` in direction ``η``. * `η`: search direction at `p` * `t::Real`: step size """ -function get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t) +function get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t::Real) M = get_manifold(mp) cbls.last_stepsize = t ManifoldsBase.retract_fused!(M, cbls.candidate_point, p, η, t, cbls.retraction_method) @@ -754,7 +765,11 @@ function (cbls::CubicBracketingLinesearchStepsize)( check_curvature(c::UnivariateTriple) = abs(c.df) < cbls.sufficient_curvature * abs(init.df) n_iter = 0 - t = cbls.last_stepsize + max_step = cbls.max_stepsize + if :stop_when_stepsize_exceeds in keys(kwargs) + max_step = min(max_step, kwargs[:stop_when_stepsize_exceeds]) + end + t = min(cbls.last_stepsize, max_step) c_old = init c = get_univariate_triple!(mp, cbls, p, η, t) a, b = nothing, nothing @@ -769,9 +784,9 @@ function (cbls::CubicBracketingLinesearchStepsize)( (a, b) = c, c_old break end - (t == cbls.max_stepsize) && return t + (t == max_step) && return t t *= cbls.stepsize_increase - t = min(t, cbls.max_stepsize) + t = min(t, max_step) c_old = c c = get_univariate_triple!(mp, cbls, p, η, t) end @@ -1294,7 +1309,7 @@ mutable struct NonmonotoneLinesearchStepsize{ retraction_method::TRM = default_retraction_method(M), stepsize_reduction::R = 0.5, stop_when_stepsize_less::R = 0.0, - stop_when_stepsize_exceeds = real(max_stepsize(M)), + stop_when_stepsize_exceeds::R = real(max_stepsize(M)), stop_increasing_at_step::I = 100, stop_decreasing_at_step::I = 1000, storage::Union{Nothing, StoreStateAction} = StoreStateAction( @@ -1388,7 +1403,8 @@ function (a::NonmonotoneLinesearchStepsize)( η, p_old, X_old, - k, + k; + kwargs..., ) end function (a::NonmonotoneLinesearchStepsize)( @@ -1454,6 +1470,13 @@ function (a::NonmonotoneLinesearchStepsize)( end #compute the new step size with the help of the Barzilai-Borwein step size + l = norm(M, p, η) + local swse # COV_EXCL_LINE + if :stop_when_stepsize_exceeds in keys(kwargs) + swse = kwargs[:stop_when_stepsize_exceeds] + else + swse = (a.stop_when_stepsize_exceeds / l) + end a.last_stepsize = linesearch_backtrack!( M, a.candidate_point, @@ -1463,11 +1486,11 @@ function (a::NonmonotoneLinesearchStepsize)( a.sufficient_decrease, a.stepsize_reduction, η; - lf0 = maximum([a.old_costs[j] for j in 1:min(iter, memory_size)]), + lf0 = maximum(view(a.old_costs, 1:min(iter, memory_size))), gradient = X, retraction_method = a.retraction_method, - stop_when_stepsize_less = (a.stop_when_stepsize_less / norm(M, p, η)), - stop_when_stepsize_exceeds = (a.stop_when_stepsize_exceeds / norm(M, p, η)), + stop_when_stepsize_less = (a.stop_when_stepsize_less / l), + stop_when_stepsize_exceeds = swse, stop_increasing_at_step = a.stop_increasing_at_step, stop_decreasing_at_step = a.stop_decreasing_at_step, report_messages_in = a.messages, @@ -1763,7 +1786,11 @@ function (a::WolfePowellLinesearchStepsize)( max_step_increase = ifelse( isfinite(a.max_stepsize), min(1.0e9, a.max_stepsize / grad_norm), 1.0e9 ) + if :stop_when_stepsize_exceeds in keys(kwargs) + max_step_increase = min(max_step_increase, kwargs[:stop_when_stepsize_exceeds]) + end step = ifelse(isfinite(a.max_stepsize), min(1.0, a.max_stepsize / grad_norm), 1.0) + step = min(step, max_step_increase) s_plus = step s_minus = step # clear messages @@ -1790,14 +1817,14 @@ function (a::WolfePowellLinesearchStepsize)( break end end - s_plus = 2.0 * s_minus + s_plus = min(2.0 * s_minus, max_step_increase) else vector_transport_to!(M, a.candidate_direction, p, η, a.candidate_point, a.vector_transport_method) if get_differential(mp, a.candidate_point, a.candidate_direction; Y = Y) < a.sufficient_curvature * l i = 0 while fNew <= f0 + a.sufficient_decrease * step * l && (s_plus < max_step_increase) # increase - s_plus = s_plus * 2.0 + s_plus = min(s_plus * 2.0, max_step_increase) step = s_plus ManifoldsBase.retract_fused!(M, a.candidate_point, p, η, step, a.retraction_method) fNew = get_cost(mp, a.candidate_point) @@ -2158,3 +2185,632 @@ end function get_last_stepsize(step::WolfePowellBinaryLinesearchStepsize, ::Any...) return step.last_stepsize end + + +#### Hager-Zhang Linesearch + + +@doc """ + HagerZhangLinesearchStepsize{P,T,R<:Real} <: Linesearch + +Do a bracketing line search to find a step size ``α`` that finds a +local minimum along the search direction ``X`` starting from ``p``, +utilizing cubic polynomial interpolation using the method described in +[HagerZhang:2006:2](@cite). The function [`secant`](@ref) is used to find the minimum of the +cubic polynomial fitted to values of the cost function and its derivative at the endpoints +of the current interval. +See [`HagerZhangLinesearch`](@ref) for the mathematical details. + +# Fields + +$(_fields(:p; name = "candidate_point")) + as temporary storage for candidates +* `initial_stepsize::R`: the step size to start the search with +$(_fields(:retraction_method)) +$(_fields(:vector_transport_method)) +* `initial_guess`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `stepsize_limit`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `max_bracket_iterations`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `start_enforcing_wolfe_conditions_at_bracketing_iteration`: see keyword arguments of + [`HagerZhangLinesearch`](@ref) for details. +* `allow_early_maxstep_termination`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `wolfe_condition_mode`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `ϵ`, `δ`, `σ`, `ω`, `θ`, `γ`, `ρ`, `Δ`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `secant_acceptance_ratio`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details. +* `candidate_direction`, `temporary_tangent`: as temporary storage for tangent vectors +* `triples`: temporary storage for function and derivative evaluations +* `last_evaluation_index`: to keep track of the number of evaluations performed so far; + points at the last filled entry of `triples`. +* `Qₖ`, `Cₖ`: to keep track of the parameters of the Wolfe condition when in adaptive mode +* `current_mode`: to keep track of the current Wolfe condition mode when in adaptive mode +* `last_stepsize`: last stepsize computed since reset +* `last_cost`: last cost value computed since reset +* `ϵₖ`: the current ϵ parameter used in the approximate Wolfe condition and bracketing + +# Constructor + + HagerZhangLinesearchStepsize(M::AbstractManifold; kwargs...) +""" +mutable struct HagerZhangLinesearchStepsize{ + TF <: Real, + TIG <: AbstractInitialLinesearchGuess, + TRM <: AbstractRetractionMethod, + TVTM <: AbstractVectorTransportMethod, + TP, + TX, + } <: Linesearch + # parameters + initial_guess::TIG + retraction_method::TRM + vector_transport_method::TVTM + stepsize_limit::TF + max_bracket_iterations::Int + start_enforcing_wolfe_conditions_at_bracketing_iteration::Int + allow_early_maxstep_termination::Bool + wolfe_condition_mode::Symbol # :standard, :approximate, :adaptive + ϵ::TF # approximate Wolfe termination parameter + δ::TF # used in approximate Wolfe condition + σ::TF # used in curvature condition + ω::TF + θ::TF # update rule parameter + γ::TF + ρ::TF + Δ::TF + secant_acceptance_ratio::TF + # storage for candidates + candidate_point::TP + candidate_direction::TX + temporary_tangent::TX + # storage for function evaluations + triples::Vector{UnivariateTriple{TF}} + last_evaluation_index::Int + # storage to be kept between outer solver iterations + Qₖ::TF + Cₖ::TF + current_mode::Symbol + # other storage + last_stepsize::TF + last_cost::TF + ϵₖ::TF + function HagerZhangLinesearchStepsize( + M::AbstractManifold; + initial_guess::TIG = HagerZhangInitialGuess(), + retraction_method::TRM = default_retraction_method(M), + vector_transport_method::TVTM = default_vector_transport_method(M), + initial_last_stepsize::TF = NaN, + initial_last_cost::TF = NaN, + stepsize_limit::TF = Inf, + candidate_point = allocate_result(M, rand), + candidate_direction = zero_vector(M, candidate_point), + max_bracket_iterations::Int = 10, + start_enforcing_wolfe_conditions_at_bracketing_iteration::Int = initial_guess isa ConstantStepsize ? 2 : 1, + max_function_evaluations::Int = 20, + wolfe_condition_mode::Symbol = :adaptive, + allow_early_maxstep_termination::Bool = true, + ϵ::TF = 1.0e-6, + δ::TF = 0.1, + σ::TF = 0.9, + ω::TF = 1.0e-3, + θ::TF = 0.5, + γ::TF = 0.66, + ρ::TF = 5.0, + Δ::TF = 0.7, + secant_acceptance_ratio::TF = 1.0e-8, + ) where { + TIG <: AbstractInitialLinesearchGuess, TRM <: AbstractRetractionMethod, + TVTM <: AbstractVectorTransportMethod, TF <: Real, + } + + # check parameters + @assert δ > 0 && δ < 0.5 + @assert δ <= σ + @assert σ < 1 + @assert ϵ >= 0 + @assert ω >= 0 && ω <= 1 + @assert Δ >= 0 && Δ <= 1 + @assert θ > 0 && θ < 1 + @assert γ > 0 && γ < 1 + @assert ρ > 1 + @assert stepsize_limit > 0 + @assert wolfe_condition_mode in (:standard, :approximate, :adaptive) + @assert secant_acceptance_ratio >= 0 + + # allocate storage + triples = Vector{UnivariateTriple{TF}}(undef, max_function_evaluations) + + initial_wolfe_mode = wolfe_condition_mode == :adaptive ? :standard : wolfe_condition_mode + + return new{TF, TIG, TRM, TVTM, typeof(candidate_point), typeof(candidate_direction)}( + initial_guess, retraction_method, vector_transport_method, stepsize_limit, + max_bracket_iterations, start_enforcing_wolfe_conditions_at_bracketing_iteration, + allow_early_maxstep_termination, wolfe_condition_mode, + ϵ, δ, σ, ω, θ, γ, ρ, Δ, secant_acceptance_ratio, + candidate_point, candidate_direction, zero_vector(M, candidate_point), + triples, 0, + 0.0, 0.0, # Qₖ, Cₖ + initial_wolfe_mode, + initial_last_stepsize, initial_last_cost, ϵ, + ) + end +end + +function initialize_stepsize!(hzls::HagerZhangLinesearchStepsize) + hzls.Qₖ = 0.0 + hzls.Cₖ = 0.0 + hzls.last_stepsize = NaN + hzls.last_cost = NaN + hzls.ϵₖ = hzls.ϵ + hzls.current_mode = hzls.wolfe_condition_mode + if hzls.current_mode === :adaptive + hzls.current_mode = :standard + end + hzls.last_evaluation_index = 0 + return hzls +end + +""" + _hz_evaluate_next_step( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, α::Real + ) + +Evaluate and store the next trial step for the Hager-Zhang linesearch. + +Given the current iterate `p`, search direction `η` (in the tangent space at `p`), and a +candidate step size `α`, this function + +1. Retracts from `p` along `η` by step `α` into `hzls.candidate_point` (using + `hzls.retraction_method`), +2. Vector-transports `η` to the candidate point into `hzls.candidate_direction` (using + `hzls.vector_transport_method`), +3. Evaluates the objective and directional derivative via + `get_cost_and_differential(mp, hzls.candidate_point, hzls.candidate_direction)`, +4. Stores the resulting triple `(α, f, df)` in `hzls.triples` and increments + `hzls.last_evaluation_index`. + +This helper is side-effecting by design; it mutates `hzls`' internal storage. + +# Return value + +By default return a tuple with three values: +- the index `i_k::Int` at which the new evaluation was stored. +- `evaluation_limit_termination`: `true` iff the maximum number of stored evaluations + has been reached. +- `wolfe_termination` is `true` iff the (standard or approximate) Wolfe conditions are + satisfied for the current candidate, according to `hzls.current_mode`. + +# Errors + +Throws an error if called more often than the maximum number of allocated function +evaluations (i.e. if `hzls.triples` would overflow). +""" +function _hz_evaluate_next_step( + hzls::HagerZhangLinesearchStepsize, + M::AbstractManifold, + mp::AbstractManoptProblem, + p, + η, + α::Real + ) + triples = hzls.triples + max_evals = length(triples) + if hzls.last_evaluation_index + 1 > max_evals + # this should never happen if the calling code is correct + error("Hager-Zhang linesearch exceeded maximum number of function evaluations $(length(hzls.triples)).") + end + ManifoldsBase.retract_fused!(M, hzls.candidate_point, p, η, α, hzls.retraction_method) + vector_transport_to!( + M, hzls.candidate_direction, p, η, hzls.candidate_point, hzls.vector_transport_method + ) + f, df = get_cost_and_differential(mp, hzls.candidate_point, hzls.candidate_direction; Y = hzls.temporary_tangent) + hzls.last_evaluation_index += 1 + triples[hzls.last_evaluation_index] = UnivariateTriple(α, f, df) + + wolfe_termination = false + evaluation_limit_termination = hzls.last_evaluation_index == max_evals + i_k = hzls.last_evaluation_index + if hzls.current_mode === :standard + # Eq (22) in HagerZhang:2006:2 + # equivalent to the (T1) condition + wolfe_termination = (α * hzls.δ * triples[1].df >= (triples[i_k].f - triples[1].f)) && + (triples[i_k].df >= hzls.σ * triples[1].df) + elseif hzls.current_mode === :approximate + # Eq (23) in HagerZhang:2006:2 + additional criterion in the (T2) condition + wolfe_termination = ((2 * hzls.δ - 1) * triples[1].df >= triples[i_k].df) && + (triples[i_k].df >= hzls.σ * triples[1].df) && triples[i_k].f <= triples[1].f + hzls.ϵₖ + else + error("Unknown Wolfe condition mode $(hzls.current_mode).") + end + + return hzls.last_evaluation_index, evaluation_limit_termination, wolfe_termination +end + +""" + _hz_bracket(hzls::HagerZhangLinesearchStepsize, c::Real, max_alpha::Real) + +Perform the bracketing phase of the Hager-Zhang linesearch starting from an initial +stepsize `c` and not exceeding `max_alpha`. + +Returns a tuple `(i_a, i_b, f_eval, f_wolfe, f_early_maxstep)` where `i_a` and `i_b` are +the indices in the stored function evaluations such that the minimum is bracketed between +`triples[i_a].t` and `triples[i_b].t`. `f_eval` is `true` if the maximum number of function +evaluations has been reached during the bracketing phase. `f_wolfe` is `true` if the Wolfe +conditions have been satisfied. `f_early_maxstep` is `true` if the maximum stepsize was +reached early with negative slope and an improvement over the initial point. +""" +function _hz_bracket( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, c::Real, max_alpha::Real + ) + # B0 + current_step = c + local c_index, f_eval, f_wolfe # COV_EXCL_LINE + ls_early_exit = false + for j in 1:hzls.max_bracket_iterations + c_index, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, current_step) + if f_eval || (f_wolfe && j >= hzls.start_enforcing_wolfe_conditions_at_bracketing_iteration) + break + end + if hzls.triples[c_index].df >= 0 + # B1 -- detecting a positive slope + # handled after the loop + break + else + if hzls.triples[c_index].f > hzls.triples[1].f + hzls.ϵₖ + # B2 -- function value gets sufficiently larger than at 0 + # perform main bracketing loop (we can skip U0-U2 checks here) + (i_a_bar, i_b_bar, f_eval, f_wolfe) = _hz_u3(hzls, M, mp, p, η, 1, c_index) + return (i_a_bar, i_b_bar, f_eval, f_wolfe, false) + else + if current_step == max_alpha + # we've reached maximum alpha so we can't expand anymore + # we handle this case after the loop + ls_early_exit = hzls.allow_early_maxstep_termination + break + end + # B3 -- widen the bracket + current_step *= hzls.ρ + if current_step > max_alpha + current_step = max_alpha + end + end + end + end + # we detected positive slope, ran out of iterations or reached max stepsize + # B1 seems to be the best choice for all three cases + + if ls_early_exit + # additional termination condition: we reached the maximum stepsize with negative + # slope and an improvement over the initial point, so we can exit early with this step + return (1, c_index, f_eval, f_wolfe, true) + end + + i_min = 1 + for i in 2:(hzls.last_evaluation_index - 1) + if hzls.triples[i].f <= hzls.triples[1].f + hzls.ϵₖ + i_min = i + end + end + return (i_min, c_index, f_eval, f_wolfe, false) +end + +""" + _hz_update( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int, c::Real + ) + +Perform an update procedure of the Hager-Zhang linesearch given the current bracketing +indices `i_a` and `i_b` and a candidate stepsize `c`. + +Returns indices and termination information `(i_A, i_B, i_c, f_eval, f_wolfe)` where the +minimum is now bracketed between `alpha_values[i_A]` and `alpha_values[i_B]`. Index `i_c` +indicates the position at which evaluation of the candidate `c` was stored. If the +candidate `c` is outside of the current bracket, the last index is returned as `-1`. +`f_eval` is `true` if the maximum number of function evaluations has been reached. +`f_wolfe` is `true` if the Wolfe conditions have been satisfied at the candidate `i_c`. +""" +function _hz_update( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int, c::Real + ) + # U0 + if c < hzls.triples[i_a].t || c > hzls.triples[i_b].t + return (i_a, i_b, -1, false, false) + end + i_c, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, c) + if hzls.triples[i_c].df >= 0 + # U1 + return (i_a, i_c, i_c, f_eval, f_wolfe) + else + if hzls.triples[i_c].f <= hzls.triples[1].f + hzls.ϵₖ + # U2 + return (i_c, i_b, i_c, f_eval, f_wolfe) + else + if f_eval || f_wolfe + # termination condition met + return (i_a, i_b, i_c, f_eval, f_wolfe) + else + # U3 + i_a_bar, i_b_bar, f_eval, f_wolfe = _hz_u3(hzls, M, mp, p, η, i_a, i_c) + return (i_a_bar, i_b_bar, i_c, f_eval, f_wolfe) + end + end + end +end + +function _hz_u3( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int + ) + i_a_bar = i_a + i_b_bar = i_b + # the loop should typically terminate before exceeding the number of evaluations + f_eval = false + f_wolfe = false + while hzls.last_evaluation_index < length(hzls.triples) + # U3 (a) + d = (1 - hzls.θ) * hzls.triples[i_a_bar].t + hzls.θ * hzls.triples[i_b_bar].t + i_d, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, d) + if hzls.triples[i_d].df >= 0 || f_eval || f_wolfe + return (i_a_bar, i_d, f_eval, f_wolfe) + else + if hzls.triples[i_d].f <= hzls.triples[1].f + hzls.ϵₖ + # U3 (b) + i_a_bar = i_d + else + # U3 (c) + i_b_bar = i_d + end + end + end + return (i_a_bar, i_b_bar, f_eval, f_wolfe) +end + +""" + _hz_secant2( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int + ) + +Perform the secant-based update in the Hager-Zhang linesearch. + +Computes a trial step using a secant interpolation of the bracketing +endpoints. If the trial step is too close to an endpoint, falls back to a +bisection step. Returns the updated bracketing indices and termination flags +from the internal update routine. + +# Arguments +- `hzls`: linesearch state and storage. +- `M`: manifold for retractions and transports. +- `mp`: optimization problem providing cost and differential. +- `p`: current iterate. +- `η`: search direction in the tangent space at `p`. +- `i_a`, `i_b`: indices of the current bracketing interval in `hzls.triples`. + +# Return value +Returns `(i_A, i_B, i_c, f_eval, f_wolfe)` where +- `i_A`, `i_B`: indices bracketing the minimum after the update, +- `i_c`: index of the most recent evaluation (or `-1` if the candidate was out of range), +- `f_eval`: `true` iff the evaluation limit has been reached, +- `f_wolfe`: `true` iff the Wolfe conditions are satisfied. + +# Steps (S1-S4) +- S1: compute a secant trial `c` from the current bracket and accept it unless too close to + an endpoint (otherwise use a bisection step). +- S2/S3: if the trial becomes a new endpoint, perform an update from that side. +- S4: return the updated bracket and termination flags. +""" +function _hz_secant2( + hzls::HagerZhangLinesearchStepsize, M::AbstractManifold, + mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int + ) + # S1 + c = secant(hzls.triples[i_a], hzls.triples[i_b]) + width = hzls.triples[i_b].t - hzls.triples[i_a].t + if abs(c - hzls.triples[i_a].t) < hzls.secant_acceptance_ratio * width || + abs(c - hzls.triples[i_b].t) < hzls.secant_acceptance_ratio * width + # secant too close to an endpoint, use bisection instead + # this case is not present in the original algorithm, but the following steps don't make much sense in this case + c = (hzls.triples[i_a].t + hzls.triples[i_b].t) / 2 + return _hz_update(hzls, M, mp, p, η, i_a, i_b, c) + end + (i_A, i_B, i_c, f_eval, f_wolfe) = _hz_update(hzls, M, mp, p, η, i_a, i_b, c) + if f_eval || f_wolfe + # not present in the original algorithm, but this seems to be the right way to handle this case + return (i_A, i_B, i_c, f_eval, f_wolfe) + end + if i_c == i_B + # S2 + c_bar = secant(hzls.triples[i_b], hzls.triples[i_B]) + # S4, part 1 + return _hz_update(hzls, M, mp, p, η, i_A, i_B, c_bar) + elseif i_c == i_A + # S3 + c_bar = secant(hzls.triples[i_a], hzls.triples[i_A]) + # S4, part 1 + return _hz_update(hzls, M, mp, p, η, i_A, i_B, c_bar) + else + # S4, part 2 + return (i_A, i_B, i_c, f_eval, f_wolfe) + end +end + +function (hzls::HagerZhangLinesearchStepsize)( + mp::AbstractManoptProblem, + s::AbstractManoptSolverState, + k::Int, + η = (-get_gradient(mp, get_iterate(s))); + fp = get_cost(mp, get_iterate(s)), + gradient = nothing, + kwargs..., + ) + M = get_manifold(mp) + p = get_iterate(s) + + dphi_0 = if !isnothing(gradient) + real(inner(M, p, η, gradient)) + else + get_differential(mp, p, η; Y = hzls.temporary_tangent) + end + hzls.triples[1] = UnivariateTriple(0.0, fp, dphi_0) + hzls.last_evaluation_index = 1 + + # update Qₖ, Cₖ + hzls.Qₖ = 1 + hzls.Qₖ * hzls.Δ + hzls.Cₖ += (abs(fp) - hzls.Cₖ) / hzls.Qₖ + + if hzls.wolfe_condition_mode == :adaptive + # Checking the V3 condition + if abs(hzls.last_cost - fp) <= hzls.ω * hzls.Cₖ + hzls.current_mode = :approximate + end + end + + # L0, initialization + + # handle stepsize limit + max_alpha = hzls.stepsize_limit + if :stop_when_stepsize_exceeds in keys(kwargs) + max_alpha = min( + kwargs[:stop_when_stepsize_exceeds], + max_alpha, + ) + end + # guess initial alpha + α0 = hzls.initial_guess(mp, s, k, hzls.last_stepsize, η; lf0 = fp, Dlf0 = dphi_0, stop_when_stepsize_exceeds = max_alpha) + + # in case initial_guess does not take into account the stepsize limit, we enforce it here + α0 = min(α0, max_alpha) + + # L0, bracket(c) + local i_a_j, i_b_j, f_eval, f_wolfe # COV_EXCL_LINE + (i_a_j, i_b_j, f_eval, f_wolfe, f_early_maxstep) = _hz_bracket(hzls, M, mp, p, η, α0, max_alpha) + !f_early_maxstep && while !(f_eval || f_wolfe) + # L1 + finite_at_b = isfinite(hzls.triples[i_b_j].f) + if finite_at_b + # _hz_secant2 only makes sense if we have finite function values at both ends + # but _hz_update may still work + (i_a, i_b, _i_c, f_eval, f_wolfe) = _hz_secant2(hzls, M, mp, p, η, i_a_j, i_b_j) + else + (i_a, i_b) = (i_a_j, i_b_j) + end + # L2 + # we additionally check that we can continue narrowing the bracket + if !(f_eval || f_wolfe) && + (!finite_at_b || (hzls.triples[i_b].t - hzls.triples[i_a].t) > hzls.γ * (hzls.triples[i_b_j].t - hzls.triples[i_a_j].t)) + # secant2 did not reduce the bracket sufficiently + # we need to do bisection + (i_a, i_b, _i_c, f_eval, f_wolfe) = _hz_update( + hzls, M, mp, p, η, + i_a, i_b, + (hzls.triples[i_a].t + hzls.triples[i_b].t) / 2, + ) + end + # L3 + i_a_j, i_b_j = i_a, i_b + + # loop terminates when we generate a point satisfying T1 or T2, or when we run out + # of objective evaluations + end + + hzls.last_stepsize = hzls.triples[hzls.last_evaluation_index].t + hzls.last_cost = hzls.triples[hzls.last_evaluation_index].f + return hzls.last_stepsize +end + +function Base.show(io::IO, hzls::HagerZhangLinesearchStepsize) + return print( + io, + """ + HagerZhangLinesearch(; + initial_guess = $(hzls.initial_guess), + retraction_method = $(hzls.retraction_method), + vector_transport_method = $(hzls.vector_transport_method), + stepsize_limit = $(hzls.stepsize_limit), + max_bracket_iterations = $(hzls.max_bracket_iterations), + start_enforcing_wolfe_conditions_at_bracketing_iteration = $(hzls.start_enforcing_wolfe_conditions_at_bracketing_iteration), + max_function_evaluations = $(length(hzls.triples)), + wolfe_condition_mode = $(hzls.wolfe_condition_mode), + ϵ = $(hzls.ϵ), δ = $(hzls.δ), σ = $(hzls.σ), + ω = $(hzls.ω), + θ = $(hzls.θ), γ = $(hzls.γ), secant_acceptance_ratio = $(hzls.secant_acceptance_ratio), + ρ = $(hzls.ρ), + Δ = $(hzls.Δ), + )""", + ) +end +function status_summary(hzls::HagerZhangLinesearchStepsize) + return "$(hzls)\nand a computed last stepsize of $(hzls.last_stepsize)" +end + +@doc """ + HagerZhangLinesearch(; kwargs...) + HagerZhangLinesearch(M::AbstractManifold; kwargs...) + +A functor representing the curvature minimizing cubic bracketing scheme introduced +in [HagerZhang:2006:2](@cite). + +The following changes were made to the original algorithm from the paper: +1. The algorithm bails out early out of a secant update that is too close to one of the end + points and switches to bisection. Original algorithm performs a similar check at a later + stage. This precaution prevents a non-productive evaluation of the objective. +2. Added `start_enforcing_wolfe_conditions_at_bracketing_iteration`, since with a very low + stepsize initialization that satisfies Wolfe conditions we might accept the initial + stepsize and not notice that bracketing could help us reach the minimum earlier. + Setting `start_enforcing_wolfe_conditions_at_bracketing_iteration`` to 1 reproduces the + behavior of the original paper. For example a static initial stepsize equal to 1.0 could + benefit from having this parameter increased. +3. The paper isn't entirely clear on what the final stepsize to return is. This + implementation returns the last evaluated stepsize. +4. The original algorithm doesn't specify what to do when the maximum stepsize is reached + during the bracketing phase with a negative slope and an improvement over the initial + point. This implementation allows for an early termination in this case, which seems + reasonable since we can't expand the bracket anymore and this point is likely close to + the minimum. By default this early termination is allowed, but it can be turned off via + `allow_early_maxstep_termination` in which case the algorithm continues with the main + loop even in this case. + +## Keyword arguments + +$(_kwargs(:p; name = "candidate_point")) as temporary storage for candidates +$(_kwargs(:retraction_method)) +$(_kwargs(:vector_transport_method)) +* `initial_guess::AbstractInitialLinesearchGuess=HagerZhangInitialGuess()`: initial linesearch guess strategy +* `initial_last_stepsize::Real = NaN`: initial value for the stored last stepsize +* `initial_last_cost::Real = NaN`: initial value for the stored last cost +* `stepsize_limit::Real = Inf`: upper bound for trial stepsizes during bracketing +* `candidate_point = allocate_result(M, rand)`: storage for trial points +* `candidate_direction = zero_vector(M, candidate_point)`: storage for transported directions +* `max_bracket_iterations::Int = 10`: maximum number of bracketing iterations +* `start_enforcing_wolfe_conditions_at_bracketing_iteration::Int = initial_guess isa ConstantStepsize ? 2 : 1`: + bracketing iteration number at which Wolfe conditions are started to be enforced; + setting to 1 may cause no bracketing to occur when the initial guess satisfies the Wolfe + conditions. +* `max_function_evaluations::Int = 20`: maximum number of function evaluations per linesearch +* `allow_early_maxstep_termination::Bool = true`: whether to allow early termination when + the maximum stepsize is reached with negative slope and an improvement over the initial point. +* `wolfe_condition_mode::Symbol = :adaptive`: one of `:standard`, `:approximate`, or `:adaptive`. + Selects between (T1) and (T2) conditions in [HagerZhang:2006:2](@cite). +* `ϵ::Real = 1.0e-6`: initial allowed increase in function value in termination condition (T2). + Allowed range: `ϵ >= 0`. +* `δ::Real = 0.1`: parameter for approximate Wolfe condition. + Allowed range: `0 < δ < 0.5` and `δ <= σ`. +* `σ::Real = 0.9`: curvature condition parameter. Allowed range: `δ <= σ < 1`. +* `ω::Real = 1.0e-3`: interpolation safeguard parameter. Allowed range: `0 <= ω <= 1`. +* `θ::Real = 0.5`: bisection update parameter. Allowed range: `0 < θ < 1`. +* `γ::Real = 0.66`: determines when a bisection step is performed instead of secant. + Allowed range: `0 < γ < 1`. +* `ρ::Real = 5.0`: bracketing expansion factor. Allowed range: `ρ > 1`. +* `Δ::Real = 0.7`: Parameter controlling the rate of change of Qₖ. + Allowed range: `0 <= Δ <= 1`. +* `secant_acceptance_ratio::Real = 1.0e-8`: minimum relative interval length + for accepting secant step. Allowed range: `secant_acceptance_ratio >= 0`. + In case of rejection, a bisection step is performed instead. + +$(_note(:ManifoldDefaultFactory, "HagerZhangLinesearch")) +""" +function HagerZhangLinesearch(args...; kwargs...) + return ManifoldDefaultsFactory(HagerZhangLinesearchStepsize, args...; kwargs...) +end diff --git a/src/plans/stopping_criterion.jl b/src/plans/stopping_criterion.jl index 8b4cf666ac..180f4fbdbf 100644 --- a/src/plans/stopping_criterion.jl +++ b/src/plans/stopping_criterion.jl @@ -321,7 +321,6 @@ function (c::StopWhenChangeLess)(mp::AbstractManoptProblem, s::AbstractManoptSol c.storage(mp, s, k) return false end -indicates_convergence(c::StopWhenChangeLess) = false function get_reason(c::StopWhenChangeLess) if (c.last_change < c.threshold) && (c.at_iteration >= 0) return "At iteration $(c.at_iteration) the algorithm performed a step with a change ($(c.last_change)) less than $(c.threshold).\n" @@ -334,8 +333,11 @@ function status_summary(c::StopWhenChangeLess; context::Symbol = :default) s = has_stopped ? "reached" : "not reached" return (_is_inline(context) ? "|Δp| < $(c.threshold):$(_MANOPT_INDENT)" : "A stopping criterion to stop when the change of the iterate is less than $(c.threshold)\n using the $(repr(c.inverse_retraction_method))\n$(_MANOPT_INDENT)") * "$s" end +indicates_convergence(c::StopWhenChangeLess) = false function Base.show(io::IO, c::StopWhenChangeLess) - return print(io, "StopWhenChangeLess($(c.threshold); inverse_retraction_method=$(repr(c.inverse_retraction_method)))") + print(io, "StopWhenChangeLess($(c.threshold); inverse_retraction_method=$(repr(c.inverse_retraction_method))") + !ismissing(c.outer_norm) && print(io, ", outer_norm = ", c.outer_norm) + return print(io, ")") end """ @@ -381,7 +383,7 @@ function (c::StopWhenCostChangeLess)( c.last_change = 2 * c.tolerance end c.last_change = c.last_cost - c.last_cost = get_cost(problem, get_iterate(state)) + c.last_cost = get_cost(problem, state) c.last_change = c.last_change - c.last_cost if abs(c.last_change) < c.tolerance c.at_iteration = iteration @@ -410,11 +412,11 @@ end StopWhenCostLess <: StoppingCriterion store a threshold when to stop looking at the cost function of the -optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`. +optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p, s)`. # Constructor - StopWhenCostLess(ε) + StopWhenCostLess(ε::Real) initialize the stopping criterion to a threshold `ε`. """ @@ -432,7 +434,7 @@ function (c::StopWhenCostLess)( if k == 0 # reset on init c.at_iteration = -1 end - c.last_cost = get_cost(p, get_iterate(s)) + c.last_cost = get_cost(p, s) if c.last_cost < c.threshold c.at_iteration = k return true @@ -466,6 +468,78 @@ function set_parameter!(c::StopWhenCostLess, ::Val{:MinCost}, v) return c end +""" + StopWhenRelativeAPosterioriCostChangeLessOrEqual <: StoppingCriterion + +A stopping criterion to stop when + +````math +\\frac{f_k - f_{k+1}}{\\max(\\lvert f_k \\rvert, \\lvert f_{k+1} \\rvert, 1)} ≤ tol, +```` + +based on Eq. (1) in [ZhuByrdLuNocedal:1997](@cite) + +# Fields +* tolerance: the threshold `tol` in the above formula. +$(_fields([:at_iteration, :last_change])) +* `last_cost``: the last cost value + +# Constructor + + StopWhenRelativeAPosterioriCostChangeLessOrEqual(tolerance::F) + +Initialize the stopping criterion to a threshold `tolerance` for the change of the cost function. + + StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr::Real=1.0e7) + +Initialize tolerance to `factr * eps(factr)`, following the convention in [ZhuByrdLuNocedal:1997](@cite). +""" +mutable struct StopWhenRelativeAPosterioriCostChangeLessOrEqual{F <: Real} <: StoppingCriterion + tolerance::F + at_iteration::Int + last_cost::F + last_change::F +end +function StopWhenRelativeAPosterioriCostChangeLessOrEqual(tol::F) where {F <: Real} + return StopWhenRelativeAPosterioriCostChangeLessOrEqual{F}(tol, -1, zero(tol), 2 * tol) +end +StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr::F = 1.0e7) where {F <: Real} = StopWhenRelativeAPosterioriCostChangeLessOrEqual(factr * eps(typeof(factr))) +function (c::StopWhenRelativeAPosterioriCostChangeLessOrEqual)( + problem::AbstractManoptProblem, state::AbstractManoptSolverState, iteration::Int + ) + if iteration <= 0 # reset on init + c.at_iteration = -1 + c.last_cost = Inf + c.last_change = 2 * c.tolerance + end + current_cost = get_cost(problem, state) + c.last_change = (c.last_cost - current_cost) / max(abs(c.last_cost), abs(current_cost), 1) + c.last_cost = current_cost + if iteration > 1 && c.last_change <= c.tolerance + c.at_iteration = iteration + return true + end + return false +end +indicates_convergence(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual) = false +function get_reason(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual) + if c.at_iteration >= 0 + return "At iteration $(c.at_iteration) the algorithm performed a step with a relative a posteriori cost change ($(abs(c.last_change))) less than or equal to $(c.tolerance)." + end + return "" +end +function status_summary(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual) + has_stopped = (c.at_iteration >= 0) + s = has_stopped ? "reached" : "not reached" + return "(fₖ- fₖ₊₁)/max(|fₖ|, |fₖ₊₁|, 1) = $(abs(c.last_change)) ≤ $(c.tolerance):\t$s" +end +function Base.show(io::IO, ::MIME"text/plain", c::StopWhenRelativeAPosterioriCostChangeLessOrEqual) + return print( + io, + "StopWhenRelativeAPosterioriCostChangeLessOrEqual with threshold $(c.tolerance).\n $(status_summary(c))", + ) +end + @doc """ StopWhenEntryChangeLess @@ -722,7 +796,7 @@ Create a stopping criterion with threshold `ε` for the gradient, that is, this indicates to stop when [`get_gradient`](@ref) returns a gradient vector of norm less than `ε`, where the norm to use can be specified in the `norm=` keyword. """ -mutable struct StopWhenGradientNormLess{F, TF, N <: Union{Missing, Real}} <: StoppingCriterion +mutable struct StopWhenGradientNormLess{F, TF <: Real, N <: Union{Missing, Real}} <: StoppingCriterion norm::F threshold::TF last_change::TF @@ -730,7 +804,7 @@ mutable struct StopWhenGradientNormLess{F, TF, N <: Union{Missing, Real}} <: Sto outer_norm::N function StopWhenGradientNormLess( ε::TF; norm::F = norm, outer_norm::N = missing - ) where {F, TF, N <: Union{Missing, Real}} + ) where {F, TF <: Real, N <: Union{Missing, Real}} return new{F, TF, N}(norm, ε, zero(ε), -1, outer_norm) end end @@ -768,11 +842,87 @@ end show(io::IO, c::StopWhenGradientNormLess) = print(io, "StopWhenGradientNormLess($(c.threshold))") """ - set_parameter!(c::StopWhenGradientNormLess, :MinGradNorm, v::Float64) + set_parameter!(c::StopWhenGradientNormLess{F,TF}, ::Val{:MinGradNorm}, v::TF) where {F,TF<:Real} Update the minimal gradient norm when an algorithm shall stop """ -function set_parameter!(c::StopWhenGradientNormLess, ::Val{:MinGradNorm}, v::Float64) +function set_parameter!(c::StopWhenGradientNormLess{F, TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF <: Real} + c.threshold = v + return c +end + +""" + StopWhenProjectedNegativeGradientNormLess <: StoppingCriterion + +A stopping criterion similar to [`StopWhenGradientNormLess`](@ref), although it checks the +norm of projected minus gradient. It is primarily useful for optimization involving +[`Hyperrectangle`](@extref Manifolds.Hyperrectangle). + +On manifolds with boundary and manifolds with corners, for a tangent vector ``X``, +``-X`` might not be a valid tangent vector. As an example, consider the objective +``f(x)=x^2`` on the interval ``[1, 2]``. Its gradient at 1 is equal to 2, but because the +point 1 is at the boundary of the interval, the projected negative gradient is equal to 0 +because we can't go in the negative direction. +""" +mutable struct StopWhenProjectedNegativeGradientNormLess{F, TF <: Real, N <: Union{Missing, Real}} <: StoppingCriterion + norm::F + threshold::TF + last_change::TF + at_iteration::Int + outer_norm::N + function StopWhenProjectedNegativeGradientNormLess( + ε::TF; norm::F = norm, outer_norm::N = missing + ) where {F, TF <: Real, N <: Union{Missing, Real}} + return new{F, TF, N}(norm, ε, zero(ε), -1, outer_norm) + end +end + +function (sc::StopWhenProjectedNegativeGradientNormLess)( + mp::AbstractManoptProblem, s::AbstractManoptSolverState, k::Int + ) + M = get_manifold(mp) + if k == 0 # reset on init + sc.at_iteration = -1 + end + if (k > 0) + r = (has_components(M) && !ismissing(sc.outer_norm)) ? (sc.outer_norm,) : () + p = get_iterate(s) + mpg = -get_gradient(s) + embed_project!(M, mpg, p, mpg) + sc.last_change = sc.norm(M, p, mpg, r...) + if sc.last_change < sc.threshold + sc.at_iteration = k + return true + end + end + return false +end +function get_reason(c::StopWhenProjectedNegativeGradientNormLess) + if (c.last_change < c.threshold) && (c.at_iteration >= 0) + return "The algorithm reached approximately critical point after $(c.at_iteration) iterations; the gradient norm ($(c.last_change)) is less than $(c.threshold).\n" + end + return "" +end +function status_summary(c::StopWhenProjectedNegativeGradientNormLess; context::Symbol = :default) + (context === :short) && return repr(c) + has_stopped = (c.at_iteration >= 0) + s = has_stopped ? "reached" : "not reached" + (context === :inline) && return "|proj (-grad f)| < $(c.threshold): $s" + return "A StoppingCriterion to stop when the negative projected gradient norm is less than a threshold of $(c.threshold):\n$(_MANOPT_INDENT)$s" +end +indicates_convergence(c::StopWhenProjectedNegativeGradientNormLess) = true +function show(io::IO, c::StopWhenProjectedNegativeGradientNormLess) + print(io, "StopWhenProjectedNegativeGradientNormLess(", c.threshold, "; norm = ", c.norm) + !ismissing(c.outer_norm) && print(io, ", outer_norm = ", c.outer_norm) + return print(io, ")") +end + +""" + set_parameter!(c::StopWhenProjectedNegativeGradientNormLess{F,TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF<:Real} + +Update the minimal gradient norm when an algorithm shall stop. +""" +function set_parameter!(c::StopWhenProjectedNegativeGradientNormLess{F, TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF <: Real} c.threshold = v return c end @@ -840,13 +990,14 @@ end """ StopWhenCostNaN <: StoppingCriterion -stop looking at the cost function of the optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`. +Stop the solver when the cost function of the optimization problem +[`AbstractManoptProblem`](@ref) is `NaN`. The value is obtained using `get_cost(p, s)`. # Constructor StopWhenCostNaN() -initialize the stopping criterion to NaN. +initialize the stopping criterion with `at_iteration` equal to -1. """ mutable struct StopWhenCostNaN <: StoppingCriterion at_iteration::Int @@ -859,7 +1010,7 @@ function (c::StopWhenCostNaN)( c.at_iteration = -1 end # but still verify whether it yields NaN - if isnan(get_cost(p, get_iterate(s))) + if isnan(get_cost(p, s)) c.at_iteration = k return true end @@ -885,13 +1036,15 @@ end """ StopWhenIterateNaN <: StoppingCriterion -stop looking at the cost function of the optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`. +Stop the solver when the iterate of the optimization problem from within an +[`AbstractManoptProblem`](@ref) contains `NaN` values. +The value is obtained using [`get_iterate`](@ref)`(s)`. # Constructor StopWhenIterateNaN() -initialize the stopping criterion to NaN. +Initialize the stopping criterion. """ mutable struct StopWhenIterateNaN <: StoppingCriterion at_iteration::Int diff --git a/src/solvers/FrankWolfe.jl b/src/solvers/FrankWolfe.jl index a7063bcb8c..5ee47797e1 100644 --- a/src/solvers/FrankWolfe.jl +++ b/src/solvers/FrankWolfe.jl @@ -327,6 +327,7 @@ calls_with_kwargs(::typeof(Frank_Wolfe_method!)) = (decorate_objective!, decorat function initialize_solver!(amp::AbstractManoptProblem, fws::FrankWolfeState) get_gradient!(amp, fws.X, fws.p) + initialize_stepsize!(fws.stepsize) return fws end function step_solver!(amp::AbstractManoptProblem, fws::FrankWolfeState, k) diff --git a/src/solvers/alternating_gradient_descent.jl b/src/solvers/alternating_gradient_descent.jl index aa5cd0ccdf..da43efcba3 100644 --- a/src/solvers/alternating_gradient_descent.jl +++ b/src/solvers/alternating_gradient_descent.jl @@ -268,6 +268,7 @@ function initialize_solver!( get_gradient!(amp, agds.X, agds.p) (agds.order_type == :FixedRandom || agds.order_type == :Random) && (shuffle!(agds.order)) + initialize_stepsize!(agds.stepsize) return agds end function step_solver!(amp::AbstractManoptProblem, agds::AlternatingGradientDescentState, k) diff --git a/src/solvers/conjugate_gradient_descent.jl b/src/solvers/conjugate_gradient_descent.jl index 49fcc93600..0fa67e6e2e 100644 --- a/src/solvers/conjugate_gradient_descent.jl +++ b/src/solvers/conjugate_gradient_descent.jl @@ -179,6 +179,7 @@ function initialize_solver!(amp::AbstractManoptProblem, cgs::ConjugateGradientDe cgs.δ = -copy(get_manifold(amp), cgs.p, cgs.X) # remember the first gradient in coefficient calculation cgs.coefficient(amp, cgs, 0) + initialize_stepsize!(cgs.stepsize) cgs.β = 0.0 return cgs end diff --git a/src/solvers/gradient_descent.jl b/src/solvers/gradient_descent.jl index 74512088fc..ddac27e73d 100644 --- a/src/solvers/gradient_descent.jl +++ b/src/solvers/gradient_descent.jl @@ -260,6 +260,7 @@ calls_with_kwargs(::typeof(gradient_descent!)) = (decorate_objective!, decorate_ # function initialize_solver!(mp::AbstractManoptProblem, s::GradientDescentState) get_gradient!(mp, s.X, s.p) + initialize_stepsize!(s.stepsize) return s end function step_solver!(p::AbstractManoptProblem, s::GradientDescentState, k) diff --git a/src/solvers/projected_gradient_method.jl b/src/solvers/projected_gradient_method.jl index d68c2fdbf9..f53bf02102 100644 --- a/src/solvers/projected_gradient_method.jl +++ b/src/solvers/projected_gradient_method.jl @@ -266,6 +266,7 @@ calls_with_kwargs(::typeof(projected_gradient_method!)) = (decorate_objective!, function initialize_solver!(amp::AbstractManoptProblem, pgms::ProjectedGradientMethodState) get_gradient!(amp, pgms.X, pgms.p) + initialize_stepsize!(pgms.stepsize) return pgms end diff --git a/src/solvers/proximal_gradient_method.jl b/src/solvers/proximal_gradient_method.jl index 03bb8ea1a9..201a6916c0 100644 --- a/src/solvers/proximal_gradient_method.jl +++ b/src/solvers/proximal_gradient_method.jl @@ -182,6 +182,7 @@ function initialize_solver!(amp::AbstractManoptProblem, pgms::ProximalGradientMe M = get_manifold(amp) zero_vector!(M, pgms.X, pgms.p) copyto!(M, pgms.a, pgms.p) + initialize_stepsize!(pgms.stepsize) return pgms end diff --git a/src/solvers/quasi_Newton.jl b/src/solvers/quasi_Newton.jl index 374f91e7fd..388f0e03f8 100644 --- a/src/solvers/quasi_Newton.jl +++ b/src/solvers/quasi_Newton.jl @@ -202,10 +202,10 @@ $(_args([:M, :f, :grad_f, :p])) # Keyword arguments -* `basis=`[`DefaultOrthonormalBasis`](@extref ManifoldsBase.DefaultOrthonormalBasis)`()`: +* `basis::AbstractBasis=`[`DefaultOrthonormalBasis`](@extref ManifoldsBase.DefaultOrthonormalBasis)`()`: basis to use within each of the the tangent spaces to represent the Hessian (inverse) for the cases where it is stored in full (matrix) form. -* `cautious_update=false`: +* `cautious_update::Bool=false`: whether or not to use the [`QuasiNewtonCautiousDirectionUpdate`](@ref) which wraps the `direction_update`. * `cautious_function=(x) -> x * 1e-4`: @@ -221,7 +221,7 @@ $(_kwargs(:evaluation; add_properties = [:GradientExample])) See also `initial_scale`. * `initial_scale=1.0`: scale initial `s` to use in with $(_doc_QN_init_scaling) in the computation of the limited memory approach. see also `initial_operator` -* `memory_size=20`: limited memory, number of ``s_k, y_k`` to store. +* `memory_size::Int=min(manifold_dimension(M), 20)`: limited memory, number of ``s_k, y_k`` to store. Set to a negative value to use a full memory (matrix) representation * `nondescent_direction_behavior=:reinitialize_direction_update`: specify how non-descent direction is handled. This can be @@ -323,12 +323,15 @@ function quasi_Newton!( ), stopping_criterion::StoppingCriterion = StopAfterIteration(max(1000, memory_size)) | StopWhenGradientNormLess(1.0e-6), + nonpositive_curvature_behavior::Symbol = :ignore, + sy_tol::Real = 1.0e-8, kwargs..., ) where { E <: AbstractEvaluationType, O <: Union{AbstractManifoldFirstOrderObjective{E}, AbstractDecoratedManifoldObjective{E}}, } keywords_accepted(quasi_Newton!; kwargs...) + local local_dir_upd # COV_EXCL_LINE if memory_size >= 0 local_dir_upd = QuasiNewtonLimitedMemoryDirectionUpdate( M, @@ -338,7 +341,12 @@ function quasi_Newton!( initial_scale = initial_scale, (project!) = (project!), vector_transport_method = vector_transport_method, + nonpositive_curvature_behavior = nonpositive_curvature_behavior, + sy_tol = sy_tol, ) + if has_anisotropic_max_stepsize(M) + local_dir_upd = QuasiNewtonLimitedMemoryBoxDirectionUpdate(local_dir_upd) + end else local_dir_upd = QuasiNewtonMatrixDirectionUpdate( M, @@ -377,21 +385,32 @@ function quasi_Newton!( end calls_with_kwargs(::typeof(quasi_Newton!)) = (decorate_objective!, decorate_state!) +function _get_max_stepsize(M::AbstractManifold, qns::QuasiNewtonState) + current_max_stepsize = get_parameter(qns.direction_update, Val(:max_stepsize)) + if !isnothing(current_max_stepsize) && !isfinite(current_max_stepsize) + return max_stepsize(M, qns.p) / norm(qns.η) + else + return current_max_stepsize + end +end + function initialize_solver!(amp::AbstractManoptProblem, qns::QuasiNewtonState) M = get_manifold(amp) get_gradient!(amp, qns.X, qns.p) copyto!(M, qns.sk, qns.p, qns.X) copyto!(M, qns.yk, qns.p, qns.X) initialize_update!(qns.direction_update) + initialize_stepsize!(qns.stepsize) return qns end function step_solver!(mp::AbstractManoptProblem, qns::QuasiNewtonState, k) M = get_manifold(mp) - get_gradient!(mp, qns.X, qns.p) + # qns.X should be the correct gradient at qns.p from initialization or the previous step qns.direction_update(qns.η, mp, qns) + current_max_stepsize = _get_max_stepsize(M, qns) if !(qns.nondescent_direction_behavior === :ignore) qns.nondescent_direction_value = real(inner(M, qns.p, qns.η, qns.X)) - if qns.nondescent_direction_value > 0 + if qns.nondescent_direction_value >= 0 if qns.nondescent_direction_behavior === :step_towards_negative_gradient || qns.nondescent_direction_behavior === :reinitialize_direction_update copyto!(M, qns.η, qns.X) @@ -400,9 +419,20 @@ function step_solver!(mp::AbstractManoptProblem, qns::QuasiNewtonState, k) if qns.nondescent_direction_behavior === :reinitialize_direction_update initialize_update!(qns.direction_update) end + # update direction after reinitialization to get a valid one + if qns.nondescent_direction_behavior === :step_towards_negative_gradient || + qns.nondescent_direction_behavior === :reinitialize_direction_update + qns.direction_update(qns.η, mp, qns) + current_max_stepsize = _get_max_stepsize(M, qns) + end end end - α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X) + local α # COV_EXCL_LINE + if isnothing(current_max_stepsize) + α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X) + else + α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X, stop_when_stepsize_exceeds = current_max_stepsize) + end copyto!(M, qns.p_old, get_iterate(qns)) ManifoldsBase.retract_fused!(M, qns.p, qns.p, qns.η, α, qns.retraction_method) qns.η .*= α @@ -703,6 +733,52 @@ function update_hessian!( return d end +function fill_rho_i!(M::AbstractManifold, p, d::QuasiNewtonLimitedMemoryDirectionUpdate, i::Int) + v = inner(M, p, d.memory_s[i], d.memory_y[i]) + if d.nonpositive_curvature_behavior === :ignore && iszero(v) + d.ρ[i] = zero(eltype(d.ρ)) + if length(d.message) > 0 + d.message = replace(d.message, " i=" => " i=$i,") + d.message = replace(d.message, "summand in" => "summands in") + else + d.message = "The inner products ⟨s_i,y_i⟩ ≈ 0, i=$i, ignoring summand in approximation." + end + elseif d.nonpositive_curvature_behavior === :byrd && v <= d.sy_tol * norm(M, p, d.memory_y[i]) + d.ρ[i] = zero(eltype(d.ρ)) + if length(d.message) > 0 + d.message = replace(d.message, " i=" => " i=$i,") + d.message = replace(d.message, "summand in" => "summands in") + else + d.message = "The inner products ⟨s_i,y_i⟩ <= $(d.sy_tol * norm(M, p, d.memory_y[i])), i=$i, removing summand from approximation." + end + else + d.ρ[i] = 1 / v + end + return d +end + +function _drop_zero_rho_vectors!(d::QuasiNewtonLimitedMemoryDirectionUpdate{U}) where {U <: InverseBFGS} + T = eltype(d.memory_s) + memory_size = capacity(d.memory_s) + new_scb = CircularBuffer{T}(memory_size) + new_ycb = CircularBuffer{T}(memory_size) + new_ρ = similar(d.ρ) + fill!(new_ρ, 0) + j = 1 + for i in 1:length(d.memory_s) + if !iszero(d.ρ[i]) + push!(new_scb, d.memory_s[i]) + push!(new_ycb, d.memory_y[i]) + new_ρ[j] = d.ρ[i] + j += 1 + end + end + d.memory_s = new_scb + d.memory_y = new_ycb + d.ρ = new_ρ + return d +end + # Limited-memory update function update_hessian!( d::QuasiNewtonLimitedMemoryDirectionUpdate{U}, @@ -716,6 +792,7 @@ function update_hessian!( start = length(d.memory_s) == capacity(d.memory_s) ? 2 : 1 M = get_manifold(mp) p = get_iterate(st) + reforming_required = false for i in start:length(d.memory_s) # transport all stored tangent vectors in the tangent space of the next iterate vector_transport_to!( @@ -724,6 +801,28 @@ function update_hessian!( vector_transport_to!( M, d.memory_y[i], p_old, d.memory_y[i], p, d.vector_transport_method ) + + # what if division by zero happened here, setting to zero ignores this in the next step + # pre-compute in case inner is expensive + fill_rho_i!(M, p, d, i) + if d.nonpositive_curvature_behavior === :byrd && iszero(d.ρ[i]) + reforming_required = true + end + end + + if reforming_required + # we need to move first vectors in memory too because they most likely won't be + # overwritten by new pairs + if start == 2 + vector_transport_to!( + M, d.memory_s[1], p_old, d.memory_s[1], p, d.vector_transport_method + ) + vector_transport_to!( + M, d.memory_y[1], p_old, d.memory_y[1], p, d.vector_transport_method + ) + fill_rho_i!(M, p, d, 1) + end + _drop_zero_rho_vectors!(d) end # add newest @@ -732,6 +831,7 @@ function update_hessian!( old_sk = popfirst!(d.memory_s) copyto!(M, old_sk, st.sk) push!(d.memory_s, old_sk) + circshift!(d.ρ, -1) else push!(d.memory_s, copy(M, st.sk)) end @@ -742,6 +842,9 @@ function update_hessian!( else push!(d.memory_y, copy(M, st.yk)) end + + fill_rho_i!(M, p, d, length(d.memory_s)) + return d end @@ -770,7 +873,27 @@ function update_hessian!( vector_transport_to!( M, d.update.memory_y[i], p_old, d.update.memory_y[i], p, d.update.vector_transport_method, ) + fill_rho_i!(M, p, d.update, i) end end return d end + +function get_cost( + mp::AbstractManoptProblem, s::QuasiNewtonState{ + P, T, + <:AbstractQuasiNewtonDirectionUpdate, + <:StoppingCriterion, + <:HagerZhangLinesearchStepsize, + } + ) where {P, T} + + hzls = s.stepsize + if hzls.last_evaluation_index === 0 + # if no evaluation was performed, we need to compute the cost + return get_cost(mp, s.p) + else + # we can reuse the stored function value from the linesearch + return hzls.triples[hzls.last_evaluation_index].f + end +end diff --git a/src/solvers/solver.jl b/src/solvers/solver.jl index a26515faae..ec64db0a65 100644 --- a/src/solvers/solver.jl +++ b/src/solvers/solver.jl @@ -189,3 +189,20 @@ end function stop_solver!(p::AbstractManoptProblem, s::ReturnSolverState, k) return stop_solver!(p, s.state, k) end + +""" + get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState) + +Get cost at the current iterate of the solver state `s` for the problem `p`. +The method may be implemented by particular solvers if they store the cost at the current +iterate in the state, but by default it is obtained by calling `get_cost(p, get_iterate(s))`. +""" +function get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState) + return _get_cost(p, s, dispatch_state_decorator(s)) +end +function _get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState, ::Val{false}) + return get_cost(p, get_iterate(s)) +end +function _get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState, ::Val{true}) + return get_cost(p, s.state) +end diff --git a/src/solvers/stochastic_gradient_descent.jl b/src/solvers/stochastic_gradient_descent.jl index 0e40d5fc5c..9448b3d4a4 100644 --- a/src/solvers/stochastic_gradient_descent.jl +++ b/src/solvers/stochastic_gradient_descent.jl @@ -295,6 +295,7 @@ calls_with_kwargs(::typeof(stochastic_gradient_descent!)) = (decorate_objective! function initialize_solver!(::AbstractManoptProblem, s::StochasticGradientDescentState) s.k = 1 (s.order_type == :FixedRandom) && (shuffle!(s.order)) + initialize_stepsize!(s.stepsize) return s end function step_solver!(mp::AbstractManoptProblem, s::StochasticGradientDescentState, iter) diff --git a/src/solvers/subgradient.jl b/src/solvers/subgradient.jl index a2579adbd8..5e4198daea 100644 --- a/src/solvers/subgradient.jl +++ b/src/solvers/subgradient.jl @@ -201,6 +201,7 @@ function initialize_solver!(mp::AbstractManoptProblem, sgs::SubGradientMethodSta M = get_manifold(mp) copyto!(M, sgs.p_star, sgs.p) sgs.X = zero_vector(M, sgs.p) + initialize_stepsize!(sgs.stepsize) return sgs end function step_solver!(mp::AbstractManoptProblem, sgs::SubGradientMethodState, k) diff --git a/test/helpers/test_linesearches.jl b/test/helpers/test_linesearches.jl index e942c3b9f5..1e10f37d6c 100644 --- a/test/helpers/test_linesearches.jl +++ b/test/helpers/test_linesearches.jl @@ -59,4 +59,20 @@ using Test initialize_solver!(mp, st_qn) ls_mt = Manopt.LineSearchesStepsize(M, LineSearches.MoreThuente()) @test_throws ErrorException ls_mt(mp_throw, st_qn, 1; fp = rosenbrock(M, x0)) + + # test max stepsize limit enforcement + @test ls_hz(mp, st_qn, 1, [1.0, 2.0, 3.0, 4.0, 0.0]; stop_when_stepsize_exceeds = 0.1) == 0.1 + + @testset "max stepsize limit setting" begin + lss = [ + LineSearches.MoreThuente(), + LineSearches.HagerZhang(), + ] + for ls in lss + nls = Manopt.linesearches_set_max_alpha(ls, 0.5) + @test Manopt.linesearches_get_max_alpha(nls) == 0.5 + nls2 = Manopt.linesearches_set_max_alpha(ls, Inf) + @test Manopt.linesearches_get_max_alpha(nls2) == Inf + end + end end diff --git a/test/helpers/test_manifold_extra_functions.jl b/test/helpers/test_manifold_extra_functions.jl index f7a7540429..c775cb1cc9 100644 --- a/test/helpers/test_manifold_extra_functions.jl +++ b/test/helpers/test_manifold_extra_functions.jl @@ -69,6 +69,7 @@ Random.seed!(42) R3 = Euclidean(3) TR3 = TangentBundle(R3) + p = [0.0, 1.0, 0.0] X = [0.0, 0.0, 0.0] @@ -83,6 +84,14 @@ Random.seed!(42) @test Manopt.max_stepsize(R3, p) == Inf @test Manopt.max_stepsize(TR3, ArrayPartition(p, X)) == Inf + S_R3 = ProductManifold(M, R3) + @test Manopt.max_stepsize(S_R3) ≈ π + @test Manopt.max_stepsize(S_R3, ArrayPartition(p, [0.0, 0.0, 0.0])) ≈ π + + S_pow = PowerManifold(M, NestedPowerRepresentation(), 3) + @test Manopt.max_stepsize(S_pow) ≈ π + @test Manopt.max_stepsize(S_pow, [p, p, p]) ≈ π + Mfr = FixedRankMatrices(5, 4, 2) pfr = SVDMPoint( [ @@ -103,6 +112,9 @@ Random.seed!(42) M = Hyperrectangle([-3, -1.5], [3, 1.5]) @test Manopt.max_stepsize(M) ≈ 6.0 @test Manopt.max_stepsize(M, [-1, 0.5]) ≈ 4.0 + + M = ProbabilitySimplex(3) + @test Manopt.max_stepsize(M) == 1.0 end @testset "Vector space default" begin @test Manopt.Rn(Val(:Manopt), 3) isa ManifoldsBase.DefaultManifold diff --git a/test/plans/test_stepsize.jl b/test/plans/test_stepsize.jl index a66b8beafd..60c0f3ce5c 100644 --- a/test/plans/test_stepsize.jl +++ b/test/plans/test_stepsize.jl @@ -87,6 +87,7 @@ end s3 = WolfePowellBinaryLinesearch()(M) @test Manopt.get_message(s3) == "" @test startswith(repr(s3), "WolfePowellBinaryLinesearchStepsize(;") + @test get_last_stepsize(s3) == 0.0 @test startswith(Manopt.status_summary(s3), "A Wolfe Powell bisection line search") # no stepsize yet so `repr` and summary are the same s4 = WolfePowellLinesearch()(M) @@ -262,6 +263,534 @@ end clbs = CubicBracketingLinesearch(; sufficient_curvature = 1.0e-16, min_bracket_width = 0.0, initial_stepsize = 0.5)(M) @test clbs(dmp, gs, 1) ≈ 1 / 6 atol = 5.0e-4 end + @testset "secant numerical stability" begin + # Large offset, small interval + a = 1.0e7 + b = a + 1.0e-6 + + # Choose derivatives that differ slightly + ga = 1.0 + gb = nextfloat(ga) # smallest representable difference + + # minimizer using affine formula + x_ref = a - ga * (b - a) / (gb - ga) + + err_secant = abs( + Manopt.secant( + Manopt.UnivariateTriple(a, 0.0, ga), + Manopt.UnivariateTriple(b, 0.0, gb) + ) - x_ref + ) + @test err_secant < 1.0e-6 + end + @testset "HagerZhang Linesearch Stepsize" begin + M = Euclidean(2) + f_sum_sq(M, p) = sum(p .^ 2) + grad_f_sum_sq(M, p) = 2 .* p + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_sum_sq, grad_f_sum_sq)) + p = [1.0, 2.0] + η = -grad_f_sum_sq(M, p) + gs = GradientDescentState(M; p = p) + + hzls = HagerZhangLinesearch()(M) + @test startswith(repr(hzls), "HagerZhangLinesearch(;") + @test startswith(Manopt.status_summary(hzls), "HagerZhangLinesearch(;") + @test Manopt.get_message(hzls) == "" + + α = hzls(dmp, gs, 1, η) + @test isfinite(α) + @test α > 0 + α2 = hzls(dmp, gs, 1, η; gradient = grad_f_sum_sq(M, p)) + @test α2 ≈ α + @test hzls.last_stepsize == α + @test hzls.last_cost <= f_sum_sq(M, p) + 1.0e-12 + + hzls_limit = Manopt.HagerZhangLinesearchStepsize(M; stepsize_limit = 0.05) + α_limit = hzls_limit(dmp, gs, 1, η) + @test α_limit <= 0.05 + eps(0.05) + @test hzls_limit.last_stepsize == α_limit + α_limit_kw = hzls_limit(dmp, gs, 2, η; stop_when_stepsize_exceeds = 0.01) + @test α_limit_kw <= 0.01 + eps(0.01) + @testset "Running out of evaluations in _hz_evaluate_next_step" begin + N = length(hzls_limit.triples) - hzls_limit.last_evaluation_index + for i in 1:N + Manopt._hz_evaluate_next_step(hzls_limit, M, dmp, p, η, 0.1) + end + @test_throws ErrorException Manopt._hz_evaluate_next_step(hzls_limit, M, dmp, p, η, 0.1) + end + @testset "Wolfe condition modes" begin + hzls_default = Manopt.HagerZhangLinesearchStepsize(M) + hzls.current_mode = :invalid_mode + @test_throws ErrorException hzls(dmp, gs, 1, η) + end + + + hzls_approx = Manopt.HagerZhangLinesearchStepsize( + M; wolfe_condition_mode = :approximate, stepsize_limit = 0.2 + ) + α_approx = hzls_approx(dmp, gs, 1, η) + @test α_approx > 0 + + @testset "termination modes" begin + hzls_std = Manopt.HagerZhangLinesearchStepsize( + M; + wolfe_condition_mode = :standard, + initial_guess = Manopt.ConstantInitialGuess(0.5), + max_function_evaluations = 5, + ) + α_std = hzls_std(dmp, gs, 1, η) + @test isapprox(α_std, 0.5; rtol = 1.0e-12, atol = 0.0) + @test hzls_std.current_mode == :standard + + hzls_adapt = Manopt.HagerZhangLinesearchStepsize( + M; + wolfe_condition_mode = :adaptive, + initial_guess = Manopt.ConstantInitialGuess(0.5), + initial_last_cost = f_sum_sq(M, p), + ω = 1.0, + max_function_evaluations = 5, + ) + α_adapt = hzls_adapt(dmp, gs, 1, η) + @test α_adapt > 0 + @test hzls_adapt.current_mode == :approximate + + hzls_eval = Manopt.HagerZhangLinesearchStepsize( + M; + wolfe_condition_mode = :standard, + initial_guess = Manopt.ConstantInitialGuess(1.0), + max_function_evaluations = 2, + ) + α_eval = hzls_eval(dmp, gs, 1, η) + @test α_eval > 0 + @test hzls_eval.last_evaluation_index == length(hzls_eval.triples) + end + @testset "B1 bracketing test" begin + M = Euclidean(1) + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 .* p + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [1.0] + η = -grad_f(M, p) + gs = GradientDescentState(M; p = p) + hzls_b1 = Manopt.HagerZhangLinesearchStepsize( + M; + initial_guess = Manopt.ConstantInitialGuess(0.75), + start_enforcing_wolfe_conditions_at_bracketing_iteration = 2, + max_bracket_iterations = 1, + ) + α_b1 = hzls_b1(dmp, gs, 1, η) + @test α_b1 > 0 + end + @testset "B2 bracketing test" begin + M = Euclidean(1) + # f(x) = -22 x^3 + 33 x^2 - x + # grad_f(x) = -66 x^2 + 66 x - 1 + f(M, p) = -22 * p[1]^3 + 33 * p[1]^2 - p[1] + grad_f(M, p) = [-66 * p[1]^2 + 66 * p[1] - 1] + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [0.0] + η = [1.0] # Descent direction + gs = GradientDescentState(M; p = p) + hzls_b2 = Manopt.HagerZhangLinesearchStepsize( + M; + initial_guess = Manopt.ConstantInitialGuess(1.0), + start_enforcing_wolfe_conditions_at_bracketing_iteration = 2, + max_bracket_iterations = 2, + ) + α = hzls_b2(dmp, gs, 1, η) + @test α > 0 + end + @testset "B3 bracketing test" begin + M = Euclidean(1) + # f(x) = -x + # grad_f(x) = -1 + f(M, p) = -p[1] + grad_f(M, p) = [-1.0] + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [0.0] + η = [1.0] # Descent direction + gs = GradientDescentState(M; p = p) + hzls_b3 = Manopt.HagerZhangLinesearchStepsize( + M; + initial_guess = Manopt.ConstantInitialGuess(1.0), + stepsize_limit = 2.0, + max_bracket_iterations = 2, + ) + α = hzls_b3(dmp, gs, 1, η) + @test α > 0 + end + @testset "U1 trigger test" begin + M = Euclidean(1) + # f(x) = x^2 / 2 + # grad_f(x) = x + f(M, p) = p[1]^2 / 2 + grad_f(M, p) = [p[1]] + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [1.0] + η = [-1.0] # Descent direction + gs = GradientDescentState(M; p = p) + hzls_u1 = Manopt.HagerZhangLinesearchStepsize( + M; + initial_guess = Manopt.ConstantInitialGuess(2.0), + ) + # We expect U1 to be triggered during the update (secant is exact, slope 0 >= 0) + α = hzls_u1(dmp, gs, 1, η) + @test α > 0 + end + @testset "U2 trigger test" begin + M = Euclidean(1) + # We mock f and grad_f to trigger U2 termination + # We need: + # 1. Starting at p=0 with descent direction (df < 0) + # 2. Bracketing finds a point with df > 0 (to finish bracketing) -> p=1.0, df=1.0 + # 3. Refinement hits max evaluations at a point with df < 0 and f > f(0)+eps -> p=0.5, f=10.0, df=-0.1 + + function f_u2(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return 0.0 + elseif isapprox(v, 1.0; atol = 1.0e-9) + return 0.0 + elseif isapprox(v, 0.5; atol = 1.0e-9) + return 10.0 + end + return 0.0 + end + + function grad_f_u2(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-9) + return [1.0] + elseif isapprox(v, 0.5; atol = 1.0e-9) + return [-0.1] + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u2, grad_f_u2)) + p = [0.0] + η = [1.0] + gs = GradientDescentState(M; p = p) + hzls_u2 = Manopt.HagerZhangLinesearchStepsize( + M; initial_guess = Manopt.ConstantInitialGuess(1.0), max_function_evaluations = 3 + ) + α = hzls_u2(dmp, gs, 1, η) + @test α > 0 + end + @testset "U3 trigger test" begin + M = Euclidean(1) + # Trigger U3 by having a point that satisfies conditions for U2 but f_eval is false. + # Same landscape as U2: + # p=0, df=-1 (start) + # p=1, df=1 (end of bracket) + # p=0.5, f=10, df=-0.1 (high function value, negative slope) + + function f_u3(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return 0.0 + elseif isapprox(v, 1.0; atol = 1.0e-9) + return 0.0 + elseif isapprox(v, 0.5; atol = 1.0e-9) + return 10.0 + end + return 0.0 + end + + function grad_f_u3(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-9) + return [1.0] + elseif isapprox(v, 0.5; atol = 1.0e-9) + return [-0.1] + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3, grad_f_u3)) + p = [0.0] + η = [1.0] # Descent direction + gs = GradientDescentState(M; p = p) + # Set max_function_evaluations > 3 so we don't hit U2 termination (f_eval=true) + hzls_u3 = Manopt.HagerZhangLinesearchStepsize( + M; initial_guess = Manopt.ConstantInitialGuess(1.0), max_function_evaluations = 5 + ) + α = hzls_u3(dmp, gs, 1, η) + @test α > 0 + end + @testset "U3 (b) trigger test" begin + M = Euclidean(1) + # Force U3 (b) in _hz_u3: + # 1) At d=0.5 we need df < 0 and f(d) <= f(0) + ϵₖ with no termination, + # so i_a_bar gets updated to i_d. + # 2) On the next U3 iteration we return from the loop. + function f_u3b(M, q) + return 0.0 + end + + function grad_f_u3b(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-12) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-12) + return [1.0] + elseif isapprox(v, 0.5; atol = 1.0e-12) + return [-1.0] + elseif isapprox(v, 0.75; atol = 1.0e-12) + return [1.0] + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3b, grad_f_u3b)) + p = [0.0] + η = [1.0] + hzls_u3b = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 4) + Manopt.initialize_stepsize!(hzls_u3b) + Manopt._hz_evaluate_next_step(hzls_u3b, M, dmp, p, η, 0.0) + Manopt._hz_evaluate_next_step(hzls_u3b, M, dmp, p, η, 1.0) + + (i_a, i_b, f_eval, f_wolfe) = Manopt._hz_u3(hzls_u3b, M, dmp, p, η, 1, 2) + @test (i_a, i_b) == (3, 4) + @test f_eval + @test !f_wolfe + end + @testset "U3 (c) info trigger test" begin + M = Euclidean(1) + # Force U3 (c) inside _hz_u3 by making the mid-point have + # negative slope but too large function value. + function f_u3c(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-12) + return 0.0 + elseif isapprox(v, 1.0; atol = 1.0e-12) + return 0.0 + elseif isapprox(v, 0.5; atol = 1.0e-12) + return 1.0 + elseif isapprox(v, 0.25; atol = 1.0e-12) + return 0.0 + end + return 0.0 + end + + function grad_f_u3c(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-12) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-12) + return [1.0] + elseif isapprox(v, 0.5; atol = 1.0e-12) + return [-0.1] + elseif isapprox(v, 0.25; atol = 1.0e-12) + return [0.1] + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3c, grad_f_u3c)) + p = [0.0] + η = [1.0] + hzls_u3c = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 4) + Manopt.initialize_stepsize!(hzls_u3c) + Manopt._hz_evaluate_next_step(hzls_u3c, M, dmp, p, η, 0.0) + Manopt._hz_evaluate_next_step(hzls_u3c, M, dmp, p, η, 1.0) + @test (1, 4, true, false) == Manopt._hz_u3(hzls_u3c, M, dmp, p, η, 1, 2) + end + @testset "U3 max evaluations termination" begin + M = Euclidean(1) + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 .* p + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [0.0] + η = [1.0] + + hzls_u3_max = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 2) + Manopt.initialize_stepsize!(hzls_u3_max) + Manopt._hz_evaluate_next_step(hzls_u3_max, M, dmp, p, η, 0.0) + Manopt._hz_evaluate_next_step(hzls_u3_max, M, dmp, p, η, 1.0) + @test hzls_u3_max.last_evaluation_index == length(hzls_u3_max.triples) + + (i_a, i_b, f_eval, f_wolfe) = Manopt._hz_u3(hzls_u3_max, M, dmp, p, η, 1, 2) + @test (i_a, i_b) == (1, 2) + @test !f_eval + @test !f_wolfe + end + @testset "U0 out-of-bracket early return" begin + M = Euclidean(1) + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 .* p + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f)) + p = [0.0] + η = [1.0] + + hzls_u0 = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 5) + Manopt.initialize_stepsize!(hzls_u0) + Manopt._hz_evaluate_next_step(hzls_u0, M, dmp, p, η, 0.0) + Manopt._hz_evaluate_next_step(hzls_u0, M, dmp, p, η, 1.0) + + last_eval_before = hzls_u0.last_evaluation_index + + # c is left of bracket [0, 1] -> U0 early return + @test (1, 2, -1, false, false) == Manopt._hz_update(hzls_u0, M, dmp, p, η, 1, 2, -0.1) + @test hzls_u0.last_evaluation_index == last_eval_before + + # c is right of bracket [0, 1] -> U0 early return + @test (1, 2, -1, false, false) == Manopt._hz_update(hzls_u0, M, dmp, p, η, 1, 2, 1.1) + @test hzls_u0.last_evaluation_index == last_eval_before + end + + @testset "S2 trigger test" begin + M = Euclidean(1) + # S2 is triggered within _hz_secant2 when the updated bracket point i_c is the new upper bound i_B + # This happens if slope at c is positive (U1 case in _hz_update). + # Sequence: + # 1. Start p=0, df=-1. + # 2. Initial bracket p=1, df=4 (df > 0 -> bracket found). + # 3. _hz_secant2 calls secant(0, 1) -> c = (0*4 - 1*(-1))/(4 - (-1)) = 0.2. + # 4. At c=0.2, we set df=0.1 (positive slope -> U1 -> i_c = i_B). + # 5. We also need f(0.2) high enough to fail Armijo so we don't return early with f_wolfe=true. + # f(0)=0. f(0.2)=0.5. Armijo check: 0.5 <= 0 + 0.1*0.2*(-1) = -0.02 (False). + + function f_s2(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return 0.0 + elseif isapprox(v, 1.0; atol = 1.0e-9) + return 2.0 # Arbitrary high value + elseif isapprox(v, 0.2; atol = 1.0e-9) + return 0.5 # Fail Armijo + end + return 0.0 # Fallback (e.g. for c_bar in S2) + end + + function grad_f_s2(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-9) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-9) + return [4.0] + elseif isapprox(v, 0.2; atol = 1.0e-9) + return [0.1] # Positive slope triggers U1 -> i_c = i_B + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_s2, grad_f_s2)) + p = [0.0] + η = [1.0] + gs = GradientDescentState(M; p = p) + hzls_s2 = Manopt.HagerZhangLinesearchStepsize( + M; initial_guess = Manopt.ConstantInitialGuess(1.0) + ) + # We expect the S2 log + α = hzls_s2(dmp, gs, 1, η) + @test α > 0 + end + + @testset "S3 trigger test" begin + M = Euclidean(1) + # S3 is triggered within _hz_secant2 when the updated bracket point i_c is the new lower bound i_A + # (U2 case in _hz_update). We set up: + # 1. Start p=0, df=-1 (descent). + # 2. Bracket at p=1, df=4 (positive slope). + # 3. Secant gives c=0.2. At c, df=-0.1 and f=0 -> U2. + + function f_s3(M, q) + return 0.0 + end + + function grad_f_s3(M, q) + v = q[1] + if isapprox(v, 0.0; atol = 1.0e-12) + return [-1.0] + elseif isapprox(v, 1.0; atol = 1.0e-12) + return [4.0] + elseif isapprox(v, 0.2; atol = 1.0e-12) + return [-0.1] + end + return [0.0] + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_s3, grad_f_s3)) + p = [0.0] + η = [1.0] + hzls_s3 = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 5) + Manopt.initialize_stepsize!(hzls_s3) + Manopt._hz_evaluate_next_step(hzls_s3, M, dmp, p, η, 0.0) + Manopt._hz_evaluate_next_step(hzls_s3, M, dmp, p, η, 1.0) + + c = Manopt.secant(hzls_s3.triples[1], hzls_s3.triples[2]) + (i_A, i_B, i_c, f_eval, f_wolfe) = Manopt._hz_secant2(hzls_s3, M, dmp, p, η, 1, 2) + @test !f_eval + @test !f_wolfe + @test hzls_s3.triples[i_A].t ≈ c atol = 1.0e-12 + + c_bar = Manopt.secant(hzls_s3.triples[1], hzls_s3.triples[i_A]) + @test hzls_s3.triples[i_c].t ≈ c_bar atol = 1.0e-12 + @test i_A != i_B + end + + @testset "Hager-Zhang infinite at b" begin + # A function that is finite for small steps but infinite for larger ones + # and has positive slope where it is infinite to trigger the bracket condition. + + M = Euclidean(1) + + # f(x) = x^2 - x for x < 1.0 + # f(x) = Inf for x >= 1.0 + # Min at x = 0.5, f(0.5) = -0.25 + function f_inf(M, p) + x = p[1] + if x < 1.0 + return x^2 - x + else + return Inf + end + end + + function grad_f_inf(M, p) + x = p[1] + if x < 1.0 + return [2 * x - 1] + else + # Return a positive slope to satisfy _hz_bracket exit condition + return [1.0] + end + end + + dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_inf, grad_f_inf)) + + # Start at 0. f(0)=0. grad(0)=-1. Search direction +1. + s = GradientDescentState(M; p = [0.0]) + + # Force initial guess to be 2.0 (in the infinite region) + hzls = HagerZhangLinesearch(; initial_guess = Manopt.ConstantInitialGuess(2.0))(M) + + # Because initial bracket will be [0, 2] with f(2)=Inf. + # Then bisection will eventually find 0.5. + + step = hzls(dmp, s, 1, [1.0]) + @test abs(step - 0.5) < 1.0e-1 + end + + @testset "Hager-Zhang initialize_stepsize!" begin + hzls = HagerZhangLinesearch()(M) + hzls.last_evaluation_index = 5 + hzls.Qₖ = 2.0 + hzls.Cₖ = 2.0 + hzls.current_mode = :approximate + Manopt.initialize_stepsize!(hzls) + @test hzls.last_evaluation_index == 0 + @test hzls.Qₖ == 0.0 + @test hzls.Cₖ == 0.0 + @test hzls.current_mode == :standard + end + + end @testset "Distance over Gradients Stepsize" begin @testset "does not use sectional cuvature (Eucludian)" begin M = Euclidean(2) diff --git a/test/plans/test_stopping_criteria.jl b/test/plans/test_stopping_criteria.jl index c65516c2e9..00f5fb66bb 100644 --- a/test/plans/test_stopping_criteria.jl +++ b/test/plans/test_stopping_criteria.jl @@ -1,16 +1,14 @@ using Manifolds, ManifoldsBase, Manopt, Test, ManifoldsBase, Dates -function repl_show_string(e) - a = IOBuffer() - Base.show(a, MIME"text/plain"(), e) - return String(take!(a)) +function to_display_string(obj) + buf = IOBuffer() + Base.show(buf, MIME"text/plain"(), obj) + return String(take!(buf)) end @testset "StoppingCriteria" begin @testset "Generic Tests" begin - @test_throws ErrorException get_stopping_criteria( - Manopt.Test.DummyStoppingCriteriaSet() - ) + @test_throws ErrorException get_stopping_criteria(Manopt.Test.DummyStoppingCriteriaSet()) sa = StopAfterIteration(10) sb = StopWhenChangeLess(Euclidean(), 0.1) s = StopWhenAll(sa, sb) @@ -28,8 +26,7 @@ end s.criteria[2].at_iteration = 3 @test length(get_reason(s.criteria[2])) > 0 s2 = StopWhenAll([StopAfterIteration(10), StopWhenChangeLess(Euclidean(), 0.1)]) - @test get_stopping_criteria(s)[1].max_iterations == - get_stopping_criteria(s2)[1].max_iterations + @test get_stopping_criteria(s)[1].max_iterations == get_stopping_criteria(s2)[1].max_iterations s3 = StopWhenCostLess(0.1) p = DefaultManoptProblem( @@ -392,6 +389,50 @@ end @test length(get_reason(sc)) == 0 end + @testset "StopWhenRelativeAPosterioriCostChangeLessOrEqual" begin + sc = StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr = 100.0) + prob = DefaultManoptProblem( + Euclidean(), ManifoldGradientObjective((M, x) -> x^2, x -> 2x) + ) + s = GradientDescentState(Euclidean(); p = 1.0) + @test !sc(prob, s, -1) + @test !sc(prob, s, 1) + @test length(get_reason(sc)) == 0 + s.p = 1.0 - 1.0e-14 + + @test sc(prob, s, 2) + @test length(get_reason(sc)) > 0 + @test startswith( + to_display_string(sc), + "StopWhenRelativeAPosterioriCostChangeLessOrEqual with threshold", + ) + @test startswith(Manopt.status_summary(sc), "(fₖ- fₖ₊₁)/max(|fₖ|, |fₖ₊₁|, 1) = ") + @test !Manopt.indicates_convergence(sc) + end + + @testset "StopWhenProjectedNegativeGradientNormLess" begin + sc = StopWhenProjectedNegativeGradientNormLess(1.0e-10) + M = Hyperrectangle([1.0], [2.0]) + prob = DefaultManoptProblem( + M, ManifoldGradientObjective((M, x) -> x^2, x -> 2x) + ) + s = GradientDescentState(M; p = [1.0], X = [2.0]) + @test !sc(prob, s, -1) + @test length(get_reason(sc)) == 0 + @test sc(prob, s, 1) + @test length(get_reason(sc)) > 0 + + @test startswith( + to_display_string(sc), + "A StoppingCriterion to stop when the negative projected gradient norm is less than", + ) + @test startswith(Manopt.status_summary(sc; context = :inline), "|proj (-grad f)| < 1.0e-10") + + Manopt.set_parameter!(sc, Val(:MinGradNorm), 1.0e-5) + @test sc.threshold == 1.0e-5 + @test Manopt.indicates_convergence(sc) + end + @testset "has_converged" begin M = Euclidean(1) pr = Manopt.Test.DummyProblem{typeof(M)}() diff --git a/test/runtests.jl b/test/runtests.jl index b5c29ecedf..3657cc12b8 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -64,6 +64,7 @@ using Manifolds, ManifoldsBase, Manopt, Test include("solvers/test_proximal_gradient_method.jl") include("solvers/test_proximal_point.jl") include("solvers/test_quasi_Newton.jl") + include("solvers/test_quasi_Newton_box.jl") include("solvers/test_particle_swarm.jl") include("solvers/test_primal_dual_semismooth_Newton.jl") include("solvers/test_stochastic_gradient_descent.jl") diff --git a/test/solvers/test_cma_es.jl b/test/solvers/test_cma_es.jl index 2840740620..c95ca98dea 100644 --- a/test/solvers/test_cma_es.jl +++ b/test/solvers/test_cma_es.jl @@ -34,7 +34,7 @@ flat_example(::AbstractManifold, p) = 0.0 @test griewank(M, p1) < 0.1 p1 = cma_es(M, griewank; σ = 10.0, rng = MersenneTwister(123)) - @test griewank(M, p1) < 0.2 + @test griewank(M, p1) < 0.25 p1 = [10.0, 10.0] cma_es!(M, griewank, p1; σ = 10.0, rng = MersenneTwister(123)) diff --git a/test/solvers/test_quasi_Newton.jl b/test/solvers/test_quasi_Newton.jl index 3ce34c7417..453dd4b60c 100644 --- a/test/solvers/test_quasi_Newton.jl +++ b/test/solvers/test_quasi_Newton.jl @@ -1,5 +1,5 @@ using Manopt, Manifolds, Test -using LinearAlgebra: I, eigvecs, tr, Diagonal +using LinearAlgebra: I, eigvecs, tr, Diagonal, dot mutable struct QuasiNewtonGradientDirectionUpdate{VT <: AbstractVectorTransportMethod} <: AbstractQuasiNewtonDirectionUpdate @@ -193,6 +193,11 @@ end ) @test isapprox(M, x_direction, x_solution; atol = rayleigh_atol) end + + @testset "Byrd's nonpositive rule" begin + x1 = quasi_Newton(M, f, grad_f, x; nonpositive_curvature_behavior = :byrd, sy_tol = 1.0e8) + @test isapprox(M, x1, x_solution; atol = rayleigh_atol) + end end @testset "Brocket" begin @@ -217,7 +222,7 @@ end x_inverseBFGSCautious = quasi_Newton( M, f, grad_f, x; memory_size = 8, vector_transport_method = ProjectionTransport(), retraction_method = QRRetraction(), - cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6), + cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6) | StopAfterIteration(100), ) x_inverseBFGSHuang = quasi_Newton( @@ -226,7 +231,7 @@ end M; retraction_method = QRRetraction(), vector_transport_method = ProjectionTransport(), ), vector_transport_method = ProjectionTransport(), retraction_method = QRRetraction(), - cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6), + cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6) | StopAfterIteration(100), ) @test isapprox(M, x_inverseBFGSCautious, x_inverseBFGSHuang; atol = 2.0e-4) end @@ -243,9 +248,10 @@ end x = [0.7011245948687502, -0.1726003159556036, 0.38798265967671103, -0.5728026616491424] x_lrbfgs = quasi_Newton( - M, F, grad_f, x; memory_size = -1, + M, F, grad_f, x; basis = get_basis(M, x, DefaultOrthonormalBasis()), - stopping_criterion = StopWhenGradientNormLess(1.0e-9), + memory_size = -1, + stopping_criterion = StopWhenGradientNormLess(1.0e-9) | StopAfterIteration(1000), ) @test norm(abs.(x_lrbfgs) - x_solution) ≈ 0 atol = rayleigh_atol end @@ -346,13 +352,13 @@ end mp = DefaultManoptProblem(M, gmp) qns = QuasiNewtonState(M; p = p) # push zeros to memory - push!(qns.direction_update.memory_s, copy(p)) - push!(qns.direction_update.memory_s, copy(p)) - push!(qns.direction_update.memory_y, copy(p)) - push!(qns.direction_update.memory_y, copy(p)) + qns.yk = copy(p) + qns.sk = copy(p) + update_hessian!(qns.direction_update, mp, qns, p, 1) + update_hessian!(qns.direction_update, mp, qns, p, 2) + @test contains(qns.direction_update.message, "i=2,1,1") qns.direction_update(mp, qns) # Update (1) says at i=1 inner products are zero (2) all are zero -> gradient proposal - @test contains(qns.direction_update.message, "i=1,2") @test contains(qns.direction_update.message, "gradient") end @@ -430,5 +436,82 @@ end # This triggers and cautious update that does not update the Hessian Manopt.update_hessian!(qns.direction_update, mp, qns, p, 1) # But I am not totally sure what to test for afterwards + + @test startswith(repr(qdu), "QuasiNewtonLimitedMemoryDirectionUpdate with memory size") + end + @testset "Removing zero rho vectors" begin + M = Euclidean(2) + p = [0.0, 1.0] + f(M, p) = sum(p .^ 2) + # A wrong gradient + grad_f(M, p) = -2 .* p + gmp = ManifoldGradientObjective(f, grad_f) + mp = DefaultManoptProblem(M, gmp) + qdu = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 3) + # push three pairs; middle one has zero inner product + push!(qdu.memory_y, [1, 0]) + push!(qdu.memory_s, [1, 0]) + + push!(qdu.memory_y, [1, 0]) + push!(qdu.memory_s, [0, 1]) + + push!(qdu.memory_y, [0, 2]) + push!(qdu.memory_s, [0, 2]) + qdu.ρ = [1.0, 0.0, 4.0] + # delete the zero inner product pair and check that the removal was correct + Manopt._drop_zero_rho_vectors!(qdu) + @test length(qdu.memory_y) == 2 + @test length(qdu.memory_s) == 2 + @test qdu.ρ[[1, 2]] == [1.0, 4.0] + @test qdu.memory_y[1] == [1, 0] + @test qdu.memory_y[2] == [0, 2] + end + + @testset "reforming_required + (start == 2)" begin + M = Euclidean(2) + p = [0.0, 0.0] + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 * sum(p) + gmp = ManifoldGradientObjective(f, grad_f) + mp = DefaultManoptProblem(M, gmp) + ha = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2; nonpositive_curvature_behavior = :byrd) + qns = QuasiNewtonState(M; p = p, nonpositive_curvature_behavior = :byrd, direction_update = ha) + + qns.yk = [1.0, 1.0] + qns.sk = [1.0, 2.0] + update_hessian!(qns.direction_update, mp, qns, p, 1) + + qns.yk = [2.0, 1.0] + qns.sk = [1.0, 2.0] + update_hessian!(qns.direction_update, mp, qns, p, 2) + ha.memory_s[2] = [0.0, 0.0] # force reforming_required in next step + update_hessian!(qns.direction_update, mp, qns, p, 3) + # test that the zeroes out pair was replaced + @test qns.direction_update.memory_s[1] == [1.0, 2.0] + @test qns.direction_update.memory_s[2] == [1.0, 2.0] + end + @testset "get_cost specialization" begin + M = Euclidean(2) + p = [0.0, 1.0] + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 .* p + gmp = ManifoldGradientObjective(f, grad_f) + mp = DefaultManoptProblem(M, gmp) + ha = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2; nonpositive_curvature_behavior = :byrd) + qns = QuasiNewtonState( + M; + p = copy(M, p), + direction_update = ha, + nondescent_direction_behavior = :step_towards_negative_gradient, + stepsize = HagerZhangLinesearch()(M), + ) + @test get_cost(mp, qns) == f(M, get_iterate(qns)) + solve!(mp, qns) + @test get_cost(mp, qns) == f(M, get_iterate(qns)) + + @testset "get_cost with DebugSolverState" begin + dqns = DebugSolverState(qns, DebugMessages(:Info, :Always)) + @test get_cost(mp, dqns) == f(M, get_iterate(dqns)) + end end end diff --git a/test/solvers/test_quasi_Newton_box.jl b/test/solvers/test_quasi_Newton_box.jl new file mode 100644 index 0000000000..1430a105bb --- /dev/null +++ b/test/solvers/test_quasi_Newton_box.jl @@ -0,0 +1,355 @@ +using Manopt, Manifolds, Test +using LinearAlgebra: I, eigvecs, tr, Diagonal, dot + +using RecursiveArrayTools + +@testset "Riemannian quasi-Newton Methods with box-like domains" begin + @testset "get_stepsize_bound - basic" begin + M = Hyperrectangle([0.0, 0.0], [2.0, 2.0]) + + # d[i] > 0 + p = [0.0, 1.0]; d = [1.0, 1.0] + @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ (2.0 - 0.0) / 1.0 # = 2.0 + @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ (2.0 - 1.0) / 1.0 # = 1.0 + + # d[i] < 0 + p = [0.0, 1.0]; d = [-1.0, -1.0] + @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ (0.0 - 0.0) / -1.0 # = 0.0 + @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ (0.0 - 1.0) / -1.0 # = 1.0 + + # d[i] = 0 + p = [0.0, 1.0]; d = [0.0, 0.0] + @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ Inf + @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ Inf + end + + + @testset "update_fp_fpp - basic d = -g" begin + M = Hyperrectangle([0.0, 1.0], [3.0, 3.0]) + + grad = [1.0, 4.0] + d = [-1.0, -4.0] + p = [0.0, 0.0] + + # values taken from loop iteration found in test case: "find_gcp! - with bounds, single variable is held fixed" + old_f_prime = -17.0 + old_f_double_prime = 34.0 + dt = 0.25 + gb = 4.0 + db = -4.0 # in case of d = -g, db = -gb + ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [2.0 0.0; 0.0 2.0]) + b = 2 + z = [-0.25, -1.0] + + # optimized formula + upd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d)) + Manopt.init_updater!(M, upd, p, d, ha) + hv_eb_dz, hv_eb_d = upd(M, p, 0 + dt, dt, b, db, ha) + @test hv_eb_dz ≈ -2.0 + @test hv_eb_d ≈ -8.0 + + # original formula + + original_hv_eb_dz = dot([0, 1], ha.matrix, z) + original_hv_eb_d = dot([0, 1], ha.matrix, d) + + @test hv_eb_dz == original_hv_eb_dz + @test hv_eb_d == original_hv_eb_d + end + + + @testset "update_fp_fpp - basic d = [-2.0, -1.0]" begin + M = Hyperrectangle([0.0, 1.0], [3.0, 3.0]) + + grad = [1.0, 4.0] + d = [-2.0, -1.0] + p = [0.0, 0.0] + + old_f_prime = -6.0 + old_f_double_prime = 10.0 + dt = 0.25 + gb = 1.0 + db = -2.0 + ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [2.0 0.0; 0.0 2.0]) + b = 1 + z = [-0.5, -0.25] + + # optimized formula + upd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d)) + Manopt.init_updater!(M, upd, p, d, ha) + hv_eb_dz, hv_eb_d = upd(M, p, 0 + dt, dt, b, db, ha) + @test hv_eb_dz == -1.0 + @test hv_eb_d == -4.0 + + # original formula + + original_hv_eb_dz = dot([1, 0], ha.matrix, z) + original_hv_eb_d = dot([1, 0], ha.matrix, d) + + @test hv_eb_dz == original_hv_eb_dz + @test hv_eb_d == original_hv_eb_d + end + + @testset "update_fp_fpp - basic d = [-2.0, -1.0] with limited memory update" begin + M = Hyperrectangle([1.0, 4.0], [2.0, 10.0]) + + p = [2.0, 5.0] + ha = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2)) + st = QuasiNewtonState(M) + + @test startswith(repr(ha), "QuasiNewtonLimitedMemoryBoxDirectionUpdate with internal state:") + @test startswith(Manopt.status_summary(ha), "limited memory direction update with support for box constraints; internal direction update status: ") + + f(M, p) = sum(p .^ 2) + grad_f(M, p) = 2 * p + gmp = ManifoldGradientObjective(f, grad_f) + mp = DefaultManoptProblem(M, gmp) + + st.yk = [2.0, 4.0] + st.sk = [4.0, 2.0] + update_hessian!(ha, mp, st, p, 1) + grad = grad_f(M, p) + st.p = p + st.X = grad + + d = similar(grad) + ha(d, mp, st) + + d2 = ha(mp, st) + @test d ≈ d2 + + b = 1 + + old_f_prime = -6.0 + old_f_double_prime = 10.0 + dt = 0.25 + db = d[b] + gb = grad[b] + + t_current = 0 + dt + + # compare the generic and limited memory updater + gupd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d)) + Manopt.init_updater!(M, gupd, p, d, ha) + hv_eb_dz, hv_eb_d = gupd(M, p, t_current, dt, b, db, ha) + + @test hv_eb_dz ≈ -0.125 + @test hv_eb_d ≈ -0.5 + + lmupd = Manopt.get_default_hessian_segment_updater(M, p, ha) + @test lmupd isa Manopt.LimitedMemorySegmentHessianUpdater + + Manopt.init_updater!(M, lmupd, p, d, ha) + hv_eb_dz_limited, hv_eb_d_limited = lmupd(M, p, t_current, dt, b, db, ha) + + @test hv_eb_dz ≈ hv_eb_dz_limited + @test hv_eb_d ≈ hv_eb_d_limited + + ha.last_gcd_result = :found_unlimited + ha.last_gcd_stepsize = Inf + @test Manopt.get_parameter(ha, Val(:max_stepsize)) == Inf + + @testset "No memory tests" begin + ha2 = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2)) + idx = Manopt.get_bounds_index(M) + @test Manopt.hessian_value(ha2, M, p, Manopt.UnitVector(b), grad) ≈ 4.0 + Manopt.update_current_scale!(M, p, ha2) + @test ha2.current_scale == ha2.qn_du.initial_scale + @test ha2.M_11 == fill(0.0, 0, 0) + @test ha2.M_21 == fill(0.0, 0, 0) + @test ha2.M_22 == fill(0.0, 0, 0) + end + end + + @testset "GeneralizedCauchyDirectionSubsolver" begin + M = Hyperrectangle([-1.0, -2.0, -Inf], [2.0, Inf, 2.0]) + ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS()) + + p = [0.0, 0.0, 0.0] + gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p, ha) + + X1 = [-5.0, 0.0, 0.0] + + d = -X1 + d_out = similar(d) + + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d, X1) === (:found_limited, 1.0) + @test d_out ≈ [2.0, 0.0, 0.0] + + d_out = similar(d) + + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, 0 * d, X1) === (:not_found, NaN) + + d2 = [0.0, 1.0, 0.0] + + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d2, [0.0, -1.0, 0.0]) === (:found_unlimited, Inf) + @test d_out ≈ d2 + + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, [1.0, 1.0, 0.0], [-10.0, -10.0, -10.0]) === (:found_limited, 1.0) + @test d_out ≈ [2.0, 10.0, 0.0] + + p2 = [-1.0, -2.0, 2.0] + gf2 = Manopt.GeneralizedCauchyDirectionSubsolver(M, p2, ha) + + @test Manopt.find_generalized_cauchy_direction!(M, gf2, d_out, p2, [-1.0, -1.0, 1.0], [-10.0, -10.0, -10.0]) === (:not_found, NaN) + + M2 = Hyperrectangle([-10.0], [10.0]) + + ha2 = QuasiNewtonMatrixDirectionUpdate(M2, BFGS(), DefaultOrthonormalBasis(), [100.0;;]) + p3 = [1.0] + gf3 = Manopt.GeneralizedCauchyDirectionSubsolver(M2, p3, ha2) + + d_out = similar(p3) + @test Manopt.find_generalized_cauchy_direction!(M2, gf3, d_out, p3, [1.0], [-10.0]) === (:found_limited, 90.0) + end + + @testset "Hitting multiple bounds at the same time in GCD" begin + M = Hyperrectangle([-1.0, -1.0, -1.0], [1.0, 1.0, 1.0]) + ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [1.0 0 0; 0 1 0; 0 0 1]) + + p = [0.0, 0.0, 0.0] + gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p, ha) + + d = [-2.0, -2.0, -1.0] + d_out = similar(d) + X = [10.0, 10.0, 10.0] + + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d, X) === (:found_limited, 1.0) + @test d_out ≈ [-1.0, -1.0, -1.0] + end + + @testset "Pure Hyperrectangle" begin + M = Hyperrectangle([-1.0, 2.0, -Inf], [2.0, Inf, 2.0]) + f(M, p) = sum(p .^ 2) + function grad_f(M, p) + return project(M, p, 2 .* p) + end + p0 = [0.0, 4.0, 1.0] + p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(10)) + @test p_opt ≈ [0, 2, 0] + + + f2(M, p) = sum(p .^ 4) + function grad_f2(M, p) + return project(M, p, 4 .* (p .^ 3)) + end + p0 = [0.0, 4.0, 1.0] + p_opt = quasi_Newton(M, f2, grad_f2, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100)) + @test f2(M, p_opt) < 16.1 + + for stepsize in [ArmijoLinesearch(), CubicBracketingLinesearch(), NonmonotoneLinesearch()] + p_opt = quasi_Newton( + M, f2, grad_f2, p0; + stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100), + stepsize = stepsize + ) + @test f2(M, p_opt) < 64.0 + end + + MInf = Hyperrectangle([-Inf, -Inf, -Inf], [Inf, Inf, Inf]) + + f3(M, p) = sum(p .^ 4) - sum(p .^ 2) + function grad_f3(M, p) + return project(MInf, p, 4 .* (p .^ 3) - 2 .* p) + end + p0 = [0.0, 4.0, 1.0] + p_opt = quasi_Newton(MInf, f3, grad_f3, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100)) + @test f3(MInf, p_opt) < 16.1 + + p_opt = quasi_Newton( + MInf, f3, grad_f3, p0; + stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100), + ) + @test f3(MInf, p_opt) < 64.0 + end + + @testset "has_anisotropic_max_stepsize" begin + @test !Manopt.has_anisotropic_max_stepsize(Sphere(2)) + @test Manopt.has_anisotropic_max_stepsize(Hyperrectangle([1], [2])) + @test Manopt.has_anisotropic_max_stepsize(ProductManifold(Hyperrectangle([1], [2]), Sphere(2))) + end + + @testset "Hyperrectangle × Sphere" begin + S2 = Sphere(2) + px = [0.0, 1.0, 0.0] + Mbox = Hyperrectangle([-1.0, 2.0, -Inf], [2.0, Inf, 2.0]) + M = Mbox × S2 + f(M, p) = sum(p.x[1] .^ 4) + 0.5 * distance(S2, p.x[2], px)^2 + grad_f(M, p) = ArrayPartition(project(Mbox, p.x[1], 4 .* (p.x[1] .^ 3)), -log(S2, p.x[2], px)) + p0 = ArrayPartition([0.0, 4.0, 1.0], [1.0, 0.0, 0.0]) + + @testset "Hessian updater" begin + d = -grad_f(M, p0) + ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis()) + gupd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d)) + Manopt.init_updater!(M, gupd, p0, d, ha) + b = (1, 2) + dt = 0.25 + t_current = 0 + dt + db = d.x[b[1]][b[2]] + hv_eb_dz, hv_eb_d = gupd(M, p0, t_current, dt, b, db, ha) + @test hv_eb_dz ≈ -64.0 + @test hv_eb_d ≈ -256.0 + end + + @testset "GCD check" begin + d = -grad_f(M, p0) + ha = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p0, InverseBFGS(), 2)) + gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p0, ha) + d_out = similar(d) + X = grad_f(M, p0) + @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p0, d, X) === (:found_limited, 1.0) + end + + p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100)) + @test distance(M, p_opt, ArrayPartition([0, 2, 0], px)) < 0.1 + end + + @testset "Sphere × Hyperrectangle" begin + S2 = Sphere(2) + px = [0.0, 1.0, 0.0] + Mbox = Hyperrectangle([-1.0 2.0; -Inf -Inf], [2.0 Inf; 2.0 Inf]) + M = S2 × Mbox + f(M, p) = sum(p.x[2] .^ 4) + 0.5 * distance(S2, p.x[1], px)^2 + grad_f(M, p) = ArrayPartition(-log(S2, p.x[1], px), project(Mbox, p.x[2], 4 .* (p.x[2] .^ 3))) + p0 = ArrayPartition([1.0, 0.0, 0.0], [0.0 4.0; 1.0 1.0]) + + p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100)) + @test distance(M, p_opt, ArrayPartition(px, [0 2; 0 0])) < 0.1 + end +end + +@testset "MaxStepsizeInDirection" begin + @testset "found_limited" begin + M = Hyperrectangle([-1.0, -2.0, -Inf], [2.0, Inf, 2.0]) + p = [0.0, 0.0, 0.0] + d = [2.0, 1.0, 1.0] + d_before = copy(d) + + sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p) + @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:found_limited, 1.0) + @test d == d_before + end + + @testset "found_unlimited" begin + M = Hyperrectangle([-Inf], [Inf]) + p = [0.0] + d = [1.0] + d_before = copy(d) + + sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p) + @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:found_unlimited, Inf) + @test d == d_before + end + + @testset "not_found" begin + M = Hyperrectangle([0.0], [1.0]) + p = [0.0] + d = [-1.0] + d_before = copy(d) + + sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p) + @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:not_found, NaN) + @test d == d_before + end +end diff --git a/tutorials/BoxDomain.qmd b/tutorials/BoxDomain.qmd new file mode 100644 index 0000000000..eb2d2da574 --- /dev/null +++ b/tutorials/BoxDomain.qmd @@ -0,0 +1,78 @@ +--- +title: "How to do optimization on box domains" +author: "Mateusz Baran" +--- + +# Optimization with box domains and products of manifolds and boxes + +A ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} is, in general, not a manifold but a manifold with corners because locally at the boundary it looks like $\mathbb{R}^{n-k} \times \mathbb{R}^k_{\geq 0}$ for some $k > 0$, instead of $\mathbb{R}^n$ as required by the definition of a manifold. + +Such spaces require special handling when used as domains in optimization. +For simple methods like gradient descent using projected gradient and a stopping criterion involving [`StopWhenProjectedNegativeGradientNormLess`](@ref) may be sufficient, however methods that approximate the Hessian can benefit from a more advanced approach. +The core idea is considering a piecewise quadratic approximation of the objective along the descent direction in the tangent space at the current iterate, and selecting the generalized Cauchy direction -- its minimizer. +The points at which the approximation might not be differentiable correspond to hitting new boundaries along the initially selected descent direction. +Then, we can perform standard line search from the initial iterate in the generalized Cauchy direction. + +Currently `Manopt.jl` can handle domains that are either a ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} or a ``[`ProductManifold`](@extref `ManifoldsBase.ProductManifold`)``{=commonmark} containing a ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} as its first factor and other manifolds as subsequent factors. + +## Example + +Consider the problem of fitting covariance matrix with box constraints on variance in principal directions. +The objective is log-probability of data under a multivariate normal distribution with zero mean and covariance matrix given by the variable to optimize. +Although there are better ways to solve this problem, expressing it this way allows us to freely extend the objective to more complex scenarios beyond what is possible with closed-form solutions. + +First, we set up the problem by generating synthetic data. +The data is sampled from a multivariate normal distribution with known covariance matrix. + +```{julia} +using Manopt, Manifolds, LinearAlgebra, Random, Distributions +using ForwardDiff, DifferentiationInterface, RecursiveArrayTools +Random.seed!(41) + +N = 5 # dimensionality of data +M_spd = SymmetricPositiveDefinite(N) +M_rot = Rotations(N) +V = rand(M_rot) +cov_matrix = Symmetric(V * Diagonal([0.5; 2.0; 5.0; 10.0; 20.0]) * V') +distr = MvNormal(zeros(N), cov_matrix) +data = Matrix(rand(distr, 200)') # 200 samples +``` + +The objective function is defined as follows, with gradient calculated using automatic differentiation. + +```{julia} +function logprob_cost(::AbstractManifold, p) + D, R = p.x + logdet = sum(log, D) + invΣ = R * Diagonal(1 ./ D) * R' + ll = - 0.5 * size(data, 1) * logdet + for row in eachrow(data) + ll -= 0.5 * row' * invΣ * row + end + return -ll # We minimize negative log-likelihood +end + +function logprob_gradient(M::AbstractManifold, p) + Y = DifferentiationInterface.gradient(q -> logprob_cost(M, q), AutoForwardDiff(), p) + return riemannian_gradient(M, p, Y) +end +``` + +Finally, we can solve the optimization problem using a quasi-Newton method with box domain support. +We restrict the variances (diagonal elements of the covariance matrix) to be between 1.0 and 100.0. +The covariance matrix is represented using its eigendecomposition $\Sigma = R D R^{\top}$, where $D$ is a diagonal matrix of variances and $R$ is an orthogonal matrix of principal directions. +With constraints on variances, the optimization variable belongs to $[1,100]^N \times \mathrm{SO}(N)$. + +```{julia} +M = ProductManifold(Hyperrectangle(fill(1.0, N), fill(100.0, N)), M_rot) + +p0 = ArrayPartition(fill(10.0, N), Matrix{Float64}(I(5))) +p_mle = quasi_Newton(M, logprob_cost, logprob_gradient, p0; stopping_criterion = StopAfterIteration(100) | StopWhenProjectedNegativeGradientNormLess(1e-6)) +println("Estimated variances: $(p_mle.x[1])") +cov_matrix_mle = p_mle.x[2] * Diagonal(p_mle.x[1]) * p_mle.x[2]' +println("Estimated covariance matrix:") +println(cov_matrix_mle) +nothing +``` + +We see that despite the original covariance matrix having variances ranging from 0.5 to 20.0, the estimated covariance matrix respects the box constraints of variances between 1.0 and 100.0. diff --git a/tutorials/Project.toml b/tutorials/Project.toml index db96359a8d..a2b2a31981 100644 --- a/tutorials/Project.toml +++ b/tutorials/Project.toml @@ -2,8 +2,10 @@ ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b" BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf" Colors = "5ae59095-9a9b-59fe-a467-6f913c188581" +DifferentiationInterface = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63" Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" FiniteDifferences = "26cc04aa-876d-5657-8c51-4c34ba976000" +ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637" LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" ManifoldDiff = "af67fdf4-a580-4b9f-bbec-742ef357defd" @@ -21,8 +23,10 @@ Manopt = {path = ".."} ADTypes = "1" BenchmarkTools = "1" Colors = "0.12, 0.13" +DifferentiationInterface = "0.7" Distributions = "0.25" FiniteDifferences = "0.12" +ForwardDiff = "1" LRUCache = "1.4" ManifoldDiff = "0.4" Manifolds = "0.11"