diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index ed530cec3f..bb5f3ac1e3 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -18,7 +18,6 @@ jobs:
       - uses: julia-actions/setup-julia@v2
         with:
           version: ${{ matrix.julia-version }}
-          arch: x64
       - uses: julia-actions/julia-buildpkg@v1
       - uses: julia-actions/julia-runtest@v1
       - uses: julia-actions/julia-processcoverage@v1
diff --git a/.github/workflows/nightly.yml b/.github/workflows/nightly.yml
index d1e08c3c23..a435557be3 100644
--- a/.github/workflows/nightly.yml
+++ b/.github/workflows/nightly.yml
@@ -15,7 +15,6 @@ jobs:
       - uses: julia-actions/setup-julia@v2
         with:
           version: ${{ matrix.julia-version }}
-          arch: x64
       - uses: julia-actions/julia-buildpkg@v1
       - uses: julia-actions/julia-runtest@v1
         env:
diff --git a/Changelog.md b/Changelog.md
index 9abbf73caa..d78dd80005 100644
--- a/Changelog.md
+++ b/Changelog.md
@@ -10,6 +10,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 This is a breaking change since the JuMP extension is dropped.
 
+### Added
+
+* `nonpositive_curvature_behavior` for `QuasiNewtonLimitedMemoryDirectionUpdate` that determines how transported (y, s) vector pairs are treated after transport; if their inner product gets too low, it may lead to non-positive-definite Hessians which needs to be avoided. This resolves issue #549. (#554)
+* `GeneralizedCauchyDirectionSubsolver` for handling direction selection in the presence of box (`Hyperrectangle`) constraints in quasi-Newton methods. This allows for L-BFGS-B-style box constraint handling. (#554)
+* New stopping criteria: `StopWhenRelativeAPosterioriCostChangeLessOrEqual` and `StopWhenProjectedNegativeGradientNormLess`. (#554).
+* `HagerZhangLinesearch` stepsize, a state-of-the-art line search for smooth objectives with cubic interpolation and adaptive Wolfe condition checking. (#554)
+* Stopping criteria can now be initialized using `initialize_stepsize!`, similar to solvers. (#554)
+
+
 ### Changed
 
 * `NonlinearLeastSquaresObjective` is now called `ManifoldNonlinearLeastSquaresObjective` (#569).
@@ -26,7 +35,8 @@ discontinue the `JuMP` extension. (#532)
 * Fixed some text descriptions of a few stopping criteria.
 * unify naming of fields, `debugDictionary` of the debug state is now called `debug_dictionary`
 * the `NesterovRule` now also stores an actual `AbstractRetractionMethod` instead of implicitly always using the default one.
-
+* Line searches consistently respect `stop_when_stepsize_exceeds` keyword argument as a hard limit. (#554)
+* `StopWhenChangeLess` falsely claimed to indicate convergence. This is now fixed. (#554)
 
 ## [0.5.34] March 3, 2026
 
@@ -86,7 +96,7 @@ Moved the documentation glossaries to using the new [Glossaries.jl](https://gith
 * Removed `atol` from `DebugFeasibility` and instead use the one newly added `atol` from the `ConstrainedManifoldObjective`. (#546)
 * Move from CompatHelper to dependabot to keep track of dependency updates in Julia packages. (#547)
 * moved the `ManoptTestSuite` module to a sub module `Manopt.Test` within `Manopt.jl`,
-so it can be easier reused by others as well (#550)
+  so it can be easier reused by others as well (#550)
 * moved to using a `Project.toml` for tests and an overall `[Workspace]`.
   This also allows finally to run single test files without installing all packages manually, but instead just switching to and instantiating the test environment. (#550)
 * for compatibility, state also `[source]` entries consistently in the sub `Project.toml` files. (#550)
diff --git a/docs/make.jl b/docs/make.jl
index 6ee570e723..07e7f20dfd 100755
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -45,6 +45,7 @@ tutorials_menu =
     "Implement a solver" => "tutorials/ImplementASolver.md",
     "Optimize on your own manifold" => "tutorials/ImplementOwnManifold.md",
     "Do constrained optimization" => "tutorials/ConstrainedOptimization.md",
+    "Do optimization with bounds" => "tutorials/BoxDomain.md",
 ]
 # Check whether all tutorials are rendered, issue a warning if not (and quarto if not set)
 all_tutorials_exist = true
@@ -188,6 +189,7 @@ makedocs(;
             "Douglas—Rachford" => "solvers/DouglasRachford.md",
             "Exact Penalty Method" => "solvers/exact_penalty_method.md",
             "Frank-Wolfe" => "solvers/FrankWolfe.md",
+            "Generalized Cauchy direction subsolver" => "solvers/generalized_cauchy_direction_subsolver.md",
             "Gradient Descent" => "solvers/gradient_descent.md",
             "Interior Point Newton" => "solvers/interior_point_Newton.md",
             "Levenberg–Marquardt" => "solvers/LevenbergMarquardt.md",
diff --git a/docs/src/extensions.md b/docs/src/extensions.md
index fe4b32e667..4a076cea73 100644
--- a/docs/src/extensions.md
+++ b/docs/src/extensions.md
@@ -45,10 +45,12 @@ x_opt = quasi_Newton(
 )
 ```
 
-In general this defines the following new [stepsize](@ref Stepsize)
+In general this defines the following new [stepsize](@ref Stepsize) with helper functions for setting and getting the maximum step size:
 
 ```@docs
 Manopt.LineSearchesStepsize
+Manopt.linesearches_get_max_alpha
+Manopt.linesearches_set_max_alpha
 ```
 
 ## Manifolds.jl
diff --git a/docs/src/references.bib b/docs/src/references.bib
index e93d4b007f..b3fb0e5573 100644
--- a/docs/src/references.bib
+++ b/docs/src/references.bib
@@ -260,6 +260,33 @@ @book{Boumal:2023
     ISBN      = {978-1-00-916616-4}
 }
 
+@article{ByrdNocedalSchnabel:1994,
+	title = {Representations of quasi-{Newton} matrices and their use in limited memory methods},
+	volume = {63},
+	issn = {1436-4646},
+	doi = {10.1007/BF01582063},
+	number = {1},
+	journal = {Mathematical Programming},
+	author = {Byrd, Richard H. and Nocedal, Jorge and Schnabel, Robert B.},
+	month = jan,
+	year = {1994},
+	pages = {129--156},
+}
+
+@article{ByrdLuNocedalZhu:1995,
+	title = {A {Limited} {Memory} {Algorithm} for {Bound} {Constrained} {Optimization}},
+	volume = {16},
+	issn = {1064-8275},
+	doi = {10.1137/0916069},
+	number = {5},
+	journal = {SIAM Journal on Scientific Computing},
+	author = {Byrd, Richard H. and Lu, Peihuang and Nocedal, Jorge and Zhu, Ciyou},
+	month = sep,
+	year = {1995},
+	note = {Publisher: Society for Industrial and Applied Mathematics},
+	pages = {1190--1208},
+}
+
 % --- C
 %
 %
@@ -882,4 +909,17 @@ @article{ZhangSra:2018
     TITLE      = {Towards Riemannian accelerated gradient methods},
     URL        = {https://arxiv.org/abs/1806.02812},
     YEAR       = {2018},
-}
\ No newline at end of file
+}
+
+@article{ZhuByrdLuNocedal:1997,
+	title = {Algorithm 778: {L}-{BFGS}-{B}: {Fortran} subroutines for large-scale bound-constrained optimization},
+	volume = {23},
+	issn = {0098-3500},
+	doi = {10.1145/279232.279236},
+	number = {4},
+	journal = {ACM Trans. Math. Softw.},
+	author = {Zhu, Ciyou and Byrd, Richard H. and Lu, Peihuang and Nocedal, Jorge},
+	month = dec,
+	year = {1997},
+	pages = {550--560},
+}
diff --git a/docs/src/solvers/generalized_cauchy_direction_subsolver.md b/docs/src/solvers/generalized_cauchy_direction_subsolver.md
new file mode 100644
index 0000000000..cedee9c719
--- /dev/null
+++ b/docs/src/solvers/generalized_cauchy_direction_subsolver.md
@@ -0,0 +1,73 @@
+# Generalized Cauchy direction subsolver
+
+The generalized Cauchy direction (GCD) subsolver is a component in optimization algorithms that handle problems with bound constraints. It solves the following problem
+
+```math
+\begin{align*}
+\operatorname*{arg\,min}_{Y  ∈  T_p D \times \mathcal{M}}&\ m_p(Y), \qquad m_p(Y) = ⟨X_g, Y⟩_p + \frac{1}{2} ⟨\mathcal{H}_p[Y], Y⟩_p\\
+\text{such that}& \ \exp_p(Y) = \exp_p(\alpha X) \in D \times \mathcal{M} \text{ for some } \alpha \in [0, A]
+\end{align*}
+```
+
+where $X=(X_{\mathrm{D}}, X_{\mathcal{M}})$ is a given direction, the exponential map handles projection of the tangent vector when reaching the boundary, $D$ is a box domain ([`Hyperrectangle`](@extref Manifolds.Hyperrectangle)), $\mathcal{M}$ is a Riemannian manifold, $X_g$ is the gradient of a scalar function $f$ at point $p=(p_{\mathrm{D}}, p_{\mathcal{M}})$, $A$ is the maximum allowed step size on $\mathcal{M}$ at point $p=(p_{\mathrm{D}}, p_{\mathcal{M}})$ in direction $X_{\mathcal{M}}$ (infinity is supported) and $\mathcal{H}_p$ is a linear operator that approximates the Hessian of $f$ at $p$.
+
+Additionally, the subsolver indicates whether the selected direction $Y$ reaches the boundary of $D$ at some point, in which case the subsequent step size selection in direction $Y$ needs to be limited to the interval $[0, s_{\max}]$, where the number $1 ≤ s_{\max} ≤ ∞$ is also returned by the subsolver.
+Note that the value $s_{\max}=1$ is obtained when the minimum lies at the boundary of $D$, while larger values indicate that we are further away from the boundary along the selected direction $X$.
+
+The solver is currently primarily intended for internal use by optimization algorithms that require bound-constrained subproblem solutions.
+
+## Simple stepsize limiting
+
+In case there is no Hessian approximation available, a simple stepsize limiting procedure is can be used to limit the stepsize in direction $X$ to the maximum allowed by the boundary of $D$ and the maximum allowed stepsize on $\mathcal{M}$.
+This procedure is available using the following:
+
+```@docs
+Manopt.MaxStepsizeInDirectionSubsolver
+Manopt.find_max_stepsize_in_direction
+```
+
+## Internal types and method
+
+### Symbols related to the GCD computation
+
+These symbols are directly used by solvers to compute the descent direction corresponding to the Generalized Cauchy direction.
+
+```@docs
+Manopt.has_anisotropic_max_stepsize
+Manopt.find_generalized_cauchy_direction!
+Manopt.GeneralizedCauchyDirectionSubsolver
+```
+
+### Symbols related to the Hessian approximation
+
+These symbols are used to evaluate the Hessian approximation at specific tangent vectors during the generalized Cauchy direction computation.
+
+```@docs
+Manopt.hessian_value
+Manopt.hessian_value_diag
+```
+
+### Symbols related to bound handling
+
+These are internal symbols used to manage and manipulate bound constraints during the GCP computation.
+
+```@docs
+Manopt.init_updater!
+Manopt.UnitVector
+Manopt.to_coordinate_index
+Manopt.AbstractSegmentHessianUpdater
+Manopt.GenericSegmentHessianUpdater
+Manopt.get_bounds_index
+Manopt.get_stepsize_bound
+Manopt.get_at_bound_index
+Manopt.set_stepsize_bound!
+Manopt.set_zero_at_index!
+```
+
+### Symbols related to specific Hessian approximations
+
+```@docs
+Manopt.LimitedMemorySegmentHessianUpdater
+Manopt.hessian_value_from_inner_products
+Manopt.update_current_scale!
+```
diff --git a/docs/src/solvers/quasi_Newton.md b/docs/src/solvers/quasi_Newton.md
index 02c74365a5..2d608a34db 100644
--- a/docs/src/solvers/quasi_Newton.md
+++ b/docs/src/solvers/quasi_Newton.md
@@ -88,6 +88,7 @@ QuasiNewtonLimitedMemoryDirectionUpdate
 QuasiNewtonCautiousDirectionUpdate
 Manopt.initialize_update!
 QuasiNewtonPreconditioner
+QuasiNewtonLimitedMemoryBoxDirectionUpdate
 ```
 
 ## Hessian update rules
diff --git a/ext/ManoptLineSearchesExt.jl b/ext/ManoptLineSearchesExt.jl
index c58734e57c..5343e05289 100644
--- a/ext/ManoptLineSearchesExt.jl
+++ b/ext/ManoptLineSearchesExt.jl
@@ -5,6 +5,37 @@ import Manopt: LineSearchesStepsize
 using ManifoldsBase
 using LineSearches
 
+Manopt.linesearches_get_max_alpha(ls::LineSearches.HagerZhang) = ls.alphamax
+Manopt.linesearches_get_max_alpha(ls::LineSearches.MoreThuente) = ls.alphamax
+
+function Manopt.linesearches_set_max_alpha(ls::LineSearches.HagerZhang{T, Tm}, max_alpha::T) where {T, Tm}
+    return HagerZhang{T, Tm}(
+        delta = ls.delta,
+        sigma = ls.sigma,
+        alphamax = max_alpha,
+        rho = ls.rho,
+        epsilon = ls.epsilon,
+        gamma = ls.gamma,
+        linesearchmax = ls.linesearchmax,
+        psi3 = ls.psi3,
+        display = ls.display,
+        mayterminate = ls.mayterminate,
+        cache = ls.cache,
+        check_flatness = ls.check_flatness,
+    )
+end
+function Manopt.linesearches_set_max_alpha(ls::LineSearches.MoreThuente{T}, max_alpha::T) where {T}
+    return MoreThuente{T}(
+        f_tol = ls.f_tol,
+        gtol = ls.gtol,
+        x_tol = ls.x_tol,
+        alphamin = ls.alphamin,
+        alphamax = max_alpha,
+        maxfev = ls.maxfev,
+        cache = ls.cache
+    )
+end
+
 function (cs::Manopt.LineSearchesStepsize)(
         mp::AbstractManoptProblem,
         s::AbstractManoptSolverState,
@@ -24,6 +55,19 @@ function (cs::Manopt.LineSearchesStepsize)(
     # guess initial alpha
     α0 = cs.initial_guess(mp, s, k, cs.last_stepsize, η; lf0 = fp, Dlf0 = dphi_0)
 
+    # handle stepsize limit
+    local ls # COV_EXCL_LINE
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        new_max_alpha = min(
+            kwargs[:stop_when_stepsize_exceeds],
+            Manopt.linesearches_get_max_alpha(cs.linesearch),
+        )
+        ls = Manopt.linesearches_set_max_alpha(cs.linesearch, new_max_alpha)
+        α0 = min(α0, new_max_alpha)
+    else
+        ls = cs.linesearch
+    end
+
     # perform actual line-search
 
     function ϕ(α)
@@ -41,7 +85,7 @@ function (cs::Manopt.LineSearchesStepsize)(
         return Manopt.get_cost_and_differential(mp, p_tmp, Y_tmp; Y = X_tmp)
     end
 
-    α, fp = cs.linesearch(ϕ, dϕ, ϕdϕ, α0, fp, dphi_0)
+    α, fp = ls(ϕ, dϕ, ϕdϕ, α0, fp, dphi_0)
     cs.last_stepsize = α
     return α
 end
diff --git a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl
index b1e027d713..a502802751 100644
--- a/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl
+++ b/ext/ManoptManifoldsExt/ManoptManifoldsExt.jl
@@ -2,7 +2,7 @@ module ManoptManifoldsExt
 
 using ManifoldsBase: exp, log, ParallelTransport, vector_transport_to
 using Manopt
-using Manopt: _math, _tex, ManifoldDefaultsFactory, _produce_type
+using Manopt: _math, _tex, ManifoldDefaultsFactory, _produce_type, get_stepsize_bound
 import Manopt:
     max_stepsize,
     get_gradient,
diff --git a/ext/ManoptManifoldsExt/manifold_functions.jl b/ext/ManoptManifoldsExt/manifold_functions.jl
index d063cbdf5b..44b5b2b317 100644
--- a/ext/ManoptManifoldsExt/manifold_functions.jl
+++ b/ext/ManoptManifoldsExt/manifold_functions.jl
@@ -8,6 +8,33 @@ Manopt.default_point_distance(::Euclidean, p) = norm(p, Inf)
 
 Manopt.default_vector_norm(::Euclidean, p, X) = norm(p, Inf)
 
+"""
+    get_bounds_index(::Hyperrectangle)
+
+Get the bound indices of [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) `M`. They are the same as the indices of the
+lower (or upper) bounds.
+"""
+Manopt.get_bounds_index(M::Hyperrectangle) = eachindex(M.lb)
+"""
+    get_stepsize_bound(M::Hyperrectangle, x, d, i)
+
+Get the upper bound on moving in direction `d` from point `p` on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle) `M`,
+for the bound index `i`. There are three cases:
+
+1. If `d[i] > 0`, the formula reads `(M.ub[i] - p[i]) / d[i]`.
+2. If `d[i] < 0`, the formula reads `(M.lb[i] - p[i]) / d[i]`.
+3. If `d[i] == 0`, the result is `Inf`.
+"""
+function Manopt.get_stepsize_bound(M::Hyperrectangle, p, d, i)
+    if d[i] > 0
+        return (M.ub[i] - p[i]) / d[i]
+    elseif d[i] < 0
+        return (M.lb[i] - p[i]) / d[i]
+    else
+        return Inf
+    end
+end
+
 """
     max_stepsize(M::TangentBundle, p)
 
@@ -40,17 +67,16 @@ The default maximum stepsize for `Hyperrectangle` manifold with corners is maxim
 of distances from `p` to each boundary.
 """
 function max_stepsize(M::Hyperrectangle, p)
-    ms = 0.0
-    for i in eachindex(M.lb, p)
-        dist_ub = M.ub[i] - p[i]
-        if dist_ub > 0
-            ms = max(ms, dist_ub)
-        end
-        dist_lb = p[i] - M.lb[i]
-        if dist_lb > 0
-            ms = max(ms, dist_lb)
-        end
-    end
+    lb = M.lb
+    ub = M.ub
+    ms = zero(eltype(p))
+    @inbounds @simd for i in eachindex(lb, ub, p)
+        dist_ub = ub[i] - p[i]
+        dist_lb = p[i] - lb[i]
+        cand_ub = ifelse(dist_ub > 0, dist_ub, zero(dist_ub))
+        cand_lb = ifelse(dist_lb > 0, dist_lb, zero(dist_lb))
+        ms = max(ms, max(cand_ub, cand_lb))
+    end # COV_EXCL_LINE
     return ms
 end
 function max_stepsize(M::Hyperrectangle)
@@ -60,6 +86,9 @@ function max_stepsize(M::Hyperrectangle)
     end
     return ms
 end
+function max_stepsize(M::ProbabilitySimplex)
+    return 1.0
+end
 
 """
     mid_point(M, p, q, x)
@@ -168,3 +197,53 @@ function reflect!(
     X .*= -1
     return retract!(M, q, p, X, retraction_method)
 end
+
+
+"""
+    Manopt.set_zero_at_index!(M::Hyperrectangle, d, i)
+
+Set element of tangent vector `d` on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle)
+at index `i` to 0.
+"""
+function Manopt.set_zero_at_index!(M::Hyperrectangle, d, i)
+    d[i] = 0
+    return d
+end
+
+"""
+    Manopt.set_stepsize_bound!(M::Hyperrectangle, d_out, p, d, t_current::Real)
+
+For each element `i` in the tangent vector `d_out`, if the stepsize bound in direction `d`
+for that element is less than `t_current`, set the element of `d_out` to the distance from
+`p[i]` to the bound in the direction of `d[i]`. If the stepsize bound is non-positive,
+set the element to 0.
+"""
+function Manopt.set_stepsize_bound!(M::Hyperrectangle, d_out, p, d, t_current::Real)
+
+    for i in eachindex(d_out, d)
+        bound = get_stepsize_bound(M, p, d, i)
+        if bound > 0
+            if bound < t_current && d_out[i] != 0
+                d_out[i] = d[i] > 0 ? M.ub[i] - p[i] : M.lb[i] - p[i]
+            end
+        else
+            d_out[i] = 0
+        end
+    end
+    return d_out
+end
+
+"""
+    Manopt.has_anisotropic_max_stepsize(::Hyperrectangle)
+
+Returns `true`, as `Hyperrectangle` manifold requires generalized Cauchy point computation in solvers.
+"""
+Manopt.has_anisotropic_max_stepsize(::Hyperrectangle) = true
+
+"""
+    Manopt.get_at_bound_index(::Hyperrectangle, X, b)
+
+Extract the element of tangent vector `X` to a point on [`Hyperrectangle`](@extref Manifolds.Hyperrectangle)
+at index `b`.
+"""
+Manopt.get_at_bound_index(::Hyperrectangle, X, b) = X[b]
diff --git a/src/Manopt.jl b/src/Manopt.jl
index 2abba31cc2..b43c83e71e 100644
--- a/src/Manopt.jl
+++ b/src/Manopt.jl
@@ -16,11 +16,11 @@ import LinearAlgebra: reflect!
 import ManifoldsBase: embed!, plot_slope, prepare_check_result, find_best_slope_window
 import ManifoldsBase: base_manifold, base_point, get_basis
 import ManifoldsBase: project, project!
-import LinearAlgebra: cross
+import LinearAlgebra: cross, LowerTriangular
 using ColorSchemes
 using ColorTypes
 using Colors
-using DataStructures: CircularBuffer, capacity, length, push!, size, isfull
+using DataStructures: CircularBuffer, capacity, length, push!, size, isfull, heapify!, heappop!
 using Dates: Millisecond, Nanosecond, Period, canonicalize, value
 using Glossaries
 using LinearAlgebra:
@@ -142,6 +142,7 @@ using ManifoldsBase:
     set_component!,
     shortest_geodesic,
     shortest_geodesic!,
+    submanifold_component,
     submanifold_components,
     vector_transport_to,
     vector_transport_to!,
@@ -427,7 +428,7 @@ export CondensedKKTVectorField, CondensedKKTVectorFieldJacobian
 export SymmetricLinearSystemObjective
 export ProximalGradientNonsmoothCost, ProximalGradientNonsmoothSubgradient
 
-export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate
+export QuasiNewtonState, QuasiNewtonLimitedMemoryDirectionUpdate, QuasiNewtonLimitedMemoryBoxDirectionUpdate
 export QuasiNewtonMatrixDirectionUpdate
 export QuasiNewtonPreconditioner
 export QuasiNewtonCautiousDirectionUpdate,
@@ -550,6 +551,7 @@ export get_stepsize, get_initial_stepsize, get_last_stepsize
 export InteriorPointCentralityCondition
 export DomainBackTracking, DomainBackTrackingStepsize, NullStepBackTrackingStepsize
 export ProximalGradientMethodBacktracking
+export HagerZhangLinesearch
 #
 # Stopping Criteria
 export StoppingCriterion, StoppingCriterionSet
@@ -572,6 +574,7 @@ export StopAfter,
     StopWhenGradientChangeLess,
     StopWhenGradientMappingNormLess,
     StopWhenGradientNormLess,
+    StopWhenProjectedNegativeGradientNormLess,
     StopWhenFirstOrderProgress,
     StopWhenIterateNaN,
     StopWhenKKTResidualLess,
@@ -583,6 +586,7 @@ export StopAfter,
     StopWhenPopulationDiverges,
     StopWhenPopulationStronglyConcentrated,
     StopWhenProjectedGradientStationary,
+    StopWhenRelativeAPosterioriCostChangeLessOrEqual,
     StopWhenRelativeResidualLess,
     StopWhenRepeated,
     StopWhenSmallerOrEqual,
diff --git a/src/helpers/LineSearchesTypes.jl b/src/helpers/LineSearchesTypes.jl
index dc01c98e38..17b3c9c442 100644
--- a/src/helpers/LineSearchesTypes.jl
+++ b/src/helpers/LineSearchesTypes.jl
@@ -12,7 +12,8 @@ Wrapper for line searches available in the `LineSearches.jl` library.
 
 Wrap `linesearch` (for example [`HagerZhang`](https://julianlsolvers.github.io/LineSearches.jl/latest/reference/linesearch.html#LineSearches.HagerZhang)
 or [`MoreThuente`](https://julianlsolvers.github.io/LineSearches.jl/latest/reference/linesearch.html#LineSearches.MoreThuente)).
-The initial step selection from Linesearches.jl is not yet supported and the value 1.0 is used.
+The initial step selection from Linesearches.jl is not yet supported and `initial_guess` is
+always used (by default [`ConstantInitialGuess`](@ref)).
 
 # Keyword Arguments
 
@@ -54,6 +55,26 @@ function LineSearchesStepsize(
         linesearch, initial_guess, retraction_method, vector_transport_method, last_stepsize
     )
 end
+
+"""
+    linesearches_get_max_alpha(ls)
+
+Get the maximum step size for `LineSearches.jl` line search `ls`.
+"""
+linesearches_get_max_alpha(ls)
+
+function linesearches_get_max_alpha end
+
+"""
+    linesearches_set_max_alpha(ls, max_alpha::Real)
+
+Set the maximum step size for `LineSearches.jl` line search `ls` to `max_alpha`.
+Return a new line search object with the updated maximum step size.
+"""
+linesearches_set_max_alpha(ls, max_alpha::Real)
+
+function linesearches_set_max_alpha end
+
 function Base.show(io::IO, cs::LineSearchesStepsize)
     return print(
         io,
diff --git a/src/plans/box_plan.jl b/src/plans/box_plan.jl
new file mode 100644
index 0000000000..8a03898135
--- /dev/null
+++ b/src/plans/box_plan.jl
@@ -0,0 +1,863 @@
+"""
+    has_anisotropic_max_stepsize(M::AbstractManifold)
+
+Return `true` if `M` has `max_stepsize` that depends on the direction.
+For example, if `M` is a `Hyperrectangle`-like manifold with corners, or a product of it
+with a standard manifold. Otherwise return `false`.
+"""
+has_anisotropic_max_stepsize(::AbstractManifold) = false
+has_anisotropic_max_stepsize(M::ProductManifold) = any(has_anisotropic_max_stepsize, M.manifolds)
+
+@doc raw"""
+    LimitedMemoryHessianApproximation <: AbstractQuasiNewtonDirectionUpdate
+
+An approximation of Hessian of a scalar function of the form ``B_0 = θ I``,
+``B_{k+1} = B_k - W_k M_k W_k^{\mathrm{T}}``,
+where ``θ > 0`` is an initial scaling guess.
+Matrix ``M_k = \left(\begin{smallmatrix}M₁₁ & M₂₁^{\mathrm{T}}\\ M₂₁ & M₂₂\end{smallmatrix}\right)``
+is stored using its blocks.
+Blocks ``W_k`` are (implicitly) composed from `memory_y` and `memory_s` stored in `qn_du`
+of type [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref).
+
+Initial scale ``θ`` is stored in the field `initial_scale` but if the memory isn't empty,
+the current scale is set to squared norm of $s_k$ divided by inner product of ``s_k`` and ``y_k``
+where ``k`` is the oldest index for which the denominator is not equal to 0.
+
+`last_gcd_result` stores the result of the last generalized Cauchy direction search.
+
+See [ByrdNocedalSchnabel:1994](@cite) for details.
+"""
+mutable struct QuasiNewtonLimitedMemoryBoxDirectionUpdate{
+        TDU <: QuasiNewtonLimitedMemoryDirectionUpdate,
+        F <: Real,
+        T_HM <: AbstractMatrix,
+        V <: AbstractVector,
+    } <: AbstractQuasiNewtonDirectionUpdate
+    # this approximates inverse Hessian
+    qn_du::TDU
+
+    # fields for approximating the Hessian
+    current_scale::F
+    M_11::T_HM
+    M_21::T_HM
+    M_22::T_HM
+    # buffer for calculating W_k blocks
+    buffer_inner_Sk_X::V
+    buffer_inner_Sk_Y::V
+    buffer_inner_Yk_X::V
+    buffer_inner_Yk_Y::V
+    last_gcd_result::Symbol
+    last_gcd_stepsize::F
+end
+
+function status_summary(d::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    s = "limited memory direction update with support for box constraints; "
+    s *= "internal direction update status: $(status_summary(d.qn_du))"
+    return s
+end
+
+function get_parameter(d::QuasiNewtonLimitedMemoryBoxDirectionUpdate, ::Val{:max_stepsize})
+    if d.last_gcd_result === :found_limited
+        return d.last_gcd_stepsize
+    else
+        return Inf
+    end
+end
+
+function QuasiNewtonLimitedMemoryBoxDirectionUpdate(
+        qn_du::QuasiNewtonLimitedMemoryDirectionUpdate{<:AbstractQuasiNewtonUpdateRule, T, F}
+    ) where {T, F <: Real}
+    memory_size = capacity(qn_du.memory_s)
+    M_11 = zeros(F, memory_size, memory_size)
+    M_21 = zeros(F, memory_size, memory_size)
+    M_22 = zeros(F, memory_size, memory_size)
+    buffer_inner_Sk_X = zeros(F, memory_size)
+    buffer_inner_Sk_Y = zeros(F, memory_size)
+    buffer_inner_Yk_X = zeros(F, memory_size)
+    buffer_inner_Yk_Y = zeros(F, memory_size)
+    return QuasiNewtonLimitedMemoryBoxDirectionUpdate{
+        typeof(qn_du), F, typeof(M_11), typeof(buffer_inner_Sk_X),
+    }(
+        qn_du,
+        qn_du.initial_scale,
+        M_11,
+        M_21,
+        M_22,
+        buffer_inner_Sk_X,
+        buffer_inner_Sk_Y,
+        buffer_inner_Yk_X,
+        buffer_inner_Yk_Y,
+        :not_searched,
+        NaN,
+    )
+end
+
+function initialize_update!(ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    initialize_update!(ha.qn_du)
+    ha.last_gcd_result = :not_searched
+    return ha
+end
+
+function (d::QuasiNewtonLimitedMemoryBoxDirectionUpdate)(
+        mp::AbstractManoptProblem, st
+    )
+    r = zero_vector(get_manifold(mp), get_iterate(st))
+    return d(r, mp, st)
+end
+function (d::QuasiNewtonLimitedMemoryBoxDirectionUpdate)(
+        r, mp::AbstractManoptProblem, st
+    )
+    d.qn_du(r, mp, st)
+    M = get_manifold(mp)
+    p = get_iterate(st)
+    X = get_gradient(st)
+    gcd = GeneralizedCauchyDirectionSubsolver(M, p, d)
+    d.last_gcd_result, d.last_gcd_stepsize = find_generalized_cauchy_direction!(M, gcd, r, p, r, X)
+    return r
+end
+
+get_update_vector_transport(u::QuasiNewtonLimitedMemoryBoxDirectionUpdate) = get_update_vector_transport(u.qn_du)
+
+function get_at_bound_index(M::ProductManifold, X, b::Tuple{Int, Any})
+    return get_at_bound_index(M.manifolds[b[1]], submanifold_component(M, X, b[1]), b[2])
+end
+
+@doc raw"""
+    hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X)
+
+Compute ``⟨X, B X⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`.
+"""
+function hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X)
+    m = length(gh.qn_du.memory_s)
+    num_nonzero_rho = count(!iszero, gh.qn_du.ρ)
+
+    normX_sqr = norm(M, p, X)^2
+
+    if m == 0 || num_nonzero_rho == 0
+        return gh.qn_du.initial_scale \ normX_sqr
+    end
+
+    ii = 1
+    for i in 1:m
+        iszero(gh.qn_du.ρ[i]) && continue
+        gh.buffer_inner_Yk_X[ii] = inner(M, p, gh.qn_du.memory_y[i], X)
+        gh.buffer_inner_Sk_X[ii] = gh.current_scale * inner(M, p, gh.qn_du.memory_s[i], X)
+
+        ii += 1
+    end
+    buffer_inner_Yk = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho)
+    buffer_inner_Sk = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho)
+
+    return hessian_value_from_inner_products(gh, normX_sqr, buffer_inner_Yk, buffer_inner_Sk, buffer_inner_Yk, buffer_inner_Sk)
+end
+
+@doc raw"""
+    hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector)
+
+Compute ``⟨X, B X⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`, and `X` is the
+[`UnitVector`](@ref).
+"""
+function hessian_value_diag(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector)
+    b = X.index
+    m = length(gh.qn_du.memory_s)
+    num_nonzero_rho = count(!iszero, gh.qn_du.ρ)
+
+    if m == 0 || num_nonzero_rho == 0
+        return inv(gh.qn_du.initial_scale)
+    end
+
+    ii = 1
+    for i in 1:m
+        iszero(gh.qn_du.ρ[i]) && continue
+        gh.buffer_inner_Yk_X[ii] = get_at_bound_index(M, gh.qn_du.memory_y[i], b)
+        gh.buffer_inner_Sk_X[ii] = gh.current_scale * get_at_bound_index(M, gh.qn_du.memory_s[i], b)
+
+        ii += 1
+    end
+    buffer_inner_Yk = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho)
+    buffer_inner_Sk = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho)
+
+    return hessian_value_from_inner_products(gh, one(eltype(gh.qn_du.ρ)), buffer_inner_Yk, buffer_inner_Sk, buffer_inner_Yk, buffer_inner_Sk)
+end
+
+@doc raw"""
+    hessian_value(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector, Y)
+
+Compute ``⟨X, B Y⟩``, where ``B`` is the (1, 1)-Hessian represented by `gh`, where `X` is the
+[`UnitVector`](@ref).
+"""
+function hessian_value(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, M::AbstractManifold, p, X::UnitVector, Y)
+    b = X.index
+
+    m = length(gh.qn_du.memory_s)
+    num_nonzero_rho = count(!iszero, gh.qn_du.ρ)
+
+    Yb = get_at_bound_index(M, Y, b)
+    if m == 0 || num_nonzero_rho == 0
+        return gh.qn_du.initial_scale * Yb
+    end
+
+    ii = 1
+    for i in 1:m
+        iszero(gh.qn_du.ρ[i]) && continue
+        gh.buffer_inner_Yk_X[ii] = get_at_bound_index(M, gh.qn_du.memory_y[i], b)
+        gh.buffer_inner_Sk_X[ii] = gh.current_scale * get_at_bound_index(M, gh.qn_du.memory_s[i], b)
+
+        gh.buffer_inner_Yk_Y[ii] = inner(M, p, gh.qn_du.memory_y[i], Y)
+        gh.buffer_inner_Sk_Y[ii] = gh.current_scale * inner(M, p, gh.qn_du.memory_s[i], Y)
+        ii += 1
+    end
+    buffer_inner_Yk_X = view(gh.buffer_inner_Yk_X, 1:num_nonzero_rho)
+    buffer_inner_Yk_Y = view(gh.buffer_inner_Yk_Y, 1:num_nonzero_rho)
+    buffer_inner_Sk_X = view(gh.buffer_inner_Sk_X, 1:num_nonzero_rho)
+    buffer_inner_Sk_Y = view(gh.buffer_inner_Sk_Y, 1:num_nonzero_rho)
+
+    return hessian_value_from_inner_products(gh, Yb, buffer_inner_Yk_X, buffer_inner_Sk_X, buffer_inner_Yk_Y, buffer_inner_Sk_Y)
+end
+
+@doc raw"""
+    update_current_scale!(M::AbstractManifold, p, gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+
+Refresh the scaling factor and blockwise Hessian approximation stored in `gh` using the
+nonzero curvature pairs currently in memory.
+
+- Identifies the most recent index with nonzero ``ρ_i`` to scale the initial Hessian guess
+    by ``ρ_i‖y_i‖^2 / θ``.
+- Builds ``L_k`` and ``S_k^\top S_k`` from the stored ``(s_i, y_i)`` pairs and updates the
+    block matrices ``M₁₁``, ``M₂₁``, and ``M₂₂`` via the blockwise inverse formula.
+- If all ``ρ_i`` vanish, resets `current_scale` to the inverse of `initial_scale` and
+    clears the block matrices.
+
+Returns the mutated `gh`.
+"""
+function update_current_scale!(M::AbstractManifold, p, gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    m = length(gh.qn_du.memory_s)
+    last_safe_index = -1
+    for i in eachindex(gh.qn_du.ρ)
+        if abs(gh.qn_du.ρ[i]) > 0
+            last_safe_index = i
+        end
+    end
+
+    if (last_safe_index == -1)
+        # All memory yield zero inner products
+        gh.current_scale = inv(gh.qn_du.initial_scale)
+        gh.M_11 = fill(0.0, 0, 0)
+        gh.M_21 = fill(0.0, 0, 0)
+        gh.M_22 = fill(0.0, 0, 0)
+        return gh
+    end
+
+    invA = Diagonal([-ri for ri in gh.qn_du.ρ if !iszero(ri)])
+    num_nonzero_rho = count(!iszero, gh.qn_du.ρ)
+
+    Lk = LowerTriangular(zeros(num_nonzero_rho, num_nonzero_rho))
+
+    # total scaling factor for the initial Hessian
+    # written this way to avoid floating point overflow (when ynorm is finite but ynorm^2 is Inf)
+    # see CUTEst EXPQUAD problem for an example
+    ynorm = norm(M, p, gh.qn_du.memory_y[last_safe_index])
+    gh.current_scale = ((gh.qn_du.ρ[last_safe_index] * ynorm) * ynorm) / gh.qn_du.initial_scale
+
+    tsksk = Symmetric(zeros(num_nonzero_rho, num_nonzero_rho))
+    ii = 1
+    # fill Dk and Lk
+    for i in 1:m
+        iszero(gh.qn_du.ρ[i]) && continue
+        jj = 1
+        for j in 1:m
+            iszero(gh.qn_du.ρ[j]) && continue
+            if jj < ii
+                Lk[ii, jj] = inner(M, p, gh.qn_du.memory_s[i], gh.qn_du.memory_y[j])
+            end
+            if ii <= jj
+                tsksk.data[ii, jj] = inner(M, p, gh.qn_du.memory_s[i], gh.qn_du.memory_s[j])
+            end
+            jj += 1
+        end
+        ii += 1
+    end
+    tsksk.data .*= gh.current_scale
+
+    # matrix inversion using the blockwise formula for speed
+    # Schur complement of -Dk is the only non-diagonal matrix we actually need to inverse in this step
+    W1 = Lk * invA
+    W2 = W1 * Lk'
+    gh.M_22 = inv(Symmetric(tsksk - W2))
+    W3 = gh.M_22 * W1
+    W4 = W1' * W3
+
+    gh.M_11 = invA + W4
+    gh.M_21 = -W3
+
+    return gh
+end
+
+@doc raw"""
+    hessian_value_from_inner_products(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, iss::Real, cy1, cs1, cy2, cs2)
+
+Evaluate the quadratic form defined by the current blockwise Hessian approximation stored in
+`gh`, given precomputed coordinate vectors.
+
+Arguments:
+- `iss`: inner product of original vectors.
+- `cy1`, `cy2`: coordinates of ``y``-like vectors in the ``Y_k`` basis.
+- `cs1`, `cs2`: coordinates of ``s``-like vectors in the scaled ``S_k`` basis.
+
+The result is ``θ·iss - cy₁ᵀ M₁₁ cy₂ - 2·cs₁ᵀ M₂₁ cy₂ - cs₁ᵀ M₂₂ cs₂`` using the blocks
+``M₁₁``, ``M₂₁``, ``M₂₂`` stored in `gh` and the current scale ``θ``. Returns the scalar value.
+"""
+function hessian_value_from_inner_products(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, iss::Real, cy1, cs1, cy2, cs2)
+    result = gh.current_scale * iss
+    if length(cy1) == 0
+        return result
+    end
+    result -= dot(cy1, gh.M_11, cy2)
+    result -= 2 * dot(cs1, gh.M_21, cy2)
+    result -= dot(cs1, gh.M_22, cs2)
+
+    return result
+end
+
+
+@doc raw"""
+    update_hessian!(gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate, p)
+
+Update Hessian approximation `gh` by moving it to point `p` and updating the stored `s` and
+`y` vectors.
+"""
+function update_hessian!(
+        gh::QuasiNewtonLimitedMemoryBoxDirectionUpdate,
+        mp::AbstractManoptProblem,
+        st::AbstractManoptSolverState,
+        p_old,
+        k::Int,
+    )
+    (capacity(gh.qn_du.memory_s) == 0) && return gh
+    update_hessian!(gh.qn_du, mp, st, p_old, k)
+    update_current_scale!(get_manifold(mp), get_iterate(st), gh)
+    return gh
+end
+
+
+"""
+    abstract type AbstractSegmentHessianUpdater end
+
+Abstract type for methods that calculate f' and f'' in the GCD calculation in subsequent
+line segments in [`GeneralizedCauchyDirectionSubsolver`](@ref).
+"""
+abstract type AbstractSegmentHessianUpdater end
+
+"""
+    init_updater!(::AbstractManifold, hessian_segment_updater::AbstractSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate)
+
+Method for initialization of `AbstractSegmentHessianUpdater` `hessian_segment_updater` just before the loop
+that examines subsequent intervals for GCD.
+"""
+init_updater!(::AbstractManifold, hessian_segment_updater::AbstractSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate)
+
+"""
+    struct GenericSegmentHessianUpdater <: AbstractSegmentHessianUpdater end
+
+Generic f' and f'' calculation that only relies on `hessian_value` but is relatively slow for
+high-dimensional domains.
+"""
+struct GenericSegmentHessianUpdater{TX} <: AbstractSegmentHessianUpdater
+    d_z::TX
+    d_tmp::TX
+end
+
+function get_default_hessian_segment_updater(M::AbstractManifold, p, ::AbstractQuasiNewtonDirectionUpdate)
+    return GenericSegmentHessianUpdater(zero_vector(M, p), zero_vector(M, p))
+end
+
+function init_updater!(M::AbstractManifold, hessian_segment_updater::GenericSegmentHessianUpdater, p, d, ha::AbstractQuasiNewtonDirectionUpdate)
+    zero_vector!(M, hessian_segment_updater.d_z, p)
+    copyto!(M, hessian_segment_updater.d_tmp, d)
+    return hessian_segment_updater
+end
+
+@doc raw"""
+    (upd::GenericSegmentHessianUpdater)(M::AbstractManifold, p, t::Real, dt::Real, b, db, ha::AbstractQuasiNewtonDirectionUpdate)
+
+Calculate Hessian values ``⟨e_b, B d_z⟩`` and ``⟨e_b, B d_tmp⟩`` for the generalized Cauchy
+point line search using the generic approach via `hessian_value` with [`UnitVector`](@ref).
+``d_z`` start with 0 and is updated in-place by adding `dt * d` to it.
+"""
+function (upd::GenericSegmentHessianUpdater)(M::AbstractManifold, p, t::Real, dt::Real, b, db, ha)
+    upd.d_z .+= dt .* upd.d_tmp
+    hv_eb_dz = hessian_value(ha, M, p, UnitVector(b), upd.d_z)
+    hv_eb_d = hessian_value(ha, M, p, UnitVector(b), upd.d_tmp)
+
+    set_zero_at_index!(M, upd.d_tmp, b)
+
+    return hv_eb_dz, hv_eb_d
+end
+
+"""
+    struct LimitedMemorySegmentHessianUpdater{TV <: AbstractVector} <: AbstractSegmentHessianUpdater
+
+Hessian value calculation for generalized Cauchy direction line segments that is optimized for
+[`QuasiNewtonLimitedMemoryBoxDirectionUpdate`](@ref). It relies on a specific Hessian structure.
+"""
+struct LimitedMemorySegmentHessianUpdater{TV <: AbstractVector} <: AbstractSegmentHessianUpdater
+    p_s::TV
+    p_y::TV
+    c_s::TV
+    c_y::TV
+end
+
+function get_default_hessian_segment_updater(::AbstractManifold, p, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    return LimitedMemorySegmentHessianUpdater(similar(ha.qn_du.ρ), similar(ha.qn_du.ρ), similar(ha.qn_du.ρ), similar(ha.qn_du.ρ))
+end
+
+function init_updater!(M::AbstractManifold, hessian_segment_updater::LimitedMemorySegmentHessianUpdater, p, d, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    fill!(hessian_segment_updater.c_s, 0)
+    fill!(hessian_segment_updater.c_y, 0)
+    ii = 1
+    for i in eachindex(ha.qn_du.ρ)
+        if iszero(ha.qn_du.ρ[i])
+            continue
+        end
+
+        hessian_segment_updater.p_s[ii] = ha.current_scale * inner(M, p, ha.qn_du.memory_s[i], d)
+        hessian_segment_updater.p_y[ii] = inner(M, p, ha.qn_du.memory_y[i], d)
+        ii += 1
+    end
+    return hessian_segment_updater
+end
+
+@doc raw"""
+    (hessian_segment_updater::LimitedMemorySegmentHessianUpdater)(
+        M::AbstractManifold, p,
+        t::Real, dt::Real, b, db, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate
+    )
+
+Calculate Hessian values ``⟨e_b, B d_z⟩`` and ``⟨e_b, B d⟩`` for the generalized Cauchy
+point line search using the limited-memory block Hessian stored in `ha`.
+``d_z`` start with 0 and is updated in-place by adding `dt * d` to it.
+
+## Arguments:
+
+- `M`: manifold.
+- `p`: current iterate.
+- `t`: current step length from `p`.
+- `dt`: step length increment from the last step.
+- `b`: bound index of the current segment.
+- `db`: search direction component at the bound index `b`.
+
+The updater reuses cached coordinate projections in `hessian_segment_updater` to cheaply
+evaluate Hessian quadratic forms via `hessian_value_from_inner_products`.
+"""
+function (hessian_segment_updater::LimitedMemorySegmentHessianUpdater)(
+        M::AbstractManifold, p,
+        t::Real, dt::Real, b, db, ha::QuasiNewtonLimitedMemoryBoxDirectionUpdate
+    )
+
+    m = length(ha.qn_du.memory_s)
+    num_nonzero_rho = count(!iszero, ha.qn_du.ρ)
+
+    ii = 1
+    for i in 1:m
+        iszero(ha.qn_du.ρ[i]) && continue
+        # setting _X to w_b from the paper
+        ha.buffer_inner_Yk_X[ii] = get_at_bound_index(M, ha.qn_du.memory_y[i], b)
+        ha.buffer_inner_Sk_X[ii] = ha.current_scale * get_at_bound_index(M, ha.qn_du.memory_s[i], b)
+
+        ii += 1
+    end
+
+    buffer_inner_Yk_eb = view(ha.buffer_inner_Yk_X, 1:num_nonzero_rho)
+    buffer_inner_Sk_eb = view(ha.buffer_inner_Sk_X, 1:num_nonzero_rho)
+
+    buffer_inner_cy = view(hessian_segment_updater.c_y, 1:num_nonzero_rho)
+    buffer_inner_cs = view(hessian_segment_updater.c_s, 1:num_nonzero_rho)
+    buffer_inner_py = view(hessian_segment_updater.p_y, 1:num_nonzero_rho)
+    buffer_inner_ps = view(hessian_segment_updater.p_s, 1:num_nonzero_rho)
+
+    buffer_inner_cy .+= dt .* buffer_inner_py
+    buffer_inner_cs .+= dt .* buffer_inner_ps
+
+    eb_B_z = hessian_value_from_inner_products(ha, t * db, buffer_inner_Yk_eb, buffer_inner_Sk_eb, buffer_inner_cy, buffer_inner_cs)
+
+    eb_B_d = hessian_value_from_inner_products(ha, db, buffer_inner_Yk_eb, buffer_inner_Sk_eb, buffer_inner_py, buffer_inner_ps)
+
+    buffer_inner_py .-= db .* buffer_inner_Yk_eb
+    buffer_inner_ps .-= db .* buffer_inner_Sk_eb
+
+    return eb_B_z, eb_B_d
+end
+
+struct ProductIndex{T <: Tuple}
+    ranges::T
+end
+
+Base.iterate(itr::ProductIndex) = _iterate(itr.ranges, 1, nothing)
+Base.iterate(itr::ProductIndex, state) = _iterate(itr.ranges, state...)
+
+function _iterate(ranges, i, st)
+    i > length(ranges) && return nothing
+    if st === nothing
+        it = iterate(ranges[i])
+        it === nothing && return _iterate(ranges, i + 1, nothing)
+        (j, st2) = it
+        return ((i, j), (i, st2))
+    else
+        it = iterate(ranges[i], st)
+        if it === nothing
+            return _iterate(ranges, i + 1, nothing)
+        else
+            (j, st2) = it
+            return ((i, j), (i, st2))
+        end
+    end
+end
+
+
+"""
+    to_coordinate_index(M::ProductManifold, b::UnitVector, B::AbstractBasis)
+
+Get the index of coordinate equal to 1 of [`UnitVector`](@ref) `b` with respect to
+`AbstractBasis` `B`.
+"""
+to_coordinate_index(::AbstractManifold, b::UnitVector{Int}, ::AbstractBasis) = b.index
+"""
+    to_coordinate_index(M::ProductManifold, b::UnitVector, B::AbstractBasis)
+
+Get the index of coordinate equal to 1 of [`UnitVector`](@ref) `b` with respect to
+`AbstractBasis` `B`.
+"""
+function to_coordinate_index(M::ProductManifold, b::UnitVector{Tuple{Int, Int}}, B::AbstractBasis)
+    i, j = b.index
+    offset = sum(k -> number_of_coordinates(M.manifolds[k], B), 1:(i - 1); init = 0)
+    return offset + j
+end
+
+Base.length(itr::ProductIndex) = sum(length, itr.ranges)
+
+
+"""
+    get_bounds_index(::AbstractManifold)
+
+Get the bound indices of manifold `M`. Standard manifolds don't have bounds, so
+`Base.OneTo(1)` is returned.
+"""
+get_bounds_index(M::AbstractManifold) = Base.OneTo(0)
+function get_bounds_index(M::ProductManifold)
+    ranges = map(get_bounds_index, M.manifolds)
+    iter = ProductIndex(ranges)
+    return iter
+end
+
+"""
+    get_stepsize_bound(M::AbstractManifold, x, d, i)
+
+Get the upper bound on moving in direction `d` from point `p` on manifold `M`, for the
+bound index `i`.
+"""
+get_stepsize_bound(M::AbstractManifold, p, d, i)
+function get_stepsize_bound(M::ProductManifold, p, d, i::Tuple{Int, Any})
+    i1, i2 = i
+    return get_stepsize_bound(M.manifolds[i1], submanifold_component(M, p, i1), submanifold_component(M, d, i1), i2)
+end
+
+"""
+    set_zero_at_index!(M::ProductManifold, d, i::Tuple{Int,Any})
+
+Set the element of the `i[1]`th component of `d` at bound index `i[2]` to zero.
+"""
+function set_zero_at_index!(M::ProductManifold, d, i::Tuple{Int, Any})
+    i1, i2 = i
+    set_zero_at_index!(M.manifolds[i1], submanifold_component(M, d, i1), i2)
+    return d
+end
+
+"""
+    set_stepsize_bound!(M::AbstractManifold, d_out, p, d, t_current::Real)
+
+For each component at index `i` in the tangent vector `d_out`, if the stepsize bound in
+direction `d` for that component is less than `t_current`, set the element of `d_out` to
+the distance from `p[i]` to the bound in the direction of `d[i]`. If the stepsize bound is
+non-positive, set the element to 0.
+
+By default it does not modify `d_out` because most manifolds don't have direction-specific
+stepsize bounds, and general anisotropic bounds are handled differently.
+"""
+function set_stepsize_bound!(::AbstractManifold, d_out, p, d, t_current::Real)
+    return d_out
+end
+
+function set_stepsize_bound!(M::ProductManifold, d_out, p, d, t_current::Real)
+    map(
+        (N, d_out_c, p_c, d_c) -> set_stepsize_bound!(N, d_out_c, p_c, d_c, t_current),
+        M.manifolds,
+        submanifold_components(M, d_out),
+        submanifold_components(M, p),
+        submanifold_components(M, d),
+    )
+    return d_out
+end
+
+@doc raw"""
+    GeneralizedCauchyDirectionSubsolver{TM <: AbstractManifold, TP, T_HA <: AbstractQuasiNewtonDirectionUpdate, TFU <: AbstractSegmentHessianUpdater}
+
+Helper container for generalized Cauchy direction search. Stores the manifold `M`, cached
+original descent direction (`d_original`), the quasi-Newton direction update `ha`, and the
+`hessian_segment_updater`, which computes certain values of the Hessian while advancing segments.
+Instances are reused across segments during [`find_generalized_cauchy_direction!`](@ref) to
+avoid allocations.
+"""
+struct GeneralizedCauchyDirectionSubsolver{
+        TX,
+        T_HA <: AbstractQuasiNewtonDirectionUpdate, TFU <: AbstractSegmentHessianUpdater, TFT <: Tuple{<:Real, Any}, TBI,
+        TO <: Base.Order.Ordering,
+    }
+    d_original::TX
+    ha::T_HA
+    hessian_segment_updater::TFU
+    F_list::Vector{TFT}
+    bounds_indices::TBI
+    ordering::TO
+end
+
+function GeneralizedCauchyDirectionSubsolver(
+        M::AbstractManifold, p, ha::AbstractQuasiNewtonDirectionUpdate;
+        hessian_segment_updater::AbstractSegmentHessianUpdater = get_default_hessian_segment_updater(M, p, ha)
+    )
+    bounds_indices = get_bounds_index(M)
+    TInd = eltype(bounds_indices)
+    TF = number_eltype(p)
+    F_list = Tuple{TF, TInd}[]
+    sizehint!(F_list, length(bounds_indices) + 1)
+    ordering = Base.By(first)
+    return GeneralizedCauchyDirectionSubsolver(
+        zero_vector(M, p), ha,
+        hessian_segment_updater, F_list, bounds_indices, ordering
+    )
+end
+
+function collect_isotropic_limits!(::AbstractManifold, ::Vector{<:Tuple{TF, Any}}, p, d)::Tuple{Bool, TF} where {TF <: Real}
+    return false, convert(TF, Inf)
+end
+
+function collect_isotropic_limits!(M::ProductManifold, F_list::Vector{<:Tuple{TF, Any}}, p, d)::Tuple{Bool, TF} where {TF <: Real}
+    has_finite_limit = false
+    smallest_positive_limit = Inf
+    map(M.manifolds, submanifold_components(M, p), submanifold_components(M, d)) do Mi, p_i, d_i
+        if !has_anisotropic_max_stepsize(Mi)
+            max_step = Manopt.max_stepsize(Mi, p_i)
+            if isfinite(max_step)
+                tms = max_step / norm(Mi, p_i, d_i)
+                push!(F_list, (tms, -1))
+                has_finite_limit = true
+                if tms < smallest_positive_limit
+                    smallest_positive_limit = tms
+                end
+            end
+        end
+    end
+    return has_finite_limit, smallest_positive_limit
+end
+
+"""
+    find_generalized_cauchy_direction!(
+        M::AbstractManifold,
+        gcd::GeneralizedCauchyDirectionSubsolver, d_out, p, d, X
+    )
+
+Find generalized Cauchy direction looking from point `p` on manifold `M` in direction `d`
+and save it to `d_out`. Gradient of the objective at `p` is `X`.
+
+The function returns a pair (status, max_stepsize) where `status` is a symbol describing
+the result of the search, and `max_stepsize` is the maximum stepsize that can be taken in
+the direction `d_out`.
+
+The `status` can be one of the following:
+* `:found_limited` if the point was found and we can perform a step of length at most 1
+  in direction `d_out` afterwards,
+* `:found_unlimited` if the point was found and we can perform a step of length at most
+  `max_stepsize(M, p)` in direction `d_out` afterwards,
+* `:not_found` if the search cannot be performed in direction `d`.
+"""
+function find_generalized_cauchy_direction!(
+        M::AbstractManifold,
+        gcd::GeneralizedCauchyDirectionSubsolver{
+            <:Any, <:AbstractQuasiNewtonDirectionUpdate,
+            <:AbstractSegmentHessianUpdater, <:Tuple{TF, Any},
+        },
+        d_out, p, d, X
+    ) where {TF <: Real}
+    copyto!(M, gcd.d_original, d)
+    copyto!(M, d_out, d)
+
+    ordering = gcd.ordering
+    F_list = gcd.F_list
+    empty!(F_list)
+
+    bounds_indices = gcd.bounds_indices
+
+    # isotropic limits
+    has_finite_limit, smallest_positive_limit = collect_isotropic_limits!(M, F_list, p, d)
+    # anisotropic limits
+    for i in bounds_indices
+        sbi = get_stepsize_bound(M, p, d, i)::TF
+
+        if sbi > 0
+            push!(F_list, (sbi, i))
+            if sbi < smallest_positive_limit
+                smallest_positive_limit = sbi
+            end
+        end
+        has_finite_limit |= isfinite(sbi)
+    end
+
+    # In this case we can't move in the direction `d` at all, though it's usually not
+    # a problem relevant to the end user because it can be handled by step_solver! that
+    # uses the GCD subsolver.
+    if isempty(F_list)
+        return (:not_found, NaN)
+    end
+    heapify!(F_list, ordering)
+
+    f_prime = inner(M, p, X, d)
+    f_double_prime = hessian_value_diag(gcd.ha, M, p, d)
+
+    if iszero(f_prime) || iszero(f_double_prime)
+        return (:not_found, NaN)
+    end
+
+    dt_min = -f_prime / f_double_prime
+    t_old = 0.0
+
+    t_current, b = heappop!(F_list, ordering)
+    dt = t_current - t_old
+
+    init_updater!(M, gcd.hessian_segment_updater, p, d, gcd.ha)
+    # b can be -1 if it corresponds to the max stepsize limit on the manifold part
+    while dt_min > dt && b != -1
+        db = get_at_bound_index(M, d, b)::TF
+        gb = get_at_bound_index(M, X, b)::TF
+
+        hv_eb_dz, hv_eb_d = gcd.hessian_segment_updater(M, p, t_current, dt, b, db, gcd.ha)::Tuple{TF, TF}
+
+        f_prime += dt * f_double_prime - db * (gb + hv_eb_dz)
+        f_double_prime += (2 * -db * hv_eb_d) + db^2 * hessian_value_diag(gcd.ha, M, p, UnitVector(b))
+
+        t_old = t_current
+
+        # If f_prime is 0, we've found the local minimizer (GCD)
+        if iszero(f_prime) || iszero(f_double_prime)
+            # It means that GCD is at the beginning of the t_current, so we want to set dt_min to 0 (stay in the point)
+            dt_min = 0.0
+            break
+        end
+
+        dt_min = -f_prime / f_double_prime
+        isempty(F_list) && break
+
+        t_current, b = heappop!(F_list, ordering)
+        dt = t_current - t_old
+    end
+
+    dt_min = max(dt_min, 0.0)
+    t_old = t_old + dt_min
+    d_out .*= t_old
+    # by construction, there is no bound achievable before stepsize 1.0 in direction d_out
+    # there first bound after that is achieved at smallest_positive_limit / t_old
+    max_feasible_stepsize = max(1.0, smallest_positive_limit / t_old)
+
+    set_stepsize_bound!(M, d_out, p, gcd.d_original, t_old)
+    if has_finite_limit
+        return (:found_limited, max_feasible_stepsize)
+    else
+        return (:found_unlimited, Inf)
+    end
+end
+
+"""
+    struct MaxStepsizeInDirectionSubsolver end
+
+Helper container for finding the maximum stepsize in a direction. Stores the manifold `M`,
+container for the list of bounds `F_list`, and the bound indices.
+
+## Constructor
+
+    MaxStepsizeInDirectionSubsolver(M::AbstractManifold, p)
+
+Initialize the `MaxStepsizeInDirectionSubsolver` for manifold `M` and point `p`. The `F_list`
+is initialized to be empty and will be populated during the search for the maximum stepsize
+in a direction. Floating point type of the elements bounds in `F_list` is determined by the
+number type of `p`.
+
+The `MaxStepsizeInDirectionSubsolver` can be reused for multiple different points and
+directions on the same manifold, but it is not thread-safe.
+"""
+struct MaxStepsizeInDirectionSubsolver{TFT <: Tuple{<:Real, Any}, TBI}
+    F_list::Vector{TFT}
+    bounds_indices::TBI
+end
+function MaxStepsizeInDirectionSubsolver(M::AbstractManifold, p)
+    bounds_indices = get_bounds_index(M)
+    TInd = eltype(bounds_indices)
+    TF = number_eltype(p)
+    F_list = Tuple{TF, TInd}[]
+    sizehint!(F_list, length(bounds_indices) + 1)
+    return MaxStepsizeInDirectionSubsolver{Tuple{TF, TInd}, typeof(bounds_indices)}(F_list, bounds_indices)
+end
+
+"""
+    find_max_stepsize_in_direction(M::AbstractManifold, gcd::MaxStepsizeInDirectionSubsolver, p, d)
+
+Find the maximum stepsize that can be performed from point `p` in direction `d`.
+
+The function returns a pair (status, max_stepsize) where `status` is a symbol describing
+the result of the search, and `max_stepsize` is the maximum stepsize that can be taken in
+the direction `d_out`.
+
+The `status` can be one of the following:
+* `:found_limited` if the point was found and we can perform a step of length at most 1
+  in direction `d_out` afterwards,
+* `:found_unlimited` if the point was found and we can perform a step of length at most
+  `max_stepsize(M, p)` in direction `d_out` afterwards,
+* `:not_found` if the search cannot be performed in direction `d`.
+"""
+function find_max_stepsize_in_direction(
+        M::AbstractManifold,
+        sdf::MaxStepsizeInDirectionSubsolver{<:Tuple{TF, Any}},
+        p, d
+    ) where {TF <: Real}
+
+    F_list = sdf.F_list
+    empty!(F_list)
+    bounds_indices = sdf.bounds_indices
+
+    # isotropic limits
+    has_finite_limit, smallest_positive_limit = collect_isotropic_limits!(M, F_list, p, d)
+    # anisotropic limits
+    for i in bounds_indices
+        sbi = get_stepsize_bound(M, p, d, i)::TF
+
+        if sbi > 0
+            push!(F_list, (sbi, i))
+            if sbi < smallest_positive_limit
+                smallest_positive_limit = sbi
+            end
+        end
+        has_finite_limit |= isfinite(sbi)
+    end
+
+    if isempty(F_list)
+        return (:not_found, NaN)
+    end
+    if has_finite_limit
+        return (:found_limited, smallest_positive_limit)
+    else
+        return (:found_unlimited, Inf)
+    end
+
+end
+
+function show(io::IO, qns::QuasiNewtonLimitedMemoryBoxDirectionUpdate)
+    print(io, "QuasiNewtonLimitedMemoryBoxDirectionUpdate with internal state:\n")
+    return print(io, qns.qn_du)
+end
diff --git a/src/plans/debug.jl b/src/plans/debug.jl
index 760f0b46c8..b5104181bb 100644
--- a/src/plans/debug.jl
+++ b/src/plans/debug.jl
@@ -1252,7 +1252,7 @@ function (d::DebugWarnIfGradientNormTooLarge)(
         p_inj = d.factor * max_stepsize(M, p)
         if Xn > p_inj
             @warn """At iteration #$k
-            the gradient norm ($Xn) is larger that $(d.factor) times the injectivity radius $(p_inj) at the current iterate.
+            the gradient norm ($Xn) is larger than $(d.factor) times the injectivity radius $(p_inj) at the current iterate.
             """
             if d.status === :Once
                 @warn "Further warnings will be suppressed, use DebugWarnIfGradientNormTooLarge($(d.factor), :Always) to get all warnings."
diff --git a/src/plans/plan.jl b/src/plans/plan.jl
index a55c1d7967..b0438faeb8 100644
--- a/src/plans/plan.jl
+++ b/src/plans/plan.jl
@@ -208,6 +208,8 @@ include("higher_order_primal_dual_plan.jl")
 
 include("stochastic_gradient_plan.jl")
 
+include("box_plan.jl")
+
 include("embedded_objective.jl")
 include("scaled_objective.jl")
 
diff --git a/src/plans/quasi_newton_plan.jl b/src/plans/quasi_newton_plan.jl
index b167b01b4b..db4686a74f 100644
--- a/src/plans/quasi_newton_plan.jl
+++ b/src/plans/quasi_newton_plan.jl
@@ -409,8 +409,8 @@ space ``T_{p_{k+1}} $(_tex(:Cal, "M"))``, preferably with an isometric vector tr
 
 # Provided functors
 
-* `(mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction
-* `(η, mp::AbstractManoptproblem, st::QuasiNewtonState) -> η` to compute the update direction in-place of `η`
+* `(mp::AbstractManoptProblem, st::QuasiNewtonState) -> η` to compute the update direction
+* `(η, mp::AbstractManoptProblem, st::QuasiNewtonState) -> η` to compute the update direction in-place of `η`
 
 # Fields
 
@@ -511,6 +511,55 @@ function initialize_update!(d::QuasiNewtonMatrixDirectionUpdate)
     copyto!(d.matrix, I)
     return d
 end
+"""
+    hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate, M, p, X)
+
+Evaluate the quadratic form associated with the stored quasi-Newton matrix.
+Returns the scalar ``c^{\top} B c`` where ``c`` are the coordinates of the
+tangent vector `X` at `p` (in the basis `d.basis`) and ``B`` is `d.matrix`.
+"""
+function hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X) where {T <: Union{BFGS, DFP, SR1, Broyden}}
+    c = get_coordinates(M, p, X, d.basis)
+    return dot(c, d.matrix, c)
+end
+
+"""
+    UnitVector{TB}
+
+A type representing a unit tangent vector on a `Hyperrectangle`-like manifold with corners,
+or a product of it with a standard manifold.
+The field `index` stores the index of the element equal to 1.
+All other elements are equal to 0.
+`its` stores the overall iterator over all bounds.
+"""
+struct UnitVector{TB}
+    index::TB
+end
+
+"""
+    hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate, M, p, X::UnitVector)
+
+Evaluate the quadratic form associated with the stored quasi-Newton matrix.
+Returns the scalar ``c^{\top} B c`` where ``c`` are the coordinates of the
+[`UnitVector`](@ref) `X` at `p` (in the basis `d.basis`) and ``B`` is `d.matrix`.
+"""
+function hessian_value_diag(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X::UnitVector) where {T <: Union{BFGS, DFP, SR1, Broyden}}
+    b = to_coordinate_index(M, X, d.basis)
+    return d.matrix[b, b]
+end
+"""
+    hessian_value(d::QuasiNewtonMatrixDirectionUpdate, M, p, X::UnitVector, Y)
+
+Evaluate the quadratic form associated with the stored quasi-Newton matrix.
+Returns the scalar ``c_b^{\top} B c`` where ``c_b`` are the coordinates of the
+[`UnitVector`](@ref) `X` at `p` (assumed to correspond to the basis `d.basis`),
+``c`` are the coordinates of the tangent vector `Y` at `p` (in the basis `d.basis`)
+and ``B`` is `d.matrix`.
+"""
+function hessian_value(d::QuasiNewtonMatrixDirectionUpdate{T}, M::AbstractManifold, p, X::UnitVector, Y) where {T <: Union{BFGS, DFP, SR1, Broyden}}
+    b = to_coordinate_index(M, X, d.basis)
+    return dot(d.matrix[b, :], get_coordinates(M, p, Y, d.basis))
+end
 
 _doc_QN_B = """
 ```math
@@ -562,6 +611,19 @@ function is always included and the old, probably no longer relevant, informatio
 $(_fields(:vector_transport_method))
 * `message`:                 a string containing a potential warning that might have appeared
 * `project!`:                a function to stabilize the update by projecting on the tangent space
+* `vector_transport_method`: method for transporting stored s and y directions to the new point
+* `nonpositive_curvature_behavior`: how non-positive-definite pairs (s, y) are detected and handled in vector transport.
+                             Allowed values are:
+                                - `:ignore` (default): pairs whose inner product is zero are
+                                  omitted from the current Hessian approximation but are
+                                  retained in memory for further iterations. This may lead
+                                  to non-positive-definite Hessians and non-descent directions
+                                  being selected and thus needs to be handled elsewhere.
+                                - `:byrd`: pairs such that `inner(M, p, X_s, Y_s) <= iszero_abstol * norm(M, p, Y_s)^2`
+                                  are removed from memory (see [ByrdLuNocedalZhu:1995](@cite),
+                                  Eq. (3.9) and its discussion).
+* `sy_tol`:                  tolerance for detecting non-positive-definite pairs (X_s, X_y).
+                             The pairs may lose positive-definiteness after vector transport.
 
 # Constructor
 
@@ -597,6 +659,8 @@ mutable struct QuasiNewtonLimitedMemoryDirectionUpdate{
     initial_scale::G
     project!::Proj
     vector_transport_method::VT
+    nonpositive_curvature_behavior::Symbol
+    sy_tol::F
     message::String
 end
 function QuasiNewtonLimitedMemoryDirectionUpdate(
@@ -608,6 +672,8 @@ function QuasiNewtonLimitedMemoryDirectionUpdate(
         initial_scale::G = 1.0,
         (project!)::Proj = (copyto!),
         vector_transport_method::VTM = default_vector_transport_method(M, typeof(p)),
+        nonpositive_curvature_behavior::Symbol = :ignore,
+        sy_tol::Real = 1.0e-8,
     ) where {
         NT <: AbstractQuasiNewtonUpdateRule,
         T,
@@ -630,6 +696,8 @@ function QuasiNewtonLimitedMemoryDirectionUpdate(
         _initial_state,
         project!,
         vector_transport_method,
+        nonpositive_curvature_behavior,
+        sy_tol,
         "",
     )
 end
@@ -661,20 +729,7 @@ function (d::QuasiNewtonLimitedMemoryDirectionUpdate{InverseBFGS})(
     end
     # backward pass
     for i in m:-1:1
-        # what if division by zero happened here, setting to zero ignores this in the next step
-        # pre-compute in case inner is expensive
-        v = inner(M, p, d.memory_s[i], d.memory_y[i])
-        if iszero(v)
-            d.ρ[i] = zero(eltype(d.ρ))
-            if length(d.message) > 0
-                d.message = replace(d.message, " i=" => " i=$i,")
-                d.message = replace(d.message, "summand in" => "summands in")
-            else
-                d.message = "The inner products ⟨s_i,y_i⟩ ≈ 0, i=$i, ignoring summand in approximation."
-            end
-        else
-            d.ρ[i] = 1 / v
-        end
+        # d.ρ is precomputed in the Hessian update
         d.ξ[i] = inner(M, p, d.memory_s[i], r) * d.ρ[i]
         r .-= d.ξ[i] .* d.memory_y[i]
     end
@@ -726,13 +781,17 @@ function initialize_update!(d::QuasiNewtonLimitedMemoryDirectionUpdate)
     return d
 end
 
+function show(io::IO, qns::QuasiNewtonLimitedMemoryDirectionUpdate)
+    return print(io, "QuasiNewtonLimitedMemoryDirectionUpdate with memory size $(length(qns.memory_s)) and $(qns.vector_transport_method) as vector transport.")
+end
+
 @doc """
     QuasiNewtonCautiousDirectionUpdate <: AbstractQuasiNewtonDirectionUpdate
 
 These [`AbstractQuasiNewtonDirectionUpdate`](@ref)s represent any quasi-Newton update rule,
 which are based on the idea of a so-called cautious update. The search direction is calculated
 as given in [`QuasiNewtonMatrixDirectionUpdate`](@ref) or [`QuasiNewtonLimitedMemoryDirectionUpdate`](@ref),
-butut the update  then is only executed if
+but the update then is only executed if
 
 ```math
 $(_tex(:frac, "g_{x_{k+1}}(y_k,s_k)", "$(_tex(:norm, "s_k"; index = "x_{k+1}"))^{2}")) ≥ θ $(_tex(:norm, "$(_tex(:grad))f(p_k)"; index = "p_k")),
diff --git a/src/plans/stepsize/initial_guess.jl b/src/plans/stepsize/initial_guess.jl
index 4c9788b6a8..be6d67b7d1 100644
--- a/src/plans/stepsize/initial_guess.jl
+++ b/src/plans/stepsize/initial_guess.jl
@@ -145,6 +145,7 @@ function (hzi::HagerZhangInitialGuess{TF})(
         k::Int, last_stepsize::Real, η;
         lf0 = get_cost(mp, get_iterate(s)),
         Dlf0 = get_differential(mp, get_iterate(s), η),
+        kwargs...
     ) where {TF <: Real}
     M = get_manifold(mp)
     p = get_iterate(s)
@@ -152,6 +153,13 @@ function (hzi::HagerZhangInitialGuess{TF})(
 
     alphamax = min(hzi.alphamax, max_stepsize(M, p))
 
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        alphamax = min(
+            kwargs[:stop_when_stepsize_exceeds],
+            alphamax,
+        )
+    end
+
     if k == 1
         point_d = hzi.point_distance(M, p)
         # Step I0
diff --git a/src/plans/stepsize/linesearch.jl b/src/plans/stepsize/linesearch.jl
index 3d2576ea41..d993b9e072 100644
--- a/src/plans/stepsize/linesearch.jl
+++ b/src/plans/stepsize/linesearch.jl
@@ -28,6 +28,19 @@ function Base.show(io::IO, ::MIME"text/plain", ams::Stepsize)
     return multiline ? status_summary(io, ams) : show(io, ams)
 end
 
+"""
+    initialize_stepsize!(sm::Stepsize)
+
+Initialize the state of a stepsize functor. This function should be called in the
+`initialize_solver!` function for solvers that do possess a stepsize and can be used to
+set up internal state of the stepsize functor that is preserved between line searches in
+the same optimization, for example adaptive thresholds for Wolfe criteria in Hager-Zhang
+line search.
+
+By default it does nothing.
+"""
+initialize_stepsize!(sm::Stepsize) = sm
+
 """
     default_stepsize(M::AbstractManifold, ams::AbstractManoptSolverState)
 
@@ -51,21 +64,40 @@ function max_stepsize(M::AbstractManifold, p)
         injectivity_radius(M, p)
     catch
         is_tutorial_mode() &&
-            @warn "`max_stepsize was called, but there seems to not be an `injectivity_raidus` available on $M."
+            @warn "`max_stepsize was called, but there seems to not be an `injectivity_radius` available on $M."
         Inf
     end
     return s
 end
+function max_stepsize(M::ProductManifold, p)
+    return min(map(max_stepsize, M.manifolds, submanifold_components(M, p))...)
+end
+function max_stepsize(M::AbstractPowerManifold, p)
+    stepsize = number_eltype(p)(Inf)
+    rep_size = representation_size(M.manifold)
+    for i in get_iterator(M)
+        cur_stepsize = max_stepsize(M.manifold, _read(M, rep_size, p, i))
+        stepsize = min(cur_stepsize, stepsize)
+    end
+    return stepsize
+end
 function max_stepsize(M::AbstractManifold)
     s = try
         injectivity_radius(M)
     catch
         is_tutorial_mode() &&
-            @warn "`max_stepsize was called, but there seems to not be an `injectivity_raidus` available on $M."
+            @warn "`max_stepsize was called, but there seems to not be an `injectivity_radius` available on $M."
         Inf
     end
     return s
 end
+function max_stepsize(M::ProductManifold)
+    return min(map(max_stepsize, M.manifolds)...)
+end
+function max_stepsize(M::AbstractPowerManifold)
+    return max_stepsize(M.manifold)
+end
+
 
 """
     Linesearch <: Stepsize
@@ -134,8 +166,8 @@ $(_kwargs(:retraction_method))
 * `gradient = nothing`: precomputed gradient at point `p`
 * `report_messages_in::NamedTuple = (; )`: a named tuple of [`StepsizeMessage`](@ref)s to report messages in.
   currently supported keywords are `:non_descent_direction`, `:stepsize_exceeds`, `:stepsize_less`, `:stop_increasing`, `:stop_decreasing`
-* `stop_when_stepsize_less=0.0`: to avoid numerical underflow
-* `stop_when_stepsize_exceeds=`[`max_stepsize`](@ref)`(M, p) / norm(M, p, η)`) to avoid leaving the injectivity radius on a manifold
+* `stop_when_stepsize_less::Real=0.0`: to avoid numerical underflow
+* `stop_when_stepsize_exceeds::Real=`[`max_stepsize`](@ref)`(M, p) / norm(M, p, η)`) to avoid leaving the injectivity radius on a manifold or exceeding boundaries on a manifold with corners
 * `stop_increasing_at_step=100`: stop the initial increase of step size after these many steps
 * `stop_decreasing_at_step=`1000`: stop the decreasing search after these many steps
 
@@ -157,13 +189,14 @@ end
 @doc "$_doc_linesearch_backtrack"
 function linesearch_backtrack!(
         M::AbstractManifold, q, f::TF, p, s, decrease, contract, η::T;
-        lf0 = f(M, p),
-        gradient = nothing,
+        lf0::Real = f(M, p), gradient = nothing,
         Dlf0 = isnothing(gradient) ? nothing : real(inner(M, p, gradient, η)),
         retraction_method::AbstractRetractionMethod = default_retraction_method(M, typeof(p)),
-        additional_increase_condition = (M, p) -> true, additional_decrease_condition = (M, p) -> true,
-        stop_when_stepsize_less = 0.0, stop_when_stepsize_exceeds = max_stepsize(M, p) / norm(M, p, η),
-        stop_increasing_at_step = 100, stop_decreasing_at_step = 1000,
+        additional_increase_condition = (M, p) -> true,
+        additional_decrease_condition = (M, p) -> true,
+        stop_when_stepsize_less::Real = 0.0, stop_when_stepsize_exceeds::Real = max_stepsize(M, p) / norm(M, p, η),
+        stop_increasing_at_step = 100,
+        stop_decreasing_at_step = 1000,
         report_messages_in::NamedTuple = (;),
     ) where {TF, T}
     ManifoldsBase.retract_fused!(M, q, p, η, s, retraction_method)
@@ -177,14 +210,14 @@ function linesearch_backtrack!(
     while f_q < lf0 + decrease * s * Dlf0 || !additional_increase_condition(M, q)
         (stop_increasing_at_step == 0) && break
         i = i + 1
-        s = s / contract
+        s = min(s / contract, stop_when_stepsize_exceeds)
         ManifoldsBase.retract_fused!(M, q, p, η, s, retraction_method)
         f_q = f(M, q)
         if i == stop_increasing_at_step
             set_message!(report_messages_in, :stop_increasing, at = i, bound = stop_increasing_at_step, value = s)
             break
         end
-        if s > stop_when_stepsize_exceeds
+        if s >= stop_when_stepsize_exceeds
             set_message!(report_messages_in, :stepsize_exceeds, at = i, bound = stop_when_stepsize_exceeds, value = s)
             break
         end
diff --git a/src/plans/stepsize/stepsize.jl b/src/plans/stepsize/stepsize.jl
index f24047b2d7..40ef290ec6 100644
--- a/src/plans/stepsize/stepsize.jl
+++ b/src/plans/stepsize/stepsize.jl
@@ -41,9 +41,10 @@ with the fields keyword arguments and the retraction is set to the default retra
 $(_kwargs(:retraction_method))
 * `contraction_factor=0.95`
 * `sufficient_decrease=0.1`
+* `last_stepsize=initial_stepsize`
 * `initial_guess=`[`ArmijoInitialGuess`](@ref)`()`
 * `stop_when_stepsize_less=0.0`: stop when the stepsize decreased below this version.
-* `stop_when_stepsize_exceeds=[`max_step`](@ref)`(M)`: provide an absolute maximal step size.
+* `stop_when_stepsize_exceeds=[`max_stepsize`](@ref)`(M)`: provide an absolute maximal step size.
 * `stop_increasing_at_step=100`: for the initial increase test, stop after these many steps
 * `stop_decreasing_at_step=1000`: in the backtrack, stop after these many steps
 """
@@ -81,17 +82,17 @@ mutable struct ArmijoLinesearchStepsize{TRM <: AbstractRetractionMethod, P, I, F
             M::AbstractManifold;
             additional_decrease_condition::DF = (M, p) -> true, additional_increase_condition::IF = (M, p) -> true,
             candidate_point::P = allocate_result(M, rand),
-            contraction_factor::Real = 0.95, initial_stepsize::Real = 1.0,
+            contraction_factor::Real = 0.95, initial_stepsize::Real = 1.0, last_stepsize::Real = initial_stepsize,
             initial_guess::IGF = ArmijoInitialGuess(), retraction_method::TRM = default_retraction_method(M),
             stop_when_stepsize_less::Real = 0.0, stop_when_stepsize_exceeds::Real = max_stepsize(M),
             stop_increasing_at_step::Integer = 100, stop_decreasing_at_step::Integer = 1000,
             sufficient_decrease::Real = 0.1,
         ) where {TRM <: AbstractRetractionMethod, P, IGF, DF, IF}
         R = promote_type(
-            typeof(contraction_factor), typeof(initial_stepsize),
+            typeof(contraction_factor), typeof(initial_stepsize), typeof(last_stepsize),
             typeof(stop_when_stepsize_exceeds), typeof(stop_when_stepsize_less), typeof(sufficient_decrease),
         )
-        cf = convert(R, contraction_factor); is = convert(R, initial_stepsize)
+        cf = convert(R, contraction_factor); is = convert(R, initial_stepsize); ls = convert(R, last_stepsize)
         swse = convert(R, stop_when_stepsize_exceeds); swsl = convert(R, stop_when_stepsize_less)
         sd = convert(R, sufficient_decrease)
         I = promote_type(typeof(stop_increasing_at_step), typeof(stop_decreasing_at_step))
@@ -104,7 +105,7 @@ mutable struct ArmijoLinesearchStepsize{TRM <: AbstractRetractionMethod, P, I, F
         return ArmijoLinesearchStepsize(;
             additional_decrease_condition = additional_decrease_condition,
             additional_increase_condition = additional_increase_condition,
-            candidate_point = candidate_point, contraction_factor = cf, initial_stepsize = is, last_stepsize = is,
+            candidate_point = candidate_point, contraction_factor = cf, initial_stepsize = is, last_stepsize = ls,
             initial_guess = initial_guess, retraction_method = retraction_method,
             stop_when_stepsize_less = swsl, stop_when_stepsize_exceeds = swse, sufficient_decrease = sd,
             stop_increasing_at_step = sias, stop_decreasing_at_step = sdas, messages = msgs
@@ -120,20 +121,26 @@ function (a::ArmijoLinesearchStepsize)(
     )
     p = get_iterate(s)
     grad = isnothing(gradient) ? get_gradient(mp, get_iterate(s)) : gradient
-    return a(mp, p, grad, η; initial_guess = a.initial_guess(mp, s, k, a.last_stepsize, η))
+    return a(mp, p, grad, η; initial_guess = a.initial_guess(mp, s, k, a.last_stepsize, η), kwargs...)
 end
 function (a::ArmijoLinesearchStepsize)(
-        mp::AbstractManoptProblem, p, X, η; initial_guess = 1.0, kwargs...
+        mp::AbstractManoptProblem, p, X, η; initial_guess::Real = 1.0,
+        stop_when_stepsize_exceeds = nothing, kwargs...
     )
     reset_messages!(a.messages)
     l = norm(get_manifold(mp), p, η)
+    swse = if isnothing(stop_when_stepsize_exceeds)
+        (a.stop_when_stepsize_exceeds / l)
+    else
+        stop_when_stepsize_exceeds
+    end
     a.last_stepsize = linesearch_backtrack!(
         get_manifold(mp), a.candidate_point,
         (M, p) -> get_cost_function(get_objective(mp))(M, p), p,
         initial_guess, a.sufficient_decrease, a.contraction_factor, η;
         gradient = X, retraction_method = a.retraction_method,
         stop_when_stepsize_less = (a.stop_when_stepsize_less / l),
-        stop_when_stepsize_exceeds = (a.stop_when_stepsize_exceeds / l),
+        stop_when_stepsize_exceeds = swse,
         stop_increasing_at_step = a.stop_increasing_at_step,
         stop_decreasing_at_step = a.stop_decreasing_at_step,
         additional_decrease_condition = a.additional_decrease_condition,
@@ -685,18 +692,20 @@ end
 Returns the extremal of the quadratic polynomial ``p`` with
 ``p'(a.t)=a.df``, ``p'(b.t)=b.df``.
 
+The result is algebraically equivalent to `(a.t * b.df - b.t * a.df) / (b.df - a.df)`
+but the used formula is more numerically stable.
+
 # Input
 * `a::UnivariateTriple{R}`: triple of bracket value `a`
 * `b::UnivariateTriple{R}`: triple bracket value `b`
 """
 function secant(a::UnivariateTriple{R}, b::UnivariateTriple{R}) where {R}
-    return (a.t * b.df - b.t * a.df) / (b.df - a.df)
+    return (a.t + b.t) / 2 + (b.t - a.t) * (a.df + b.df) / (2 * (a.df - b.df))
 end
 
 """
     cubic_stepsize_update_step(a::Real, b::Real, c::Real, τ::Real)
 
-
 Step function to determine the stepsize update `c` described in
 [Hager:1989](@cite).
 
@@ -720,6 +729,8 @@ function cubic_stepsize_update_step(a::Real, b::Real, c::Real, τ::Real)
 end
 
 """
+    get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t::Real)
+
 Get the `UnivariateTriple` of the problem `mp` related to the step with
 stepsize ``t`` from ``p`` in direction ``η``.
 
@@ -730,7 +741,7 @@ stepsize ``t`` from ``p`` in direction ``η``.
 * `η`: search direction at `p`
 * `t::Real`: step size
 """
-function get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t)
+function get_univariate_triple!(mp::AbstractManoptProblem, cbls::CubicBracketingLinesearchStepsize, p, η, t::Real)
     M = get_manifold(mp)
     cbls.last_stepsize = t
     ManifoldsBase.retract_fused!(M, cbls.candidate_point, p, η, t, cbls.retraction_method)
@@ -754,7 +765,11 @@ function (cbls::CubicBracketingLinesearchStepsize)(
     check_curvature(c::UnivariateTriple) = abs(c.df) < cbls.sufficient_curvature * abs(init.df)
 
     n_iter = 0
-    t = cbls.last_stepsize
+    max_step = cbls.max_stepsize
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        max_step = min(max_step, kwargs[:stop_when_stepsize_exceeds])
+    end
+    t = min(cbls.last_stepsize, max_step)
     c_old = init
     c = get_univariate_triple!(mp, cbls, p, η, t)
     a, b = nothing, nothing
@@ -769,9 +784,9 @@ function (cbls::CubicBracketingLinesearchStepsize)(
             (a, b) = c, c_old
             break
         end
-        (t == cbls.max_stepsize) && return t
+        (t == max_step) && return t
         t *= cbls.stepsize_increase
-        t = min(t, cbls.max_stepsize)
+        t = min(t, max_step)
         c_old = c
         c = get_univariate_triple!(mp, cbls, p, η, t)
     end
@@ -1294,7 +1309,7 @@ mutable struct NonmonotoneLinesearchStepsize{
             retraction_method::TRM = default_retraction_method(M),
             stepsize_reduction::R = 0.5,
             stop_when_stepsize_less::R = 0.0,
-            stop_when_stepsize_exceeds = real(max_stepsize(M)),
+            stop_when_stepsize_exceeds::R = real(max_stepsize(M)),
             stop_increasing_at_step::I = 100,
             stop_decreasing_at_step::I = 1000,
             storage::Union{Nothing, StoreStateAction} = StoreStateAction(
@@ -1388,7 +1403,8 @@ function (a::NonmonotoneLinesearchStepsize)(
         η,
         p_old,
         X_old,
-        k,
+        k;
+        kwargs...,
     )
 end
 function (a::NonmonotoneLinesearchStepsize)(
@@ -1454,6 +1470,13 @@ function (a::NonmonotoneLinesearchStepsize)(
     end
 
     #compute the new step size with the help of the Barzilai-Borwein step size
+    l = norm(M, p, η)
+    local swse # COV_EXCL_LINE
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        swse = kwargs[:stop_when_stepsize_exceeds]
+    else
+        swse = (a.stop_when_stepsize_exceeds / l)
+    end
     a.last_stepsize = linesearch_backtrack!(
         M,
         a.candidate_point,
@@ -1463,11 +1486,11 @@ function (a::NonmonotoneLinesearchStepsize)(
         a.sufficient_decrease,
         a.stepsize_reduction,
         η;
-        lf0 = maximum([a.old_costs[j] for j in 1:min(iter, memory_size)]),
+        lf0 = maximum(view(a.old_costs, 1:min(iter, memory_size))),
         gradient = X,
         retraction_method = a.retraction_method,
-        stop_when_stepsize_less = (a.stop_when_stepsize_less / norm(M, p, η)),
-        stop_when_stepsize_exceeds = (a.stop_when_stepsize_exceeds / norm(M, p, η)),
+        stop_when_stepsize_less = (a.stop_when_stepsize_less / l),
+        stop_when_stepsize_exceeds = swse,
         stop_increasing_at_step = a.stop_increasing_at_step,
         stop_decreasing_at_step = a.stop_decreasing_at_step,
         report_messages_in = a.messages,
@@ -1763,7 +1786,11 @@ function (a::WolfePowellLinesearchStepsize)(
     max_step_increase = ifelse(
         isfinite(a.max_stepsize), min(1.0e9, a.max_stepsize / grad_norm), 1.0e9
     )
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        max_step_increase = min(max_step_increase, kwargs[:stop_when_stepsize_exceeds])
+    end
     step = ifelse(isfinite(a.max_stepsize), min(1.0, a.max_stepsize / grad_norm), 1.0)
+    step = min(step, max_step_increase)
     s_plus = step
     s_minus = step
     # clear messages
@@ -1790,14 +1817,14 @@ function (a::WolfePowellLinesearchStepsize)(
                 break
             end
         end
-        s_plus = 2.0 * s_minus
+        s_plus = min(2.0 * s_minus, max_step_increase)
     else
         vector_transport_to!(M, a.candidate_direction, p, η, a.candidate_point, a.vector_transport_method)
         if get_differential(mp, a.candidate_point, a.candidate_direction; Y = Y) < a.sufficient_curvature * l
             i = 0
             while fNew <= f0 + a.sufficient_decrease * step * l && (s_plus < max_step_increase)
                 # increase
-                s_plus = s_plus * 2.0
+                s_plus = min(s_plus * 2.0, max_step_increase)
                 step = s_plus
                 ManifoldsBase.retract_fused!(M, a.candidate_point, p, η, step, a.retraction_method)
                 fNew = get_cost(mp, a.candidate_point)
@@ -2158,3 +2185,632 @@ end
 function get_last_stepsize(step::WolfePowellBinaryLinesearchStepsize, ::Any...)
     return step.last_stepsize
 end
+
+
+#### Hager-Zhang Linesearch
+
+
+@doc """
+    HagerZhangLinesearchStepsize{P,T,R<:Real} <: Linesearch
+
+Do a bracketing line search to find a step size ``α`` that finds a
+local minimum along the  search direction ``X`` starting from ``p``,
+utilizing cubic polynomial interpolation using the method described in
+[HagerZhang:2006:2](@cite). The function [`secant`](@ref) is used to find the minimum of the
+cubic polynomial fitted to values of the cost function and its derivative at the endpoints
+of the current interval.
+See [`HagerZhangLinesearch`](@ref) for the mathematical details.
+
+# Fields
+
+$(_fields(:p; name = "candidate_point"))
+  as temporary storage for candidates
+* `initial_stepsize::R`: the step size to start the search with
+$(_fields(:retraction_method))
+$(_fields(:vector_transport_method))
+* `initial_guess`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `stepsize_limit`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `max_bracket_iterations`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `start_enforcing_wolfe_conditions_at_bracketing_iteration`: see keyword arguments of
+  [`HagerZhangLinesearch`](@ref) for details.
+* `allow_early_maxstep_termination`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `wolfe_condition_mode`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `ϵ`, `δ`, `σ`, `ω`, `θ`, `γ`, `ρ`, `Δ`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+* `secant_acceptance_ratio`: see keyword arguments of [`HagerZhangLinesearch`](@ref) for details.
+*  `candidate_direction`, `temporary_tangent`: as temporary storage for tangent vectors
+* `triples`: temporary storage for function and derivative evaluations
+* `last_evaluation_index`: to keep track of the number of evaluations performed so far;
+  points at the last filled entry of `triples`.
+* `Qₖ`, `Cₖ`: to keep track of the parameters of the Wolfe condition when in adaptive mode
+* `current_mode`: to keep track of the current Wolfe condition mode when in adaptive mode
+* `last_stepsize`: last stepsize computed since reset
+* `last_cost`: last cost value computed since reset
+* `ϵₖ`: the current ϵ parameter used in the approximate Wolfe condition and bracketing
+
+# Constructor
+
+    HagerZhangLinesearchStepsize(M::AbstractManifold; kwargs...)
+"""
+mutable struct HagerZhangLinesearchStepsize{
+        TF <: Real,
+        TIG <: AbstractInitialLinesearchGuess,
+        TRM <: AbstractRetractionMethod,
+        TVTM <: AbstractVectorTransportMethod,
+        TP,
+        TX,
+    } <: Linesearch
+    # parameters
+    initial_guess::TIG
+    retraction_method::TRM
+    vector_transport_method::TVTM
+    stepsize_limit::TF
+    max_bracket_iterations::Int
+    start_enforcing_wolfe_conditions_at_bracketing_iteration::Int
+    allow_early_maxstep_termination::Bool
+    wolfe_condition_mode::Symbol # :standard, :approximate, :adaptive
+    ϵ::TF # approximate Wolfe termination parameter
+    δ::TF # used in approximate Wolfe condition
+    σ::TF # used in curvature condition
+    ω::TF
+    θ::TF # update rule parameter
+    γ::TF
+    ρ::TF
+    Δ::TF
+    secant_acceptance_ratio::TF
+    # storage for candidates
+    candidate_point::TP
+    candidate_direction::TX
+    temporary_tangent::TX
+    # storage for function evaluations
+    triples::Vector{UnivariateTriple{TF}}
+    last_evaluation_index::Int
+    # storage to be kept between outer solver iterations
+    Qₖ::TF
+    Cₖ::TF
+    current_mode::Symbol
+    # other storage
+    last_stepsize::TF
+    last_cost::TF
+    ϵₖ::TF
+    function HagerZhangLinesearchStepsize(
+            M::AbstractManifold;
+            initial_guess::TIG = HagerZhangInitialGuess(),
+            retraction_method::TRM = default_retraction_method(M),
+            vector_transport_method::TVTM = default_vector_transport_method(M),
+            initial_last_stepsize::TF = NaN,
+            initial_last_cost::TF = NaN,
+            stepsize_limit::TF = Inf,
+            candidate_point = allocate_result(M, rand),
+            candidate_direction = zero_vector(M, candidate_point),
+            max_bracket_iterations::Int = 10,
+            start_enforcing_wolfe_conditions_at_bracketing_iteration::Int = initial_guess isa ConstantStepsize ? 2 : 1,
+            max_function_evaluations::Int = 20,
+            wolfe_condition_mode::Symbol = :adaptive,
+            allow_early_maxstep_termination::Bool = true,
+            ϵ::TF = 1.0e-6,
+            δ::TF = 0.1,
+            σ::TF = 0.9,
+            ω::TF = 1.0e-3,
+            θ::TF = 0.5,
+            γ::TF = 0.66,
+            ρ::TF = 5.0,
+            Δ::TF = 0.7,
+            secant_acceptance_ratio::TF = 1.0e-8,
+        ) where {
+            TIG <: AbstractInitialLinesearchGuess, TRM <: AbstractRetractionMethod,
+            TVTM <: AbstractVectorTransportMethod, TF <: Real,
+        }
+
+        # check parameters
+        @assert δ > 0 && δ < 0.5
+        @assert δ <= σ
+        @assert σ < 1
+        @assert ϵ >= 0
+        @assert ω >= 0 && ω <= 1
+        @assert Δ >= 0 && Δ <= 1
+        @assert θ > 0 && θ < 1
+        @assert γ > 0 && γ < 1
+        @assert ρ > 1
+        @assert stepsize_limit > 0
+        @assert wolfe_condition_mode in (:standard, :approximate, :adaptive)
+        @assert secant_acceptance_ratio >= 0
+
+        # allocate storage
+        triples = Vector{UnivariateTriple{TF}}(undef, max_function_evaluations)
+
+        initial_wolfe_mode = wolfe_condition_mode == :adaptive ? :standard : wolfe_condition_mode
+
+        return new{TF, TIG, TRM, TVTM, typeof(candidate_point), typeof(candidate_direction)}(
+            initial_guess, retraction_method, vector_transport_method, stepsize_limit,
+            max_bracket_iterations, start_enforcing_wolfe_conditions_at_bracketing_iteration,
+            allow_early_maxstep_termination, wolfe_condition_mode,
+            ϵ, δ, σ, ω, θ, γ, ρ, Δ, secant_acceptance_ratio,
+            candidate_point, candidate_direction, zero_vector(M, candidate_point),
+            triples, 0,
+            0.0, 0.0, # Qₖ, Cₖ
+            initial_wolfe_mode,
+            initial_last_stepsize, initial_last_cost, ϵ,
+        )
+    end
+end
+
+function initialize_stepsize!(hzls::HagerZhangLinesearchStepsize)
+    hzls.Qₖ = 0.0
+    hzls.Cₖ = 0.0
+    hzls.last_stepsize = NaN
+    hzls.last_cost = NaN
+    hzls.ϵₖ = hzls.ϵ
+    hzls.current_mode = hzls.wolfe_condition_mode
+    if hzls.current_mode === :adaptive
+        hzls.current_mode = :standard
+    end
+    hzls.last_evaluation_index = 0
+    return hzls
+end
+
+"""
+    _hz_evaluate_next_step(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, α::Real
+    )
+
+Evaluate and store the next trial step for the Hager-Zhang linesearch.
+
+Given the current iterate `p`, search direction `η` (in the tangent space at `p`), and a
+candidate step size `α`, this function
+
+1. Retracts from `p` along `η` by step `α` into `hzls.candidate_point` (using
+     `hzls.retraction_method`),
+2. Vector-transports `η` to the candidate point into `hzls.candidate_direction` (using
+     `hzls.vector_transport_method`),
+3. Evaluates the objective and directional derivative via
+     `get_cost_and_differential(mp, hzls.candidate_point, hzls.candidate_direction)`,
+4. Stores the resulting triple `(α, f, df)` in `hzls.triples` and increments
+     `hzls.last_evaluation_index`.
+
+This helper is side-effecting by design; it mutates `hzls`' internal storage.
+
+# Return value
+
+By default return a tuple with three values:
+-  the index `i_k::Int` at which the new evaluation was stored.
+- `evaluation_limit_termination`: `true` iff the maximum number of stored evaluations
+    has been reached.
+- `wolfe_termination` is `true` iff the (standard or approximate) Wolfe conditions are
+    satisfied for the current candidate, according to `hzls.current_mode`.
+
+# Errors
+
+Throws an error if called more often than the maximum number of allocated function
+evaluations (i.e. if `hzls.triples` would overflow).
+"""
+function _hz_evaluate_next_step(
+        hzls::HagerZhangLinesearchStepsize,
+        M::AbstractManifold,
+        mp::AbstractManoptProblem,
+        p,
+        η,
+        α::Real
+    )
+    triples = hzls.triples
+    max_evals = length(triples)
+    if hzls.last_evaluation_index + 1 > max_evals
+        # this should never happen if the calling code is correct
+        error("Hager-Zhang linesearch exceeded maximum number of function evaluations $(length(hzls.triples)).")
+    end
+    ManifoldsBase.retract_fused!(M, hzls.candidate_point, p, η, α, hzls.retraction_method)
+    vector_transport_to!(
+        M, hzls.candidate_direction, p, η, hzls.candidate_point, hzls.vector_transport_method
+    )
+    f, df = get_cost_and_differential(mp, hzls.candidate_point, hzls.candidate_direction; Y = hzls.temporary_tangent)
+    hzls.last_evaluation_index += 1
+    triples[hzls.last_evaluation_index] = UnivariateTriple(α, f, df)
+
+    wolfe_termination = false
+    evaluation_limit_termination = hzls.last_evaluation_index == max_evals
+    i_k = hzls.last_evaluation_index
+    if hzls.current_mode === :standard
+        # Eq (22) in HagerZhang:2006:2
+        # equivalent to the (T1) condition
+        wolfe_termination = (α * hzls.δ * triples[1].df >= (triples[i_k].f - triples[1].f)) &&
+            (triples[i_k].df >= hzls.σ * triples[1].df)
+    elseif hzls.current_mode === :approximate
+        # Eq (23) in HagerZhang:2006:2 + additional criterion in the (T2) condition
+        wolfe_termination = ((2 * hzls.δ - 1) * triples[1].df >= triples[i_k].df) &&
+            (triples[i_k].df >= hzls.σ * triples[1].df) && triples[i_k].f <= triples[1].f + hzls.ϵₖ
+    else
+        error("Unknown Wolfe condition mode $(hzls.current_mode).")
+    end
+
+    return hzls.last_evaluation_index, evaluation_limit_termination, wolfe_termination
+end
+
+"""
+    _hz_bracket(hzls::HagerZhangLinesearchStepsize, c::Real, max_alpha::Real)
+
+Perform the bracketing phase of the Hager-Zhang linesearch starting from an initial
+stepsize `c` and not exceeding `max_alpha`.
+
+Returns a tuple `(i_a, i_b, f_eval, f_wolfe, f_early_maxstep)` where `i_a` and `i_b` are
+the indices in the stored function evaluations such that the minimum is bracketed between
+`triples[i_a].t` and `triples[i_b].t`. `f_eval` is `true` if the maximum number of function
+evaluations has been reached during the bracketing phase. `f_wolfe` is `true` if the Wolfe
+conditions have been satisfied. `f_early_maxstep` is `true` if the maximum stepsize was
+reached early with negative slope and an improvement over the initial point.
+"""
+function _hz_bracket(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, c::Real, max_alpha::Real
+    )
+    # B0
+    current_step = c
+    local c_index, f_eval, f_wolfe # COV_EXCL_LINE
+    ls_early_exit = false
+    for j in 1:hzls.max_bracket_iterations
+        c_index, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, current_step)
+        if f_eval || (f_wolfe && j >= hzls.start_enforcing_wolfe_conditions_at_bracketing_iteration)
+            break
+        end
+        if hzls.triples[c_index].df >= 0
+            # B1 -- detecting a positive slope
+            # handled after the loop
+            break
+        else
+            if hzls.triples[c_index].f > hzls.triples[1].f + hzls.ϵₖ
+                # B2 -- function value gets sufficiently larger than at 0
+                # perform main bracketing loop (we can skip U0-U2 checks here)
+                (i_a_bar, i_b_bar, f_eval, f_wolfe) = _hz_u3(hzls, M, mp, p, η, 1, c_index)
+                return (i_a_bar, i_b_bar, f_eval, f_wolfe, false)
+            else
+                if current_step == max_alpha
+                    # we've reached maximum alpha so we can't expand anymore
+                    # we handle this case after the loop
+                    ls_early_exit = hzls.allow_early_maxstep_termination
+                    break
+                end
+                # B3 -- widen the bracket
+                current_step *= hzls.ρ
+                if current_step > max_alpha
+                    current_step = max_alpha
+                end
+            end
+        end
+    end
+    # we detected positive slope, ran out of iterations or reached max stepsize
+    # B1 seems to be the best choice for all three cases
+
+    if ls_early_exit
+        # additional termination condition: we reached the maximum stepsize with negative
+        # slope and an improvement over the initial point, so we can exit early with this step
+        return (1, c_index, f_eval, f_wolfe, true)
+    end
+
+    i_min = 1
+    for i in 2:(hzls.last_evaluation_index - 1)
+        if hzls.triples[i].f <= hzls.triples[1].f + hzls.ϵₖ
+            i_min = i
+        end
+    end
+    return (i_min, c_index, f_eval, f_wolfe, false)
+end
+
+"""
+    _hz_update(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int, c::Real
+    )
+
+Perform an update procedure of the Hager-Zhang linesearch given the current bracketing
+indices `i_a` and `i_b` and a candidate stepsize `c`.
+
+Returns indices and termination information `(i_A, i_B, i_c, f_eval, f_wolfe)` where the
+minimum is now bracketed between `alpha_values[i_A]` and `alpha_values[i_B]`. Index `i_c`
+indicates the position at which evaluation of the candidate `c` was stored. If the
+candidate `c` is outside of the current bracket, the last index is returned as `-1`.
+`f_eval` is `true` if the maximum number of function evaluations has been reached.
+`f_wolfe` is `true` if the Wolfe conditions have been satisfied at the candidate `i_c`.
+"""
+function _hz_update(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int, c::Real
+    )
+    # U0
+    if c < hzls.triples[i_a].t || c > hzls.triples[i_b].t
+        return (i_a, i_b, -1, false, false)
+    end
+    i_c, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, c)
+    if hzls.triples[i_c].df >= 0
+        # U1
+        return (i_a, i_c, i_c, f_eval, f_wolfe)
+    else
+        if hzls.triples[i_c].f <= hzls.triples[1].f + hzls.ϵₖ
+            # U2
+            return (i_c, i_b, i_c, f_eval, f_wolfe)
+        else
+            if f_eval || f_wolfe
+                # termination condition met
+                return (i_a, i_b, i_c, f_eval, f_wolfe)
+            else
+                # U3
+                i_a_bar, i_b_bar, f_eval, f_wolfe = _hz_u3(hzls, M, mp, p, η, i_a, i_c)
+                return (i_a_bar, i_b_bar, i_c, f_eval, f_wolfe)
+            end
+        end
+    end
+end
+
+function _hz_u3(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int
+    )
+    i_a_bar = i_a
+    i_b_bar = i_b
+    # the loop should typically terminate before exceeding the number of evaluations
+    f_eval = false
+    f_wolfe = false
+    while hzls.last_evaluation_index < length(hzls.triples)
+        # U3 (a)
+        d = (1 - hzls.θ) * hzls.triples[i_a_bar].t + hzls.θ * hzls.triples[i_b_bar].t
+        i_d, f_eval, f_wolfe = _hz_evaluate_next_step(hzls, M, mp, p, η, d)
+        if hzls.triples[i_d].df >= 0 || f_eval || f_wolfe
+            return (i_a_bar, i_d, f_eval, f_wolfe)
+        else
+            if hzls.triples[i_d].f <= hzls.triples[1].f + hzls.ϵₖ
+                # U3 (b)
+                i_a_bar = i_d
+            else
+                # U3 (c)
+                i_b_bar = i_d
+            end
+        end
+    end
+    return (i_a_bar, i_b_bar, f_eval, f_wolfe)
+end
+
+"""
+    _hz_secant2(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int
+    )
+
+Perform the secant-based update in the Hager-Zhang linesearch.
+
+Computes a trial step using a secant interpolation of the bracketing
+endpoints. If the trial step is too close to an endpoint, falls back to a
+bisection step. Returns the updated bracketing indices and termination flags
+from the internal update routine.
+
+# Arguments
+- `hzls`: linesearch state and storage.
+- `M`: manifold for retractions and transports.
+- `mp`: optimization problem providing cost and differential.
+- `p`: current iterate.
+- `η`: search direction in the tangent space at `p`.
+- `i_a`, `i_b`: indices of the current bracketing interval in `hzls.triples`.
+
+# Return value
+Returns `(i_A, i_B, i_c, f_eval, f_wolfe)` where
+- `i_A`, `i_B`: indices bracketing the minimum after the update,
+- `i_c`: index of the most recent evaluation (or `-1` if the candidate was out of range),
+- `f_eval`: `true` iff the evaluation limit has been reached,
+- `f_wolfe`: `true` iff the Wolfe conditions are satisfied.
+
+# Steps (S1-S4)
+- S1: compute a secant trial `c` from the current bracket and accept it unless too close to
+  an endpoint (otherwise use a bisection step).
+- S2/S3: if the trial becomes a new endpoint, perform an update from that side.
+- S4: return the updated bracket and termination flags.
+"""
+function _hz_secant2(
+        hzls::HagerZhangLinesearchStepsize, M::AbstractManifold,
+        mp::AbstractManoptProblem, p, η, i_a::Int, i_b::Int
+    )
+    # S1
+    c = secant(hzls.triples[i_a], hzls.triples[i_b])
+    width = hzls.triples[i_b].t - hzls.triples[i_a].t
+    if abs(c - hzls.triples[i_a].t) < hzls.secant_acceptance_ratio * width ||
+            abs(c - hzls.triples[i_b].t) < hzls.secant_acceptance_ratio * width
+        # secant too close to an endpoint, use bisection instead
+        # this case is not present in the original algorithm, but the following steps don't make much sense in this case
+        c = (hzls.triples[i_a].t + hzls.triples[i_b].t) / 2
+        return _hz_update(hzls, M, mp, p, η, i_a, i_b, c)
+    end
+    (i_A, i_B, i_c, f_eval, f_wolfe) = _hz_update(hzls, M, mp, p, η, i_a, i_b, c)
+    if f_eval || f_wolfe
+        # not present in the original algorithm, but this seems to be the right way to handle this case
+        return (i_A, i_B, i_c, f_eval, f_wolfe)
+    end
+    if i_c == i_B
+        # S2
+        c_bar = secant(hzls.triples[i_b], hzls.triples[i_B])
+        # S4, part 1
+        return _hz_update(hzls, M, mp, p, η, i_A, i_B, c_bar)
+    elseif i_c == i_A
+        # S3
+        c_bar = secant(hzls.triples[i_a], hzls.triples[i_A])
+        # S4, part 1
+        return _hz_update(hzls, M, mp, p, η, i_A, i_B, c_bar)
+    else
+        # S4, part 2
+        return (i_A, i_B, i_c, f_eval, f_wolfe)
+    end
+end
+
+function (hzls::HagerZhangLinesearchStepsize)(
+        mp::AbstractManoptProblem,
+        s::AbstractManoptSolverState,
+        k::Int,
+        η = (-get_gradient(mp, get_iterate(s)));
+        fp = get_cost(mp, get_iterate(s)),
+        gradient = nothing,
+        kwargs...,
+    )
+    M = get_manifold(mp)
+    p = get_iterate(s)
+
+    dphi_0 = if !isnothing(gradient)
+        real(inner(M, p, η, gradient))
+    else
+        get_differential(mp, p, η; Y = hzls.temporary_tangent)
+    end
+    hzls.triples[1] = UnivariateTriple(0.0, fp, dphi_0)
+    hzls.last_evaluation_index = 1
+
+    # update Qₖ, Cₖ
+    hzls.Qₖ = 1 + hzls.Qₖ * hzls.Δ
+    hzls.Cₖ += (abs(fp) - hzls.Cₖ) / hzls.Qₖ
+
+    if hzls.wolfe_condition_mode == :adaptive
+        # Checking the V3 condition
+        if abs(hzls.last_cost - fp) <= hzls.ω * hzls.Cₖ
+            hzls.current_mode = :approximate
+        end
+    end
+
+    # L0, initialization
+
+    # handle stepsize limit
+    max_alpha = hzls.stepsize_limit
+    if :stop_when_stepsize_exceeds in keys(kwargs)
+        max_alpha = min(
+            kwargs[:stop_when_stepsize_exceeds],
+            max_alpha,
+        )
+    end
+    # guess initial alpha
+    α0 = hzls.initial_guess(mp, s, k, hzls.last_stepsize, η; lf0 = fp, Dlf0 = dphi_0, stop_when_stepsize_exceeds = max_alpha)
+
+    # in case initial_guess does not take into account the stepsize limit, we enforce it here
+    α0 = min(α0, max_alpha)
+
+    # L0, bracket(c)
+    local i_a_j, i_b_j, f_eval, f_wolfe # COV_EXCL_LINE
+    (i_a_j, i_b_j, f_eval, f_wolfe, f_early_maxstep) = _hz_bracket(hzls, M, mp, p, η, α0, max_alpha)
+    !f_early_maxstep && while !(f_eval || f_wolfe)
+        # L1
+        finite_at_b = isfinite(hzls.triples[i_b_j].f)
+        if finite_at_b
+            # _hz_secant2 only makes sense if we have finite function values at both ends
+            # but _hz_update may still work
+            (i_a, i_b, _i_c, f_eval, f_wolfe) = _hz_secant2(hzls, M, mp, p, η, i_a_j, i_b_j)
+        else
+            (i_a, i_b) = (i_a_j, i_b_j)
+        end
+        # L2
+        # we additionally check that we can continue narrowing the bracket
+        if !(f_eval || f_wolfe) &&
+                (!finite_at_b || (hzls.triples[i_b].t - hzls.triples[i_a].t) > hzls.γ * (hzls.triples[i_b_j].t - hzls.triples[i_a_j].t))
+            # secant2 did not reduce the bracket sufficiently
+            # we need to do bisection
+            (i_a, i_b, _i_c, f_eval, f_wolfe) = _hz_update(
+                hzls, M, mp, p, η,
+                i_a, i_b,
+                (hzls.triples[i_a].t + hzls.triples[i_b].t) / 2,
+            )
+        end
+        # L3
+        i_a_j, i_b_j = i_a, i_b
+
+        # loop terminates when we generate a point satisfying T1 or T2, or when we run out
+        # of objective evaluations
+    end
+
+    hzls.last_stepsize = hzls.triples[hzls.last_evaluation_index].t
+    hzls.last_cost = hzls.triples[hzls.last_evaluation_index].f
+    return hzls.last_stepsize
+end
+
+function Base.show(io::IO, hzls::HagerZhangLinesearchStepsize)
+    return print(
+        io,
+        """
+        HagerZhangLinesearch(;
+            initial_guess = $(hzls.initial_guess),
+            retraction_method = $(hzls.retraction_method),
+            vector_transport_method = $(hzls.vector_transport_method),
+            stepsize_limit = $(hzls.stepsize_limit),
+            max_bracket_iterations = $(hzls.max_bracket_iterations),
+            start_enforcing_wolfe_conditions_at_bracketing_iteration = $(hzls.start_enforcing_wolfe_conditions_at_bracketing_iteration),
+            max_function_evaluations = $(length(hzls.triples)),
+            wolfe_condition_mode = $(hzls.wolfe_condition_mode),
+            ϵ = $(hzls.ϵ), δ = $(hzls.δ), σ = $(hzls.σ),
+            ω = $(hzls.ω),
+            θ = $(hzls.θ), γ = $(hzls.γ), secant_acceptance_ratio = $(hzls.secant_acceptance_ratio),
+            ρ = $(hzls.ρ),
+            Δ = $(hzls.Δ),
+        )""",
+    )
+end
+function status_summary(hzls::HagerZhangLinesearchStepsize)
+    return "$(hzls)\nand a computed last stepsize of $(hzls.last_stepsize)"
+end
+
+@doc """
+    HagerZhangLinesearch(; kwargs...)
+    HagerZhangLinesearch(M::AbstractManifold; kwargs...)
+
+A functor representing the curvature minimizing cubic bracketing scheme introduced
+in [HagerZhang:2006:2](@cite).
+
+The following changes were made to the original algorithm from the paper:
+1. The algorithm bails out early out of a secant update that is too close to one of the end
+   points and switches to bisection. Original algorithm performs a similar check at a later
+   stage. This precaution prevents a non-productive evaluation of the objective.
+2. Added `start_enforcing_wolfe_conditions_at_bracketing_iteration`, since with a very low
+   stepsize initialization that satisfies Wolfe conditions we might accept the initial
+   stepsize and not notice that bracketing could help us reach the minimum earlier.
+   Setting `start_enforcing_wolfe_conditions_at_bracketing_iteration`` to 1 reproduces the
+   behavior of the original paper. For example a static initial stepsize equal to 1.0 could
+   benefit from having this parameter increased.
+3. The paper isn't entirely clear on what the final stepsize to return is. This
+   implementation returns the last evaluated stepsize.
+4. The original algorithm doesn't specify what to do when the maximum stepsize is reached
+   during the bracketing phase with a negative slope and an improvement over the initial
+   point. This implementation allows for an early termination in this case, which seems
+   reasonable since we can't expand the bracket anymore and this point is likely close to
+   the minimum. By default this early termination is allowed, but it can be turned off via
+   `allow_early_maxstep_termination` in which case the algorithm continues with the main
+   loop even in this case.
+
+## Keyword arguments
+
+$(_kwargs(:p; name = "candidate_point")) as temporary storage for candidates
+$(_kwargs(:retraction_method))
+$(_kwargs(:vector_transport_method))
+* `initial_guess::AbstractInitialLinesearchGuess=HagerZhangInitialGuess()`: initial linesearch guess strategy
+* `initial_last_stepsize::Real = NaN`: initial value for the stored last stepsize
+* `initial_last_cost::Real = NaN`: initial value for the stored last cost
+* `stepsize_limit::Real = Inf`: upper bound for trial stepsizes during bracketing
+* `candidate_point = allocate_result(M, rand)`: storage for trial points
+* `candidate_direction = zero_vector(M, candidate_point)`: storage for transported directions
+* `max_bracket_iterations::Int = 10`: maximum number of bracketing iterations
+* `start_enforcing_wolfe_conditions_at_bracketing_iteration::Int = initial_guess isa ConstantStepsize ? 2 : 1`:
+  bracketing iteration number at which Wolfe conditions are started to be enforced;
+  setting to 1 may cause no bracketing to occur when the initial guess satisfies the Wolfe
+  conditions.
+* `max_function_evaluations::Int = 20`: maximum number of function evaluations per linesearch
+* `allow_early_maxstep_termination::Bool = true`: whether to allow early termination when
+  the maximum stepsize is reached with negative slope and an improvement over the initial point.
+* `wolfe_condition_mode::Symbol = :adaptive`: one of `:standard`, `:approximate`, or `:adaptive`.
+  Selects between (T1) and (T2) conditions in [HagerZhang:2006:2](@cite).
+* `ϵ::Real = 1.0e-6`: initial allowed increase in function value in termination condition (T2).
+  Allowed range: `ϵ >= 0`.
+* `δ::Real = 0.1`: parameter for approximate Wolfe condition.
+  Allowed range: `0 < δ < 0.5` and `δ <= σ`.
+* `σ::Real = 0.9`: curvature condition parameter. Allowed range: `δ <= σ < 1`.
+* `ω::Real = 1.0e-3`: interpolation safeguard parameter. Allowed range: `0 <= ω <= 1`.
+* `θ::Real = 0.5`: bisection update parameter. Allowed range: `0 < θ < 1`.
+* `γ::Real = 0.66`: determines when a bisection step is performed instead of secant.
+  Allowed range: `0 < γ < 1`.
+* `ρ::Real = 5.0`: bracketing expansion factor. Allowed range: `ρ > 1`.
+* `Δ::Real = 0.7`: Parameter controlling the rate of change of Qₖ.
+  Allowed range: `0 <= Δ <= 1`.
+* `secant_acceptance_ratio::Real = 1.0e-8`: minimum relative interval length
+  for accepting secant step. Allowed range: `secant_acceptance_ratio >= 0`.
+  In case of rejection, a bisection step is performed instead.
+
+$(_note(:ManifoldDefaultFactory, "HagerZhangLinesearch"))
+"""
+function HagerZhangLinesearch(args...; kwargs...)
+    return ManifoldDefaultsFactory(HagerZhangLinesearchStepsize, args...; kwargs...)
+end
diff --git a/src/plans/stopping_criterion.jl b/src/plans/stopping_criterion.jl
index 8b4cf666ac..180f4fbdbf 100644
--- a/src/plans/stopping_criterion.jl
+++ b/src/plans/stopping_criterion.jl
@@ -321,7 +321,6 @@ function (c::StopWhenChangeLess)(mp::AbstractManoptProblem, s::AbstractManoptSol
     c.storage(mp, s, k)
     return false
 end
-indicates_convergence(c::StopWhenChangeLess) = false
 function get_reason(c::StopWhenChangeLess)
     if (c.last_change < c.threshold) && (c.at_iteration >= 0)
         return "At iteration $(c.at_iteration) the algorithm performed a step with a change ($(c.last_change)) less than $(c.threshold).\n"
@@ -334,8 +333,11 @@ function status_summary(c::StopWhenChangeLess; context::Symbol = :default)
     s = has_stopped ? "reached" : "not reached"
     return (_is_inline(context) ? "|Δp| < $(c.threshold):$(_MANOPT_INDENT)" : "A stopping criterion to stop when the change of the iterate is less than $(c.threshold)\n using the $(repr(c.inverse_retraction_method))\n$(_MANOPT_INDENT)") * "$s"
 end
+indicates_convergence(c::StopWhenChangeLess) = false
 function Base.show(io::IO, c::StopWhenChangeLess)
-    return print(io, "StopWhenChangeLess($(c.threshold); inverse_retraction_method=$(repr(c.inverse_retraction_method)))")
+    print(io, "StopWhenChangeLess($(c.threshold); inverse_retraction_method=$(repr(c.inverse_retraction_method))")
+    !ismissing(c.outer_norm) && print(io, ", outer_norm = ", c.outer_norm)
+    return print(io, ")")
 end
 
 """
@@ -381,7 +383,7 @@ function (c::StopWhenCostChangeLess)(
         c.last_change = 2 * c.tolerance
     end
     c.last_change = c.last_cost
-    c.last_cost = get_cost(problem, get_iterate(state))
+    c.last_cost = get_cost(problem, state)
     c.last_change = c.last_change - c.last_cost
     if abs(c.last_change) < c.tolerance
         c.at_iteration = iteration
@@ -410,11 +412,11 @@ end
     StopWhenCostLess <: StoppingCriterion
 
 store a threshold when to stop looking at the cost function of the
-optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`.
+optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p, s)`.
 
 # Constructor
 
-    StopWhenCostLess(ε)
+    StopWhenCostLess(ε::Real)
 
 initialize the stopping criterion to a threshold `ε`.
 """
@@ -432,7 +434,7 @@ function (c::StopWhenCostLess)(
     if k == 0 # reset on init
         c.at_iteration = -1
     end
-    c.last_cost = get_cost(p, get_iterate(s))
+    c.last_cost = get_cost(p, s)
     if c.last_cost < c.threshold
         c.at_iteration = k
         return true
@@ -466,6 +468,78 @@ function set_parameter!(c::StopWhenCostLess, ::Val{:MinCost}, v)
     return c
 end
 
+"""
+    StopWhenRelativeAPosterioriCostChangeLessOrEqual <: StoppingCriterion
+
+A stopping criterion to stop when
+
+````math
+\\frac{f_k - f_{k+1}}{\\max(\\lvert f_k \\rvert, \\lvert f_{k+1} \\rvert, 1)} ≤ tol,
+````
+
+based on Eq. (1) in [ZhuByrdLuNocedal:1997](@cite)
+
+# Fields
+* tolerance: the threshold `tol` in the above formula.
+$(_fields([:at_iteration, :last_change]))
+* `last_cost``: the last cost value
+
+# Constructor
+
+    StopWhenRelativeAPosterioriCostChangeLessOrEqual(tolerance::F)
+
+Initialize the stopping criterion to a threshold `tolerance` for the change of the cost function.
+
+    StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr::Real=1.0e7)
+
+Initialize tolerance to `factr * eps(factr)`, following the convention in [ZhuByrdLuNocedal:1997](@cite).
+"""
+mutable struct StopWhenRelativeAPosterioriCostChangeLessOrEqual{F <: Real} <: StoppingCriterion
+    tolerance::F
+    at_iteration::Int
+    last_cost::F
+    last_change::F
+end
+function StopWhenRelativeAPosterioriCostChangeLessOrEqual(tol::F) where {F <: Real}
+    return StopWhenRelativeAPosterioriCostChangeLessOrEqual{F}(tol, -1, zero(tol), 2 * tol)
+end
+StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr::F = 1.0e7) where {F <: Real} = StopWhenRelativeAPosterioriCostChangeLessOrEqual(factr * eps(typeof(factr)))
+function (c::StopWhenRelativeAPosterioriCostChangeLessOrEqual)(
+        problem::AbstractManoptProblem, state::AbstractManoptSolverState, iteration::Int
+    )
+    if iteration <= 0 # reset on init
+        c.at_iteration = -1
+        c.last_cost = Inf
+        c.last_change = 2 * c.tolerance
+    end
+    current_cost = get_cost(problem, state)
+    c.last_change = (c.last_cost - current_cost) / max(abs(c.last_cost), abs(current_cost), 1)
+    c.last_cost = current_cost
+    if iteration > 1 && c.last_change <= c.tolerance
+        c.at_iteration = iteration
+        return true
+    end
+    return false
+end
+indicates_convergence(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual) = false
+function get_reason(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual)
+    if c.at_iteration >= 0
+        return "At iteration $(c.at_iteration) the algorithm performed a step with a relative a posteriori cost change ($(abs(c.last_change))) less than or equal to $(c.tolerance)."
+    end
+    return ""
+end
+function status_summary(c::StopWhenRelativeAPosterioriCostChangeLessOrEqual)
+    has_stopped = (c.at_iteration >= 0)
+    s = has_stopped ? "reached" : "not reached"
+    return "(fₖ- fₖ₊₁)/max(|fₖ|, |fₖ₊₁|, 1) = $(abs(c.last_change)) ≤ $(c.tolerance):\t$s"
+end
+function Base.show(io::IO, ::MIME"text/plain", c::StopWhenRelativeAPosterioriCostChangeLessOrEqual)
+    return print(
+        io,
+        "StopWhenRelativeAPosterioriCostChangeLessOrEqual with threshold $(c.tolerance).\n    $(status_summary(c))",
+    )
+end
+
 @doc """
     StopWhenEntryChangeLess
 
@@ -722,7 +796,7 @@ Create a stopping criterion with threshold `ε` for the gradient, that is, this
 indicates to stop when [`get_gradient`](@ref) returns a gradient vector of norm less than `ε`,
 where the norm to use can be specified in the `norm=` keyword.
 """
-mutable struct StopWhenGradientNormLess{F, TF, N <: Union{Missing, Real}} <: StoppingCriterion
+mutable struct StopWhenGradientNormLess{F, TF <: Real, N <: Union{Missing, Real}} <: StoppingCriterion
     norm::F
     threshold::TF
     last_change::TF
@@ -730,7 +804,7 @@ mutable struct StopWhenGradientNormLess{F, TF, N <: Union{Missing, Real}} <: Sto
     outer_norm::N
     function StopWhenGradientNormLess(
             ε::TF; norm::F = norm, outer_norm::N = missing
-        ) where {F, TF, N <: Union{Missing, Real}}
+        ) where {F, TF <: Real, N <: Union{Missing, Real}}
         return new{F, TF, N}(norm, ε, zero(ε), -1, outer_norm)
     end
 end
@@ -768,11 +842,87 @@ end
 show(io::IO, c::StopWhenGradientNormLess) = print(io, "StopWhenGradientNormLess($(c.threshold))")
 
 """
-    set_parameter!(c::StopWhenGradientNormLess, :MinGradNorm, v::Float64)
+    set_parameter!(c::StopWhenGradientNormLess{F,TF}, ::Val{:MinGradNorm}, v::TF) where {F,TF<:Real}
 
 Update the minimal gradient norm when an algorithm shall stop
 """
-function set_parameter!(c::StopWhenGradientNormLess, ::Val{:MinGradNorm}, v::Float64)
+function set_parameter!(c::StopWhenGradientNormLess{F, TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF <: Real}
+    c.threshold = v
+    return c
+end
+
+"""
+    StopWhenProjectedNegativeGradientNormLess <: StoppingCriterion
+
+A stopping criterion similar to [`StopWhenGradientNormLess`](@ref), although it checks the
+norm of projected minus gradient. It is primarily useful for optimization involving
+[`Hyperrectangle`](@extref Manifolds.Hyperrectangle).
+
+On manifolds with boundary and manifolds with corners, for a tangent vector ``X``,
+``-X`` might not be a valid tangent vector. As an example, consider the objective
+``f(x)=x^2`` on the interval ``[1, 2]``. Its gradient at 1 is equal to 2, but because the
+point 1 is at the boundary of the interval, the projected negative gradient is equal to 0
+because we can't go in the negative direction.
+"""
+mutable struct StopWhenProjectedNegativeGradientNormLess{F, TF <: Real, N <: Union{Missing, Real}} <: StoppingCriterion
+    norm::F
+    threshold::TF
+    last_change::TF
+    at_iteration::Int
+    outer_norm::N
+    function StopWhenProjectedNegativeGradientNormLess(
+            ε::TF; norm::F = norm, outer_norm::N = missing
+        ) where {F, TF <: Real, N <: Union{Missing, Real}}
+        return new{F, TF, N}(norm, ε, zero(ε), -1, outer_norm)
+    end
+end
+
+function (sc::StopWhenProjectedNegativeGradientNormLess)(
+        mp::AbstractManoptProblem, s::AbstractManoptSolverState, k::Int
+    )
+    M = get_manifold(mp)
+    if k == 0 # reset on init
+        sc.at_iteration = -1
+    end
+    if (k > 0)
+        r = (has_components(M) && !ismissing(sc.outer_norm)) ? (sc.outer_norm,) : ()
+        p = get_iterate(s)
+        mpg = -get_gradient(s)
+        embed_project!(M, mpg, p, mpg)
+        sc.last_change = sc.norm(M, p, mpg, r...)
+        if sc.last_change < sc.threshold
+            sc.at_iteration = k
+            return true
+        end
+    end
+    return false
+end
+function get_reason(c::StopWhenProjectedNegativeGradientNormLess)
+    if (c.last_change < c.threshold) && (c.at_iteration >= 0)
+        return "The algorithm reached approximately critical point after $(c.at_iteration) iterations; the gradient norm ($(c.last_change)) is less than $(c.threshold).\n"
+    end
+    return ""
+end
+function status_summary(c::StopWhenProjectedNegativeGradientNormLess; context::Symbol = :default)
+    (context === :short) && return repr(c)
+    has_stopped = (c.at_iteration >= 0)
+    s = has_stopped ? "reached" : "not reached"
+    (context === :inline) && return "|proj (-grad f)| < $(c.threshold): $s"
+    return "A StoppingCriterion to stop when the negative projected gradient norm is less than a threshold of $(c.threshold):\n$(_MANOPT_INDENT)$s"
+end
+indicates_convergence(c::StopWhenProjectedNegativeGradientNormLess) = true
+function show(io::IO, c::StopWhenProjectedNegativeGradientNormLess)
+    print(io, "StopWhenProjectedNegativeGradientNormLess(", c.threshold, "; norm = ", c.norm)
+    !ismissing(c.outer_norm) && print(io, ", outer_norm = ", c.outer_norm)
+    return print(io, ")")
+end
+
+"""
+    set_parameter!(c::StopWhenProjectedNegativeGradientNormLess{F,TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF<:Real}
+
+Update the minimal gradient norm when an algorithm shall stop.
+"""
+function set_parameter!(c::StopWhenProjectedNegativeGradientNormLess{F, TF}, ::Val{:MinGradNorm}, v::TF) where {F, TF <: Real}
     c.threshold = v
     return c
 end
@@ -840,13 +990,14 @@ end
 """
     StopWhenCostNaN <: StoppingCriterion
 
-stop looking at the cost function of the optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`.
+Stop the solver when the cost function of the optimization problem
+[`AbstractManoptProblem`](@ref) is `NaN`. The value is obtained using `get_cost(p, s)`.
 
 # Constructor
 
     StopWhenCostNaN()
 
-initialize the stopping criterion to NaN.
+initialize the stopping criterion with `at_iteration` equal to -1.
 """
 mutable struct StopWhenCostNaN <: StoppingCriterion
     at_iteration::Int
@@ -859,7 +1010,7 @@ function (c::StopWhenCostNaN)(
         c.at_iteration = -1
     end
     # but still verify whether it yields NaN
-    if isnan(get_cost(p, get_iterate(s)))
+    if isnan(get_cost(p, s))
         c.at_iteration = k
         return true
     end
@@ -885,13 +1036,15 @@ end
 """
     StopWhenIterateNaN <: StoppingCriterion
 
-stop looking at the cost function of the optimization problem from within a [`AbstractManoptProblem`](@ref), i.e `get_cost(p,get_iterate(o))`.
+Stop the solver when the iterate of the optimization problem from within an
+[`AbstractManoptProblem`](@ref) contains `NaN` values.
+The value is obtained using [`get_iterate`](@ref)`(s)`.
 
 # Constructor
 
     StopWhenIterateNaN()
 
-initialize the stopping criterion to NaN.
+Initialize the stopping criterion.
 """
 mutable struct StopWhenIterateNaN <: StoppingCriterion
     at_iteration::Int
diff --git a/src/solvers/FrankWolfe.jl b/src/solvers/FrankWolfe.jl
index a7063bcb8c..5ee47797e1 100644
--- a/src/solvers/FrankWolfe.jl
+++ b/src/solvers/FrankWolfe.jl
@@ -327,6 +327,7 @@ calls_with_kwargs(::typeof(Frank_Wolfe_method!)) = (decorate_objective!, decorat
 
 function initialize_solver!(amp::AbstractManoptProblem, fws::FrankWolfeState)
     get_gradient!(amp, fws.X, fws.p)
+    initialize_stepsize!(fws.stepsize)
     return fws
 end
 function step_solver!(amp::AbstractManoptProblem, fws::FrankWolfeState, k)
diff --git a/src/solvers/alternating_gradient_descent.jl b/src/solvers/alternating_gradient_descent.jl
index aa5cd0ccdf..da43efcba3 100644
--- a/src/solvers/alternating_gradient_descent.jl
+++ b/src/solvers/alternating_gradient_descent.jl
@@ -268,6 +268,7 @@ function initialize_solver!(
     get_gradient!(amp, agds.X, agds.p)
     (agds.order_type == :FixedRandom || agds.order_type == :Random) &&
         (shuffle!(agds.order))
+    initialize_stepsize!(agds.stepsize)
     return agds
 end
 function step_solver!(amp::AbstractManoptProblem, agds::AlternatingGradientDescentState, k)
diff --git a/src/solvers/conjugate_gradient_descent.jl b/src/solvers/conjugate_gradient_descent.jl
index 49fcc93600..0fa67e6e2e 100644
--- a/src/solvers/conjugate_gradient_descent.jl
+++ b/src/solvers/conjugate_gradient_descent.jl
@@ -179,6 +179,7 @@ function initialize_solver!(amp::AbstractManoptProblem, cgs::ConjugateGradientDe
     cgs.δ = -copy(get_manifold(amp), cgs.p, cgs.X)
     # remember the first gradient in coefficient calculation
     cgs.coefficient(amp, cgs, 0)
+    initialize_stepsize!(cgs.stepsize)
     cgs.β = 0.0
     return cgs
 end
diff --git a/src/solvers/gradient_descent.jl b/src/solvers/gradient_descent.jl
index 74512088fc..ddac27e73d 100644
--- a/src/solvers/gradient_descent.jl
+++ b/src/solvers/gradient_descent.jl
@@ -260,6 +260,7 @@ calls_with_kwargs(::typeof(gradient_descent!)) = (decorate_objective!, decorate_
 #
 function initialize_solver!(mp::AbstractManoptProblem, s::GradientDescentState)
     get_gradient!(mp, s.X, s.p)
+    initialize_stepsize!(s.stepsize)
     return s
 end
 function step_solver!(p::AbstractManoptProblem, s::GradientDescentState, k)
diff --git a/src/solvers/projected_gradient_method.jl b/src/solvers/projected_gradient_method.jl
index d68c2fdbf9..f53bf02102 100644
--- a/src/solvers/projected_gradient_method.jl
+++ b/src/solvers/projected_gradient_method.jl
@@ -266,6 +266,7 @@ calls_with_kwargs(::typeof(projected_gradient_method!)) = (decorate_objective!,
 
 function initialize_solver!(amp::AbstractManoptProblem, pgms::ProjectedGradientMethodState)
     get_gradient!(amp, pgms.X, pgms.p)
+    initialize_stepsize!(pgms.stepsize)
     return pgms
 end
 
diff --git a/src/solvers/proximal_gradient_method.jl b/src/solvers/proximal_gradient_method.jl
index 03bb8ea1a9..201a6916c0 100644
--- a/src/solvers/proximal_gradient_method.jl
+++ b/src/solvers/proximal_gradient_method.jl
@@ -182,6 +182,7 @@ function initialize_solver!(amp::AbstractManoptProblem, pgms::ProximalGradientMe
     M = get_manifold(amp)
     zero_vector!(M, pgms.X, pgms.p)
     copyto!(M, pgms.a, pgms.p)
+    initialize_stepsize!(pgms.stepsize)
     return pgms
 end
 
diff --git a/src/solvers/quasi_Newton.jl b/src/solvers/quasi_Newton.jl
index 374f91e7fd..388f0e03f8 100644
--- a/src/solvers/quasi_Newton.jl
+++ b/src/solvers/quasi_Newton.jl
@@ -202,10 +202,10 @@ $(_args([:M, :f, :grad_f, :p]))
 
 # Keyword arguments
 
-* `basis=`[`DefaultOrthonormalBasis`](@extref ManifoldsBase.DefaultOrthonormalBasis)`()`:
+* `basis::AbstractBasis=`[`DefaultOrthonormalBasis`](@extref ManifoldsBase.DefaultOrthonormalBasis)`()`:
   basis to use within each of the the tangent spaces to represent
   the Hessian (inverse) for the cases where it is stored in full (matrix) form.
-* `cautious_update=false`:
+* `cautious_update::Bool=false`:
    whether or not to use the [`QuasiNewtonCautiousDirectionUpdate`](@ref)
    which wraps the `direction_update`.
 * `cautious_function=(x) -> x * 1e-4`:
@@ -221,7 +221,7 @@ $(_kwargs(:evaluation; add_properties = [:GradientExample]))
    See also `initial_scale`.
 * `initial_scale=1.0`: scale initial `s` to use in with $(_doc_QN_init_scaling) in the computation of the limited memory approach.
   see also `initial_operator`
-* `memory_size=20`: limited memory, number of ``s_k, y_k`` to store.
+* `memory_size::Int=min(manifold_dimension(M), 20)`: limited memory, number of ``s_k, y_k`` to store.
    Set to a negative value to use a full memory (matrix) representation
 * `nondescent_direction_behavior=:reinitialize_direction_update`:
   specify how non-descent direction is handled. This can be
@@ -323,12 +323,15 @@ function quasi_Newton!(
         ),
         stopping_criterion::StoppingCriterion = StopAfterIteration(max(1000, memory_size)) |
             StopWhenGradientNormLess(1.0e-6),
+        nonpositive_curvature_behavior::Symbol = :ignore,
+        sy_tol::Real = 1.0e-8,
         kwargs...,
     ) where {
         E <: AbstractEvaluationType,
         O <: Union{AbstractManifoldFirstOrderObjective{E}, AbstractDecoratedManifoldObjective{E}},
     }
     keywords_accepted(quasi_Newton!; kwargs...)
+    local local_dir_upd # COV_EXCL_LINE
     if memory_size >= 0
         local_dir_upd = QuasiNewtonLimitedMemoryDirectionUpdate(
             M,
@@ -338,7 +341,12 @@ function quasi_Newton!(
             initial_scale = initial_scale,
             (project!) = (project!),
             vector_transport_method = vector_transport_method,
+            nonpositive_curvature_behavior = nonpositive_curvature_behavior,
+            sy_tol = sy_tol,
         )
+        if has_anisotropic_max_stepsize(M)
+            local_dir_upd = QuasiNewtonLimitedMemoryBoxDirectionUpdate(local_dir_upd)
+        end
     else
         local_dir_upd = QuasiNewtonMatrixDirectionUpdate(
             M,
@@ -377,21 +385,32 @@ function quasi_Newton!(
 end
 calls_with_kwargs(::typeof(quasi_Newton!)) = (decorate_objective!, decorate_state!)
 
+function _get_max_stepsize(M::AbstractManifold, qns::QuasiNewtonState)
+    current_max_stepsize = get_parameter(qns.direction_update, Val(:max_stepsize))
+    if !isnothing(current_max_stepsize) && !isfinite(current_max_stepsize)
+        return max_stepsize(M, qns.p) / norm(qns.η)
+    else
+        return current_max_stepsize
+    end
+end
+
 function initialize_solver!(amp::AbstractManoptProblem, qns::QuasiNewtonState)
     M = get_manifold(amp)
     get_gradient!(amp, qns.X, qns.p)
     copyto!(M, qns.sk, qns.p, qns.X)
     copyto!(M, qns.yk, qns.p, qns.X)
     initialize_update!(qns.direction_update)
+    initialize_stepsize!(qns.stepsize)
     return qns
 end
 function step_solver!(mp::AbstractManoptProblem, qns::QuasiNewtonState, k)
     M = get_manifold(mp)
-    get_gradient!(mp, qns.X, qns.p)
+    # qns.X should be the correct gradient at qns.p from initialization or the previous step
     qns.direction_update(qns.η, mp, qns)
+    current_max_stepsize = _get_max_stepsize(M, qns)
     if !(qns.nondescent_direction_behavior === :ignore)
         qns.nondescent_direction_value = real(inner(M, qns.p, qns.η, qns.X))
-        if qns.nondescent_direction_value > 0
+        if qns.nondescent_direction_value >= 0
             if qns.nondescent_direction_behavior === :step_towards_negative_gradient ||
                     qns.nondescent_direction_behavior === :reinitialize_direction_update
                 copyto!(M, qns.η, qns.X)
@@ -400,9 +419,20 @@ function step_solver!(mp::AbstractManoptProblem, qns::QuasiNewtonState, k)
             if qns.nondescent_direction_behavior === :reinitialize_direction_update
                 initialize_update!(qns.direction_update)
             end
+            # update direction after reinitialization to get a valid one
+            if qns.nondescent_direction_behavior === :step_towards_negative_gradient ||
+                    qns.nondescent_direction_behavior === :reinitialize_direction_update
+                qns.direction_update(qns.η, mp, qns)
+                current_max_stepsize = _get_max_stepsize(M, qns)
+            end
         end
     end
-    α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X)
+    local α # COV_EXCL_LINE
+    if isnothing(current_max_stepsize)
+        α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X)
+    else
+        α = qns.stepsize(mp, qns, k, qns.η; gradient = qns.X, stop_when_stepsize_exceeds = current_max_stepsize)
+    end
     copyto!(M, qns.p_old, get_iterate(qns))
     ManifoldsBase.retract_fused!(M, qns.p, qns.p, qns.η, α, qns.retraction_method)
     qns.η .*= α
@@ -703,6 +733,52 @@ function update_hessian!(
     return d
 end
 
+function fill_rho_i!(M::AbstractManifold, p, d::QuasiNewtonLimitedMemoryDirectionUpdate, i::Int)
+    v = inner(M, p, d.memory_s[i], d.memory_y[i])
+    if d.nonpositive_curvature_behavior === :ignore && iszero(v)
+        d.ρ[i] = zero(eltype(d.ρ))
+        if length(d.message) > 0
+            d.message = replace(d.message, " i=" => " i=$i,")
+            d.message = replace(d.message, "summand in" => "summands in")
+        else
+            d.message = "The inner products ⟨s_i,y_i⟩ ≈ 0, i=$i, ignoring summand in approximation."
+        end
+    elseif d.nonpositive_curvature_behavior === :byrd && v <= d.sy_tol * norm(M, p, d.memory_y[i])
+        d.ρ[i] = zero(eltype(d.ρ))
+        if length(d.message) > 0
+            d.message = replace(d.message, " i=" => " i=$i,")
+            d.message = replace(d.message, "summand in" => "summands in")
+        else
+            d.message = "The inner products ⟨s_i,y_i⟩ <= $(d.sy_tol * norm(M, p, d.memory_y[i])), i=$i, removing summand from approximation."
+        end
+    else
+        d.ρ[i] = 1 / v
+    end
+    return d
+end
+
+function _drop_zero_rho_vectors!(d::QuasiNewtonLimitedMemoryDirectionUpdate{U}) where {U <: InverseBFGS}
+    T = eltype(d.memory_s)
+    memory_size = capacity(d.memory_s)
+    new_scb = CircularBuffer{T}(memory_size)
+    new_ycb = CircularBuffer{T}(memory_size)
+    new_ρ = similar(d.ρ)
+    fill!(new_ρ, 0)
+    j = 1
+    for i in 1:length(d.memory_s)
+        if !iszero(d.ρ[i])
+            push!(new_scb, d.memory_s[i])
+            push!(new_ycb, d.memory_y[i])
+            new_ρ[j] = d.ρ[i]
+            j += 1
+        end
+    end
+    d.memory_s = new_scb
+    d.memory_y = new_ycb
+    d.ρ = new_ρ
+    return d
+end
+
 # Limited-memory update
 function update_hessian!(
         d::QuasiNewtonLimitedMemoryDirectionUpdate{U},
@@ -716,6 +792,7 @@ function update_hessian!(
     start = length(d.memory_s) == capacity(d.memory_s) ? 2 : 1
     M = get_manifold(mp)
     p = get_iterate(st)
+    reforming_required = false
     for i in start:length(d.memory_s)
         # transport all stored tangent vectors in the tangent space of the next iterate
         vector_transport_to!(
@@ -724,6 +801,28 @@ function update_hessian!(
         vector_transport_to!(
             M, d.memory_y[i], p_old, d.memory_y[i], p, d.vector_transport_method
         )
+
+        # what if division by zero happened here, setting to zero ignores this in the next step
+        # pre-compute in case inner is expensive
+        fill_rho_i!(M, p, d, i)
+        if d.nonpositive_curvature_behavior === :byrd && iszero(d.ρ[i])
+            reforming_required = true
+        end
+    end
+
+    if reforming_required
+        # we need to move first vectors in memory too because they most likely won't be
+        # overwritten by new pairs
+        if start == 2
+            vector_transport_to!(
+                M, d.memory_s[1], p_old, d.memory_s[1], p, d.vector_transport_method
+            )
+            vector_transport_to!(
+                M, d.memory_y[1], p_old, d.memory_y[1], p, d.vector_transport_method
+            )
+            fill_rho_i!(M, p, d, 1)
+        end
+        _drop_zero_rho_vectors!(d)
     end
 
     # add newest
@@ -732,6 +831,7 @@ function update_hessian!(
         old_sk = popfirst!(d.memory_s)
         copyto!(M, old_sk, st.sk)
         push!(d.memory_s, old_sk)
+        circshift!(d.ρ, -1)
     else
         push!(d.memory_s, copy(M, st.sk))
     end
@@ -742,6 +842,9 @@ function update_hessian!(
     else
         push!(d.memory_y, copy(M, st.yk))
     end
+
+    fill_rho_i!(M, p, d, length(d.memory_s))
+
     return d
 end
 
@@ -770,7 +873,27 @@ function update_hessian!(
             vector_transport_to!(
                 M, d.update.memory_y[i], p_old, d.update.memory_y[i], p, d.update.vector_transport_method,
             )
+            fill_rho_i!(M, p, d.update, i)
         end
     end
     return d
 end
+
+function get_cost(
+        mp::AbstractManoptProblem, s::QuasiNewtonState{
+            P, T,
+            <:AbstractQuasiNewtonDirectionUpdate,
+            <:StoppingCriterion,
+            <:HagerZhangLinesearchStepsize,
+        }
+    ) where {P, T}
+
+    hzls = s.stepsize
+    if hzls.last_evaluation_index === 0
+        # if no evaluation was performed, we need to compute the cost
+        return get_cost(mp, s.p)
+    else
+        # we can reuse the stored function value from the linesearch
+        return hzls.triples[hzls.last_evaluation_index].f
+    end
+end
diff --git a/src/solvers/solver.jl b/src/solvers/solver.jl
index a26515faae..ec64db0a65 100644
--- a/src/solvers/solver.jl
+++ b/src/solvers/solver.jl
@@ -189,3 +189,20 @@ end
 function stop_solver!(p::AbstractManoptProblem, s::ReturnSolverState, k)
     return stop_solver!(p, s.state, k)
 end
+
+"""
+    get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState)
+
+Get cost at the current iterate of the solver state `s` for the problem `p`.
+The method may be implemented by particular solvers if they store the cost at the current
+iterate in the state, but by default it is obtained by calling `get_cost(p, get_iterate(s))`.
+"""
+function get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState)
+    return _get_cost(p, s, dispatch_state_decorator(s))
+end
+function _get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState, ::Val{false})
+    return get_cost(p, get_iterate(s))
+end
+function _get_cost(p::AbstractManoptProblem, s::AbstractManoptSolverState, ::Val{true})
+    return get_cost(p, s.state)
+end
diff --git a/src/solvers/stochastic_gradient_descent.jl b/src/solvers/stochastic_gradient_descent.jl
index 0e40d5fc5c..9448b3d4a4 100644
--- a/src/solvers/stochastic_gradient_descent.jl
+++ b/src/solvers/stochastic_gradient_descent.jl
@@ -295,6 +295,7 @@ calls_with_kwargs(::typeof(stochastic_gradient_descent!)) = (decorate_objective!
 function initialize_solver!(::AbstractManoptProblem, s::StochasticGradientDescentState)
     s.k = 1
     (s.order_type == :FixedRandom) && (shuffle!(s.order))
+    initialize_stepsize!(s.stepsize)
     return s
 end
 function step_solver!(mp::AbstractManoptProblem, s::StochasticGradientDescentState, iter)
diff --git a/src/solvers/subgradient.jl b/src/solvers/subgradient.jl
index a2579adbd8..5e4198daea 100644
--- a/src/solvers/subgradient.jl
+++ b/src/solvers/subgradient.jl
@@ -201,6 +201,7 @@ function initialize_solver!(mp::AbstractManoptProblem, sgs::SubGradientMethodSta
     M = get_manifold(mp)
     copyto!(M, sgs.p_star, sgs.p)
     sgs.X = zero_vector(M, sgs.p)
+    initialize_stepsize!(sgs.stepsize)
     return sgs
 end
 function step_solver!(mp::AbstractManoptProblem, sgs::SubGradientMethodState, k)
diff --git a/test/helpers/test_linesearches.jl b/test/helpers/test_linesearches.jl
index e942c3b9f5..1e10f37d6c 100644
--- a/test/helpers/test_linesearches.jl
+++ b/test/helpers/test_linesearches.jl
@@ -59,4 +59,20 @@ using Test
     initialize_solver!(mp, st_qn)
     ls_mt = Manopt.LineSearchesStepsize(M, LineSearches.MoreThuente())
     @test_throws ErrorException ls_mt(mp_throw, st_qn, 1; fp = rosenbrock(M, x0))
+
+    # test max stepsize limit enforcement
+    @test ls_hz(mp, st_qn, 1, [1.0, 2.0, 3.0, 4.0, 0.0]; stop_when_stepsize_exceeds = 0.1) == 0.1
+
+    @testset "max stepsize limit setting" begin
+        lss = [
+            LineSearches.MoreThuente(),
+            LineSearches.HagerZhang(),
+        ]
+        for ls in lss
+            nls = Manopt.linesearches_set_max_alpha(ls, 0.5)
+            @test Manopt.linesearches_get_max_alpha(nls) == 0.5
+            nls2 = Manopt.linesearches_set_max_alpha(ls, Inf)
+            @test Manopt.linesearches_get_max_alpha(nls2) == Inf
+        end
+    end
 end
diff --git a/test/helpers/test_manifold_extra_functions.jl b/test/helpers/test_manifold_extra_functions.jl
index f7a7540429..c775cb1cc9 100644
--- a/test/helpers/test_manifold_extra_functions.jl
+++ b/test/helpers/test_manifold_extra_functions.jl
@@ -69,6 +69,7 @@ Random.seed!(42)
 
         R3 = Euclidean(3)
         TR3 = TangentBundle(R3)
+
         p = [0.0, 1.0, 0.0]
         X = [0.0, 0.0, 0.0]
 
@@ -83,6 +84,14 @@ Random.seed!(42)
         @test Manopt.max_stepsize(R3, p) == Inf
         @test Manopt.max_stepsize(TR3, ArrayPartition(p, X)) == Inf
 
+        S_R3 = ProductManifold(M, R3)
+        @test Manopt.max_stepsize(S_R3) ≈ π
+        @test Manopt.max_stepsize(S_R3, ArrayPartition(p, [0.0, 0.0, 0.0])) ≈ π
+
+        S_pow = PowerManifold(M, NestedPowerRepresentation(), 3)
+        @test Manopt.max_stepsize(S_pow) ≈ π
+        @test Manopt.max_stepsize(S_pow, [p, p, p]) ≈ π
+
         Mfr = FixedRankMatrices(5, 4, 2)
         pfr = SVDMPoint(
             [
@@ -103,6 +112,9 @@ Random.seed!(42)
         M = Hyperrectangle([-3, -1.5], [3, 1.5])
         @test Manopt.max_stepsize(M) ≈ 6.0
         @test Manopt.max_stepsize(M, [-1, 0.5]) ≈ 4.0
+
+        M = ProbabilitySimplex(3)
+        @test Manopt.max_stepsize(M) == 1.0
     end
     @testset "Vector space default" begin
         @test Manopt.Rn(Val(:Manopt), 3) isa ManifoldsBase.DefaultManifold
diff --git a/test/plans/test_stepsize.jl b/test/plans/test_stepsize.jl
index a66b8beafd..60c0f3ce5c 100644
--- a/test/plans/test_stepsize.jl
+++ b/test/plans/test_stepsize.jl
@@ -87,6 +87,7 @@ end
     s3 = WolfePowellBinaryLinesearch()(M)
     @test Manopt.get_message(s3) == ""
     @test startswith(repr(s3), "WolfePowellBinaryLinesearchStepsize(;")
+    @test get_last_stepsize(s3) == 0.0
     @test startswith(Manopt.status_summary(s3), "A Wolfe Powell bisection line search")
     # no stepsize yet so `repr` and summary are the same
     s4 = WolfePowellLinesearch()(M)
@@ -262,6 +263,534 @@ end
         clbs = CubicBracketingLinesearch(; sufficient_curvature = 1.0e-16, min_bracket_width = 0.0, initial_stepsize = 0.5)(M)
         @test clbs(dmp, gs, 1) ≈ 1 / 6 atol = 5.0e-4
     end
+    @testset "secant numerical stability" begin
+        # Large offset, small interval
+        a = 1.0e7
+        b = a + 1.0e-6
+
+        # Choose derivatives that differ slightly
+        ga = 1.0
+        gb = nextfloat(ga)   # smallest representable difference
+
+        # minimizer using affine formula
+        x_ref = a - ga * (b - a) / (gb - ga)
+
+        err_secant = abs(
+            Manopt.secant(
+                Manopt.UnivariateTriple(a, 0.0, ga),
+                Manopt.UnivariateTriple(b, 0.0, gb)
+            ) - x_ref
+        )
+        @test err_secant < 1.0e-6
+    end
+    @testset "HagerZhang Linesearch Stepsize" begin
+        M = Euclidean(2)
+        f_sum_sq(M, p) = sum(p .^ 2)
+        grad_f_sum_sq(M, p) = 2 .* p
+        dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_sum_sq, grad_f_sum_sq))
+        p = [1.0, 2.0]
+        η = -grad_f_sum_sq(M, p)
+        gs = GradientDescentState(M; p = p)
+
+        hzls = HagerZhangLinesearch()(M)
+        @test startswith(repr(hzls), "HagerZhangLinesearch(;")
+        @test startswith(Manopt.status_summary(hzls), "HagerZhangLinesearch(;")
+        @test Manopt.get_message(hzls) == ""
+
+        α = hzls(dmp, gs, 1, η)
+        @test isfinite(α)
+        @test α > 0
+        α2 = hzls(dmp, gs, 1, η; gradient = grad_f_sum_sq(M, p))
+        @test α2 ≈ α
+        @test hzls.last_stepsize == α
+        @test hzls.last_cost <= f_sum_sq(M, p) + 1.0e-12
+
+        hzls_limit = Manopt.HagerZhangLinesearchStepsize(M; stepsize_limit = 0.05)
+        α_limit = hzls_limit(dmp, gs, 1, η)
+        @test α_limit <= 0.05 + eps(0.05)
+        @test hzls_limit.last_stepsize == α_limit
+        α_limit_kw = hzls_limit(dmp, gs, 2, η; stop_when_stepsize_exceeds = 0.01)
+        @test α_limit_kw <= 0.01 + eps(0.01)
+        @testset "Running out of evaluations in _hz_evaluate_next_step" begin
+            N = length(hzls_limit.triples) - hzls_limit.last_evaluation_index
+            for i in 1:N
+                Manopt._hz_evaluate_next_step(hzls_limit, M, dmp, p, η, 0.1)
+            end
+            @test_throws ErrorException Manopt._hz_evaluate_next_step(hzls_limit, M, dmp, p, η, 0.1)
+        end
+        @testset "Wolfe condition modes" begin
+            hzls_default = Manopt.HagerZhangLinesearchStepsize(M)
+            hzls.current_mode = :invalid_mode
+            @test_throws ErrorException hzls(dmp, gs, 1, η)
+        end
+
+
+        hzls_approx = Manopt.HagerZhangLinesearchStepsize(
+            M; wolfe_condition_mode = :approximate, stepsize_limit = 0.2
+        )
+        α_approx = hzls_approx(dmp, gs, 1, η)
+        @test α_approx > 0
+
+        @testset "termination modes" begin
+            hzls_std = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                wolfe_condition_mode = :standard,
+                initial_guess = Manopt.ConstantInitialGuess(0.5),
+                max_function_evaluations = 5,
+            )
+            α_std = hzls_std(dmp, gs, 1, η)
+            @test isapprox(α_std, 0.5; rtol = 1.0e-12, atol = 0.0)
+            @test hzls_std.current_mode == :standard
+
+            hzls_adapt = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                wolfe_condition_mode = :adaptive,
+                initial_guess = Manopt.ConstantInitialGuess(0.5),
+                initial_last_cost = f_sum_sq(M, p),
+                ω = 1.0,
+                max_function_evaluations = 5,
+            )
+            α_adapt = hzls_adapt(dmp, gs, 1, η)
+            @test α_adapt > 0
+            @test hzls_adapt.current_mode == :approximate
+
+            hzls_eval = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                wolfe_condition_mode = :standard,
+                initial_guess = Manopt.ConstantInitialGuess(1.0),
+                max_function_evaluations = 2,
+            )
+            α_eval = hzls_eval(dmp, gs, 1, η)
+            @test α_eval > 0
+            @test hzls_eval.last_evaluation_index == length(hzls_eval.triples)
+        end
+        @testset "B1 bracketing test" begin
+            M = Euclidean(1)
+            f(M, p) = sum(p .^ 2)
+            grad_f(M, p) = 2 .* p
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [1.0]
+            η = -grad_f(M, p)
+            gs = GradientDescentState(M; p = p)
+            hzls_b1 = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                initial_guess = Manopt.ConstantInitialGuess(0.75),
+                start_enforcing_wolfe_conditions_at_bracketing_iteration = 2,
+                max_bracket_iterations = 1,
+            )
+            α_b1 = hzls_b1(dmp, gs, 1, η)
+            @test α_b1 > 0
+        end
+        @testset "B2 bracketing test" begin
+            M = Euclidean(1)
+            # f(x) = -22 x^3 + 33 x^2 - x
+            # grad_f(x) = -66 x^2 + 66 x - 1
+            f(M, p) = -22 * p[1]^3 + 33 * p[1]^2 - p[1]
+            grad_f(M, p) = [-66 * p[1]^2 + 66 * p[1] - 1]
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [0.0]
+            η = [1.0] # Descent direction
+            gs = GradientDescentState(M; p = p)
+            hzls_b2 = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                initial_guess = Manopt.ConstantInitialGuess(1.0),
+                start_enforcing_wolfe_conditions_at_bracketing_iteration = 2,
+                max_bracket_iterations = 2,
+            )
+            α = hzls_b2(dmp, gs, 1, η)
+            @test α > 0
+        end
+        @testset "B3 bracketing test" begin
+            M = Euclidean(1)
+            # f(x) = -x
+            # grad_f(x) = -1
+            f(M, p) = -p[1]
+            grad_f(M, p) = [-1.0]
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [0.0]
+            η = [1.0] # Descent direction
+            gs = GradientDescentState(M; p = p)
+            hzls_b3 = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                initial_guess = Manopt.ConstantInitialGuess(1.0),
+                stepsize_limit = 2.0,
+                max_bracket_iterations = 2,
+            )
+            α = hzls_b3(dmp, gs, 1, η)
+            @test α > 0
+        end
+        @testset "U1 trigger test" begin
+            M = Euclidean(1)
+            # f(x) = x^2 / 2
+            # grad_f(x) = x
+            f(M, p) = p[1]^2 / 2
+            grad_f(M, p) = [p[1]]
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [1.0]
+            η = [-1.0] # Descent direction
+            gs = GradientDescentState(M; p = p)
+            hzls_u1 = Manopt.HagerZhangLinesearchStepsize(
+                M;
+                initial_guess = Manopt.ConstantInitialGuess(2.0),
+            )
+            # We expect U1 to be triggered during the update (secant is exact, slope 0 >= 0)
+            α = hzls_u1(dmp, gs, 1, η)
+            @test α > 0
+        end
+        @testset "U2 trigger test" begin
+            M = Euclidean(1)
+            # We mock f and grad_f to trigger U2 termination
+            # We need:
+            # 1. Starting at p=0 with descent direction (df < 0)
+            # 2. Bracketing finds a point with df > 0 (to finish bracketing) -> p=1.0, df=1.0
+            # 3. Refinement hits max evaluations at a point with df < 0 and f > f(0)+eps -> p=0.5, f=10.0, df=-0.1
+
+            function f_u2(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return 0.0
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return 0.0
+                elseif isapprox(v, 0.5; atol = 1.0e-9)
+                    return 10.0
+                end
+                return 0.0
+            end
+
+            function grad_f_u2(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return [1.0]
+                elseif isapprox(v, 0.5; atol = 1.0e-9)
+                    return [-0.1]
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u2, grad_f_u2))
+            p = [0.0]
+            η = [1.0]
+            gs = GradientDescentState(M; p = p)
+            hzls_u2 = Manopt.HagerZhangLinesearchStepsize(
+                M; initial_guess = Manopt.ConstantInitialGuess(1.0), max_function_evaluations = 3
+            )
+            α = hzls_u2(dmp, gs, 1, η)
+            @test α > 0
+        end
+        @testset "U3 trigger test" begin
+            M = Euclidean(1)
+            # Trigger U3 by having a point that satisfies conditions for U2 but f_eval is false.
+            # Same landscape as U2:
+            # p=0, df=-1 (start)
+            # p=1, df=1 (end of bracket)
+            # p=0.5, f=10, df=-0.1 (high function value, negative slope)
+
+            function f_u3(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return 0.0
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return 0.0
+                elseif isapprox(v, 0.5; atol = 1.0e-9)
+                    return 10.0
+                end
+                return 0.0
+            end
+
+            function grad_f_u3(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return [1.0]
+                elseif isapprox(v, 0.5; atol = 1.0e-9)
+                    return [-0.1]
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3, grad_f_u3))
+            p = [0.0]
+            η = [1.0] # Descent direction
+            gs = GradientDescentState(M; p = p)
+            # Set max_function_evaluations > 3 so we don't hit U2 termination (f_eval=true)
+            hzls_u3 = Manopt.HagerZhangLinesearchStepsize(
+                M; initial_guess = Manopt.ConstantInitialGuess(1.0), max_function_evaluations = 5
+            )
+            α = hzls_u3(dmp, gs, 1, η)
+            @test α > 0
+        end
+        @testset "U3 (b) trigger test" begin
+            M = Euclidean(1)
+            # Force U3 (b) in _hz_u3:
+            # 1) At d=0.5 we need df < 0 and f(d) <= f(0) + ϵₖ with no termination,
+            #    so i_a_bar gets updated to i_d.
+            # 2) On the next U3 iteration we return from the loop.
+            function f_u3b(M, q)
+                return 0.0
+            end
+
+            function grad_f_u3b(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-12)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-12)
+                    return [1.0]
+                elseif isapprox(v, 0.5; atol = 1.0e-12)
+                    return [-1.0]
+                elseif isapprox(v, 0.75; atol = 1.0e-12)
+                    return [1.0]
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3b, grad_f_u3b))
+            p = [0.0]
+            η = [1.0]
+            hzls_u3b = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 4)
+            Manopt.initialize_stepsize!(hzls_u3b)
+            Manopt._hz_evaluate_next_step(hzls_u3b, M, dmp, p, η, 0.0)
+            Manopt._hz_evaluate_next_step(hzls_u3b, M, dmp, p, η, 1.0)
+
+            (i_a, i_b, f_eval, f_wolfe) = Manopt._hz_u3(hzls_u3b, M, dmp, p, η, 1, 2)
+            @test (i_a, i_b) == (3, 4)
+            @test f_eval
+            @test !f_wolfe
+        end
+        @testset "U3 (c) info trigger test" begin
+            M = Euclidean(1)
+            # Force U3 (c) inside _hz_u3 by making the mid-point have
+            # negative slope but too large function value.
+            function f_u3c(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-12)
+                    return 0.0
+                elseif isapprox(v, 1.0; atol = 1.0e-12)
+                    return 0.0
+                elseif isapprox(v, 0.5; atol = 1.0e-12)
+                    return 1.0
+                elseif isapprox(v, 0.25; atol = 1.0e-12)
+                    return 0.0
+                end
+                return 0.0
+            end
+
+            function grad_f_u3c(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-12)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-12)
+                    return [1.0]
+                elseif isapprox(v, 0.5; atol = 1.0e-12)
+                    return [-0.1]
+                elseif isapprox(v, 0.25; atol = 1.0e-12)
+                    return [0.1]
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_u3c, grad_f_u3c))
+            p = [0.0]
+            η = [1.0]
+            hzls_u3c = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 4)
+            Manopt.initialize_stepsize!(hzls_u3c)
+            Manopt._hz_evaluate_next_step(hzls_u3c, M, dmp, p, η, 0.0)
+            Manopt._hz_evaluate_next_step(hzls_u3c, M, dmp, p, η, 1.0)
+            @test (1, 4, true, false) == Manopt._hz_u3(hzls_u3c, M, dmp, p, η, 1, 2)
+        end
+        @testset "U3 max evaluations termination" begin
+            M = Euclidean(1)
+            f(M, p) = sum(p .^ 2)
+            grad_f(M, p) = 2 .* p
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [0.0]
+            η = [1.0]
+
+            hzls_u3_max = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 2)
+            Manopt.initialize_stepsize!(hzls_u3_max)
+            Manopt._hz_evaluate_next_step(hzls_u3_max, M, dmp, p, η, 0.0)
+            Manopt._hz_evaluate_next_step(hzls_u3_max, M, dmp, p, η, 1.0)
+            @test hzls_u3_max.last_evaluation_index == length(hzls_u3_max.triples)
+
+            (i_a, i_b, f_eval, f_wolfe) = Manopt._hz_u3(hzls_u3_max, M, dmp, p, η, 1, 2)
+            @test (i_a, i_b) == (1, 2)
+            @test !f_eval
+            @test !f_wolfe
+        end
+        @testset "U0 out-of-bracket early return" begin
+            M = Euclidean(1)
+            f(M, p) = sum(p .^ 2)
+            grad_f(M, p) = 2 .* p
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f, grad_f))
+            p = [0.0]
+            η = [1.0]
+
+            hzls_u0 = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 5)
+            Manopt.initialize_stepsize!(hzls_u0)
+            Manopt._hz_evaluate_next_step(hzls_u0, M, dmp, p, η, 0.0)
+            Manopt._hz_evaluate_next_step(hzls_u0, M, dmp, p, η, 1.0)
+
+            last_eval_before = hzls_u0.last_evaluation_index
+
+            # c is left of bracket [0, 1] -> U0 early return
+            @test (1, 2, -1, false, false) == Manopt._hz_update(hzls_u0, M, dmp, p, η, 1, 2, -0.1)
+            @test hzls_u0.last_evaluation_index == last_eval_before
+
+            # c is right of bracket [0, 1] -> U0 early return
+            @test (1, 2, -1, false, false) == Manopt._hz_update(hzls_u0, M, dmp, p, η, 1, 2, 1.1)
+            @test hzls_u0.last_evaluation_index == last_eval_before
+        end
+
+        @testset "S2 trigger test" begin
+            M = Euclidean(1)
+            # S2 is triggered within _hz_secant2 when the updated bracket point i_c is the new upper bound i_B
+            # This happens if slope at c is positive (U1 case in _hz_update).
+            # Sequence:
+            # 1. Start p=0, df=-1.
+            # 2. Initial bracket p=1, df=4 (df > 0 -> bracket found).
+            # 3. _hz_secant2 calls secant(0, 1) -> c = (0*4 - 1*(-1))/(4 - (-1)) = 0.2.
+            # 4. At c=0.2, we set df=0.1 (positive slope -> U1 -> i_c = i_B).
+            # 5. We also need f(0.2) high enough to fail Armijo so we don't return early with f_wolfe=true.
+            #    f(0)=0. f(0.2)=0.5. Armijo check: 0.5 <= 0 + 0.1*0.2*(-1) = -0.02 (False).
+
+            function f_s2(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return 0.0
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return 2.0 # Arbitrary high value
+                elseif isapprox(v, 0.2; atol = 1.0e-9)
+                    return 0.5 # Fail Armijo
+                end
+                return 0.0 # Fallback (e.g. for c_bar in S2)
+            end
+
+            function grad_f_s2(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-9)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-9)
+                    return [4.0]
+                elseif isapprox(v, 0.2; atol = 1.0e-9)
+                    return [0.1] # Positive slope triggers U1 -> i_c = i_B
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_s2, grad_f_s2))
+            p = [0.0]
+            η = [1.0]
+            gs = GradientDescentState(M; p = p)
+            hzls_s2 = Manopt.HagerZhangLinesearchStepsize(
+                M; initial_guess = Manopt.ConstantInitialGuess(1.0)
+            )
+            # We expect the S2 log
+            α = hzls_s2(dmp, gs, 1, η)
+            @test α > 0
+        end
+
+        @testset "S3 trigger test" begin
+            M = Euclidean(1)
+            # S3 is triggered within _hz_secant2 when the updated bracket point i_c is the new lower bound i_A
+            # (U2 case in _hz_update). We set up:
+            # 1. Start p=0, df=-1 (descent).
+            # 2. Bracket at p=1, df=4 (positive slope).
+            # 3. Secant gives c=0.2. At c, df=-0.1 and f=0 -> U2.
+
+            function f_s3(M, q)
+                return 0.0
+            end
+
+            function grad_f_s3(M, q)
+                v = q[1]
+                if isapprox(v, 0.0; atol = 1.0e-12)
+                    return [-1.0]
+                elseif isapprox(v, 1.0; atol = 1.0e-12)
+                    return [4.0]
+                elseif isapprox(v, 0.2; atol = 1.0e-12)
+                    return [-0.1]
+                end
+                return [0.0]
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_s3, grad_f_s3))
+            p = [0.0]
+            η = [1.0]
+            hzls_s3 = Manopt.HagerZhangLinesearchStepsize(M; max_function_evaluations = 5)
+            Manopt.initialize_stepsize!(hzls_s3)
+            Manopt._hz_evaluate_next_step(hzls_s3, M, dmp, p, η, 0.0)
+            Manopt._hz_evaluate_next_step(hzls_s3, M, dmp, p, η, 1.0)
+
+            c = Manopt.secant(hzls_s3.triples[1], hzls_s3.triples[2])
+            (i_A, i_B, i_c, f_eval, f_wolfe) = Manopt._hz_secant2(hzls_s3, M, dmp, p, η, 1, 2)
+            @test !f_eval
+            @test !f_wolfe
+            @test hzls_s3.triples[i_A].t ≈ c atol = 1.0e-12
+
+            c_bar = Manopt.secant(hzls_s3.triples[1], hzls_s3.triples[i_A])
+            @test hzls_s3.triples[i_c].t ≈ c_bar atol = 1.0e-12
+            @test i_A != i_B
+        end
+
+        @testset "Hager-Zhang infinite at b" begin
+            # A function that is finite for small steps but infinite for larger ones
+            # and has positive slope where it is infinite to trigger the bracket condition.
+
+            M = Euclidean(1)
+
+            # f(x) = x^2 - x for x < 1.0
+            # f(x) = Inf for x >= 1.0
+            # Min at x = 0.5, f(0.5) = -0.25
+            function f_inf(M, p)
+                x = p[1]
+                if x < 1.0
+                    return x^2 - x
+                else
+                    return Inf
+                end
+            end
+
+            function grad_f_inf(M, p)
+                x = p[1]
+                if x < 1.0
+                    return [2 * x - 1]
+                else
+                    # Return a positive slope to satisfy _hz_bracket exit condition
+                    return [1.0]
+                end
+            end
+
+            dmp = DefaultManoptProblem(M, ManifoldGradientObjective(f_inf, grad_f_inf))
+
+            # Start at 0. f(0)=0. grad(0)=-1. Search direction +1.
+            s = GradientDescentState(M; p = [0.0])
+
+            # Force initial guess to be 2.0 (in the infinite region)
+            hzls = HagerZhangLinesearch(; initial_guess = Manopt.ConstantInitialGuess(2.0))(M)
+
+            # Because initial bracket will be [0, 2] with f(2)=Inf.
+            # Then bisection will eventually find 0.5.
+
+            step = hzls(dmp, s, 1, [1.0])
+            @test abs(step - 0.5) < 1.0e-1
+        end
+
+        @testset "Hager-Zhang initialize_stepsize!" begin
+            hzls = HagerZhangLinesearch()(M)
+            hzls.last_evaluation_index = 5
+            hzls.Qₖ = 2.0
+            hzls.Cₖ = 2.0
+            hzls.current_mode = :approximate
+            Manopt.initialize_stepsize!(hzls)
+            @test hzls.last_evaluation_index == 0
+            @test hzls.Qₖ == 0.0
+            @test hzls.Cₖ == 0.0
+            @test hzls.current_mode == :standard
+        end
+
+    end
     @testset "Distance over Gradients Stepsize" begin
         @testset "does not use sectional cuvature (Eucludian)" begin
             M = Euclidean(2)
diff --git a/test/plans/test_stopping_criteria.jl b/test/plans/test_stopping_criteria.jl
index c65516c2e9..00f5fb66bb 100644
--- a/test/plans/test_stopping_criteria.jl
+++ b/test/plans/test_stopping_criteria.jl
@@ -1,16 +1,14 @@
 using Manifolds, ManifoldsBase, Manopt, Test, ManifoldsBase, Dates
 
-function repl_show_string(e)
-    a = IOBuffer()
-    Base.show(a, MIME"text/plain"(), e)
-    return String(take!(a))
+function to_display_string(obj)
+    buf = IOBuffer()
+    Base.show(buf, MIME"text/plain"(), obj)
+    return String(take!(buf))
 end
 
 @testset "StoppingCriteria" begin
     @testset "Generic Tests" begin
-        @test_throws ErrorException get_stopping_criteria(
-            Manopt.Test.DummyStoppingCriteriaSet()
-        )
+        @test_throws ErrorException get_stopping_criteria(Manopt.Test.DummyStoppingCriteriaSet())
         sa = StopAfterIteration(10)
         sb = StopWhenChangeLess(Euclidean(), 0.1)
         s = StopWhenAll(sa, sb)
@@ -28,8 +26,7 @@ end
         s.criteria[2].at_iteration = 3
         @test length(get_reason(s.criteria[2])) > 0
         s2 = StopWhenAll([StopAfterIteration(10), StopWhenChangeLess(Euclidean(), 0.1)])
-        @test get_stopping_criteria(s)[1].max_iterations ==
-            get_stopping_criteria(s2)[1].max_iterations
+        @test get_stopping_criteria(s)[1].max_iterations == get_stopping_criteria(s2)[1].max_iterations
 
         s3 = StopWhenCostLess(0.1)
         p = DefaultManoptProblem(
@@ -392,6 +389,50 @@ end
         @test length(get_reason(sc)) == 0
     end
 
+    @testset "StopWhenRelativeAPosterioriCostChangeLessOrEqual" begin
+        sc = StopWhenRelativeAPosterioriCostChangeLessOrEqual(; factr = 100.0)
+        prob = DefaultManoptProblem(
+            Euclidean(), ManifoldGradientObjective((M, x) -> x^2, x -> 2x)
+        )
+        s = GradientDescentState(Euclidean(); p = 1.0)
+        @test !sc(prob, s, -1)
+        @test !sc(prob, s, 1)
+        @test length(get_reason(sc)) == 0
+        s.p = 1.0 - 1.0e-14
+
+        @test sc(prob, s, 2)
+        @test length(get_reason(sc)) > 0
+        @test startswith(
+            to_display_string(sc),
+            "StopWhenRelativeAPosterioriCostChangeLessOrEqual with threshold",
+        )
+        @test startswith(Manopt.status_summary(sc), "(fₖ- fₖ₊₁)/max(|fₖ|, |fₖ₊₁|, 1) = ")
+        @test !Manopt.indicates_convergence(sc)
+    end
+
+    @testset "StopWhenProjectedNegativeGradientNormLess" begin
+        sc = StopWhenProjectedNegativeGradientNormLess(1.0e-10)
+        M = Hyperrectangle([1.0], [2.0])
+        prob = DefaultManoptProblem(
+            M, ManifoldGradientObjective((M, x) -> x^2, x -> 2x)
+        )
+        s = GradientDescentState(M; p = [1.0], X = [2.0])
+        @test !sc(prob, s, -1)
+        @test length(get_reason(sc)) == 0
+        @test sc(prob, s, 1)
+        @test length(get_reason(sc)) > 0
+
+        @test startswith(
+            to_display_string(sc),
+            "A StoppingCriterion to stop when the negative projected gradient norm is less than",
+        )
+        @test startswith(Manopt.status_summary(sc; context = :inline), "|proj (-grad f)| < 1.0e-10")
+
+        Manopt.set_parameter!(sc, Val(:MinGradNorm), 1.0e-5)
+        @test sc.threshold == 1.0e-5
+        @test Manopt.indicates_convergence(sc)
+    end
+
     @testset "has_converged" begin
         M = Euclidean(1)
         pr = Manopt.Test.DummyProblem{typeof(M)}()
diff --git a/test/runtests.jl b/test/runtests.jl
index b5c29ecedf..3657cc12b8 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -64,6 +64,7 @@ using Manifolds, ManifoldsBase, Manopt, Test
         include("solvers/test_proximal_gradient_method.jl")
         include("solvers/test_proximal_point.jl")
         include("solvers/test_quasi_Newton.jl")
+        include("solvers/test_quasi_Newton_box.jl")
         include("solvers/test_particle_swarm.jl")
         include("solvers/test_primal_dual_semismooth_Newton.jl")
         include("solvers/test_stochastic_gradient_descent.jl")
diff --git a/test/solvers/test_cma_es.jl b/test/solvers/test_cma_es.jl
index 2840740620..c95ca98dea 100644
--- a/test/solvers/test_cma_es.jl
+++ b/test/solvers/test_cma_es.jl
@@ -34,7 +34,7 @@ flat_example(::AbstractManifold, p) = 0.0
         @test griewank(M, p1) < 0.1
 
         p1 = cma_es(M, griewank; σ = 10.0, rng = MersenneTwister(123))
-        @test griewank(M, p1) < 0.2
+        @test griewank(M, p1) < 0.25
 
         p1 = [10.0, 10.0]
         cma_es!(M, griewank, p1; σ = 10.0, rng = MersenneTwister(123))
diff --git a/test/solvers/test_quasi_Newton.jl b/test/solvers/test_quasi_Newton.jl
index 3ce34c7417..453dd4b60c 100644
--- a/test/solvers/test_quasi_Newton.jl
+++ b/test/solvers/test_quasi_Newton.jl
@@ -1,5 +1,5 @@
 using Manopt, Manifolds, Test
-using LinearAlgebra: I, eigvecs, tr, Diagonal
+using LinearAlgebra: I, eigvecs, tr, Diagonal, dot
 
 mutable struct QuasiNewtonGradientDirectionUpdate{VT <: AbstractVectorTransportMethod} <:
     AbstractQuasiNewtonDirectionUpdate
@@ -193,6 +193,11 @@ end
             )
             @test isapprox(M, x_direction, x_solution; atol = rayleigh_atol)
         end
+
+        @testset "Byrd's nonpositive rule" begin
+            x1 = quasi_Newton(M, f, grad_f, x; nonpositive_curvature_behavior = :byrd, sy_tol = 1.0e8)
+            @test isapprox(M, x1, x_solution; atol = rayleigh_atol)
+        end
     end
 
     @testset "Brocket" begin
@@ -217,7 +222,7 @@ end
         x_inverseBFGSCautious = quasi_Newton(
             M, f, grad_f, x; memory_size = 8,
             vector_transport_method = ProjectionTransport(), retraction_method = QRRetraction(),
-            cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6),
+            cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6) | StopAfterIteration(100),
         )
 
         x_inverseBFGSHuang = quasi_Newton(
@@ -226,7 +231,7 @@ end
                 M; retraction_method = QRRetraction(), vector_transport_method = ProjectionTransport(),
             ),
             vector_transport_method = ProjectionTransport(), retraction_method = QRRetraction(),
-            cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6),
+            cautious_update = true, stopping_criterion = StopWhenGradientNormLess(1.0e-6) | StopAfterIteration(100),
         )
         @test isapprox(M, x_inverseBFGSCautious, x_inverseBFGSHuang; atol = 2.0e-4)
     end
@@ -243,9 +248,10 @@ end
 
         x = [0.7011245948687502, -0.1726003159556036, 0.38798265967671103, -0.5728026616491424]
         x_lrbfgs = quasi_Newton(
-            M, F, grad_f, x; memory_size = -1,
+            M, F, grad_f, x;
             basis = get_basis(M, x, DefaultOrthonormalBasis()),
-            stopping_criterion = StopWhenGradientNormLess(1.0e-9),
+            memory_size = -1,
+            stopping_criterion = StopWhenGradientNormLess(1.0e-9) | StopAfterIteration(1000),
         )
         @test norm(abs.(x_lrbfgs) - x_solution) ≈ 0 atol = rayleigh_atol
     end
@@ -346,13 +352,13 @@ end
         mp = DefaultManoptProblem(M, gmp)
         qns = QuasiNewtonState(M; p = p)
         # push zeros to memory
-        push!(qns.direction_update.memory_s, copy(p))
-        push!(qns.direction_update.memory_s, copy(p))
-        push!(qns.direction_update.memory_y, copy(p))
-        push!(qns.direction_update.memory_y, copy(p))
+        qns.yk = copy(p)
+        qns.sk = copy(p)
+        update_hessian!(qns.direction_update, mp, qns, p, 1)
+        update_hessian!(qns.direction_update, mp, qns, p, 2)
+        @test contains(qns.direction_update.message, "i=2,1,1")
         qns.direction_update(mp, qns)
         # Update (1) says at i=1 inner products are zero (2) all are zero -> gradient proposal
-        @test contains(qns.direction_update.message, "i=1,2")
         @test contains(qns.direction_update.message, "gradient")
     end
 
@@ -430,5 +436,82 @@ end
         # This triggers and cautious update that does not update the Hessian
         Manopt.update_hessian!(qns.direction_update, mp, qns, p, 1)
         # But I am not totally sure what to test for afterwards
+
+        @test startswith(repr(qdu), "QuasiNewtonLimitedMemoryDirectionUpdate with memory size")
+    end
+    @testset "Removing zero rho vectors" begin
+        M = Euclidean(2)
+        p = [0.0, 1.0]
+        f(M, p) = sum(p .^ 2)
+        # A wrong gradient
+        grad_f(M, p) = -2 .* p
+        gmp = ManifoldGradientObjective(f, grad_f)
+        mp = DefaultManoptProblem(M, gmp)
+        qdu = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 3)
+        # push three pairs; middle one has zero inner product
+        push!(qdu.memory_y, [1, 0])
+        push!(qdu.memory_s, [1, 0])
+
+        push!(qdu.memory_y, [1, 0])
+        push!(qdu.memory_s, [0, 1])
+
+        push!(qdu.memory_y, [0, 2])
+        push!(qdu.memory_s, [0, 2])
+        qdu.ρ = [1.0, 0.0, 4.0]
+        # delete the zero inner product pair and check that the removal was correct
+        Manopt._drop_zero_rho_vectors!(qdu)
+        @test length(qdu.memory_y) == 2
+        @test length(qdu.memory_s) == 2
+        @test qdu.ρ[[1, 2]] == [1.0, 4.0]
+        @test qdu.memory_y[1] == [1, 0]
+        @test qdu.memory_y[2] == [0, 2]
+    end
+
+    @testset "reforming_required + (start == 2)" begin
+        M = Euclidean(2)
+        p = [0.0, 0.0]
+        f(M, p) = sum(p .^ 2)
+        grad_f(M, p) = 2 * sum(p)
+        gmp = ManifoldGradientObjective(f, grad_f)
+        mp = DefaultManoptProblem(M, gmp)
+        ha = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2; nonpositive_curvature_behavior = :byrd)
+        qns = QuasiNewtonState(M; p = p, nonpositive_curvature_behavior = :byrd, direction_update = ha)
+
+        qns.yk = [1.0, 1.0]
+        qns.sk = [1.0, 2.0]
+        update_hessian!(qns.direction_update, mp, qns, p, 1)
+
+        qns.yk = [2.0, 1.0]
+        qns.sk = [1.0, 2.0]
+        update_hessian!(qns.direction_update, mp, qns, p, 2)
+        ha.memory_s[2] = [0.0, 0.0]  # force reforming_required in next step
+        update_hessian!(qns.direction_update, mp, qns, p, 3)
+        # test that the zeroes out pair was replaced
+        @test qns.direction_update.memory_s[1] == [1.0, 2.0]
+        @test qns.direction_update.memory_s[2] == [1.0, 2.0]
+    end
+    @testset "get_cost specialization" begin
+        M = Euclidean(2)
+        p = [0.0, 1.0]
+        f(M, p) = sum(p .^ 2)
+        grad_f(M, p) = 2 .* p
+        gmp = ManifoldGradientObjective(f, grad_f)
+        mp = DefaultManoptProblem(M, gmp)
+        ha = QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2; nonpositive_curvature_behavior = :byrd)
+        qns = QuasiNewtonState(
+            M;
+            p = copy(M, p),
+            direction_update = ha,
+            nondescent_direction_behavior = :step_towards_negative_gradient,
+            stepsize = HagerZhangLinesearch()(M),
+        )
+        @test get_cost(mp, qns) == f(M, get_iterate(qns))
+        solve!(mp, qns)
+        @test get_cost(mp, qns) == f(M, get_iterate(qns))
+
+        @testset "get_cost with DebugSolverState" begin
+            dqns = DebugSolverState(qns, DebugMessages(:Info, :Always))
+            @test get_cost(mp, dqns) == f(M, get_iterate(dqns))
+        end
     end
 end
diff --git a/test/solvers/test_quasi_Newton_box.jl b/test/solvers/test_quasi_Newton_box.jl
new file mode 100644
index 0000000000..1430a105bb
--- /dev/null
+++ b/test/solvers/test_quasi_Newton_box.jl
@@ -0,0 +1,355 @@
+using Manopt, Manifolds, Test
+using LinearAlgebra: I, eigvecs, tr, Diagonal, dot
+
+using RecursiveArrayTools
+
+@testset "Riemannian quasi-Newton Methods with box-like domains" begin
+    @testset "get_stepsize_bound - basic" begin
+        M = Hyperrectangle([0.0, 0.0], [2.0, 2.0])
+
+        # d[i] > 0
+        p = [0.0, 1.0]; d = [1.0, 1.0]
+        @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ (2.0 - 0.0) / 1.0  # = 2.0
+        @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ (2.0 - 1.0) / 1.0  # = 1.0
+
+        # d[i] < 0
+        p = [0.0, 1.0]; d = [-1.0, -1.0]
+        @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ (0.0 - 0.0) / -1.0  # = 0.0
+        @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ (0.0 - 1.0) / -1.0  # = 1.0
+
+        # d[i] = 0
+        p = [0.0, 1.0]; d = [0.0, 0.0]
+        @test Manopt.get_stepsize_bound(M, p, d, 1) ≈ Inf
+        @test Manopt.get_stepsize_bound(M, p, d, 2) ≈ Inf
+    end
+
+
+    @testset "update_fp_fpp - basic d = -g" begin
+        M = Hyperrectangle([0.0, 1.0], [3.0, 3.0])
+
+        grad = [1.0, 4.0]
+        d = [-1.0, -4.0]
+        p = [0.0, 0.0]
+
+        # values taken from loop iteration found in test case: "find_gcp! - with bounds, single variable is held fixed"
+        old_f_prime = -17.0
+        old_f_double_prime = 34.0
+        dt = 0.25
+        gb = 4.0
+        db = -4.0 # in case of d = -g, db = -gb
+        ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [2.0 0.0; 0.0 2.0])
+        b = 2
+        z = [-0.25, -1.0]
+
+        # optimized formula
+        upd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d))
+        Manopt.init_updater!(M, upd, p, d, ha)
+        hv_eb_dz, hv_eb_d = upd(M, p, 0 + dt, dt, b, db, ha)
+        @test hv_eb_dz ≈ -2.0
+        @test hv_eb_d ≈ -8.0
+
+        # original formula
+
+        original_hv_eb_dz = dot([0, 1], ha.matrix, z)
+        original_hv_eb_d = dot([0, 1], ha.matrix, d)
+
+        @test hv_eb_dz == original_hv_eb_dz
+        @test hv_eb_d == original_hv_eb_d
+    end
+
+
+    @testset "update_fp_fpp - basic d = [-2.0, -1.0]" begin
+        M = Hyperrectangle([0.0, 1.0], [3.0, 3.0])
+
+        grad = [1.0, 4.0]
+        d = [-2.0, -1.0]
+        p = [0.0, 0.0]
+
+        old_f_prime = -6.0
+        old_f_double_prime = 10.0
+        dt = 0.25
+        gb = 1.0
+        db = -2.0
+        ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [2.0 0.0; 0.0 2.0])
+        b = 1
+        z = [-0.5, -0.25]
+
+        # optimized formula
+        upd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d))
+        Manopt.init_updater!(M, upd, p, d, ha)
+        hv_eb_dz, hv_eb_d = upd(M, p, 0 + dt, dt, b, db, ha)
+        @test hv_eb_dz == -1.0
+        @test hv_eb_d == -4.0
+
+        # original formula
+
+        original_hv_eb_dz = dot([1, 0], ha.matrix, z)
+        original_hv_eb_d = dot([1, 0], ha.matrix, d)
+
+        @test hv_eb_dz == original_hv_eb_dz
+        @test hv_eb_d == original_hv_eb_d
+    end
+
+    @testset "update_fp_fpp - basic d = [-2.0, -1.0] with limited memory update" begin
+        M = Hyperrectangle([1.0, 4.0], [2.0, 10.0])
+
+        p = [2.0, 5.0]
+        ha = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2))
+        st = QuasiNewtonState(M)
+
+        @test startswith(repr(ha), "QuasiNewtonLimitedMemoryBoxDirectionUpdate with internal state:")
+        @test startswith(Manopt.status_summary(ha), "limited memory direction update with support for box constraints; internal direction update status: ")
+
+        f(M, p) = sum(p .^ 2)
+        grad_f(M, p) = 2 * p
+        gmp = ManifoldGradientObjective(f, grad_f)
+        mp = DefaultManoptProblem(M, gmp)
+
+        st.yk = [2.0, 4.0]
+        st.sk = [4.0, 2.0]
+        update_hessian!(ha, mp, st, p, 1)
+        grad = grad_f(M, p)
+        st.p = p
+        st.X = grad
+
+        d = similar(grad)
+        ha(d, mp, st)
+
+        d2 = ha(mp, st)
+        @test d ≈ d2
+
+        b = 1
+
+        old_f_prime = -6.0
+        old_f_double_prime = 10.0
+        dt = 0.25
+        db = d[b]
+        gb = grad[b]
+
+        t_current = 0 + dt
+
+        # compare the generic and limited memory updater
+        gupd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d))
+        Manopt.init_updater!(M, gupd, p, d, ha)
+        hv_eb_dz, hv_eb_d = gupd(M, p, t_current, dt, b, db, ha)
+
+        @test hv_eb_dz ≈ -0.125
+        @test hv_eb_d ≈ -0.5
+
+        lmupd = Manopt.get_default_hessian_segment_updater(M, p, ha)
+        @test lmupd isa Manopt.LimitedMemorySegmentHessianUpdater
+
+        Manopt.init_updater!(M, lmupd, p, d, ha)
+        hv_eb_dz_limited, hv_eb_d_limited = lmupd(M, p, t_current, dt, b, db, ha)
+
+        @test hv_eb_dz ≈ hv_eb_dz_limited
+        @test hv_eb_d ≈ hv_eb_d_limited
+
+        ha.last_gcd_result = :found_unlimited
+        ha.last_gcd_stepsize = Inf
+        @test Manopt.get_parameter(ha, Val(:max_stepsize)) == Inf
+
+        @testset "No memory tests" begin
+            ha2 = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p, InverseBFGS(), 2))
+            idx = Manopt.get_bounds_index(M)
+            @test Manopt.hessian_value(ha2, M, p, Manopt.UnitVector(b), grad) ≈ 4.0
+            Manopt.update_current_scale!(M, p, ha2)
+            @test ha2.current_scale == ha2.qn_du.initial_scale
+            @test ha2.M_11 == fill(0.0, 0, 0)
+            @test ha2.M_21 == fill(0.0, 0, 0)
+            @test ha2.M_22 == fill(0.0, 0, 0)
+        end
+    end
+
+    @testset "GeneralizedCauchyDirectionSubsolver" begin
+        M = Hyperrectangle([-1.0, -2.0, -Inf], [2.0, Inf, 2.0])
+        ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS())
+
+        p = [0.0, 0.0, 0.0]
+        gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p, ha)
+
+        X1 = [-5.0, 0.0, 0.0]
+
+        d = -X1
+        d_out = similar(d)
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d, X1) === (:found_limited, 1.0)
+        @test d_out ≈ [2.0, 0.0, 0.0]
+
+        d_out = similar(d)
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, 0 * d, X1) === (:not_found, NaN)
+
+        d2 = [0.0, 1.0, 0.0]
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d2, [0.0, -1.0, 0.0]) === (:found_unlimited, Inf)
+        @test d_out ≈ d2
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, [1.0, 1.0, 0.0], [-10.0, -10.0, -10.0]) === (:found_limited, 1.0)
+        @test d_out ≈ [2.0, 10.0, 0.0]
+
+        p2 = [-1.0, -2.0, 2.0]
+        gf2 = Manopt.GeneralizedCauchyDirectionSubsolver(M, p2, ha)
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf2, d_out, p2, [-1.0, -1.0, 1.0], [-10.0, -10.0, -10.0]) === (:not_found, NaN)
+
+        M2 = Hyperrectangle([-10.0], [10.0])
+
+        ha2 = QuasiNewtonMatrixDirectionUpdate(M2, BFGS(), DefaultOrthonormalBasis(), [100.0;;])
+        p3 = [1.0]
+        gf3 = Manopt.GeneralizedCauchyDirectionSubsolver(M2, p3, ha2)
+
+        d_out = similar(p3)
+        @test Manopt.find_generalized_cauchy_direction!(M2, gf3, d_out, p3, [1.0], [-10.0]) === (:found_limited, 90.0)
+    end
+
+    @testset "Hitting multiple bounds at the same time in GCD" begin
+        M = Hyperrectangle([-1.0, -1.0, -1.0], [1.0, 1.0, 1.0])
+        ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis(), [1.0 0 0; 0 1 0; 0 0 1])
+
+        p = [0.0, 0.0, 0.0]
+        gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p, ha)
+
+        d = [-2.0, -2.0, -1.0]
+        d_out = similar(d)
+        X = [10.0, 10.0, 10.0]
+
+        @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p, d, X) === (:found_limited, 1.0)
+        @test d_out ≈ [-1.0, -1.0, -1.0]
+    end
+
+    @testset "Pure Hyperrectangle" begin
+        M = Hyperrectangle([-1.0, 2.0, -Inf], [2.0, Inf, 2.0])
+        f(M, p) = sum(p .^ 2)
+        function grad_f(M, p)
+            return project(M, p, 2 .* p)
+        end
+        p0 = [0.0, 4.0, 1.0]
+        p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(10))
+        @test p_opt ≈ [0, 2, 0]
+
+
+        f2(M, p) = sum(p .^ 4)
+        function grad_f2(M, p)
+            return project(M, p, 4 .* (p .^ 3))
+        end
+        p0 = [0.0, 4.0, 1.0]
+        p_opt = quasi_Newton(M, f2, grad_f2, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100))
+        @test f2(M, p_opt) < 16.1
+
+        for stepsize in [ArmijoLinesearch(), CubicBracketingLinesearch(), NonmonotoneLinesearch()]
+            p_opt = quasi_Newton(
+                M, f2, grad_f2, p0;
+                stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100),
+                stepsize = stepsize
+            )
+            @test f2(M, p_opt) < 64.0
+        end
+
+        MInf = Hyperrectangle([-Inf, -Inf, -Inf], [Inf, Inf, Inf])
+
+        f3(M, p) = sum(p .^ 4) - sum(p .^ 2)
+        function grad_f3(M, p)
+            return project(MInf, p, 4 .* (p .^ 3) - 2 .* p)
+        end
+        p0 = [0.0, 4.0, 1.0]
+        p_opt = quasi_Newton(MInf, f3, grad_f3, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100))
+        @test f3(MInf, p_opt) < 16.1
+
+        p_opt = quasi_Newton(
+            MInf, f3, grad_f3, p0;
+            stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100),
+        )
+        @test f3(MInf, p_opt) < 64.0
+    end
+
+    @testset "has_anisotropic_max_stepsize" begin
+        @test !Manopt.has_anisotropic_max_stepsize(Sphere(2))
+        @test Manopt.has_anisotropic_max_stepsize(Hyperrectangle([1], [2]))
+        @test Manopt.has_anisotropic_max_stepsize(ProductManifold(Hyperrectangle([1], [2]), Sphere(2)))
+    end
+
+    @testset "Hyperrectangle × Sphere" begin
+        S2 = Sphere(2)
+        px = [0.0, 1.0, 0.0]
+        Mbox = Hyperrectangle([-1.0, 2.0, -Inf], [2.0, Inf, 2.0])
+        M = Mbox × S2
+        f(M, p) = sum(p.x[1] .^ 4) + 0.5 * distance(S2, p.x[2], px)^2
+        grad_f(M, p) = ArrayPartition(project(Mbox, p.x[1], 4 .* (p.x[1] .^ 3)), -log(S2, p.x[2], px))
+        p0 = ArrayPartition([0.0, 4.0, 1.0], [1.0, 0.0, 0.0])
+
+        @testset "Hessian updater" begin
+            d = -grad_f(M, p0)
+            ha = QuasiNewtonMatrixDirectionUpdate(M, BFGS(), DefaultOrthonormalBasis())
+            gupd = Manopt.GenericSegmentHessianUpdater(similar(d), similar(d))
+            Manopt.init_updater!(M, gupd, p0, d, ha)
+            b = (1, 2)
+            dt = 0.25
+            t_current = 0 + dt
+            db = d.x[b[1]][b[2]]
+            hv_eb_dz, hv_eb_d = gupd(M, p0, t_current, dt, b, db, ha)
+            @test hv_eb_dz ≈ -64.0
+            @test hv_eb_d ≈ -256.0
+        end
+
+        @testset "GCD check" begin
+            d = -grad_f(M, p0)
+            ha = QuasiNewtonLimitedMemoryBoxDirectionUpdate(QuasiNewtonLimitedMemoryDirectionUpdate(M, p0, InverseBFGS(), 2))
+            gf = Manopt.GeneralizedCauchyDirectionSubsolver(M, p0, ha)
+            d_out = similar(d)
+            X = grad_f(M, p0)
+            @test Manopt.find_generalized_cauchy_direction!(M, gf, d_out, p0, d, X) === (:found_limited, 1.0)
+        end
+
+        p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100))
+        @test distance(M, p_opt, ArrayPartition([0, 2, 0], px)) < 0.1
+    end
+
+    @testset "Sphere × Hyperrectangle" begin
+        S2 = Sphere(2)
+        px = [0.0, 1.0, 0.0]
+        Mbox = Hyperrectangle([-1.0 2.0; -Inf -Inf], [2.0 Inf; 2.0 Inf])
+        M = S2 × Mbox
+        f(M, p) = sum(p.x[2] .^ 4) + 0.5 * distance(S2, p.x[1], px)^2
+        grad_f(M, p) = ArrayPartition(-log(S2, p.x[1], px), project(Mbox, p.x[2], 4 .* (p.x[2] .^ 3)))
+        p0 = ArrayPartition([1.0, 0.0, 0.0], [0.0 4.0; 1.0 1.0])
+
+        p_opt = quasi_Newton(M, f, grad_f, p0; stopping_criterion = StopWhenProjectedNegativeGradientNormLess(1.0e-6) | StopAfterIteration(100))
+        @test distance(M, p_opt, ArrayPartition(px, [0 2; 0 0])) < 0.1
+    end
+end
+
+@testset "MaxStepsizeInDirection" begin
+    @testset "found_limited" begin
+        M = Hyperrectangle([-1.0, -2.0, -Inf], [2.0, Inf, 2.0])
+        p = [0.0, 0.0, 0.0]
+        d = [2.0, 1.0, 1.0]
+        d_before = copy(d)
+
+        sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p)
+        @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:found_limited, 1.0)
+        @test d == d_before
+    end
+
+    @testset "found_unlimited" begin
+        M = Hyperrectangle([-Inf], [Inf])
+        p = [0.0]
+        d = [1.0]
+        d_before = copy(d)
+
+        sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p)
+        @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:found_unlimited, Inf)
+        @test d == d_before
+    end
+
+    @testset "not_found" begin
+        M = Hyperrectangle([0.0], [1.0])
+        p = [0.0]
+        d = [-1.0]
+        d_before = copy(d)
+
+        sdf = Manopt.MaxStepsizeInDirectionSubsolver(M, p)
+        @test Manopt.find_max_stepsize_in_direction(M, sdf, p, d) === (:not_found, NaN)
+        @test d == d_before
+    end
+end
diff --git a/tutorials/BoxDomain.qmd b/tutorials/BoxDomain.qmd
new file mode 100644
index 0000000000..eb2d2da574
--- /dev/null
+++ b/tutorials/BoxDomain.qmd
@@ -0,0 +1,78 @@
+---
+title: "How to do optimization on box domains"
+author: "Mateusz Baran"
+---
+
+# Optimization with box domains and products of manifolds and boxes
+
+A ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} is, in general, not a manifold but a manifold with corners because locally at the boundary it looks like $\mathbb{R}^{n-k} \times \mathbb{R}^k_{\geq 0}$ for some $k > 0$, instead of $\mathbb{R}^n$ as required by the definition of a manifold.
+
+Such spaces require special handling when used as domains in optimization.
+For simple methods like gradient descent using projected gradient and a stopping criterion involving [`StopWhenProjectedNegativeGradientNormLess`](@ref) may be sufficient, however methods that approximate the Hessian can benefit from a more advanced approach.
+The core idea is considering a piecewise quadratic approximation of the objective along the descent direction in the tangent space at the current iterate, and selecting the generalized Cauchy direction -- its minimizer.
+The points at which the approximation might not be differentiable correspond to hitting new boundaries along the initially selected descent direction.
+Then, we can perform standard line search from the initial iterate in the generalized Cauchy direction.
+
+Currently `Manopt.jl` can handle domains that are either a ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} or a ``[`ProductManifold`](@extref `ManifoldsBase.ProductManifold`)``{=commonmark} containing a ``[`Hyperrectangle`](@extref `Manifolds.Hyperrectangle`)``{=commonmark} as its first factor and other manifolds as subsequent factors.
+
+## Example
+
+Consider the problem of fitting covariance matrix with box constraints on variance in principal directions.
+The objective is log-probability of data under a multivariate normal distribution with zero mean and covariance matrix given by the variable to optimize.
+Although there are better ways to solve this problem, expressing it this way allows us to freely extend the objective to more complex scenarios beyond what is possible with closed-form solutions.
+
+First, we set up the problem by generating synthetic data.
+The data is sampled from a multivariate normal distribution with known covariance matrix.
+
+```{julia}
+using Manopt, Manifolds, LinearAlgebra, Random, Distributions
+using ForwardDiff, DifferentiationInterface, RecursiveArrayTools
+Random.seed!(41)
+
+N = 5 # dimensionality of data
+M_spd = SymmetricPositiveDefinite(N)
+M_rot = Rotations(N)
+V = rand(M_rot)
+cov_matrix = Symmetric(V * Diagonal([0.5; 2.0; 5.0; 10.0; 20.0]) * V')
+distr = MvNormal(zeros(N), cov_matrix)
+data = Matrix(rand(distr, 200)')  # 200 samples
+```
+
+The objective function is defined as follows, with gradient calculated using automatic differentiation.
+
+```{julia}
+function logprob_cost(::AbstractManifold, p)
+    D, R = p.x
+    logdet = sum(log, D)
+    invΣ = R * Diagonal(1 ./ D) * R'
+    ll = - 0.5 * size(data, 1) * logdet
+    for row in eachrow(data)
+        ll -= 0.5 * row' * invΣ * row
+    end
+    return -ll  # We minimize negative log-likelihood
+end
+
+function logprob_gradient(M::AbstractManifold, p)
+    Y = DifferentiationInterface.gradient(q -> logprob_cost(M, q), AutoForwardDiff(), p)
+    return riemannian_gradient(M, p, Y)
+end
+```
+
+Finally, we can solve the optimization problem using a quasi-Newton method with box domain support.
+We restrict the variances (diagonal elements of the covariance matrix) to be between 1.0 and 100.0.
+The covariance matrix is represented using its eigendecomposition $\Sigma = R D R^{\top}$, where $D$ is a diagonal matrix of variances and $R$ is an orthogonal matrix of principal directions.
+With constraints on variances, the optimization variable belongs to $[1,100]^N \times \mathrm{SO}(N)$.
+
+```{julia}
+M = ProductManifold(Hyperrectangle(fill(1.0, N), fill(100.0, N)), M_rot)
+
+p0 = ArrayPartition(fill(10.0, N), Matrix{Float64}(I(5)))
+p_mle = quasi_Newton(M, logprob_cost, logprob_gradient, p0; stopping_criterion = StopAfterIteration(100) | StopWhenProjectedNegativeGradientNormLess(1e-6))
+println("Estimated variances: $(p_mle.x[1])")
+cov_matrix_mle = p_mle.x[2] * Diagonal(p_mle.x[1]) * p_mle.x[2]'
+println("Estimated covariance matrix:")
+println(cov_matrix_mle)
+nothing
+```
+
+We see that despite the original covariance matrix having variances ranging from 0.5 to 20.0, the estimated covariance matrix respects the box constraints of variances between 1.0 and 100.0.
diff --git a/tutorials/Project.toml b/tutorials/Project.toml
index db96359a8d..a2b2a31981 100644
--- a/tutorials/Project.toml
+++ b/tutorials/Project.toml
@@ -2,8 +2,10 @@
 ADTypes = "47edcb42-4c32-4615-8424-f2b9edc5f35b"
 BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
 Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
+DifferentiationInterface = "a0c0ee7d-e4b9-4e03-894e-1c5f64a51d63"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 FiniteDifferences = "26cc04aa-876d-5657-8c51-4c34ba976000"
+ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
 LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
 ManifoldDiff = "af67fdf4-a580-4b9f-bbec-742ef357defd"
@@ -21,8 +23,10 @@ Manopt = {path = ".."}
 ADTypes = "1"
 BenchmarkTools = "1"
 Colors = "0.12, 0.13"
+DifferentiationInterface = "0.7"
 Distributions = "0.25"
 FiniteDifferences = "0.12"
+ForwardDiff = "1"
 LRUCache = "1.4"
 ManifoldDiff = "0.4"
 Manifolds = "0.11"