Add GPU solver tests on Gradient Descent and Conjugate Gradient Descent#579
Add GPU solver tests on Gradient Descent and Conjugate Gradient Descent#579zazabap wants to merge 1 commit intoJuliaManifolds:masterfrom
Conversation
17 tests verifying gradient_descent and conjugate_gradient_descent work transparently with CuArray-backed manifold points. No ManoptCUDAExt needed — the _produce_type fix from JuliaManifolds#577 handles GPU allocation natively. Tests cover: ConstantLength/ArmijoLinesearch stepsizes, Float32/Float64, matrix Euclidean, Sphere, recording with :Cost, CPU-vs-GPU equivalence. All tests use known closed-form solutions for verification.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #579 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 91 91
Lines 10013 10013
=========================================
Hits 10013 10013 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
How to review these cross-repo changesFour PRs, recommended review order (bottom-up by dependency):
PRs 1-2 add extension modules + tests. PRs 3-4 add test files only — no runtime code changes. Each PR is independently mergeable and all CUDA tests skip gracefully without a GPU. No |
|
Thanks for that start. As a first comment I have 2 remarks on wording / phrasing
I would suppose this is either AI slop (then please be more careful with AI) or a language barrier (then never mind, but take my comment into account). I am not yet sure how to review this in general, since I solely work on MacOS, where I would only have Metal (see Metal.jl) available. See also Mateusz comment on asking about maybe better using https://github.com/JuliaGPU/GPUArrays.jl instead of a direct dependency on CUDA.jl maybe? |
Thanks for indicating the mistake here! The entire Manopt.jl does have plenty of the algorithms which I check with the simplest one. For the other algorithms, I also consider the current coverage of the tests on GPUs is not enough. Since the update on the Manifolds.jl indicate some overrides are not implemented, I will gradually add the tests when implementations there finished. |
|
For me it is also fine if you “start somewhere” and we first have a docs page that maybe describes which solvers/stepsizes/... are already tested/verified. Then just do neither call it test suite nor comprehensive – it would still be a very welcoming contribution. |
I see, and I would contribute on tests in following direction:
I also modified the title to the contribution in this PR. |
|
As discussed already here JuliaManifolds/Manifolds.jl#856 (comment) for Manifolds.jl I think a major challenge here is to make sure the code works correctly and is tested. For now, I do not see how that could be done. Since I solely own two MacBooks, I can not even test the code myself. From the current state of this PR, with no actual changes to the package besides a strong dependency on CUDA (why?) I am not yet sure how this continues. By the way, I saw you are (at least partly?) based in Tokyo? I am currently a guest scientist at RIKEN. |
|
@kellertuer Welcome to Tokyo! I am a part time research associate at RIKEN AIP, if you and the collaborator think it might be better work in the new ManifoldGPU.jl package, then it might be a better solution to contribute on that side : ). I will be back to Tokyo in the end of March. If you lack the access of CUDA environment, I could actually provide one if possible since April. Our lab will purchase a new work station with RTX 5060Ti/ RTX 5070Ti. I will try to obtain the remote access for collaborators after officially discussed with PI in my lab. |
|
Yes, working on that in ManifoldGPU.jl would be best. The simple reason is, that I would like to keep Manopt.jl and Manifolds.jl with a very high test coverage and for now there is no way to test (on CI) gpu code. If at any time later such a CI is available we can still think about migrating the ManifoldGPU code over. I am actually also still in Tokyo end of March. |
|
Yes, ManifoldsGPU.jl is the right place for all GPU-related stuff for now. We will link to that repository from Manopt.jl and Manifolds.jl when it's usable enough to promote GPU support. We will also need a short tutorial explaining caveats of working with GPU. For example, Stiefel maniifold, on CPU, the QR retraction is generally the fastest one but on GPU it can only be partially batched currently. On the other hand SVD works fully batched, so polar retraction is preferable. Also, one note for benchmarking: you should use in-place gradients in your benchmark scripts for better performance: https://manoptjl.org/stable/tutorials/InplaceGradient/#Speedup-using-in-place-evaluation . |
|
Since we started the separate GPU package, I'll close this one for now. Feel free to reopen it and move it to the other repository, if that feels fitting. |
|
I consider at some point when the implementation on the ManifoldsGPU.jl is finished on GPUs. We could think of a method to test the Manopt.jl CUDA version. I will mark it as possible contribution in the future. |
|
The challenge there is, that currently there is no CI available (free of charge) that has GPUs activated. Since all the work on these packages is done voluntarily, that is I get zero money paid for developing, supporting, bug fixing these, I also have zero money to spent on paying for CIs. I do also hope that the main work has to be done in Manifolds.jl; here maybe only a few small bugfixes are required, where the algorithms are not generic enough. One thing we can think about is a small tutorial. That we can pre-render (see for example how that is done in ManoptExamples.jl) – and include then in the docs, where all current examples are re-run on every documentation generation. |
|
Currently the only realistic option is to periodically run GPU tests locally and manually. It's still worth having those tests to sometimes check if GPU support still works, and maybe in the future we somehow get GPU-enabled CI. A pre-rendered example in the tutorial would also be nice, sure. IIRC the hyperparameter optimization example is currently pre-rendered? |
|
I am absolutely not a fan of a manual CI, that was already the reason for the separate package; but sure manually run docs are fine, some are already like that. |

Summary
Add a comprehensive GPU test suite (17 tests, all verified passing on RTX 3090) verifying that Manopt.jl solvers work transparently with
CuArray-backed manifold points. NoManoptCUDAExt.jlis needed — the_produce_typefix from #577 handles GPU allocation natively.This supersedes PR #574 (which used a CUDA extension approach, now unnecessary).
Which solvers work on GPU
Verified working (tested in this PR)
gradient_descentConstantLengthgradient_descentArmijoLinesearch(M, p)gradient_descentConstantLengthgradient_descentConstantLengthgradient_descentConstantLengthgradient_descentArmijoLinesearch+record=[:Cost]conjugate_gradient_descentConstantLengthExpected to work (same code path, not individually tested)
Any first-order solver with user-provided gradient works if the manifold retraction and gradient are GPU-compatible (
quasi_Newton,subgradient_method, etc.).Requirements for GPU solvers
ConstantStepsize/ConstantLengthalways work.ArmijoLinesearch(M, p)works when constructed with a GPU pointp(allocates GPU candidate buffer).CuArray. Zygote AD works (see Add CUDA GPU tests for AD gradient computation ManifoldDiff.jl#84); ForwardDiff does not (scalar indexing inseed!).record=[:Cost]works transparently.Why no ManoptCUDAExt.jl?
PR #577 added
_produce_type(factory, M, p)which passes the user's point type to stepsize/direction-update constructors. This meansArmijoLinesearchStepsize(M, p)automatically doescandidate_point = allocate(p), producing aCuArraywhenpis aCuArray. The solver loop uses broadcasting and ManifoldsBase operations — all GPU-compatible. No Manopt-side overrides are needed.GPU benchmark results (RTX 3090, 300 iterations)
GPU speedup scales with per-iteration compute intensity: O(n²) matrix operations benefit most.
Tests: 17/17 verified passing
result isa CuArray, converges to known targetatol=1e-3record=[:Cost]produces decreasing cost sequenceisapprox(Array(gpu_result), cpu_result)All tests gracefully skip when CUDA is not available.
Changes
test/test_cuda_ext.jl— new test file (17 tests)test/runtests.jl— addedinclude("test_cuda_ext.jl")test/Project.toml— added CUDA to[deps]and[compat]Changelog.md— added entry about GPU test coverageRelated PRs
_produce_typeto improve GPU compatibility #577 (_produce_typefix) — already merged