Add CUDA GPU tests for AD gradient computation by zazabap · Pull Request #84 · JuliaManifolds/ManifoldDiff.jl

zazabap · 2026-02-20T15:56:30Z

Summary

Add GPU tests verifying that the AD gradient pipeline (RiemannianProjectionBackend) works correctly with CuArray inputs. 11 tests, all verified passing on RTX 3090.

No code changes to ManifoldDiff.jl itself — the library already works on GPU because it delegates to DifferentiationInterface.jl and ManifoldsBase operations. This PR adds test coverage to verify and document this.

What works on GPU

AD Backend	GPU status	Notes
`RiemannianProjectionBackend(AutoZygote())`	✅ Works	Zygote reverse-mode AD supports CuArrays natively
`RiemannianProjectionBackend(AutoForwardDiff())`	❌ Fails	ForwardDiff's `seed!` uses scalar indexing — fundamental limitation, not version-specific

Operations tested

Operation	Manifold	GPU status
`ManifoldDiff.gradient(M, f, p, backend)`	Euclidean	✅ Float64 and Float32
`ManifoldDiff.gradient(M, f, p, backend)`	Sphere	✅ Riemannian gradient in tangent space
CPU-vs-GPU equivalence	Euclidean	✅ `isapprox(Array(gpu), cpu; atol=1e-12)`

End-to-end GPU solver pipeline

With ManifoldDiff + Manifolds CUDA ext + Manopt _produce_type (#577):

using CUDA, Manifolds, Manopt, ManifoldDiff, Zygote, ADTypes
M = Euclidean(1000)
target = CuArray(randn(1000))
f(M, p) = sum((p .- target).^2) / 2

# AD-computed gradient — entirely on GPU
result = gradient_descent(M, f, p0;
    evaluation=InplaceEvaluation())  # uses ManifoldDiff default backend

This works transparently with CuArrays when Zygote is the AD backend.

Tests: 11/11 verified passing

Test	What's verified
Euclidean gradient Float64	`grad isa CuArray{Float64}`, matches analytical `p`
Euclidean gradient Float32	`grad isa CuArray{Float32}`, correct with `atol=1e-4`
Sphere Riemannian gradient	Tangent space: `dot(grad, p) ≈ 0`, matches `a - dot(a,p)*p`
CPU-vs-GPU equivalence	`isapprox(Array(grad_gpu), grad_cpu; atol=1e-12)`
Quadratic objective	`grad = p - target`, verified against analytical solution

All tests use dot(a, p) instead of p[1] to avoid scalar indexing on Sphere.
All tests gracefully skip when CUDA/Zygote is not available.

Changes

test/test_cuda_ext.jl — new test file (11 tests)
test/runtests.jl — added include("test_cuda_ext.jl")
Project.toml — added CUDA to [compat], [extras], and [targets]

New test_cuda_ext.jl verifying RiemannianProjectionBackend(AutoZygote()) works correctly on CuArrays. ForwardDiff is incompatible with CuArrays due to scalar indexing in seed!; only Zygote (reverse-mode) is tested. 11 tests: Euclidean gradient (Float64/Float32), Sphere Riemannian gradient with tangent space verification, CPU-vs-GPU equivalence, quadratic objective with known analytical solution.

codecov · 2026-02-20T16:17:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.69%. Comparing base (59ead53) to head (e156e17).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #84   +/-   ##
=======================================
  Coverage   95.69%   95.69%           
=======================================
  Files          13       13           
  Lines         395      395           
=======================================
  Hits          378      378           
  Misses         17       17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kellertuer · 2026-02-20T23:30:42Z

Thanks for that, the two tests missing are

make sure your code follows the code formatting (on this repository still JulilaFormatter for now)
any PR has to have a short entry in the changelog, so all changes are properly documented therein.

gdalle · 2026-02-22T14:53:07Z

I have ideas on how to make ForwardDiff gradients work on GPUs via DI, if you're interested. It was low-priority but it wouldn't take long for me to implement

kellertuer · 2026-02-22T23:58:23Z

Help on that side would be very welcome.

gdalle · 2026-03-04T11:35:28Z

If you check out the branch from JuliaDiff/DifferentiationInterface.jl#974 and use DI.AutoForwardFromPrimitive(AutoForwardDiff()), the ForwardDiff gradients should run on the GPU (but there will still be data transfer with the CPU, as explained here). Let me know if it works / speeds things up for you or not

zazabap mentioned this pull request Feb 20, 2026

Add GPU solver tests on Gradient Descent and Conjugate Gradient Descent JuliaManifolds/Manopt.jl#579

Closed

zazabap mentioned this pull request Feb 20, 2026

GPU-native manifold operations via CUDA extension JuliaManifolds/Manifolds.jl#856

Closed

zazabap marked this pull request as draft February 20, 2026 16:23

gdalle mentioned this pull request Feb 23, 2026

GPU scalar indexing when transpose(::CuArray) JuliaDiff/DifferentiationInterface.jl#970

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA GPU tests for AD gradient computation#84

Add CUDA GPU tests for AD gradient computation#84
zazabap wants to merge 1 commit intoJuliaManifolds:mainfrom
zazabap:cuda-gpu-tests

zazabap commented Feb 20, 2026

Uh oh!

codecov Bot commented Feb 20, 2026

Uh oh!

kellertuer commented Feb 20, 2026

Uh oh!

gdalle commented Feb 22, 2026

Uh oh!

kellertuer commented Feb 22, 2026

Uh oh!

gdalle commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zazabap commented Feb 20, 2026

Summary

What works on GPU

Operations tested

End-to-end GPU solver pipeline

Tests: 11/11 verified passing

Changes

Uh oh!

codecov Bot commented Feb 20, 2026

Codecov Report

Uh oh!

kellertuer commented Feb 20, 2026

Uh oh!

gdalle commented Feb 22, 2026

Uh oh!

kellertuer commented Feb 22, 2026

Uh oh!

gdalle commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants