Skip to content

Add CUDA GPU tests for AD gradient computation#84

Draft
zazabap wants to merge 1 commit intoJuliaManifolds:mainfrom
zazabap:cuda-gpu-tests
Draft

Add CUDA GPU tests for AD gradient computation#84
zazabap wants to merge 1 commit intoJuliaManifolds:mainfrom
zazabap:cuda-gpu-tests

Conversation

@zazabap
Copy link
Copy Markdown

@zazabap zazabap commented Feb 20, 2026

Summary

Add GPU tests verifying that the AD gradient pipeline (RiemannianProjectionBackend) works correctly with CuArray inputs. 11 tests, all verified passing on RTX 3090.

No code changes to ManifoldDiff.jl itself — the library already works on GPU because it delegates to DifferentiationInterface.jl and ManifoldsBase operations. This PR adds test coverage to verify and document this.

What works on GPU

AD Backend GPU status Notes
RiemannianProjectionBackend(AutoZygote()) ✅ Works Zygote reverse-mode AD supports CuArrays natively
RiemannianProjectionBackend(AutoForwardDiff()) ❌ Fails ForwardDiff's seed! uses scalar indexing — fundamental limitation, not version-specific

Operations tested

Operation Manifold GPU status
ManifoldDiff.gradient(M, f, p, backend) Euclidean ✅ Float64 and Float32
ManifoldDiff.gradient(M, f, p, backend) Sphere ✅ Riemannian gradient in tangent space
CPU-vs-GPU equivalence Euclidean isapprox(Array(gpu), cpu; atol=1e-12)

End-to-end GPU solver pipeline

With ManifoldDiff + Manifolds CUDA ext + Manopt _produce_type (#577):

using CUDA, Manifolds, Manopt, ManifoldDiff, Zygote, ADTypes
M = Euclidean(1000)
target = CuArray(randn(1000))
f(M, p) = sum((p .- target).^2) / 2

# AD-computed gradient — entirely on GPU
result = gradient_descent(M, f, p0;
    evaluation=InplaceEvaluation())  # uses ManifoldDiff default backend

This works transparently with CuArrays when Zygote is the AD backend.

Tests: 11/11 verified passing

Test What's verified
Euclidean gradient Float64 grad isa CuArray{Float64}, matches analytical p
Euclidean gradient Float32 grad isa CuArray{Float32}, correct with atol=1e-4
Sphere Riemannian gradient Tangent space: dot(grad, p) ≈ 0, matches a - dot(a,p)*p
CPU-vs-GPU equivalence isapprox(Array(grad_gpu), grad_cpu; atol=1e-12)
Quadratic objective grad = p - target, verified against analytical solution

All tests use dot(a, p) instead of p[1] to avoid scalar indexing on Sphere.
All tests gracefully skip when CUDA/Zygote is not available.

Changes

  • test/test_cuda_ext.jl — new test file (11 tests)
  • test/runtests.jl — added include("test_cuda_ext.jl")
  • Project.toml — added CUDA to [compat], [extras], and [targets]

New test_cuda_ext.jl verifying RiemannianProjectionBackend(AutoZygote())
works correctly on CuArrays. ForwardDiff is incompatible with CuArrays
due to scalar indexing in seed!; only Zygote (reverse-mode) is tested.

11 tests: Euclidean gradient (Float64/Float32), Sphere Riemannian
gradient with tangent space verification, CPU-vs-GPU equivalence,
quadratic objective with known analytical solution.
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.69%. Comparing base (59ead53) to head (e156e17).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #84   +/-   ##
=======================================
  Coverage   95.69%   95.69%           
=======================================
  Files          13       13           
  Lines         395      395           
=======================================
  Hits          378      378           
  Misses         17       17           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kellertuer
Copy link
Copy Markdown
Member

Thanks for that, the two tests missing are

  • make sure your code follows the code formatting (on this repository still JulilaFormatter for now)
  • any PR has to have a short entry in the changelog, so all changes are properly documented therein.

@gdalle
Copy link
Copy Markdown
Contributor

gdalle commented Feb 22, 2026

I have ideas on how to make ForwardDiff gradients work on GPUs via DI, if you're interested. It was low-priority but it wouldn't take long for me to implement

@kellertuer
Copy link
Copy Markdown
Member

Help on that side would be very welcome.

@gdalle
Copy link
Copy Markdown
Contributor

gdalle commented Mar 4, 2026

If you check out the branch from JuliaDiff/DifferentiationInterface.jl#974 and use DI.AutoForwardFromPrimitive(AutoForwardDiff()), the ForwardDiff gradients should run on the GPU (but there will still be data transfer with the CPU, as explained here). Let me know if it works / speeds things up for you or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants