feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port) by zazabap · Pull Request #1 · JuliaManifolds/ManifoldsGPU.jl

zazabap · 2026-03-05T13:28:33Z

Summary

Ports GPU-native manifold operations from Manifolds.jl PR #856 into ManifoldsGPU.jl, implementing batched exp! (and log! where applicable) for five manifolds on PowerManifold(M, batch) with ArrayPowerRepresentation and 3D/2D CuArray.

Grassmann: exp! via gesvdj! (thin SVD) + gemm_strided_batched; formula exp_p(X) = (p·V·cos(S) + U·sin(S))·V'; polar retraction identical to Stiefel
Sphere: exp! and log! via pure broadcasting; handles θ ≈ 0 and d ≈ 0 with ifelse
Euclidean: exp! and log! via broadcasting (flat manifold)
UnitaryMatrices: exp! via complex Scaling & Squaring Taylor series (_matrix_exp_gpu extended to CuArray{Complex{T},3}); q = p · expm(X)
SymmetricPositiveDefinite: exp! via serial CPU eigendecomp per slice (no batched GPU eigendecomp in CUDA.jl yet); output symmetrized to satisfy is_point

Also includes CI-safe JLArray test suite restructure and test quality fixes.

Test Plan

All 67 tests pass (julia --project=. -e 'using Pkg; Pkg.test()')
JLArray tests run in CI without GPU hardware (verifies CPU fallback paths)
CUDA tests run on GPU hardware (verified locally)
Float32 and Float64 variants tested for all manifolds
is_point membership check passes for all GPU results
CPU/GPU agreement within tolerance (atol=2e-14 for Float64, 2e-5 for Float32)

Notes

Grassmann exp tests use 3-arg isapprox(M, A, B) (subspace distance) rather than element-wise comparison — CPU LAPACK and GPU cuSOLVER Jacobi can produce different orthonormal representatives of the same subspace when singular values are nearly degenerate
SPD exp! is marked with a TODO to replace with syevjBatched when CUDA.jl exposes it

🤖 Generated with Claude Code

- Add GPUArrays.jl and JLArrays.jl as test dependencies for JLArray-based testing - Split tests: jlarray_tests.jl (CI-safe, no GPU needed) + cuda_tests.jl (manual) - runtests.jl: always run JLArray tests, conditionally run CUDA tests - JLArray tests verify CPU fallback correctness and package loading - Enable scalar indexing for JLArray tests (CPU fallback in Manifolds.jl requires it) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add Random.seed!(45) and is_point check to Stiefel CUDA testset - Fix tolerance 1e-10 -> 2e-14 in basic Stiefel testset - Add comment explaining 2e-12 tolerance in fallback stress testset - Remove redundant using declarations from jlarray_tests.jl - Tighten JLArrays compat from "0.1, 0.2, 0.3" to "0.3" - Document cross-file seed reuse Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements batched geodesic exponential and polar retraction for PowerManifold(Grassmann(n, k), batch) on CuArray{T,3}. exp! uses cuSOLVER gesvdj! (batched Jacobi SVD) to compute the geodesic formula exp_p(X) = (p·V·cos(S) + U·sin(S))·V', where X = U·diag(S)·V' is the thin SVD. gesvdj! returns V directly (not V'); the 'T' flag in the final gemm_strided_batched call applies the transpose. The polar retraction reuses the Stiefel polar formula (SVD projection onto Stiefel, which represents the Grassmann quotient). Falls back to per-slice svd! (via cuSOLVER) when gesvdj! exceeds its size limit. Tests: Grassmann exp and polar retraction (Float32/Float64, JLArray and CUDA), plus a stress test. Grassmann exp tests use 3-arg isapprox (manifold-aware subspace distance) to correctly handle cases where CPU LAPACK and GPU cuSOLVER converge to different orthogonal representatives of the same degenerate Grassmann subspace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements batched geodesic exponential and logarithm for PowerManifold(Sphere(n), batch) with ArrayPowerRepresentation. Strategy: pure broadcasting — no LAPACK/cuSOLVER required. - exp!: cos(θ)·p + sin(θ)/θ·X where θ=||X||, handles θ≈0 via ifelse - log!: d·(q - cos(d)·p)/sin(d) where d=arccos(⟨p,q⟩), handles d≈0 Shape: PowerManifold(Sphere(n), batch) uses (n+1, batch) 2D arrays. Dispatches on CuArray{T,2}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Flat manifold — geodesics are straight lines. Pure broadcasting, no LAPACK. PowerManifold(Euclidean(n), batch) uses (n, batch) 2D arrays; dispatches on CuArray{T,N} to handle any dimensionality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…onential Extends _matrix_exp_gpu in helpers.jl to support CuArray{Complex{T},3} (Scaling & Squaring with Taylor series, same orders as real: 18 for Float32, 30 for Float64). exp! for batched UnitaryMatrices: q = p · expm(X) where X is skew-Hermitian. expm of a skew-Hermitian matrix is unitary, so q inherits p's unitarity. Supports ComplexF32 (atol=2e-4) and ComplexF64 (atol=2e-12). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Serial batch implementation: each SPD exp uses CPU eigendecomposition (no batched GPU eigendecomp available in CUDA.jl at time of writing). Formula: exp_p(X) = p^{1/2} · expm(p^{-1/2} · X · p^{-1/2}) · p^{1/2} Output is symmetrized (0.5*(result + result')) to ensure exact symmetry within Manifolds.jl's is_point tolerance. TODO: replace with cusolverDnSsyevjBatched when exposed in CUDA.jl. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zazabap and others added 7 commits March 5, 2026 09:20

zazabap closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1

feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1
zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
zazabap:main

zazabap commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zazabap commented Mar 5, 2026

Summary

Test Plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant