Skip to content

feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1

Closed
zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
zazabap:main
Closed

feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1
zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
zazabap:main

Conversation

@zazabap
Copy link
Copy Markdown
Contributor

@zazabap zazabap commented Mar 5, 2026

Summary

Ports GPU-native manifold operations from Manifolds.jl PR #856 into ManifoldsGPU.jl, implementing batched exp! (and log! where applicable) for five manifolds on PowerManifold(M, batch) with ArrayPowerRepresentation and 3D/2D CuArray.

  • Grassmann: exp! via gesvdj! (thin SVD) + gemm_strided_batched; formula exp_p(X) = (p·V·cos(S) + U·sin(S))·V'; polar retraction identical to Stiefel
  • Sphere: exp! and log! via pure broadcasting; handles θ ≈ 0 and d ≈ 0 with ifelse
  • Euclidean: exp! and log! via broadcasting (flat manifold)
  • UnitaryMatrices: exp! via complex Scaling & Squaring Taylor series (_matrix_exp_gpu extended to CuArray{Complex{T},3}); q = p · expm(X)
  • SymmetricPositiveDefinite: exp! via serial CPU eigendecomp per slice (no batched GPU eigendecomp in CUDA.jl yet); output symmetrized to satisfy is_point

Also includes CI-safe JLArray test suite restructure and test quality fixes.

Test Plan

  • All 67 tests pass (julia --project=. -e 'using Pkg; Pkg.test()')
  • JLArray tests run in CI without GPU hardware (verifies CPU fallback paths)
  • CUDA tests run on GPU hardware (verified locally)
  • Float32 and Float64 variants tested for all manifolds
  • is_point membership check passes for all GPU results
  • CPU/GPU agreement within tolerance (atol=2e-14 for Float64, 2e-5 for Float32)

Notes

  • Grassmann exp tests use 3-arg isapprox(M, A, B) (subspace distance) rather than element-wise comparison — CPU LAPACK and GPU cuSOLVER Jacobi can produce different orthonormal representatives of the same subspace when singular values are nearly degenerate
  • SPD exp! is marked with a TODO to replace with syevjBatched when CUDA.jl exposes it

🤖 Generated with Claude Code

zazabap and others added 7 commits March 5, 2026 09:20
- Add GPUArrays.jl and JLArrays.jl as test dependencies for JLArray-based testing
- Split tests: jlarray_tests.jl (CI-safe, no GPU needed) + cuda_tests.jl (manual)
- runtests.jl: always run JLArray tests, conditionally run CUDA tests
- JLArray tests verify CPU fallback correctness and package loading
- Enable scalar indexing for JLArray tests (CPU fallback in Manifolds.jl requires it)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Random.seed!(45) and is_point check to Stiefel CUDA testset
- Fix tolerance 1e-10 -> 2e-14 in basic Stiefel testset
- Add comment explaining 2e-12 tolerance in fallback stress testset
- Remove redundant using declarations from jlarray_tests.jl
- Tighten JLArrays compat from "0.1, 0.2, 0.3" to "0.3"
- Document cross-file seed reuse

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements batched geodesic exponential and polar retraction for
PowerManifold(Grassmann(n, k), batch) on CuArray{T,3}.

exp! uses cuSOLVER gesvdj! (batched Jacobi SVD) to compute the
geodesic formula exp_p(X) = (p·V·cos(S) + U·sin(S))·V', where
X = U·diag(S)·V' is the thin SVD. gesvdj! returns V directly (not V');
the 'T' flag in the final gemm_strided_batched call applies the transpose.

The polar retraction reuses the Stiefel polar formula (SVD projection
onto Stiefel, which represents the Grassmann quotient). Falls back to
per-slice svd! (via cuSOLVER) when gesvdj! exceeds its size limit.

Tests: Grassmann exp and polar retraction (Float32/Float64, JLArray and
CUDA), plus a stress test. Grassmann exp tests use 3-arg isapprox
(manifold-aware subspace distance) to correctly handle cases where CPU
LAPACK and GPU cuSOLVER converge to different orthogonal representatives
of the same degenerate Grassmann subspace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements batched geodesic exponential and logarithm for
PowerManifold(Sphere(n), batch) with ArrayPowerRepresentation.

Strategy: pure broadcasting — no LAPACK/cuSOLVER required.
- exp!: cos(θ)·p + sin(θ)/θ·X where θ=||X||, handles θ≈0 via ifelse
- log!: d·(q - cos(d)·p)/sin(d) where d=arccos(⟨p,q⟩), handles d≈0

Shape: PowerManifold(Sphere(n), batch) uses (n+1, batch) 2D arrays.
Dispatches on CuArray{T,2}.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flat manifold — geodesics are straight lines. Pure broadcasting, no LAPACK.
PowerManifold(Euclidean(n), batch) uses (n, batch) 2D arrays;
dispatches on CuArray{T,N} to handle any dimensionality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onential

Extends _matrix_exp_gpu in helpers.jl to support CuArray{Complex{T},3}
(Scaling & Squaring with Taylor series, same orders as real: 18 for Float32,
30 for Float64).

exp! for batched UnitaryMatrices: q = p · expm(X) where X is skew-Hermitian.
expm of a skew-Hermitian matrix is unitary, so q inherits p's unitarity.
Supports ComplexF32 (atol=2e-4) and ComplexF64 (atol=2e-12).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Serial batch implementation: each SPD exp uses CPU eigendecomposition
(no batched GPU eigendecomp available in CUDA.jl at time of writing).

Formula: exp_p(X) = p^{1/2} · expm(p^{-1/2} · X · p^{-1/2}) · p^{1/2}
Output is symmetrized (0.5*(result + result')) to ensure exact symmetry
within Manifolds.jl's is_point tolerance.

TODO: replace with cusolverDnSsyevjBatched when exposed in CUDA.jl.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zazabap zazabap closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant