feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1
Closed
zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
Closed
feat: GPU ops for Grassmann, Sphere, Euclidean, UnitaryMatrices, SPD (PR #856 port)#1zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
zazabap wants to merge 7 commits intoJuliaManifolds:mainfrom
Conversation
- Add GPUArrays.jl and JLArrays.jl as test dependencies for JLArray-based testing - Split tests: jlarray_tests.jl (CI-safe, no GPU needed) + cuda_tests.jl (manual) - runtests.jl: always run JLArray tests, conditionally run CUDA tests - JLArray tests verify CPU fallback correctness and package loading - Enable scalar indexing for JLArray tests (CPU fallback in Manifolds.jl requires it) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Random.seed!(45) and is_point check to Stiefel CUDA testset - Fix tolerance 1e-10 -> 2e-14 in basic Stiefel testset - Add comment explaining 2e-12 tolerance in fallback stress testset - Remove redundant using declarations from jlarray_tests.jl - Tighten JLArrays compat from "0.1, 0.2, 0.3" to "0.3" - Document cross-file seed reuse Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements batched geodesic exponential and polar retraction for
PowerManifold(Grassmann(n, k), batch) on CuArray{T,3}.
exp! uses cuSOLVER gesvdj! (batched Jacobi SVD) to compute the
geodesic formula exp_p(X) = (p·V·cos(S) + U·sin(S))·V', where
X = U·diag(S)·V' is the thin SVD. gesvdj! returns V directly (not V');
the 'T' flag in the final gemm_strided_batched call applies the transpose.
The polar retraction reuses the Stiefel polar formula (SVD projection
onto Stiefel, which represents the Grassmann quotient). Falls back to
per-slice svd! (via cuSOLVER) when gesvdj! exceeds its size limit.
Tests: Grassmann exp and polar retraction (Float32/Float64, JLArray and
CUDA), plus a stress test. Grassmann exp tests use 3-arg isapprox
(manifold-aware subspace distance) to correctly handle cases where CPU
LAPACK and GPU cuSOLVER converge to different orthogonal representatives
of the same degenerate Grassmann subspace.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements batched geodesic exponential and logarithm for
PowerManifold(Sphere(n), batch) with ArrayPowerRepresentation.
Strategy: pure broadcasting — no LAPACK/cuSOLVER required.
- exp!: cos(θ)·p + sin(θ)/θ·X where θ=||X||, handles θ≈0 via ifelse
- log!: d·(q - cos(d)·p)/sin(d) where d=arccos(⟨p,q⟩), handles d≈0
Shape: PowerManifold(Sphere(n), batch) uses (n+1, batch) 2D arrays.
Dispatches on CuArray{T,2}.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flat manifold — geodesics are straight lines. Pure broadcasting, no LAPACK.
PowerManifold(Euclidean(n), batch) uses (n, batch) 2D arrays;
dispatches on CuArray{T,N} to handle any dimensionality.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onential
Extends _matrix_exp_gpu in helpers.jl to support CuArray{Complex{T},3}
(Scaling & Squaring with Taylor series, same orders as real: 18 for Float32,
30 for Float64).
exp! for batched UnitaryMatrices: q = p · expm(X) where X is skew-Hermitian.
expm of a skew-Hermitian matrix is unitary, so q inherits p's unitarity.
Supports ComplexF32 (atol=2e-4) and ComplexF64 (atol=2e-12).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Serial batch implementation: each SPD exp uses CPU eigendecomposition
(no batched GPU eigendecomp available in CUDA.jl at time of writing).
Formula: exp_p(X) = p^{1/2} · expm(p^{-1/2} · X · p^{-1/2}) · p^{1/2}
Output is symmetrized (0.5*(result + result')) to ensure exact symmetry
within Manifolds.jl's is_point tolerance.
TODO: replace with cusolverDnSsyevjBatched when exposed in CUDA.jl.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports GPU-native manifold operations from Manifolds.jl PR #856 into ManifoldsGPU.jl, implementing batched
exp!(andlog!where applicable) for five manifolds onPowerManifold(M, batch)withArrayPowerRepresentationand 3D/2DCuArray.exp!viagesvdj!(thin SVD) +gemm_strided_batched; formulaexp_p(X) = (p·V·cos(S) + U·sin(S))·V'; polar retraction identical to Stiefelexp!andlog!via pure broadcasting; handlesθ ≈ 0andd ≈ 0withifelseexp!andlog!via broadcasting (flat manifold)exp!via complex Scaling & Squaring Taylor series (_matrix_exp_gpuextended toCuArray{Complex{T},3});q = p · expm(X)exp!via serial CPU eigendecomp per slice (no batched GPU eigendecomp in CUDA.jl yet); output symmetrized to satisfyis_pointAlso includes CI-safe JLArray test suite restructure and test quality fixes.
Test Plan
julia --project=. -e 'using Pkg; Pkg.test()')is_pointmembership check passes for all GPU resultsatol=2e-14for Float64,2e-5for Float32)Notes
exptests use 3-argisapprox(M, A, B)(subspace distance) rather than element-wise comparison — CPU LAPACK and GPU cuSOLVER Jacobi can produce different orthonormal representatives of the same subspace when singular values are nearly degenerateexp!is marked with a TODO to replace withsyevjBatchedwhen CUDA.jl exposes it🤖 Generated with Claude Code