Skip to content

add GPU log!, inverse_retract_polar!, parallel_transport_direction!, distance for Grassmann#8

Merged
mateuszbaran merged 2 commits intoJuliaManifolds:mainfrom
zazabap:feat/grassmann-log-transport
Apr 9, 2026
Merged

add GPU log!, inverse_retract_polar!, parallel_transport_direction!, distance for Grassmann#8
mateuszbaran merged 2 commits intoJuliaManifolds:mainfrom
zazabap:feat/grassmann-log-transport

Conversation

@zazabap
Copy link
Copy Markdown
Contributor

@zazabap zazabap commented Apr 9, 2026

Summary

Add batched GPU overrides for Grassmann log!, inverse_retract_polar!, parallel_transport_direction!, and distance on PowerManifold{ℝ, <:Grassmann{ℝ}} with CuArray{T,3}. Covers steps 3 + 4 from the roadmap (#5).

Batched right-division q / (p'q) is computed as q * inv(p'q) via existing _batched_inv_gpu (safe for small, well-conditioned k×k matrices on Grassmann).

Operations added

Method Algorithm
inverse_retract_polar! q * inv(p'q) - p via batched LU inverse + GEMM
inverse_retract! Dispatcher for PolarInverseRetraction
log! Inverse retract → batched SVD → U * diag(atan(S)) * V'
parallel_transport_direction! SVD of direction + sin/cos rotation + projection (avoids n×n identity)
vector_transport_direction! Dispatcher for ParallelTransport
distance Inverse retract → batched SVD → norm(atan.(S))

Files changed

File Change
ext/ManifoldsGPUCUDAExt/Grassmann.jl +104 lines: 6 new methods
ext/ManifoldsGPUCUDAExt/ManifoldsGPUCUDAExt.jl +6 lines: imports for new functions
test/cuda/test_grassmann.jl +231 lines: 14 test blocks (Float64/Float32 + large-matrix fallback)
benchmarks/utils.jl +32 lines: _benchmark_distance helper
benchmarks/Grassmann.jl +4 lines: distance benchmark entry

Benchmarks

Grassmann(32, 16), batch=2048, Float32 — within gesvdj! limit:

Operation CPU (ms) GPU (ms) Speedup Relative error
exp 167.38 5.99 27.93x 7.02e-5
log! 151.02 3.76 40.12x 1.25e-5
inner 0.59 0.10 5.88x 2.64e-6
norm 1.52 0.09 17.43x 1.85e-7
project! 2.25 0.19 12.01x 1.32e-7
retract_fused!(PolarRetraction) 91.29 3.25 28.05x 1.33e-6
distance 146.90 3.62 40.53x 0.0

Grassmann(64, 32), batch=2048, Float32 — exceeds gesvdj! limit, uses gesvda! fallback:

Operation CPU (ms) GPU (ms) Speedup Relative error
exp 720.21 225.22 3.20x 8.21e-5
log! 510.03 122.64 4.16x 9.38e-5
inner 2.21 0.12 18.40x 3.76e-4
norm 5.75 0.11 53.52x 1.31e-7
project! 9.67 0.56 17.31x 1.96e-7
retract_fused!(PolarRetraction) 327.76 116.23 2.82x 6.84e-7
distance 500.25 126.54 3.95x 6.55e-8

SVD-heavy operations (log!, distance, exp, polar retract) get ~40x speedup within the gesvdj! 32×32 limit and ~4x with the gesvda! fallback for larger matrices.

Tests

  • 14 new CUDA testsets: Float64/Float32 for each operation + large-matrix (64×32) fallback tests for log!, parallel_transport_direction!, and distance
  • All 243 existing JLArray tests pass

Copy link
Copy Markdown
Member

@mateuszbaran mateuszbaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good again 👍 . I only have one small suggestion for a small improvement 🙂 .

Comment thread ext/ManifoldsGPUCUDAExt/Grassmann.jl Outdated
Replace allocating gemm_strided_batched with in-place gemm_strided_batched!
in inverse_retract_polar!, log!, and parallel_transport_direction! as
suggested in PR review.
@mateuszbaran mateuszbaran merged commit a13ddfd into JuliaManifolds:main Apr 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants