v0.7.0
AMDGPU v0.7.0
Merged pull requests:
- Enable 5.4 JLLs on LLVM <16 (#503) (@jpsamaroo)
- Use refs instead of pointers to get a slightly friendlier abi (#504) (@gbaraldi)
- Bump actions/checkout from 3 to 4 (#506) (@dependabot[bot])
- Add ROCm mixed mode (#508) (@pxl-th)
- Do runtime ROCm discovery (#509) (@pxl-th)
- Switch tests to ReTestItems.jl (#511) (@pxl-th)
- Use non-blocking synchronization by default (#512) (@pxl-th)
- Bump GPUCompiler to 0.25 (#513) (@pxl-th)
- Add a method for getrf! (#514) (@amontoison)
- Use branches instead of 'ifelse' (#519) (@pxl-th)
- Interface getrf_batched and getri_batched (#520) (@amontoison)
- Bring back CI (#523) (@pxl-th)
- Add workgroup synchronization primitives (#524) (@pxl-th)
- Use HIP for retrieving GCN arch (#525) (@pxl-th)
- Mention Julia 1.10+ requirement for Navi 3 (#526) (@pxl-th)
Closed issues:
- Runtime Locking (#64)
- 2x slower AMDGPU.jl kernel compared to HIP (#331)
- sincos() x3.5 slower than separate sin()/cos() calls (#341)
- HSA memory fault using
AMDGPU.rand()on device ≠ 1 (#386) - WARNING: could not import AMDGPU.device_libs_path into Compiler (#434)
sincosintrinsic is broken with GPUCompiler 0.24 (#502)- Navi 3 causes
malloc(): unsorted double linked list corrupted(#518)