Skip to content

Commit deb2e10

Browse files
committed
fix(cuda_std): use correct PTX scope suffix in block acqrel fence
This updates the inline assembly to use the correct `.cta` (cooperative thread array) scope suffix, ensuring that block-level fences don't incur the unnecessary performance overhead of a system-wide synchronization.
1 parent 60b86e1 commit deb2e10

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

crates/cuda_std/src/atomic/intrinsics.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ pub unsafe fn fence_acqrel_device() {
4242

4343
#[gpu_only]
4444
pub unsafe fn fence_acqrel_block() {
45-
asm!("fence.acq_rel.sys;");
45+
asm!("fence.acq_rel.cta;");
4646
}
4747

4848
#[gpu_only]

0 commit comments

Comments
 (0)