Skip to content

Commit d91ea94

Browse files
committed
[Bwd,Sm100] Add fence_view_async_shared before LSE release
1 parent 5ded17f commit d91ea94

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

flash_attn/cute/flash_bwd_sm100.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3133,6 +3133,7 @@ def compute_loop(
31333133
)
31343134

31353135
cute.arch.fence_view_async_tmem_store()
3136+
cute.arch.fence_view_async_shared()
31363137
self.compute_sync_barrier.arrive_and_wait()
31373138
if const_expr(not self.tile_hdim == 192):
31383139
# Signal tmem store P completion with pipeline_S_P

0 commit comments

Comments
 (0)