-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Pull requests: Dao-AILab/flash-attention
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
layers/rotary: add rope_scaling support (linear, NTK, YaRN)
#2451
opened Apr 11, 2026 by
GitGlimpse895
Loading…
[Perf] SM103 tcgen05.ld.red for fused TMEM load + row-max in softmax
#2449
opened Apr 9, 2026 by
LopezCastroRoberto
Loading…
expose num_splits for FA2 and add option for kernel blocksize alignment
#2448
opened Apr 8, 2026 by
liangel-02
Contributor
Loading…
Add forward and backward support for FA4 hdim=256 on SM100
#2447
opened Apr 8, 2026 by
wenkechen
Loading…
[Cute,Fwd,Sm90] Ceil div in paged kv manager to prevent size 0
#2446
opened Apr 8, 2026 by
imbr92
Contributor
Loading…
[FA3][Hopper] Make the split heuristic sequence-aware for low-tile decode
#2445
opened Apr 8, 2026 by
mllopartbsc
Loading…
[Cute,Sm100,Fwd] add MLA 64/512 with topk sparsity for MQA 128 heads
#2441
opened Apr 6, 2026 by
jayhshah
Collaborator
Loading…
Add dropout support to CuTe DSL attention kernels
#2439
opened Apr 6, 2026 by
blake-snc
Contributor
Loading…
10 of 12 tasks
[CuTe, SM120] Fix forward/backward bugs and add SM120 test guards
#2420
opened Apr 1, 2026 by
2imi9
Loading…
Add compress_factor for compressed causal attention
#2418
opened Mar 31, 2026 by
jduprat
Contributor
Loading…
Fix SM120 forward pass crash: parent __init__ overwrites arch, enabling unsupported TMA path
#2416
opened Mar 31, 2026 by
moghon92
Loading…
Feat([FA4][CUTE DSL]) Add head_dim=256 support (forward + backward)
#2412
opened Mar 30, 2026 by
wangsiyu
Loading…
chore(tests): move benchmarks to benchmarks/cute/ and reduce test prints
#2408
opened Mar 29, 2026 by
NJX-njx
Contributor
Loading…
3 tasks
fix(flash_fwd_sm90): zero partial V smem to prevent 0*NaN=NaN in PV GEMM
#2407
opened Mar 29, 2026 by
NJX-njx
Contributor
Loading…
3 tasks
[CuTe, SM120] Forward kernel with optimized TMA path and full features support
#2406
opened Mar 28, 2026 by
sisgrad
Loading…
feat: setup_context for FlashAttnFunc (torch.func.grad)
#2405
opened Mar 28, 2026 by
NJX-njx
Contributor
Loading…
fix(cute): SM120 forward/bwd and atomic add compatibility
#2404
opened Mar 28, 2026 by
NJX-njx
Contributor
Loading…
build(windows): MSVC conforming preprocessor for CUDA 13+ and ninja warning
#2403
opened Mar 28, 2026 by
NJX-njx
Contributor
Loading…
[Cute,Sm80,Fwd] Guard kernel body with tile validity check to fix varlen illegal memory access
#2391
opened Mar 25, 2026 by
zhuochenKIDD
Contributor
Loading…
Add SM80/SM120 block-sparse forward attention support
#2389
opened Mar 25, 2026 by
blake-snc
Contributor
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.