Skip to content

Commit 21fc296

Browse files
Added benchmark for LLaMA 3 model for attention tests (#3930)
## Summary - Added new benchmark for new low precision attention API - uses Llama3-8b model, prefill - 2-phase test - perplexity using the WikiText-2 test set - runtime over different sequence lengths (1024 -> 131072) - Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8) - has options to control torch.compile usage, warmup_iters, sequence length, number of runtime test iterations, rope fusion ## Example Run python benchmarks/prototype/attention/eval_llama3_model.py --baseline fa3 --test fa3_fp8
1 parent 1b920d0 commit 21fc296

1 file changed

Lines changed: 482 additions & 0 deletions

File tree

0 commit comments

Comments
 (0)