Commit 21fc296
authored
Added benchmark for LLaMA 3 model for attention tests (#3930)
## Summary
- Added new benchmark for new low precision attention API
- uses Llama3-8b model, prefill
- 2-phase test
- perplexity using the WikiText-2 test set
- runtime over different sequence lengths (1024 -> 131072)
- Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8)
- has options to control torch.compile usage, warmup_iters, sequence length, number of runtime test iterations, rope fusion
## Example Run
python benchmarks/prototype/attention/eval_llama3_model.py --baseline fa3 --test fa3_fp81 parent 1b920d0 commit 21fc296
1 file changed
Lines changed: 482 additions & 0 deletions
0 commit comments