Commit 21fc296

authored

Added benchmark for LLaMA 3 model for attention tests (#3930)

## Summary - Added new benchmark for new low precision attention API - uses Llama3-8b model, prefill - 2-phase test - perplexity using the WikiText-2 test set - runtime over different sequence lengths (1024 -> 131072) - Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8) - has options to control torch.compile usage, warmup_iters, sequence length, number of runtime test iterations, rope fusion ## Example Run python benchmarks/prototype/attention/eval_llama3_model.py --baseline fa3 --test fa3_fp8

1 parent 1b920d0 commit 21fc296Copy full SHA for 21fc296

1 file changed

benchmarks/prototype/attention
- eval_llama3_model.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 21fc296

File tree

0 commit comments