Add blockwise FP8 roofline benchmark#4344
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4344
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit d11f401 with merge base 9058b58 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@iamzainhuda nice progress, i haven't looked at the code yet (lmk when it's ready for review) but please be sure to include the benchmark data for DSV3 671b shapes in the PR description when done |
added the benchmark results for both compile and non compiled runs with scaled_mm backend. i have another PR with a kernel optimization that should improve the speedup a bit |
|
@iamzainhuda please fix the ruff/linter issue |
yup fixed |
Summary
Added FP8 blockwise linear into roofline script. This will let us compare performance of our FP8 blockwise linear layer against FP16 linear layer across the DeepSeek V3 shapes.
Added:
blockwise_fp8_trainingroofline mode selected via--mx_recipe_name=blockwise_fp8_training.--shape_gen_name=dsv3-16b-671b.M=seq_len=4096,dim=7168,inter_dim=18432.roofline_utilshardware specs, including FP8 GEMM time and modeled quantization overhead.--blockwise_use_triton=Falseuses the scaled_mm backend.--blockwise_use_triton=Trueuses the custom Triton GEMM backend.--blockwise_compile_benchmarks=Truecompiles the blockwise benchmark path.Roofline comparison column added as:
b_fp8_e2e_spdp_pct_of_r: measured speedup / roofline speedup * 100.Results
2.13.0a0+gitd1299910.17.0+gitcf0b50ae1dsv3-16b-671bblockwise_scaled_mm4.209 ms2.103 ms0.647 ms2.750 ms1.530xNon-compiled command:
(M,K,N)dsv3.ffn.w1(4096,7168,18432)dsv3.ffn.w2(4096,18432,7168)dsv3.ffn.w3(4096,7168,18432)Compiled command:
(M,K,N)dsv3.ffn.w1(4096,7168,18432)dsv3.ffn.w2(4096,18432,7168)dsv3.ffn.w3(4096,7168,18432)Note: the compiled run emits a Dynamo warning for
_scaled_mm_v2on thisscript-only branch. The custom-op routing that removes that warning is part of an upcoming PR (will link when ready)