Add "32x1 transposed" variant to MXFP8 3D quantization kernel#4356
Add "32x1 transposed" variant to MXFP8 3D quantization kernel#4356alexsamardzic wants to merge 2 commits intogh/alexsamardzic/1/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4356
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 04b49da with merge base 9052ece ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Benchmarking results: |
|
@alexsamardzic can you benchmark this against the 2 stage approach we do here in ao/torchao/prototype/moe_training/mxfp8_grouped_mm.py Lines 555 to 558 in 9052ece |
Here is an adapted benchmarking script to compare between the two: bench_quantize_3d_vs_triton.py. And here are the results: |
Stack from ghstack (oldest at bottom):