[moe training] default pad_token_groups_for_grouped_mm=False by danielvegamyhre · Pull Request #4080 · pytorch/ao

danielvegamyhre · 2026-03-13T20:12:41Z

Summary

Default pad_token_groups_for_grouped_mm to False to avoid surprising users with the extra pad/unpad kernels that incur overhead. This is often handled upstream of the quantization + grouped MM step now, by systems like HybridEP etc, so best for the user to explicitly request the padding via the flag if they need it.
Update tests and benchmarks accordingly

Tests

pytest test/prototype/moe_training/test_training.py -s

pytorch-bot · 2026-03-13T20:12:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4080

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit b6bf1a5 with merge base eb64bfb ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_fusion.py::TestDynamicPatternMatcher::test_q_attention_block
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/quantization/pt2e/test_x86inductor_fusion.py::TestDynamicPatternMatcher::test_q_attention_block

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 13, 2026

danielvegamyhre force-pushed the defaultoff branch 2 times, most recently from f6a27bd to 10362fc Compare March 13, 2026 20:16

drisspg approved these changes Mar 13, 2026

View reviewed changes

danielvegamyhre added mx module: training quantize_ api training flow labels Mar 13, 2026

danielvegamyhre added this to the MXFP8 Training milestone Mar 13, 2026

danielvegamyhre force-pushed the defaultoff branch 3 times, most recently from 7911823 to 5eb9e5e Compare March 13, 2026 22:29

[mxfp8 moe training] default pad_token_groups_for_grouped_mm to False

b6bf1a5

danielvegamyhre force-pushed the defaultoff branch from 5eb9e5e to b6bf1a5 Compare March 13, 2026 22:29

danielvegamyhre mentioned this pull request Mar 13, 2026

[ROCm] Enable MXFP8/MXFP4 emulation tests on ROCm (MI300+) #4041

Open

danielvegamyhre merged commit 960f307 into main Mar 14, 2026
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe training] default pad_token_groups_for_grouped_mm=False#4080

[moe training] default pad_token_groups_for_grouped_mm=False#4080
danielvegamyhre merged 1 commit intomainfrom
defaultoff

danielvegamyhre commented Mar 13, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielvegamyhre commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Uh oh!

pytorch-bot Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4080

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielvegamyhre commented Mar 13, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 13, 2026 •

edited

Loading