nvfp4 gptq for bmm by vkuzo · Pull Request #4327 · pytorch/ao

vkuzo · 2026-04-24T12:52:38Z

Summary:

Extend GPTQ coverage for bmm, formulating the bmm as a 3d case of mm.
This involves:

refactoring the 2d code to make it easily extendable to 3d
the existing bmm logic was numerically incorrect (used a single
hessian), modify it to instead use E K by K Hessians for an E, N, K
input shape, route to the 2D hessian logic E times. This is slow
but we can optimize later.

We test numerical correctness by bitwise matching E 2d hessian
calculations to the 3D one.

Test Plan:

torchao/prototype/gptq/gptq_nvfp4_llama3_2_1b_nonsequential_wikitext.sh
// gptq accuracy unchanged

[ghstack-poisoned]

vkuzo · 2026-04-24T12:52:40Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-24T12:52:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4327

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2026-04-24T12:53:45Z

-        assert observer.total_batches == 0
-
-    @pytest.mark.skipif(not torch.cuda.is_available(), reason="Need CUDA available")
-    def test_observer_tensor_attributes(self):


this test is not that useful, deleting instead of updating with new contents of the observer tensor

[ghstack-poisoned]

jerryzh168 · 2026-04-24T20:35:10Z

        return torch.Tensor._make_wrapper_subclass(cls, shape, **kwargs)  # type: ignore[attr-defined]

-    def __init__(self, hp_data: torch.Tensor, total_batches: int, hessian=None):
+    def __init__(self, hp_data: torch.Tensor, total_batches, hessian=None):


can you add some docs for total_batches?

sure, let me do that in a future PR (planning today), as we also need to change the definition of this for grouped_mm to sample each token equally instead of sampling each batch equally

[ghstack-poisoned]

vkuzo added 30 commits April 20, 2026 20:52

Update

f46445f

[ghstack-poisoned]

Update

3c92c1a

[ghstack-poisoned]

Update

b513b61

[ghstack-poisoned]

Update

a669b9e

[ghstack-poisoned]

Update

53bd8d0

[ghstack-poisoned]

Update

4c86363

[ghstack-poisoned]

Update

3cc91ed

[ghstack-poisoned]

Update

9b7dc74

[ghstack-poisoned]

Update

d69b32a

[ghstack-poisoned]

Update

294c9cc

[ghstack-poisoned]

Update

65fae62

[ghstack-poisoned]

Update

5ee2ad2

[ghstack-poisoned]

Update

2adda75

[ghstack-poisoned]

Update

6463808

[ghstack-poisoned]

Update

d121bff

[ghstack-poisoned]

Update

80421c8

[ghstack-poisoned]

Update

d302888

[ghstack-poisoned]

Update

9631b76

[ghstack-poisoned]

Update

5fe6574

[ghstack-poisoned]

Update

5292f2f

[ghstack-poisoned]

Update

f679216

[ghstack-poisoned]

Update

68dc794

[ghstack-poisoned]

Update

3ffc619

[ghstack-poisoned]

Update

2f0a3cf

[ghstack-poisoned]

Update

fad1467

[ghstack-poisoned]

Update

f668c26

[ghstack-poisoned]

Update

522de32

[ghstack-poisoned]

Update

f635432

[ghstack-poisoned]

Update

31bcb11

[ghstack-poisoned]

Update

75542fa

[ghstack-poisoned]

vkuzo added 10 commits April 23, 2026 18:44

Update

b664d8a

[ghstack-poisoned]

Update

dc35c65

[ghstack-poisoned]

Update

f23cf38

[ghstack-poisoned]

Update

6650b1d

[ghstack-poisoned]

Update

4d9b68f

[ghstack-poisoned]

Update

8a21110

[ghstack-poisoned]

Update

4a456ec

[ghstack-poisoned]

Update

f8d1861

[ghstack-poisoned]

Update

932677b

[ghstack-poisoned]

Update

0c74af8

[ghstack-poisoned]

vkuzo requested a review from jerryzh168 as a code owner April 24, 2026 12:52

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 24, 2026

This was referenced Apr 24, 2026

install debug expert token counters on nvfp4 moe test script #4322

Merged

nvfp4 gptq: prep for bmm and grouped_mm #4323

Merged

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 24, 2026

vkuzo commented Apr 24, 2026

View reviewed changes

This was referenced Apr 24, 2026

extend GPTQ coverage to grouped_mm #4328

Merged

extend gptq example script with olmoe model #4329

Merged

vkuzo added 2 commits April 24, 2026 14:04

Update

9efbf9f

[ghstack-poisoned]

Update

5daac37

[ghstack-poisoned]

This was referenced Apr 24, 2026

make gptq convert work for moe #4330

Merged

[wip] perf improvements #4331

Open

jerryzh168 reviewed Apr 24, 2026

View reviewed changes

jerryzh168 approved these changes Apr 24, 2026

View reviewed changes

Update

0306ca4

[ghstack-poisoned]

vkuzo requested a review from danielvegamyhre as a code owner April 27, 2026 12:07

vkuzo changed the base branch from gh/vkuzo/259/head to main April 27, 2026 12:07

vkuzo merged commit f9a5f28 into main Apr 27, 2026
38 of 66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nvfp4 gptq for bmm#4327

nvfp4 gptq for bmm#4327
vkuzo merged 58 commits intomainfrom
gh/vkuzo/260/head

vkuzo commented Apr 24, 2026 •

edited

Loading

Uh oh!

vkuzo commented Apr 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

vkuzo Apr 24, 2026

Uh oh!

jerryzh168 Apr 24, 2026

Uh oh!

vkuzo Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4327

❗ 1 Active SEVs

Uh oh!

vkuzo Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

vkuzo Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Apr 24, 2026 •

edited

Loading

vkuzo commented Apr 24, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 24, 2026 •

edited

Loading