fix: bugs and code quality improvements across prototype modules by tatavishnurao · Pull Request #4332 · pytorch/ao

tatavishnurao · 2026-04-25T09:35:36Z

Diagnostics: Bug fixes and code quality improvements across prototype modules

Summary

Fixes several bugs and code quality issues found across torchao/prototype/attention/, torchao/prototype/quantization/int4/, and torchao/prototype/quantization/embedding/.

Changes

Bug fixes

Replace print() with logger in fusion_utils.py — Production fusion pass used print() to report RoPE fusion results, which cannot be suppressed and floods console output during torch.compile. Switched to logger.info().
- torchao/prototype/attention/shared_utils/fusion_utils.py:965-1108
Add kernel availability guard to from_hp_da8w4() — Calling Int4OpaqueTensor.from_hp_da8w4() directly bypassed the _dispatch_dump assertion that protects the config handler path, resulting in a confusing RuntimeError from the C++ kernel. Added an early check with a clear error message.
- torchao/prototype/quantization/int4/int4_opaque_tensor.py
Replace mutable default argument kwargs={} — _replace_embedding_with_quantized_embedding() used a mutable dict as a default argument (kwargs={}). While harmless in this case (the dict is only read, never mutated), it is a well-known Python antipattern and source of subtle bugs.
- torchao/prototype/quantization/embedding/api.py:145
Add kernel availability guard to QuantizedLinear — QuantizedLinear.forward() dynamically resolves torch.ops.torchao._linear_8bit_act_{N}bit_weight which only exists when C++ kernels are built. Unlike QuantizedEmbedding, there was no guard to prevent an AttributeError crash. Added _is_kernel_library_loaded() assertion.
- torchao/prototype/quantization/embedding/api.py:253

Test robustness

Fix fragile stdout capture in test_rope_fusion_detection.py — The test asserted "1 fused with RoPE" appeared in stdout, which couples the test to the logging mechanism. Updated to capture logger output instead, keeping the test working after the print() → logger migration.
- test/prototype/attention/test_rope_fusion_detection.py:134-141

Code quality

Add public API exports to embedding/__init__.py — The module was empty, forcing users to import from api.py directly. Added re-exports for EmbeddingQuantizer, TiedEmbeddingQuantizer, QuantizedEmbedding, QuantizedEmbeddingFallback, and QuantizedLinear to match the pattern used by int4/__init__.py.
- torchao/prototype/quantization/embedding/__init__.py
Remove unused _is_blackwell() — _is_blackwell() in attention/utils.py was defined but never called anywhere. Removed dead code.
- torchao/prototype/attention/utils.py
Fix misconfigured triton @autotune decorators — hadamard_single_phase1_kernel, hadamard_v_phase1_kernel, and hadamard_rope_single_phase1_kernel had @triton.autotune configs with empty {} (no tunable parameters) and key=["D"] where D is passed as a tl.constexpr. These decorators add launch overhead for zero tuning benefit. Removed the @triton.autotune decorators and inlined the num_warps value from the first config.
- torchao/prototype/attention/quantization/triton_hadamard_qkv_quantization.py
- torchao/prototype/attention/quantization/triton_hadamard_rope_qkv_quantization.py

Issues not fixed (noted for future work)

torch._C._dispatch_dump is undocumented PyTorch internal API — Used in inference_workflow.py:135 and test_ops.py:118-119,360,383,405-407 to check if a C++ kernel is registered. Works reliably but is not part of PyTorch's public API. No better alternative exists currently.

Test plan

pytest test/prototype/attention/test_rope_fusion_detection.py — verify fusion detection tests still pass after print() → logger migration
pytest test/prototype/quantization/test_int4_opaque_tensor.py — verify A16W4 path still works
pytest test/prototype/quantization/test_embedding.py — verify embedding quantization paths
pytest test/quantization/test_quant_api.py — verify PrototypeInt4WeightOnlyConfig and Int8DynamicActivationInt4WeightConfig still register correctly

- Replace print() with logger.info() in fusion_utils.py (unconsoleable output during torch.compile) - Add kernel availability guard to Int4OpaqueTensor.from_hp_da8w4() with clear error message - Add kernel availability guard to QuantizedLinear._forward_2d() to prevent AttributeError - Replace mutable default kwargs={} in _replace_embedding_with_quantized_embedding() - Fix fragile stdout capture in test_rope_fusion_detection.py to use logger capture - Add public API exports to embedding/__init__.py (EmbeddingQuantizer, QuantizedLinear, etc.) - Remove unused _is_blackwell() from attention/utils.py - Remove misconfigured @triton.autotune decorators (empty configs, constexpr key)

pytorch-bot · 2026-04-25T09:35:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4332

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2026-04-25T09:35:41Z

Hi @tatavishnurao!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-04-25T10:09:34Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Xia-Weiwen

The change to torchao/prototype/quantization/int4/int4_opaque_tensor.py looks good to me if with the fix of build flag.

Co-authored-by: Xia Weiwen <xia.weiwen@hotmail.com>

tatavishnurao · 2026-04-27T05:37:37Z

Thanks for the review! I have applied with the suggested USE_CPP_KERNELS=1 wording fix.

tatavishnurao added 2 commits April 23, 2026 00:00

Issues

a5da06e

tatavishnurao requested review from jerryzh168 and vkuzo as code owners April 25, 2026 09:35

tatavishnurao mentioned this pull request Apr 25, 2026

Bug fixes and code quality improvements across prototype modules #4333

Open

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2026

Xia-Weiwen reviewed Apr 27, 2026

View reviewed changes

Comment thread torchao/prototype/quantization/int4/int4_opaque_tensor.py Outdated

Apply suggestion from @Xia-Weiwen

ad7ab73

Co-authored-by: Xia Weiwen <xia.weiwen@hotmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bugs and code quality improvements across prototype modules#4332

fix: bugs and code quality improvements across prototype modules#4332
tatavishnurao wants to merge 3 commits intopytorch:mainfrom
tatavishnurao:fix/prototype-bugs-and-cleanup

tatavishnurao commented Apr 25, 2026

Uh oh!

pytorch-bot Bot commented Apr 25, 2026

Uh oh!

meta-cla Bot commented Apr 25, 2026

Uh oh!

meta-cla Bot commented Apr 25, 2026

Uh oh!

Xia-Weiwen left a comment

Uh oh!

Uh oh!

tatavishnurao commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tatavishnurao commented Apr 25, 2026

Diagnostics: Bug fixes and code quality improvements across prototype modules

Summary

Changes

Bug fixes

Test robustness

Code quality

Issues not fixed (noted for future work)

Test plan

Uh oh!

pytorch-bot Bot commented Apr 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4332

❗ 1 Active SEVs

Uh oh!

meta-cla Bot commented Apr 25, 2026

Action Required

Process

Uh oh!

meta-cla Bot commented Apr 25, 2026

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tatavishnurao commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants