Skip to content

fix: bugs and code quality improvements across prototype modules#4332

Open
tatavishnurao wants to merge 3 commits intopytorch:mainfrom
tatavishnurao:fix/prototype-bugs-and-cleanup
Open

fix: bugs and code quality improvements across prototype modules#4332
tatavishnurao wants to merge 3 commits intopytorch:mainfrom
tatavishnurao:fix/prototype-bugs-and-cleanup

Conversation

@tatavishnurao
Copy link
Copy Markdown

Diagnostics: Bug fixes and code quality improvements across prototype modules

Summary

Fixes several bugs and code quality issues found across torchao/prototype/attention/, torchao/prototype/quantization/int4/, and torchao/prototype/quantization/embedding/.

Changes

Bug fixes

  • Replace print() with logger in fusion_utils.py — Production fusion pass used print() to report RoPE fusion results, which cannot be suppressed and floods console output during torch.compile. Switched to logger.info().

    • torchao/prototype/attention/shared_utils/fusion_utils.py:965-1108
  • Add kernel availability guard to from_hp_da8w4() — Calling Int4OpaqueTensor.from_hp_da8w4() directly bypassed the _dispatch_dump assertion that protects the config handler path, resulting in a confusing RuntimeError from the C++ kernel. Added an early check with a clear error message.

    • torchao/prototype/quantization/int4/int4_opaque_tensor.py
  • Replace mutable default argument kwargs={}_replace_embedding_with_quantized_embedding() used a mutable dict as a default argument (kwargs={}). While harmless in this case (the dict is only read, never mutated), it is a well-known Python antipattern and source of subtle bugs.

    • torchao/prototype/quantization/embedding/api.py:145
  • Add kernel availability guard to QuantizedLinearQuantizedLinear.forward() dynamically resolves torch.ops.torchao._linear_8bit_act_{N}bit_weight which only exists when C++ kernels are built. Unlike QuantizedEmbedding, there was no guard to prevent an AttributeError crash. Added _is_kernel_library_loaded() assertion.

    • torchao/prototype/quantization/embedding/api.py:253

Test robustness

  • Fix fragile stdout capture in test_rope_fusion_detection.py — The test asserted "1 fused with RoPE" appeared in stdout, which couples the test to the logging mechanism. Updated to capture logger output instead, keeping the test working after the print()logger migration.
    • test/prototype/attention/test_rope_fusion_detection.py:134-141

Code quality

  • Add public API exports to embedding/__init__.py — The module was empty, forcing users to import from api.py directly. Added re-exports for EmbeddingQuantizer, TiedEmbeddingQuantizer, QuantizedEmbedding, QuantizedEmbeddingFallback, and QuantizedLinear to match the pattern used by int4/__init__.py.

    • torchao/prototype/quantization/embedding/__init__.py
  • Remove unused _is_blackwell()_is_blackwell() in attention/utils.py was defined but never called anywhere. Removed dead code.

    • torchao/prototype/attention/utils.py
  • Fix misconfigured triton @autotune decoratorshadamard_single_phase1_kernel, hadamard_v_phase1_kernel, and hadamard_rope_single_phase1_kernel had @triton.autotune configs with empty {} (no tunable parameters) and key=["D"] where D is passed as a tl.constexpr. These decorators add launch overhead for zero tuning benefit. Removed the @triton.autotune decorators and inlined the num_warps value from the first config.

    • torchao/prototype/attention/quantization/triton_hadamard_qkv_quantization.py
    • torchao/prototype/attention/quantization/triton_hadamard_rope_qkv_quantization.py

Issues not fixed (noted for future work)

  • torch._C._dispatch_dump is undocumented PyTorch internal API — Used in inference_workflow.py:135 and test_ops.py:118-119,360,383,405-407 to check if a C++ kernel is registered. Works reliably but is not part of PyTorch's public API. No better alternative exists currently.

Test plan

  • pytest test/prototype/attention/test_rope_fusion_detection.py — verify fusion detection tests still pass after print()logger migration
  • pytest test/prototype/quantization/test_int4_opaque_tensor.py — verify A16W4 path still works
  • pytest test/prototype/quantization/test_embedding.py — verify embedding quantization paths
  • pytest test/quantization/test_quant_api.py — verify PrototypeInt4WeightOnlyConfig and Int8DynamicActivationInt4WeightConfig still register correctly

- Replace print() with logger.info() in fusion_utils.py (unconsoleable output during torch.compile)
- Add kernel availability guard to Int4OpaqueTensor.from_hp_da8w4() with clear error message
- Add kernel availability guard to QuantizedLinear._forward_2d() to prevent AttributeError
- Replace mutable default kwargs={} in _replace_embedding_with_quantized_embedding()
- Fix fragile stdout capture in test_rope_fusion_detection.py to use logger capture
- Add public API exports to embedding/__init__.py (EmbeddingQuantizer, QuantizedLinear, etc.)
- Remove unused _is_blackwell() from attention/utils.py
- Remove misconfigured @triton.autotune decorators (empty configs, constexpr key)
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Apr 25, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4332

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented Apr 25, 2026

Hi @tatavishnurao!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented Apr 25, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2026
Copy link
Copy Markdown
Collaborator

@Xia-Weiwen Xia-Weiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to torchao/prototype/quantization/int4/int4_opaque_tensor.py looks good to me if with the fix of build flag.

Comment thread torchao/prototype/quantization/int4/int4_opaque_tensor.py Outdated
Co-authored-by: Xia Weiwen <xia.weiwen@hotmail.com>
@tatavishnurao
Copy link
Copy Markdown
Author

Thanks for the review! I have applied with the suggested USE_CPP_KERNELS=1 wording fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants