Add reduce_range to avoid overflow in int8 tensor by cyxlily · Pull Request #4266 · pytorch/ao

cyxlily · 2026-04-10T03:33:50Z

Add reduce_range optional flag. Default False. If True, use reduced int8 range [-64, 63] instead of full range [-128, 127] to reduce overflow risk on platforms without VNNI instructions. Kept optional for backward compatibility with older call sites and serialized configs.

pytorch-bot · 2026-04-10T03:33:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4266

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 4 New Failures

As of commit 4877828 with merge base b3e0db2 ():

NEW FAILURES - The following jobs have failed:

Run 1xH100 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio mslk --index-url https://download.... / linux-job (gh)
test/integration/test_integration.py::PythonQuantUtilOpUnitTest::test_dynamic_quant_per_channel_numerics_cpu
Run 1xL4 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
test/integration/test_integration.py::PythonQuantUtilOpUnitTest::test_dynamic_quant_per_channel_numerics_cpu
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch torchvision --index-url https://download.py... / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_get_group_qparams_symmetric
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch torchvision --index-url htt... / linux-job (gh)
test/quantization/test_quant_primitives.py::TestQuantPrimitives::test_get_group_qparams_symmetric

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Add reduce_range optional flag. Default False. If True, use reduced int8 range [-64, 63] instead of full range [-128, 127] to reduce overflow risk on platforms without VNNI instructions. Kept optional for backward compatibility with older call sites and serialized configs. Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen · 2026-04-10T06:53:41Z

@claude review

claude · 2026-04-10T06:54:05Z

Claude finished @Xia-Weiwen's task in 2m 56s —— View job

Review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Read changed files and gather context
Review implementation in int8_tensor.py
Review config changes in quant_api.py
Review test changes
Post review feedback

Summary

This PR adds a reduce_range optional flag that narrows the int8 quantization range from [-128, 127] to [-64, 63] to avoid overflow on CPUs without VNNI instructions. The changes are well-scoped and the plumbing is consistent across both dynamic and static int8 quantization paths.

Issues

1. Optional[bool] type annotation is misleading

reduce_range is typed as Optional[bool] across all locations (QuantizeTensorToInt8Kwargs, Int8Tensor, config classes), but the validation in __post_init__ asserts it must be True or False — None is never accepted:

# quant_api.py:976
assert self.reduce_range in (True, False), (
    "`reduce_range` must be True or False"
)

Since None is disallowed, the type should be just bool, not Optional[bool]. Using Optional[bool] with a default of False suggests None is a valid sentinel value, which contradicts the validation. The docstring mentions "Kept optional for backward compatibility with older call sites and serialized configs" — if this is about deserialization of older configs that lack the field, the default value of False already handles that. Optional[bool] isn't needed for that purpose.

Fix this →

2. Test skip logic may be too aggressive

In the test at test/quantization/quantize_/workflows/int8/test_int8_tensor.py:539:

if not reduce_range and not _cpu_is_vnni_supported():
    self.skipTest(
        "Only test reduce_range=True on CPUs without VNNI support to avoid int8 overflow."
    )

This skips the reduce_range=False test case on CPUs without VNNI. This is reasonable to avoid overflow failures in CI, but it means the reduce_range=False path is never tested on non-VNNI machines. Consider adding a comment explaining that reduce_range=False is the standard path already covered by the existing test classes (TestInt8DynamicQuant, TestInt8StaticQuant), so this skip is acceptable.

3. Test uses SQNR threshold instead of assertAllClose

Xia-Weiwen's review comment suggested using assertAllClose with a fake-quantized reference rather than SQNR. The current code at line 597 still uses:

assert compute_error(output_fp, output_quantized) > 20

This only validates that quantization doesn't catastrophically degrade quality, but doesn't verify correctness against a reference implementation. For a feature that changes quantization ranges, verifying that the output matches fake-quantized reference output would be a stronger test.

Nits / Minor Suggestions

4. (-64, 63) magic numbers appear in multiple places

The reduced range (-64, 63) is hardcoded in four locations:

int8_tensor.py:193 (from_hp)
int8_tensor.py:248 (dequantize)
test_int8_tensor.py:560 (test)
Docstrings

Consider defining these as module-level constants (e.g., _REDUCED_QUANT_MIN = -64, _REDUCED_QUANT_MAX = 63) to avoid duplication and make the reduced range easy to find and change in the future. This is a minor style nit and not blocking.

5. Int8WeightOnlyConfig not updated

Int8WeightOnlyConfig (line 814 of quant_api.py) doesn't get the reduce_range parameter. This is likely intentional since weight-only quantization uses a floating-point activation path (no int8 accumulation overflow risk), but it may be worth a comment in the PR description or docstring clarifying this.

What looks good

The reduce_range flag is correctly threaded through all relevant code paths: from_hp, dequantize, config classes, act_quant_kwargs, and the _choose_quant_func_and_quantize_tensor dispatch.
Backward compatibility is preserved — the default is False and existing serialized models will work.
The reduce_range attribute is properly included in optional_tensor_attribute_names, __repr__, and all aten op implementations (slice, select, _pin_memory).
The test covers a comprehensive parameter matrix (dtype, compile, config_mode, granularity, mapping_type, reduce_range).

|

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

cyxlily · 2026-04-13T03:35:57Z

Thanks for the suggestion.

We need to keep reduce_range as [Optional[bool]] for backward compatibility with old quantized models. Otherwise we will got error:
[FAIL] 1 Int8Tensor(s) failed dequantize(): ['fc.weight']
I updated the CPU test so we still run the setup/quantization/forward path on non-VNNI CPUs, but we skip the SQNR (compute_error) assertion specifically when reduce_range=False and VNNI is unavailable. TestInt8DynamicQuant and TestInt8StaticQuant only test GPU.
In this test, I intentionally did not use a fake-quantized reference with assertAllClose.
The goal here is to measure end-to-end degradation from quantization by comparing FP output and quantized output, rather than checking closeness to a specific reference path.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen · 2026-04-13T06:09:49Z

@claude review again

claude · 2026-04-13T06:10:16Z

Claude finished @Xia-Weiwen's task in 2m 28s —— View job

Re-review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Read all changed files and gather context
Review implementation in int8_tensor.py
Review config changes in quant_api.py
Review kwargs changes
Review test changes
Post review feedback

Summary

The PR is in good shape. The reduce_range flag is correctly threaded through all code paths, the constants are properly extracted, and the test covers a comprehensive parameter matrix. Here are the remaining items to address:

Issues

1. Unresolved reviewer comment: move _cpu_is_vnni_supported to torchao/utils.py

@Xia-Weiwen commented that _cpu_is_vnni_supported (currently imported from torchao/kernel/intmm.py) should be moved to torchao/utils.py. This hasn't been addressed yet. The function at torchao/kernel/intmm.py:121 is a general CPU capability check, not specific to int matmul, so torchao/utils.py is arguably a better home for it.

Fix this →

2. __all__ cleanup in workflows/__init__.py is unrelated to reduce_range

The diff removes duplicate "Int8Tensor" and "QuantizeTensorToInt8Kwargs" entries from __all__ in torchao/quantization/quantize_/workflows/__init__.py. This is a correct cleanup (the base had duplicates), but it's unrelated to the reduce_range feature. Consider noting this in the PR description or splitting it into a separate commit for clarity. Not blocking.

3. Optional[bool] type vs validation — author's response is reasonable

My previous review flagged that Optional[bool] is misleading when None is rejected by validation. The author responded that Optional[bool] is needed for backward compatibility with deserialized old models. This is a valid concern — if an old serialized Int8Tensor lacks the reduce_range attribute, the Optional type combined with the default False ensures smooth loading. The __post_init__ validation on the config classes (which are constructed by users, not deserialized) catches misuse at config creation time. This is acceptable as-is.

Nit

4. Test SQNR assertion silently passes on non-VNNI CPUs with reduce_range=False

At test_int8_tensor.py:602-605:

if reduce_range or _cpu_is_vnni_supported():
    assert compute_error(output_fp, output_quantized) > 20, (...)

When reduce_range=False on a non-VNNI CPU, the test runs the full quantization + forward pass but skips the quality assertion entirely. This means a complete regression (e.g., NaN outputs) would go undetected on non-VNNI CI machines. Consider adding a basic sanity check for the skipped case, such as:

if reduce_range or _cpu_is_vnni_supported():
    assert compute_error(output_fp, output_quantized) > 20, (...)
else:
    # At minimum, verify no NaN/Inf in output
    assert not torch.isnan(output_quantized).any(), "Output contains NaN"
    assert not torch.isinf(output_quantized).any(), "Output contains Inf"

What looks good

The reduce_range flag is correctly propagated through all paths: from_hp, dequantize, both config classes, act_quant_kwargs, _choose_quant_func_and_quantize_tensor, and all aten ops (slice, select, _pin_memory).
Module-level constants (_REDUCED_QUANT_MIN, _REDUCED_QUANT_MAX, _FULL_QUANT_MIN, _FULL_QUANT_MAX) replace all magic numbers — addresses the nit from the previous review.
The _pin_memory implementation now correctly handles act_pre_scale (was previously missing, unrelated to reduce_range but a good bugfix).
The test covers a wide parameter matrix: dtype × compile × config_mode × granularity × mapping_type × reduce_range.
Backward compatibility is preserved via Optional[bool] with default False.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168 · 2026-04-13T17:37:46Z

    Args:
        granularity: the granularity for the Tensor, currently either PerRow() or PerTensor()
        mapping_type: whether to use symmetric or asymmetric quant
+        reduce_range: optional flag. If True, use reduced int8 range [-64, 63]


nit: we should be consistent on using numbers v.s. _REDUCED_QUANT_MIN etc. I think, right now the quant_api is using the variable but here it's using variables.

OK to use variables but might also be good to mention what these values are in the docstring as well

as commented in #3784 (comment) I'm wondering if we can just set it automatically and print out a warning, instead of leave this to user?

Thanks for comments, updated.

jerryzh168 · 2026-04-13T22:15:48Z



+@common_utils.instantiate_parametrized_tests
+class TestInt8TensorCPU(TorchAOIntegrationTestCase):


nit: maybe create a new test_int8_tensor_cpu.py it will be easier to separate tests by device for CI in the future if we want.

Thanks for comments, updated.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

vkuzo · 2026-04-14T13:09:53Z

+_FULL_QUANT_MAX = 127
+
+
+def _should_use_reduced_range(tensor: torch.Tensor) -> bool:


I think the user should set this and torchao can make it easy to pick the right setting, automagically setting this based on hardware is a footgun

OK, we can expose this as an arg, and by default it will be autopicked based on device, and don't require people to set it?

# reduce_range auto-picked, but can be overwritten # by default, auto picked based on device config = Int8DynamicActivationInt8WeightConfig() # explicit overwrite config = Int8DynamicActivationInt8WeightConfig(reduce_range=True/False)

or are you talking about user should always explicitly set this arg in the config, like changing all callsites to:

config = Int8DynamicActivationInt8WeightConfig(reduce_range=get_recommended_reduce_range())

@vkuzo @jerryzh168 Thanks for the comments. To make UX better, we can probably do as @jerryzh168 suggested:

if users don't set it explicitly, it is decided automatically (reduce_range=true if on CPU and no VNNI, false otherwise)

if users set it explicitly, we just use whichever the users set and the users are responsible for the results.

How does that sound to you? Thanks.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

WeightOnlyInt8Quant doesn't have overflow risk, set reduce_range = False at default. Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

vkuzo · 2026-04-16T09:55:35Z

    Args:
        granularity: the granularity for the Tensor, currently either PerRow() or PerTensor()
        mapping_type: whether to use symmetric or asymmetric quant
+        reduce_range: If None, choose the int8 quantization range automatically.


I'm going to quote "simple over easy" here: let's make it simple by requiring the user to specify numerics upfront. It's ok to ask users on hardware that needs reduce_range to specify the required flag manually.

@vkuzo Thanks for the comments. Just to confirm, you don't have concern if reduce_range is optional and if it's None we use reduce_range=False, do you? This is for backward compatibility in case where people saved their quantized model without this flag and want to load it with a new version of torchao.

yes, that sounds good. It's also find to provide a helper function which returns the "correct" value based on hardware, but the default value should be simple and deterministic (reduce_range False).

Thanks for the comments. I've set reduce_range default to False. @vkuzo @jerryzh168 please review.

Set reduce_range default to False. Users can use should_reduce_range() to determine.

Xia-Weiwen · 2026-04-20T02:32:19Z

@claude review

claude · 2026-04-20T02:32:36Z

Claude finished @Xia-Weiwen's task in 2m 26s —— View job

Review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Read changed files and gather context
Review implementation in int8_tensor.py
Review config changes in quant_api.py
Review utility changes (utils.py, intmm.py, kwargs)
Review test changes
Post review feedback

Summary

The PR is well-structured. The reduce_range flag is consistently threaded through all code paths (configs, from_hp, QuantizeTensorToInt8Kwargs, aten ops), backward compatibility is maintained, and the resolution of the auto-detect vs. manual-set discussion aligns with the "simple over easy" principle. One issue that should be fixed before merge, and a few minor items:

Issue: Dead code reintroduced from deleted PR #4211

check_cpu_version and check_xpu_version were intentionally removed from the codebase in PR #4211 (Remove check_cpu_version and check_xpu_version helpers). This PR re-adds them in torchao/utils.py:1173-1195, but neither function is used anywhere — not in this PR or in the rest of the codebase. This appears to be a merge artifact from rebasing onto main after #4211 landed.

These should be removed to avoid contradicting the explicit cleanup in #4211.

Fix this →

Minor items (non-blocking)

1. is_pinned doesn't check act_pre_scale

The aten.is_pinned.default implementation (int8_tensor.py:446-455) checks qdata, scale, zero_point, act_quant_scale, and act_quant_zero_point — but not act_pre_scale. The _pin_memory implementation correctly handles act_pre_scale (good bugfix in this PR), but is_pinned is now inconsistent. If act_pre_scale is present and not pinned, is_pinned would return True while the tensor isn't fully pinned. This predates this PR but is worth fixing while you're in this code.

2. should_reduce_range is not exported from the public torchao.quantization namespace

The docstrings for both Int8DynamicActivationInt8WeightConfig and Int8StaticActivationInt8WeightConfig reference should_reduce_range() as a helper users can call, but it's only importable from the deep internal path torchao.quantization.quantize_.workflows.int8.int8_tensor. Consider re-exporting it from torchao.quantization so users can do:

from torchao.quantization import should_reduce_range

3. None handling in reduce_range

The validation accepts None (assert self.reduce_range in (True, False, None)), and the type is Optional[bool]. When reduce_range=None, it's falsy in Python so if reduce_range: in from_hp will use the full range — equivalent to False. This works correctly by accident but is worth a brief comment in the code, or normalizing None to False in __post_init__ for explicitness.

What looks good

_cpu_is_vnni_supported moved from intmm.py to utils.py with the import updated — clean separation of concerns.
reduce_range correctly propagated to all aten op implementations (slice, select, _pin_memory).
Removing hardcoded quant_min=-128, quant_max=127 from dequantize() is correct — dequantize_affine doesn't use these values in computation (the formula is simply (input - zero_point) * scale), so the default None → full dtype range works fine.
The _pin_memory bugfix for act_pre_scale is a good incidental fix.
Test moved to a separate test_int8_tensor_cpu.py file per @jerryzh168's suggestion — good for future CI device separation.
The test uses should_reduce_range() to dynamically determine the flag, ensuring it passes on both VNNI and non-VNNI CI machines.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen · 2026-04-20T03:22:37Z

@claude review

claude · 2026-04-20T03:22:56Z

Claude finished @Xia-Weiwen's task in 1m 59s —— View job

Review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Read changed files and gather context
Review implementation in int8_tensor.py
Review config changes in quant_api.py
Review utility changes (utils.py, intmm.py, kwargs)
Review test changes
Check resolution of previous review items
Post review feedback

Summary

The PR is in good shape and ready for merge. All major items from previous reviews have been addressed. The reduce_range flag is correctly and consistently threaded through the entire quantization pipeline — configs, from_hp, kwargs dispatch, aten ops, and tests.

Previous review items — status

Dead code (check_cpu_version/check_xpu_version) — Fixed. These are no longer present in utils.py.
_cpu_is_vnni_supported moved to utils.py — Done. The import in intmm.py is updated accordingly.
Test moved to separate test_int8_tensor_cpu.py — Done per @jerryzh168's suggestion.
reduce_range default behavior settled — Per @vkuzo's guidance, the default is False (simple and deterministic), with should_reduce_range() as a helper users can call.
is_pinned now checks act_pre_scale — Fixed at int8_tensor.py:455-456.
_pin_memory now handles act_pre_scale — Fixed at int8_tensor.py:477-479.

Remaining items (non-blocking)

1. should_reduce_range not re-exported from torchao.quantization

The docstrings for both config classes reference should_reduce_range() as a helper, but users must import it from torchao.utils or the deep internal path torchao.quantization.quantize_.workflows.int8.int8_tensor. Re-exporting from torchao.quantization would improve discoverability:

from torchao.quantization import should_reduce_range

The function is in int8_tensor.py's __all__ (line 39) but isn't propagated up through workflows/__init__.py or torchao/quantization/__init__.py. This can be a follow-up.

2. Duplicate entries in workflows/__init__.py:__all__

Lines 44-45 and 50-51 have duplicate "Int8Tensor" and "QuantizeTensorToInt8Kwargs" entries. This predates this PR but is worth cleaning up. Not blocking.

3. None handling in reduce_range is implicit

When reduce_range=None, __post_init__ normalizes it to False. In from_hp, if reduce_range: naturally treats None as falsy. This works correctly but relies on implicit Python truthiness. A brief inline comment in from_hp noting this would help readability, but not blocking.

What looks good

reduce_range correctly propagated through all paths: from_hp, dequantize, both config classes, act_quant_kwargs, _choose_quant_func_and_quantize_tensor, and all aten ops (slice, select, _pin_memory, is_pinned).
Removing hardcoded quant_min=-128, quant_max=127 from dequantize() is correct — _dequantize_affine_no_dtype_check doesn't use these values in computation (formula is (input - zero_point) * scale), so defaulting to None → full dtype range is fine.
should_reduce_range() is a clean, simple helper that follows the "simple over easy" principle — users explicitly opt in.
Backward compatibility preserved via Optional[bool] with default False and None → False normalization.
Test uses should_reduce_range() to dynamically determine the flag, ensuring CI passes on both VNNI and non-VNNI machines.
The act_pre_scale bugfixes in is_pinned and _pin_memory are good incidental improvements.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168 · 2026-04-23T00:53:35Z


    granularity: Granularity
    mapping_type: MappingType = MappingType.SYMMETRIC
+    reduce_range: Optional[bool] = False


can you add a comment on why Optional, and whether we can remove it in the future?

I don't think we need to guarantee BC for old checkpoints btw

Thanks for the comments, removed Optional as it's not needed to guarantee BC for old checkpoints.

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen · 2026-04-24T02:56:40Z

Hi @jerryzh168 Could you please review this PR again?

jerryzh168 · 2026-04-24T18:12:36Z

+        assert self.reduce_range in (True, False), (
+            "`reduce_range` must be True or False. None is defaulted to False."
+        )


probably not needed?

jerryzh168 · 2026-04-24T18:12:57Z

        )
+        if self.reduce_range is None:
+            self.reduce_range = False
+        assert self.reduce_range in (True, False), (


jerryzh168 · 2026-04-24T18:14:16Z

+                reduce_range=reduce_range,
+            )
+        else:
+            act_granularity, _ = Int8Tensor._normalize_granularity(granularity)


nit: add assert for config_mode == "static"

jerryzh168 · 2026-04-24T18:14:56Z

+        input_tensor = torch.randn(M, K, dtype=dtype, device=device)
+        model = ToyTwoLinearModel(K, N, K, dtype=dtype, device=device).eval()
+        model_q = copy.deepcopy(model)
+        reduce_range = should_reduce_range(input_tensor.device)


what's the effect when reduce_range is not set correctly for cpu? should we test that as well?

Thanks for the comments. On CPUs without VNNI, when using torch.compile with int8 matmul, there may be accuracy degradation compared to eager mode due to overflow in oneDNN qlinear. The reduce_range setting is designed to prevent this.
We don't need to test incorrect settings because this UT focuses on the recommended path, and I've already provided the helper function should_reduce_range() and clear UT Notes.

jerryzh168

looks good overall

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen · 2026-04-28T03:33:20Z

Merge as CI failures are unrelated.

cyxlily requested review from andrewor14, jerryzh168 and vkuzo as code owners April 10, 2026 03:33

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 10, 2026

cyxlily marked this pull request as draft April 10, 2026 03:34

Xia-Weiwen reviewed Apr 10, 2026

View reviewed changes

Xia-Weiwen approved these changes Apr 10, 2026

View reviewed changes

Xia-Weiwen added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Apr 10, 2026

Cui, Lily added 2 commits April 10, 2026 06:56

Refine UT

2adbdd9

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Only test reduce_range=True on CPUs without VNNI support

e087a60

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

cyxlily mentioned this pull request Apr 10, 2026

[X86] Update smooth quant fusion pattern #3784

Open

Hardcode recude range

3c0c12b

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen reviewed Apr 13, 2026

View reviewed changes

Comment thread test/quantization/quantize_/workflows/int8/test_int8_tensor.py Outdated

Xia-Weiwen approved these changes Apr 13, 2026

View reviewed changes

Xia-Weiwen marked this pull request as ready for review April 13, 2026 06:56

Xia-Weiwen requested a review from danielvegamyhre as a code owner April 13, 2026 06:56

Move vnni check func into utils

9fd46cd

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168 reviewed Apr 13, 2026

View reviewed changes

Automatically set reduce_range

4e825f1

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

vkuzo requested changes Apr 14, 2026

View reviewed changes

Cui, Lily added 2 commits April 15, 2026 04:02

Re-add user-configurable reduce_range

716cca8

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Add UT notes

8a5840d

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen requested review from jerryzh168 and vkuzo April 16, 2026 02:35

Fix CI

741115a

WeightOnlyInt8Quant doesn't have overflow risk, set reduce_range = False at default. Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

vkuzo requested changes Apr 16, 2026

View reviewed changes

Set reduce_range default to False

6e8eb08

Set reduce_range default to False. Users can use should_reduce_range() to determine.

Xia-Weiwen approved these changes Apr 20, 2026

View reviewed changes

Cui, Lily added 2 commits April 20, 2026 02:33

Use should_reduce_range in UT

6eaf74b

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Merge upstream/main into reduce_range

6cdd804

cyxlily requested a review from vkuzo April 20, 2026 03:33

Cui, Lily added 2 commits April 20, 2026 03:45

Move should_reduce_range to utils

232882c

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Refine codes

ea96be6

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

jerryzh168 reviewed Apr 23, 2026

View reviewed changes

Remove Optional

c4d095c

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

cyxlily requested a review from jerryzh168 April 23, 2026 08:16

jerryzh168 reviewed Apr 24, 2026

View reviewed changes

jerryzh168 approved these changes Apr 24, 2026

View reviewed changes

Modify assert

4877828

Signed-off-by: Cui, Lily <yuxin.cui@intel.com>

Xia-Weiwen merged commit e094ce3 into pytorch:main Apr 28, 2026
15 of 19 checks passed

cyxlily deleted the reduce_range branch April 29, 2026 05:22



		@common_utils.instantiate_parametrized_tests
		class TestInt8TensorCPU(TorchAOIntegrationTestCase):

		_FULL_QUANT_MAX = 127


		def _should_use_reduced_range(tensor: torch.Tensor) -> bool:

Conversation

cyxlily commented Apr 10, 2026

Uh oh!

pytorch-bot Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4266

❗ 1 Active SEVs

❌ 4 New Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Apr 10, 2026

Uh oh!

claude Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Summary

Issues

Nits / Minor Suggestions

What looks good

Uh oh!

cyxlily commented Apr 13, 2026

Uh oh!

Uh oh!

Xia-Weiwen commented Apr 13, 2026

Uh oh!

claude Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Summary

Issues

Nit

What looks good

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #4266: Add reduce_range to avoid overflow in int8 tensor

Summary

Issue: Dead code reintroduced from deleted PR #4211

Minor items (non-blocking)

What looks good

Uh oh!

Xia-Weiwen commented Apr 20, 2026

Uh oh!

claude Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

pytorch-bot Bot commented Apr 10, 2026 •

edited

Loading

claude Bot commented Apr 10, 2026 •

edited

Loading

claude Bot commented Apr 13, 2026 •

edited

Loading

jerryzh168 Apr 14, 2026 •

edited

Loading

Xia-Weiwen Apr 16, 2026 •

edited

Loading

claude Bot commented Apr 20, 2026 •

edited

Loading

claude Bot commented Apr 20, 2026 •

edited

Loading