Update on "Add UIntxBitPackedTensor, UIntxWeightOnlyConfig, and Int8DynamicActivationUIntxWeightConfig"

jerryzh168 · jerryzh168 · commit 2207d6b6dc3f · 2026-03-13T14:07:51.000-07:00
Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using
gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based
GemliteUIntXWeightOnlyConfig path.

- UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(),
  and aten.linear/t/slice dispatch implementations
- UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit)
- Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight
- Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes

Test Plan:
- python test/prototype/test_uintx_bit_packed_tensor.py
- Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8)
- Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos
- Tests cover slice dim0/dim1 for tensor parallelism
- Tests cover non-standard shapes (1024x1025)
- Verified backward compat: old GemliteUIntXWeightOnlyConfig still works

[ghstack-poisoned]
diff --git a/torchao/prototype/quantization/quant_api.py b/torchao/prototype/quantization/quant_api.py
@@ -134,6 +134,9 @@ class UIntxWeightOnlyConfig(AOBaseConfig):
 
     def __post_init__(self):
         torch._C._log_api_usage_once("torchao.quantization.UIntxWeightOnlyConfig")
+        assert self.bit_width in [4, 8], (
+            f"bit_width must be 4 or 8, got {self.bit_width}"
+        )
 
 
 @register_quantize_module_handler(UIntxWeightOnlyConfig)
@@ -184,6 +187,9 @@ def __post_init__(self):
         torch._C._log_api_usage_once(
             "torchao.quantization.Int8DynamicActivationUIntxWeightConfig"
         )
+        assert self.bit_width in [4, 8], (
+            f"bit_width must be 4 or 8, got {self.bit_width}"
+        )
 
 
 @register_quantize_module_handler(Int8DynamicActivationUIntxWeightConfig)