Skip to content

Commit 2207d6b

Browse files
committed
Update on "Add UIntxBitPackedTensor, UIntxWeightOnlyConfig, and Int8DynamicActivationUIntxWeightConfig"
Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based GemliteUIntXWeightOnlyConfig path. - UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(), and aten.linear/t/slice dispatch implementations - UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit) - Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight - Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes Test Plan: - python test/prototype/test_uintx_bit_packed_tensor.py - Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8) - Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos - Tests cover slice dim0/dim1 for tensor parallelism - Tests cover non-standard shapes (1024x1025) - Verified backward compat: old GemliteUIntXWeightOnlyConfig still works [ghstack-poisoned]
1 parent 3808c73 commit 2207d6b

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

torchao/prototype/quantization/quant_api.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,9 @@ class UIntxWeightOnlyConfig(AOBaseConfig):
134134

135135
def __post_init__(self):
136136
torch._C._log_api_usage_once("torchao.quantization.UIntxWeightOnlyConfig")
137+
assert self.bit_width in [4, 8], (
138+
f"bit_width must be 4 or 8, got {self.bit_width}"
139+
)
137140

138141

139142
@register_quantize_module_handler(UIntxWeightOnlyConfig)
@@ -184,6 +187,9 @@ def __post_init__(self):
184187
torch._C._log_api_usage_once(
185188
"torchao.quantization.Int8DynamicActivationUIntxWeightConfig"
186189
)
190+
assert self.bit_width in [4, 8], (
191+
f"bit_width must be 4 or 8, got {self.bit_width}"
192+
)
187193

188194

189195
@register_quantize_module_handler(Int8DynamicActivationUIntxWeightConfig)

0 commit comments

Comments
 (0)