Commit 2207d6b
committed
Update on "Add UIntxBitPackedTensor, UIntxWeightOnlyConfig, and Int8DynamicActivationUIntxWeightConfig"
Add v2 tensor subclass UIntxBitPackedTensor(TorchAOBaseTensor) using
gemlite bit-packing and Triton GEMM kernels, replacing the old AQT-based
GemliteUIntXWeightOnlyConfig path.
- UIntxBitPackedTensor: tensor subclass with from_hp(), dequantize(),
and aten.linear/t/slice dispatch implementations
- UIntxWeightOnlyConfig: weight-only quantization (4-bit/8-bit)
- Int8DynamicActivationUIntxWeightConfig: int8 dynamic activation + uintx weight
- Tests for both configs covering 4-bit, 8-bit, slice, and non-standard shapes
Test Plan:
- python test/prototype/test_uintx_bit_packed_tensor.py
- Tests cover UIntxWeightOnlyConfig: 4-bit (group64/128, pack32/8), 8-bit (perchannel, pack32/8)
- Tests cover Int8DynamicActivationUIntxWeightConfig: same bit_width/group_size/packing combos
- Tests cover slice dim0/dim1 for tensor parallelism
- Tests cover non-standard shapes (1024x1025)
- Verified backward compat: old GemliteUIntXWeightOnlyConfig still works
[ghstack-poisoned]1 parent 3808c73 commit 2207d6b
1 file changed
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
137 | 140 | | |
138 | 141 | | |
139 | 142 | | |
| |||
184 | 187 | | |
185 | 188 | | |
186 | 189 | | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
187 | 193 | | |
188 | 194 | | |
189 | 195 | | |
| |||
0 commit comments