|
| 1 | +# TorchAO |
| 2 | + |
| 3 | +> PyTorch-native library for quantization, sparsity, and low-precision training. Provides the quantize_() API with Config classes for int4/int8/float8/MX weight and activation quantization, composable with torch.compile. |
| 4 | + |
| 5 | +## Docs |
| 6 | + |
| 7 | +- [Quick Start](https://docs.pytorch.org/ao/stable/quick_start.html) |
| 8 | +- [Workflows Matrix](https://docs.pytorch.org/ao/main/workflows.html): Status of every dtype x hardware combination |
| 9 | +- [API Reference](https://docs.pytorch.org/ao/stable/api_reference/index.html) |
| 10 | +- [Inference Quantization](https://docs.pytorch.org/ao/main/workflows/inference.html) |
| 11 | +- [Float8 Training](https://docs.pytorch.org/ao/main/workflows/training.html) |
| 12 | +- [QAT](https://docs.pytorch.org/ao/main/workflows/qat.html) |
| 13 | +- [Quantization Overview](https://docs.pytorch.org/ao/main/contributing/quantization_overview.html): Architecture and internals |
| 14 | +- [Contributor Guide](https://docs.pytorch.org/ao/main/contributing/contributor_guide.html): How to add tensors, kernels, configs |
| 15 | +- [PT2E Quantization](https://docs.pytorch.org/ao/main/pt2e_quantization/index.html): PyTorch 2 Export quantization for deployment backends (X86, XPU, ExecuTorch) |
| 16 | + |
| 17 | +## Code |
| 18 | + |
| 19 | +- [quantize_() and Config classes](https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_api.py): Main entry point |
| 20 | +- [Tensor subclasses](https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows): Int4Tensor, Int8Tensor, Float8Tensor, etc. |
| 21 | +- [Granularity](https://github.com/pytorch/ao/blob/main/torchao/quantization/granularity.py): PerTensor, PerRow, PerGroup, PerBlock, PerToken |
| 22 | +- [Float8 training](https://github.com/pytorch/ao/tree/main/torchao/float8): Scaled float8 training recipes |
| 23 | +- [Sparsity](https://github.com/pytorch/ao/tree/main/torchao/sparsity): Semi-structured 2:4 sparsity |
| 24 | +- [Quantized optimizers](https://github.com/pytorch/ao/tree/main/torchao/optim): AdamW8bit, AdamW4bit, AdamWFp8 |
| 25 | +- [QAT](https://github.com/pytorch/ao/tree/main/torchao/quantization/qat): Quantization-aware training |
| 26 | +- [MX formats](https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats): MXFP8, MXFP4, NVFP4 (prototype) |
| 27 | +- [MoE training](https://github.com/pytorch/ao/tree/main/torchao/prototype/moe_training): MXFP8 MoE training (prototype) |
| 28 | + |
| 29 | +## Deprecated APIs |
| 30 | + |
| 31 | +Do not use or recommend these: |
| 32 | +- `AffineQuantizedTensor` (AQT) in `torchao/dtypes/` - old v1 system, being removed. New tensor types inherit from `TorchAOBaseTensor` |
| 33 | +- `autoquant()` - deleted |
| 34 | +- Layout registration system (`PlainLayout`, `Float8Layout`, `TensorCoreTiledLayout`, etc.) - deleted |
| 35 | +- `TorchAODType` - deprecated |
| 36 | +- `change_linear_weights_to_int4_woqtensors` - deleted, use `quantize_(model, Int4WeightOnlyConfig())` |
| 37 | + |
| 38 | +## Optional |
| 39 | + |
| 40 | +- [Tutorials](https://github.com/pytorch/ao/tree/main/tutorials) |
| 41 | +- [Benchmarks](https://github.com/pytorch/ao/tree/main/benchmarks) |
| 42 | +- [Contributing](https://github.com/pytorch/ao/blob/main/CONTRIBUTING.md) |
| 43 | +- [MSLK kernels](https://github.com/pytorch/MSLK): Optional accelerated kernels |
0 commit comments