Commit 325585d
committed
Update on "Add FP8-INT4 checkpoint upload code"
Summary:
att, the support is added in #3714
checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-FP8-INT4
Test Plan:
```
sh release.sh --model_id $MODEL --push_to_hub --populate_model_card_template --quants FP8-INT4
```
produced checkpoint: https://huggingface.co/jerryzh168/Qwen3-8B-FP8-INT4
Benchmark:
```
vllm bench throughput --model jerryzh168/Qwen3-8B-FP8-INT4
```
```
Throughput: 33.03 requests/s, 38055.86 total tokens/s, 4228.43 output tokens/s
Total num prompt tokens: 1024000
Total num output tokens: 128000
```
Reviewers:
Subscribers:
Tasks:
Tags:
[ghstack-poisoned]1 parent 83d1561 commit 325585d
2 files changed
Lines changed: 3 additions & 2 deletions
File tree
- .github/scripts/torchao_model_releases
- torchao/quantization
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
23 | 22 | | |
24 | 23 | | |
25 | 24 | | |
| |||
783 | 782 | | |
784 | 783 | | |
785 | 784 | | |
| 785 | + | |
| 786 | + | |
786 | 787 | | |
787 | 788 | | |
788 | 789 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
877 | 877 | | |
878 | 878 | | |
879 | 879 | | |
880 | | - | |
| 880 | + | |
881 | 881 | | |
882 | 882 | | |
883 | 883 | | |
| |||
0 commit comments