- This performance data were collected based on the maximum CPU and NPU frequencies of each platform.
- The script for setting the frequencies is located in the scripts directory.
- All models should be converted with
optimization_levelset to 0 to enable optimized runtime performance.
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s | memory(MB) |
|---|---|---|---|---|---|---|---|
| Qwen2 | 0.5B | w8a8 | 128 | 64 | 143.83 | 42.58 | 654.26 |
| MiniCPM4 | 0.5B | w8a8 | 128 | 64 | 128.46 | 45.13 | 524.55 |
| Qwen3 | 0.6B | w8a8 | 128 | 64 | 213.50 | 32.16 | 773.77 |
| TinyLLAMA | 1.1B | w8a8 | 128 | 64 | 239.00 | 24.49 | 1085.21 |
| Qwen2.5 | 1.5B | w8a8 | 128 | 64 | 412.27 | 16.32 | 1659.15 |
| RWKV7 | 1.5B | w8a8 | 128 | 64 | 788.00 | 13.33 | 1450.29 |
| InternLM2 | 1.8B | w8a8 | 128 | 64 | 374.00 | 15.58 | 1765.71 |
| Gemma2 | 2B | w8a8 | 128 | 64 | 679.90 | 9.80 | 2765.30 |
| Gemma3n | 2B | w8a8 | 128 | 64 | 1220.40 | 9.46 | 2709.25 |
| TeleChat2 | 3B | w8a8 | 128 | 64 | 649.60 | 10.22 | 2777.00 |
| Phi3 | 3.8B | w8a8 | 128 | 64 | 1022.00 | 7.50 | 3747.73 |
| MiniCPM3 | 4B | w8a8 | 128 | 64 | 1385.92 | 5.99 | 4339.61 |
| ChatGLM3 | 6B | w8a8 | 128 | 64 | 1395.34 | 4.94 | 5976.43 |
| Qwen3-VL | 2B | w8a8 | 128 | 64 | 391 | 15.12 | 1892.13 |
| DeepSeekOCR | 3B(A570M) | w8a8 | 128 | 64 | 696.21 | 31.81 | 3028.66 |
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s | memory(MB) |
|---|---|---|---|---|---|---|---|
| Qwen2 | 0.5B | w4a16 | 128 | 64 | 327.72 | 34.24 | 426.24 |
| 0.5B | w4a16_g128 | 128 | 64 | 363.58 | 33.22 | 445.95 | |
| 0.5B | w8a8 | 128 | 64 | 334.26 | 22.95 | 661.1 | |
| MiniCPM4 | 0.5B | w4a16 | 128 | 64 | 348.87 | 35.8 | 322.41 |
| 0.5B | w4a16_g128 | 128 | 64 | 371.96 | 32.88 | 362.23 | |
| 0.5B | w8a8 | 128 | 64 | 337.52 | 23.71 | 528.96 | |
| Qwen3 | 0.6B | w4a16 | 128 | 64 | 482.82 | 25.16 | 495.99 |
| 0.6B | w4a16_g128 | 128 | 64 | 512.36 | 24.3 | 528.48 | |
| 0.6B | w8a8 | 128 | 64 | 448.94 | 17.09 | 779.62 | |
| TinyLLAMA | 1.1B | w4a16 | 128 | 64 | 517.82 | 21.32 | 591 |
| 1.1B | w4a16_g128 | 128 | 64 | 658.78 | 18.89 | 681 | |
| 1.1B | w8a8 | 128 | 64 | 537.82 | 12.63 | 1082.83 | |
| RWKV7 | 1.5B | w4a16 | 128 | 64 | 1779.65 | 9.96 | 799.89 |
| 1.5B | w4a16_g128 | 128 | 64 | 1877.95 | 9.37 | 890.16 | |
| 1.5B | w8a8 | 128 | 64 | 1718.8 | 6.96 | 1458.48 | |
| InternLM2 | 1.8B | w4a16 | 128 | 64 | 771.6 | 13.65 | 966.12 |
| 1.8B | w4a16_g128 | 128 | 64 | 1001.23 | 12.18 | 1061.57 | |
| 1.8B | w8a8 | 128 | 64 | 777.86 | 7.91 | 1773.23 | |
| Gemma2 | 2B | w4a16 | 128 | 64 | 1119.51 | 8.45 | 1529.03 |
| 2B | w4a16_g128 | 128 | 64 | 1407.31 | 7.76 | 1616.45 | |
| 2B | w8a8 | 128 | 64 | 1052.77 | 5.01 | 2771.54 | |
| Gemma-3n | 2B | w4a16 | 128 | 64 | 3187 | 7.38 | 1574.34 |
| 2B | w8a8 | 128 | 64 | 3229.16 | 4.75 | 2722.76 | |
| TeleChat2 | 3B | w4a16 | 128 | 64 | 1143.73 | 9.05 | 1514.98 |
| 3B | w4a16_g128 | 128 | 64 | 1422.38 | 7.91 | 1633.54 | |
| 3B | w8a8 | 128 | 64 | 1035.37 | 5.15 | 2783.73 | |
| Phi3 | 3.8B | w4a16 | 128 | 64 | 1800.92 | 6.52 | 1985.75 |
| 3.8B | w4a16_g128 | 128 | 64 | 2236.9 | 5.96 | 2141.89 | |
| 3.8B | w8a8 | 128 | 64 | 1591.59 | 3.76 | 3757.22 | |
| MiniCPM3 | 4B | w4a16 | 128 | 64 | 2484.63 | 4.94 | 2336.73 |
| 4B | w4a16_g128 | 128 | 64 | 3053.52 | 4.49 | 2618.14 | |
| 4B | w8a8 | 128 | 64 | 2509.27 | 3.04 | 4366.85 | |
| ChatGLM3 | 6B | w4a16 | 128 | 64 | 2121.26 | 4.7 | 3014.38 |
| 6B | w4a16_g128 | 128 | 64 | 2958.88 | 4.03 | 3244.15 | |
| 6B | w8a8 | 128 | 64 | 1920.97 | 2.5 | 5958.65 | |
| Qwen3-VL | 2B | w4a16 | 128 | 64 | 791.20 | 12.88 | 1082.65 |
| 2B | w4a16_g128 | 128 | 64 | 1026.31 | 11.62 | 1170.89 | |
| 2B | w8a8 | 128 | 64 | 799.09 | 7.67 | 1900.80 | |
| DeepSeekOCR | 3B(A570M) | w4a16 | 128 | 64 | 1010.15 | 24.85 | 1756.13 |
| 3B(A570M) | w8a8 | 128 | 64 | 1312.00 | 16.21 | 3072.33 |
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s | memory(MB) |
|---|---|---|---|---|---|---|---|
| Qwen2 | 0.5B | w8a8 | 128 | 64 | 650.37 | 12.94 | 632.48 |
| MiniCPM4 | 0.5B | w8a8 | 128 | 64 | 689.88 | 11.78 | 500.54 |
| Qwen3 | 0.6B | w8a8 | 128 | 64 | 901.09 | 10.00 | 756.72 |
| Model | Model Size | Dtype | Seqlen | New_tokens | TTFT(ms) | Tokens/s |
|---|---|---|---|---|---|---|
| Qwen2 | 0.5B | w4a16 | 128 | 64 | 650.69 | 21.43 |
| 0.5B | w4a16_g128 | 128 | 64 | 679.78 | 18.18 | |
| 0.5B | w8a8 | 128 | 64 | 636.90 | 13.91 | |
| MiniCPM4 | 0.5B | w4a16 | 128 | 64 | 654.20 | 22.97 |
| 0.5B | w4a16_g128 | 128 | 64 | 691.57 | 18.78 | |
| 0.5B | w8a8 | 128 | 64 | 663.41 | 15.12 | |
| Qwen3 | 0.6B | w4a16 | 128 | 64 | 955.94 | 15.41 |
| 0.6B | w4a16_g128 | 128 | 64 | 1019.94 | 12.60 | |
| 0.6B | w8a8 | 128 | 64 | 945.18 | 10.55 |
| model | Stage | RK3588(w8a8) | RK3576(w4a16) |
|---|---|---|---|
| Qwen2-VL-2B | img-encoder(392*392) | 3.28s | 3.55s |
| Prefill(len=196) | 632.6ms | 1234.9ms | |
| Decode | 16.6 tokens/s | 14.57 tokens/s | |
| Qwen2.5-VL-3B | img-encoder(392*392) | 2.93s | 2.87s |
| Prefill(len=196) | 1120ms | 2130ms | |
| Decode | 8.66 tokens/s | 7.87 tokens/s | |
| MiniCPM-V-2_6 | img-encoder(448*448) | 3.27s | 2.4s |
| Prefill(len=64) | 826ms | 1230ms | |
| Decode | 4.18 tokens/s | 3.85 tokens/s | |
| SmolVLM-256M | Img-encoder(512*512) | 842ms | 768ms |
| Prefill(len=128) | 77.3ms | 180ms | |
| Decode | 78 tokens/s | 57.73tokens/s | |
| Qwen3-VL-2B | img-encoder(448*448) | 2.08s | 1.61s |
| Prefill(len=196) | 649ms | 1587ms | |
| Decode | 14.91 tokens/s | 10.36 tokens/s | |
| DeepSeekOCR-3B(A570M) | Img-encoder(448*448) | 2.09s | 2.27ms |
| Prefill(len=128) | 696ms | 1010ms | |
| Decode | 31.8 tokens/s | 22.3 tokens/s |
- The img-encoder runs inference on RKNN with FP16, tested using all NPU cores.