Skip to content

Latest commit

 

History

History
executable file
·
117 lines (105 loc) · 9.04 KB

File metadata and controls

executable file
·
117 lines (105 loc) · 9.04 KB

Model Performance Benchmark

  • This performance data were collected based on the maximum CPU and NPU frequencies of each platform.
  • The script for setting the frequencies is located in the scripts directory.
  • All models should be converted with optimization_level set to 0 to enable optimized runtime performance.

RK3588

Model Model Size Dtype Seqlen New_tokens TTFT(ms) Tokens/s memory(MB)
Qwen2 0.5B w8a8 128 64 143.83 42.58 654.26
MiniCPM4 0.5B w8a8 128 64 128.46 45.13 524.55
Qwen3 0.6B w8a8 128 64 213.50 32.16 773.77
TinyLLAMA 1.1B w8a8 128 64 239.00 24.49 1085.21
Qwen2.5 1.5B w8a8 128 64 412.27 16.32 1659.15
RWKV7 1.5B w8a8 128 64 788.00 13.33 1450.29
InternLM2 1.8B w8a8 128 64 374.00 15.58 1765.71
Gemma2 2B w8a8 128 64 679.90 9.80 2765.30
Gemma3n 2B w8a8 128 64 1220.40 9.46 2709.25
TeleChat2 3B w8a8 128 64 649.60 10.22 2777.00
Phi3 3.8B w8a8 128 64 1022.00 7.50 3747.73
MiniCPM3 4B w8a8 128 64 1385.92 5.99 4339.61
ChatGLM3 6B w8a8 128 64 1395.34 4.94 5976.43
Qwen3-VL 2B w8a8 128 64 391 15.12 1892.13
DeepSeekOCR 3B(A570M) w8a8 128 64 696.21 31.81 3028.66

RK3576

Model Model Size Dtype Seqlen New_tokens TTFT(ms) Tokens/s memory(MB)
Qwen2 0.5B w4a16 128 64 327.72 34.24 426.24
0.5B w4a16_g128 128 64 363.58 33.22 445.95
0.5B w8a8 128 64 334.26 22.95 661.1
MiniCPM4 0.5B w4a16 128 64 348.87 35.8 322.41
0.5B w4a16_g128 128 64 371.96 32.88 362.23
0.5B w8a8 128 64 337.52 23.71 528.96
Qwen3 0.6B w4a16 128 64 482.82 25.16 495.99
0.6B w4a16_g128 128 64 512.36 24.3 528.48
0.6B w8a8 128 64 448.94 17.09 779.62
TinyLLAMA 1.1B w4a16 128 64 517.82 21.32 591
1.1B w4a16_g128 128 64 658.78 18.89 681
1.1B w8a8 128 64 537.82 12.63 1082.83
RWKV7 1.5B w4a16 128 64 1779.65 9.96 799.89
1.5B w4a16_g128 128 64 1877.95 9.37 890.16
1.5B w8a8 128 64 1718.8 6.96 1458.48
InternLM2 1.8B w4a16 128 64 771.6 13.65 966.12
1.8B w4a16_g128 128 64 1001.23 12.18 1061.57
1.8B w8a8 128 64 777.86 7.91 1773.23
Gemma2 2B w4a16 128 64 1119.51 8.45 1529.03
2B w4a16_g128 128 64 1407.31 7.76 1616.45
2B w8a8 128 64 1052.77 5.01 2771.54
Gemma-3n 2B w4a16 128 64 3187 7.38 1574.34
2B w8a8 128 64 3229.16 4.75 2722.76
TeleChat2 3B w4a16 128 64 1143.73 9.05 1514.98
3B w4a16_g128 128 64 1422.38 7.91 1633.54
3B w8a8 128 64 1035.37 5.15 2783.73
Phi3 3.8B w4a16 128 64 1800.92 6.52 1985.75
3.8B w4a16_g128 128 64 2236.9 5.96 2141.89
3.8B w8a8 128 64 1591.59 3.76 3757.22
MiniCPM3 4B w4a16 128 64 2484.63 4.94 2336.73
4B w4a16_g128 128 64 3053.52 4.49 2618.14
4B w8a8 128 64 2509.27 3.04 4366.85
ChatGLM3 6B w4a16 128 64 2121.26 4.7 3014.38
6B w4a16_g128 128 64 2958.88 4.03 3244.15
6B w8a8 128 64 1920.97 2.5 5958.65
Qwen3-VL 2B w4a16 128 64 791.20 12.88 1082.65
2B w4a16_g128 128 64 1026.31 11.62 1170.89
2B w8a8 128 64 799.09 7.67 1900.80
DeepSeekOCR 3B(A570M) w4a16 128 64 1010.15 24.85 1756.13
3B(A570M) w8a8 128 64 1312.00 16.21 3072.33

RK3562

Model Model Size Dtype Seqlen New_tokens TTFT(ms) Tokens/s memory(MB)
Qwen2 0.5B w8a8 128 64 650.37 12.94 632.48
MiniCPM4 0.5B w8a8 128 64 689.88 11.78 500.54
Qwen3 0.6B w8a8 128 64 901.09 10.00 756.72

RV1126B

Model Model Size Dtype Seqlen New_tokens TTFT(ms) Tokens/s
Qwen2 0.5B w4a16 128 64 650.69 21.43
0.5B w4a16_g128 128 64 679.78 18.18
0.5B w8a8 128 64 636.90 13.91
MiniCPM4 0.5B w4a16 128 64 654.20 22.97
0.5B w4a16_g128 128 64 691.57 18.78
0.5B w8a8 128 64 663.41 15.12
Qwen3 0.6B w4a16 128 64 955.94 15.41
0.6B w4a16_g128 128 64 1019.94 12.60
0.6B w8a8 128 64 945.18 10.55

Multimodal

model Stage RK3588(w8a8) RK3576(w4a16)
Qwen2-VL-2B img-encoder(392*392) 3.28s 3.55s
Prefill(len=196) 632.6ms 1234.9ms
Decode 16.6 tokens/s 14.57 tokens/s
Qwen2.5-VL-3B img-encoder(392*392) 2.93s 2.87s
Prefill(len=196) 1120ms 2130ms
Decode 8.66 tokens/s 7.87 tokens/s
MiniCPM-V-2_6 img-encoder(448*448) 3.27s 2.4s
Prefill(len=64) 826ms 1230ms
Decode 4.18 tokens/s 3.85 tokens/s
SmolVLM-256M Img-encoder(512*512) 842ms 768ms
Prefill(len=128) 77.3ms 180ms
Decode 78 tokens/s 57.73tokens/s
Qwen3-VL-2B img-encoder(448*448) 2.08s 1.61s
Prefill(len=196) 649ms 1587ms
Decode 14.91 tokens/s 10.36 tokens/s
DeepSeekOCR-3B(A570M) Img-encoder(448*448) 2.09s 2.27ms
Prefill(len=128) 696ms 1010ms
Decode 31.8 tokens/s 22.3 tokens/s
  • The img-encoder runs inference on RKNN with FP16, tested using all NPU cores.