Skip to content

release-v1.2.0

Choose a tag to compare

@hiyhc hiyhc released this 08 Apr 10:05
· 7 commits to main since this release
  • Supports custom model conversion.
  • Supports chat_template configuration.
  • Enables multi-turn dialogue interactions.
  • Implements automatic prompt cache reuse for improved inference efficiency.
  • Expands maximum context length to 16K.
  • Supports embedding flash storage to reduce memory usage.
  • Introduces the GRQ Int4 quantization algorithm.
  • Supports GPTQ-Int8 model conversion.
  • Compatible with the RK3562 platform.
  • Added support for visual multimodal models such as InternVL2, Janus, and Qwen2.5-VL.
  • Supports CPU core configuration.
  • Added support for Gemma3
  • Added support for Python 3.9/3.11/3.12