Releases
release-v1.1.0
Compare
Sorry, something went wrong.
No results found
hiyhc
released this
11 Oct 08:53
Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
Added gdq algorithm to improve 4-bit quantization accuracy.
Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
Added support for Llama3, Gemma2, and Minicpm3 models.
Added support for gguf model conversion (currently supports q4_0 and fp16 only).
Added support for LoRa models.
Added storage and loading of prompt cache
Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
Fixed catastrophic forgetting issue when the token count exceeds max_context.
Optimized prefill speed.
Optimized generate speed.
Optimized model initialization time
Added support for four input interfaces: prompt, embedding, token, and multimodal.
You can’t perform that action at this time.