Motivation
TEN Framework is a powerful conversational voice AI framework. FunASR (16K+ stars) could be a great addition as an STT extension, particularly strong for Chinese and multilingual scenarios.
Why FunASR
- OpenAI-compatible API:
/v1/audio/transcriptions — easy to integrate as an STT service
- 50+ languages with auto language detection
- Non-autoregressive: SenseVoice processes in single forward pass — lower latency for conversational AI
- Built-in VAD: FSMN-VAD, no external dependency needed
- Built-in speaker diarization: cam++ model for multi-speaker scenarios
- Emotion detection: SenseVoice detects speech emotions — useful for conversational AI context
- Streaming ASR: WebSocket-based real-time recognition
- Free & self-hosted: No API costs
Integration as TEN Extension
# Start FunASR server
pip install funasr vllm
funasr-server --device cuda # OpenAI-compatible at :8000
Since TEN already supports OpenAI-compatible STT services, FunASR can be used as a drop-in backend. A dedicated TEN extension could provide:
- Real-time streaming via WebSocket
- Emotion metadata for agent reactions
- Speaker identification in multi-party conversations
References
Motivation
TEN Framework is a powerful conversational voice AI framework. FunASR (16K+ stars) could be a great addition as an STT extension, particularly strong for Chinese and multilingual scenarios.
Why FunASR
/v1/audio/transcriptions— easy to integrate as an STT serviceIntegration as TEN Extension
Since TEN already supports OpenAI-compatible STT services, FunASR can be used as a drop-in backend. A dedicated TEN extension could provide:
References