Feature request
TEN Framework is for building conversational voice AI agents — ASR latency directly impacts conversation quality. SenseVoice (8.3K+ stars) could significantly reduce the STT bottleneck.
Why SenseVoice for TEN
- 5x faster than Whisper — lower latency for more natural voice conversations
- Non-autoregressive — single forward pass, predictable and consistent latency
- Built-in emotion detection — agents can adapt responses based on user's emotional state
- Audio event detection — detect laughter, music, etc. for richer context
- 50+ languages with auto-detection — multilingual agent support out of the box
- Built-in VAD (FSMN-VAD) — handles silence detection efficiently
Integration
OpenAI-compatible API
pip install funasr
funasr-server --device cuda
# Serves at http://localhost:8000/v1/audio/transcriptions
Python API (for direct integration)
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_chunk)
# Returns: text + language + emotion + audio events
Streaming (for real-time agents)
FunASR also supports WebSocket streaming for low-latency scenarios:
from funasr import AutoModel
model = AutoModel(model="paraformer-zh-streaming", chunk_size=[0, 10, 5])
# Processes audio chunks in real-time
References
Feature request
TEN Framework is for building conversational voice AI agents — ASR latency directly impacts conversation quality. SenseVoice (8.3K+ stars) could significantly reduce the STT bottleneck.
Why SenseVoice for TEN
Integration
OpenAI-compatible API
pip install funasr funasr-server --device cuda # Serves at http://localhost:8000/v1/audio/transcriptionsPython API (for direct integration)
Streaming (for real-time agents)
FunASR also supports WebSocket streaming for low-latency scenarios:
References
pip install funasr(1M+ monthly downloads)