feat: Add SenseVoice/FunASR as STT extension — 5x faster than Whisper, emotion detection

## Feature request

TEN Framework is for building conversational voice AI agents — ASR latency directly impacts conversation quality. [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) (8.3K+ stars) could significantly reduce the STT bottleneck.

## Why SenseVoice for TEN

1. **5x faster** than Whisper — lower latency for more natural voice conversations
2. **Non-autoregressive** — single forward pass, predictable and consistent latency
3. **Built-in emotion detection** — agents can adapt responses based on user's emotional state
4. **Audio event detection** — detect laughter, music, etc. for richer context
5. **50+ languages** with auto-detection — multilingual agent support out of the box
6. **Built-in VAD** (FSMN-VAD) — handles silence detection efficiently

## Integration

### OpenAI-compatible API
```bash
pip install funasr
funasr-server --device cuda
# Serves at http://localhost:8000/v1/audio/transcriptions
```

### Python API (for direct integration)
```python
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad")
result = model.generate(input=audio_chunk)
# Returns: text + language + emotion + audio events
```

### Streaming (for real-time agents)
FunASR also supports WebSocket streaming for low-latency scenarios:
```python
from funasr import AutoModel
model = AutoModel(model="paraformer-zh-streaming", chunk_size=[0, 10, 5])
# Processes audio chunks in real-time
```

## References

- SenseVoice: https://github.com/FunAudioLLM/SenseVoice
- FunASR: https://github.com/modelscope/FunASR (16K+ stars)
- PyPI: `pip install funasr` (1M+ monthly downloads)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add SenseVoice/FunASR as STT extension — 5x faster than Whisper, emotion detection #2175

Feature request

Why SenseVoice for TEN

Integration

OpenAI-compatible API

Python API (for direct integration)

Streaming (for real-time agents)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Add SenseVoice/FunASR as STT extension — 5x faster than Whisper, emotion detection #2175

Description

Feature request

Why SenseVoice for TEN

Integration

OpenAI-compatible API

Python API (for direct integration)

Streaming (for real-time agents)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions