Feature: Add FunASR/SenseVoice as STT extension

## Motivation

TEN Framework is a powerful conversational voice AI framework. [FunASR](https://github.com/modelscope/FunASR) (16K+ stars) could be a great addition as an STT extension, particularly strong for Chinese and multilingual scenarios.

## Why FunASR

- **OpenAI-compatible API**: `/v1/audio/transcriptions` — easy to integrate as an STT service
- **50+ languages** with auto language detection
- **Non-autoregressive**: SenseVoice processes in single forward pass — lower latency for conversational AI
- **Built-in VAD**: FSMN-VAD, no external dependency needed
- **Built-in speaker diarization**: cam++ model for multi-speaker scenarios
- **Emotion detection**: SenseVoice detects speech emotions — useful for conversational AI context
- **Streaming ASR**: WebSocket-based real-time recognition
- **Free & self-hosted**: No API costs

## Integration as TEN Extension

```bash
# Start FunASR server
pip install funasr vllm
funasr-server --device cuda  # OpenAI-compatible at :8000
```

Since TEN already supports OpenAI-compatible STT services, FunASR can be used as a drop-in backend. A dedicated TEN extension could provide:
- Real-time streaming via WebSocket
- Emotion metadata for agent reactions
- Speaker identification in multi-party conversations

## References

- GitHub: https://github.com/modelscope/FunASR
- Streaming ASR: WebSocket endpoint
- Models: SenseVoice (50+ langs), Paraformer (zh/en high accuracy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add FunASR/SenseVoice as STT extension #2176

Motivation

Why FunASR

Integration as TEN Extension

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Add FunASR/SenseVoice as STT extension #2176

Description

Motivation

Why FunASR

Integration as TEN Extension

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions