Skip to content

Feature: Add FunASR/SenseVoice as STT extension #2176

@LauraGPT

Description

@LauraGPT

Motivation

TEN Framework is a powerful conversational voice AI framework. FunASR (16K+ stars) could be a great addition as an STT extension, particularly strong for Chinese and multilingual scenarios.

Why FunASR

  • OpenAI-compatible API: /v1/audio/transcriptions — easy to integrate as an STT service
  • 50+ languages with auto language detection
  • Non-autoregressive: SenseVoice processes in single forward pass — lower latency for conversational AI
  • Built-in VAD: FSMN-VAD, no external dependency needed
  • Built-in speaker diarization: cam++ model for multi-speaker scenarios
  • Emotion detection: SenseVoice detects speech emotions — useful for conversational AI context
  • Streaming ASR: WebSocket-based real-time recognition
  • Free & self-hosted: No API costs

Integration as TEN Extension

# Start FunASR server
pip install funasr vllm
funasr-server --device cuda  # OpenAI-compatible at :8000

Since TEN already supports OpenAI-compatible STT services, FunASR can be used as a drop-in backend. A dedicated TEN extension could provide:

  • Real-time streaming via WebSocket
  • Emotion metadata for agent reactions
  • Speaker identification in multi-party conversations

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions