Skip to content

Related work: FunASR SenseVoice — multi-task speech model with emotion recognition #36

Description

@LauraGPT

Hi! Interesting open-source speech language model.

Wanted to share a related project that might be of interest for comparison or collaboration:

SenseVoice

SenseVoice is a multi-task speech foundation model that handles:

  • ASR (50+ languages)
  • Speech emotion recognition (happy, sad, angry, neutral, etc.)
  • Audio event detection (laughter, applause, music, etc.)

All in a single model with ultra-low latency (~70ms for 10s audio).

Comparison areas

Tada SenseVoice
Architecture Speech LM Multi-task encoder
Tasks Speech understanding ASR + emotion + events
Latency - ~70ms/10s
Languages - 50+

Also relevant

Paper: https://arxiv.org/abs/2407.04051

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions