feat(eot): add audio models AGT-2919#1719
Open
chenghao-mou wants to merge 29 commits into
Open
Conversation
🦋 Changeset detectedLatest commit: 012da27 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
8 tasks
ed2e02f to
3e2fb33
Compare
add audio eot model and local inference support, deprecating silero and turn detector plugins
…frame The AudioFrame emitted on START_OF_SPEECH / END_OF_SPEECH sliced off the prefix-padding samples but still reported `samplesPerChannel = speechBufferIndex`, so the frame's metadata claimed more samples than its data contained and downstream consumers (STT, transcription) lost the pre-roll context the buffer machinery is designed to preserve. Slice from 0 instead so data length matches samplesPerChannel and the prefix-padding pre-roll is delivered, matching the Python original. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… to version
Rename the unreleased `inference.AudioTurnDetector` to `inference.TurnDetector`
and replace its `model` constructor option with `version` (`'v1' | 'v1-mini'`).
The `version` is the constructor knob only; the `model` field/getter is kept and
now holds the full model name (`turn-detector-v1` / `turn-detector-v1-mini`),
which telemetry/billing read via `detector.model` (metric `modelName` →
`EOTModelUsage.model` → remote sessions) unchanged.
Mirrors the upstream Python rename. The private base peers are renamed to the
modality-agnostic streaming scheme: `BaseStreamingTurnDetector`,
`BaseStreamingTurnDetectorStream`, `StreamingTurnDetectionTransport`,
`BaseStreamingTurnDetectorCallbacks`, `BaseStreamingTurnDetectorOptions`
(resolving the public-opts `TurnDetectorOptions` collision). Adds
`TurnDetectorVersion`; keeps `TurnDetectorModel` with updated values.
Also folds in in-flight AGT-2520 EOU work: VAD slow-inference guard fix,
`turnDetection: null` opt-out preserved distinctly from `undefined`,
silero `VAD.load()` delegating to `inference.VAD({ model: 'silero' })` for
16 kHz, and a `LocalTransport` cleanup refactor.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a copy of the turn detection model license, call it out in the root README alongside the Apache-2.0 license, and annotate it in REUSE.toml to keep the REUSE-3.2 lint green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main dropped the flat `export *` re-exports (AgentSession, tool, logMetrics) in favor of namespace-only exports, and does not have the 1.5.0 Toolset API (Agent.create / array-style tools). Adapt basic_agent.ts to main's namespace conventions (new voice.Agent, object tools, voice.*/metrics.* prefixes) while preserving the multimodal-EOU session config. Regenerate pnpm-lock.yaml against the rebased package.json set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d3ee830 to
7e24939
Compare
toubatbrian
reviewed
Jun 12, 2026
toubatbrian
reviewed
Jun 12, 2026
toubatbrian
reviewed
Jun 12, 2026
| } | ||
|
|
||
| /** | ||
| * Speaking-guard wrapper for the bounce-EOU task, mirroring Python's |
Contributor
There was a problem hiding this comment.
Should we remove the comments phrasing that references "mirroring pythons", etc
toubatbrian
reviewed
Jun 12, 2026
Comment on lines
+583
to
+585
| // A different stream means a fresh request lifecycle: drop any held | ||
| // prediction future and re-arm so the adopting recognition starts its own | ||
| // request on the next VAD event. |
Contributor
There was a problem hiding this comment.
claude tends to add a bunch of inline comments, would be nice to clean them up, only left those that are necessary
Member
Author
There was a problem hiding this comment.
I will do another sweep of comments
…dal-EOU # Conflicts: # agents/src/inference/utils.ts # agents/src/voice/agent_activity.ts # agents/src/voice/audio_recognition.ts
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
add audio eot model and local inference support, deprecating silero and turn detector plugins## Description
Changes Made
Adds streaming audio end-of-turn detection. Single user-facing
AudioTurnDetectorthat selects between two backends:turn-detectorturn-detector-miniOn cloud transport error or
predict_end_of_turntimeout, the session swaps to mini/local for the rest of the stream (sticky per session, one warning per failure mode).Local failures emit the default
1.0prediction and retry on the next turn.A user-set
unlikely_thresholdis scaled multiplicatively against the cloud default so the operating point survives a fallback.Pre-Review Checklist
Testing
restaurant_agent.tsandrealtime_agent.tswork properly (for major changes)Additional Notes
Python PR: livekit/agents#4722
Note to reviewers: Please ensure the pre-review checklist is completed before starting your review.