Skip to content

automatic STT keyterm detection#6039

Open
longcw wants to merge 10 commits into
mainfrom
longc/auto-stt-keyterms
Open

automatic STT keyterm detection#6039
longcw wants to merge 10 commits into
mainfrom
longc/auto-stt-keyterms

Conversation

@longcw

@longcw longcw commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Adds automatic keyterm detection to AgentSession, biasing the STT toward the correct spelling of distinctive words (names, companies, products, jargon) as they come up in the conversation.

Overview

  • New keyterm_options on AgentSession: user-defined terms plus a detection config (enabled, llm, turn_interval, max_keyterms, instructions).
  • KeytermDetector runs a background LLM pass per user turn over the recent transcript and maintains the keyterm set with a confirmation gate: a new term starts as pending and only biases the STT once later transcript evidence confirms it; remove only applies to spellings the user explicitly corrected, and the replacement goes through the same confirmation flow.
  • The default prompt treats USER lines as untrusted STT output and ASSISTANT lines as authoritative spelling, and explicitly rejects misrecognitions: sound-alike variants of already-tracked terms, garbled phrases the assistant never adopts, and fragments from interrupted lines.
  • Detection state is owned by the session so keyterms survive agent handoffs; user-defined terms are shown to the detection LLM as applied but are never modified by it.
  • New STT.update_keyterms() with a keyterms capability flag, implemented for deepgram (v1/v2), assemblyai, google, and livekit inference STT; the fallback and stream adapters forward it.

longcw added 6 commits June 9, 2026 21:04
Show user-defined terms to the detection LLM as applied so it stops re-proposing them every pass; add misrecognition rules to the default prompt (sound-alike variants of tracked terms, user-only garbled phrases, interrupted-line fragments) and route corrections through the normal confirmation gate.
devin-ai-integration[bot]

This comment was marked as resolved.

User-defined keyterms were silently dropped unless detection.enabled was set: start() returned before binding the STT, so the initial push and later set_user_keyterms() were no-ops. Bind the STT unconditionally (skipping the push when there are no terms, so sessions without keyterms see no capability warning or reconnect) and gate only the detection setup on enabled.
keyterm_options={
"terms": ["LiveKit"],
"detection": {"enabled": True, "turn_interval": 1},
},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For conversations where some context/user information is available before the call (e.g from a patient/customer profile loaded when starting), should we allow extracting keyterms from such context first?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps keep that on the developer side, they can pass the context like address, user name via keyterm_options={"terms": [...]} once the profile loads.

"enabled": False,
"llm": None,
"turn_interval": 1,
"max_keyterms": None,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of vendors impose a max limit, maybe we should check this in the extract-for-model function.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines +553 to +556
def update_keyterms(self, keyterms: list[str]) -> None:
# Google biases recognition via (phrase, boost) pairs; apply a moderate
# default boost since the common keyterms API carries no per-term weight.
self.update_options(keywords=[(term, _DEFAULT_KEYTERM_BOOST) for term in keyterms])

@devin-ai-integration devin-ai-integration Bot Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Google STT _update_keyterms merges user keywords with auto-detected keyterms using a default boost

At livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:556-570, the Google STT's _update_keyterms merges the provider-agnostic keyterms (which have no per-term weight) with the user's manually-tuned keywords list (which have explicit boosts). A default boost of 10.0 (_DEFAULT_KEYTERM_BOOST at line 74) is applied to auto-detected terms. This is a reasonable heuristic but may need tuning — Google accepts boosts roughly in the 0–20 range, and 10.0 is moderate. Users who need different boost values should use the Google-specific keywords parameter directly.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment on lines +553 to +556
def _update_keyterms(self, keyterms: list[str]) -> None:
# Google biases recognition via (phrase, boost) pairs; apply a moderate
# default boost since the common keyterms API carries no per-term weight.
self.update_options(keywords=[(term, _DEFAULT_KEYTERM_BOOST) for term in keyterms])

@devin-ai-integration devin-ai-integration Bot Jun 11, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Google STT claims keyterms=True even when adaptation would shadow keywords

The Google STT plugin now unconditionally sets keyterms=True in its capabilities (livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:234). However, _update_keyterms uses update_options(keywords=...) which is shadowed by an existing adaptation config (stt.py:106-109build_adaptation returns adaptation first, ignoring keywords). If a user configures both adaptation and enables keyterm detection, detected terms are stored but never reach the recognizer. The existing warning at stt.py:512-515 covers this partially, but the keyterms capability claim might mislead the keyterm detector into running LLM passes whose results are silently discarded.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

endpoint_url=endpoint_url,
)

def _update_keyterms(self, keyterms: list[str]) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the flux models can support mid-stream keyterm update: https://developers.deepgram.com/docs/keyterm#dynamic-keyterm-updates-flux-only should we disable this for nova models?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants