- Add
on_low_language_confidenceproperty toLanguageDetectionOptionsControls behavior when language confidence is below threshold. Either "error" (default) or "fallback". When set to "fallback", the transcription will use the fallback language instead of erroring when confidence is low.
- Add
multichannelproperty toTranscriptParams - Add
multichannelandaudio_channelsproperty toTranscript - Add
channelproperty toTranscriptWord,TranscriptUtterance,TranscriptSentence, andSentimentAnalysisResult
- Log a warning when a user tries to use API key authentication in the browser to connect to the real-time Streaming STT API.
- Update dependencies
- Use assembly.ai short URL for sample files
- Add
language_confidence_thresholdtoTranscript,TranscriptParams, andTranscriptOptionalParams.The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold.
- Add
language_confidencetoTranscriptThe confidence score for the detected language, between 0.0 (low confidence) and 1.0 (high confidence)
Using these new fields you can determine the confidence of the language detection model (enable by setting language_detection to true), and fail the transcript if it doesn't meet your desired threshold.
Learn more about the new automatic language detection model and feature improvements on our blog.
- Change
RealtimeErrorTypefrom enum to const object. - Add
RealtimeErrorTypeCodeswhich is a union ofRealtimeErrorTypevalues
- Remove
conformer-2fromSpeechModelunion type. - Remove conformer-2 deprecation warning
- Add more TSDoc comments for
RealtimeServicedocumentation - Add new LeMUR models
- Add
TranscriptWebhookNotificationwhich is a union ofTranscriptReadyNotificationorRedactedAudioNotification - Add
RedactedAudioNotificationwhich represents the body of the PII redacted audio webhook notification.
- You can now retrieve previous LeMUR responses using
client.lemur.getResponse<LemurTask>("YOUR_REQUEST_ID"). - LeMUR functions now return
usagewith the number ofinput_tokensandoutput_tokens.
- Rename
TranscriptService.redactionsfunction toTranscriptService.redactedAudio. - Add
TranscriptService.redactedAudioFilefunction. - Add
workerdexport to fixcacheissue withfetchon Cloudflare Workers.
- Fix Rollup exports so __SDK_VERSION__ is properly replaced with the version of the SDK.
- Add new
PiiPolicyenum values
- Add an export that only includes the Streaming STT code. You can use the export
- by importing
assemblyai/streaming, - or by loading the
assemblyai.streaming.umd.jsfile, orassemblyai.streaming.umd.min.jsfile in a script-tag.
- by importing
- Add new
EntityTypeenum values
- Add react-native exports that resolve to the browser version of the library.
- Caching is disabled for all HTTP request made by the SDK
- Accept data-URIs in
client.files.upload(dataUri),client.transcripts.submit(audio: dataUri),client.transcripts.transcribe(audio: dataUri). - Change how the WebSocket libraries are imported for better compatibility across frameworks and runtimes.
The library no longer relies on a internal
#wsimport, and instead compiles the imports into the dist bundles. Browser builds will use the nativeWebSocket, other builds will use thewspackage.
- Deprecate
enableExtraSessionInformationparameter inCreateRealtimeTranscriberParamstype
- Add
disablePartialTranscriptsparameter toCreateRealtimeTranscriberParams - Add
enableExtraSessionInformationparameter toCreateRealtimeTranscriberParams - Add
session_informationevent toRealtimeTranscriber.on()
⚠️ Deprecateconformer-2literal forTranscriptParams.speech_modelproperty
- Add missing
statusproperty toAutoHighlightsResult
SpeechModel.BestenumTranscriptListItem.errorproperty
- Make
PageDetails.prev_urlnullable - Rename Realtime to Streaming inside code documentation
- More inline code documentation
- Rename
SubstitutionPolicyliteral "entity_type" to "entity_name" - Fix the pagination example in "List transcripts" sample on README
- GitHub action to generate API reference
- Generate API reference with Typedoc and host on GitHub Pages
- Add
conformer-2toSpeechModeltype - Change
language_codefield to accept any string - Move from JSDoc to TSDoc
- Update
wsto 8.13.0 - Update dev dependencies (no public facing changes)
- Add
audio_urlproperty toTranscribeParamsin addition to theaudioproperty. You can use one or the other.audio_urlonly accepts a URL string. - Add
TranscriptReadyNotificationtype for the transcript webhook body.
- Update codebase to use TSDoc
- Update README.md with more samples
- Add
RealtimeTranscriber.configureEndUtteranceSilenceThresholdfunction - Add
RealtimeTranscriber.forceEndUtterancefunction - Add
end_utterance_silence_thresholdproperty toCreateRealtimeTranscriberParamsandRealtimeTranscriberParamstypes.
- Add
speech_modelfield toTranscriptParamsand addSpeechModeltype.
- Windows paths passed to
client.transcripts.transcribeandclient.transcripts.submitwill work as expected.
- Add
answer_formattoLemurActionItemsParamstype
- Rename
RealtimeServicetoRealtimeTranscriber,RealtimeServiceFactorytoRealtimeTranscriberFactory,RealtimeTranscriberFactory.createService()toRealtimeTranscriberFactory.transcriber(). Deprecated aliases are provided for all old types and functions for backwards compatibility. - Restrict the type for
redact_pii_audio_qualityfromstringtoRedactPiiAudioQualityan enum string.
- Add
content_safety_confidencetoTranscriptParams&TranscriptOptionalParams.
- The
RealtimeServicenow sends audio as binary instead of a base64-encoded JSON object.
- Add
"anthropic/claude-2-1"toLemurModeltype - Add
encodingoption to the real-time service and factory.encodingcan be"pcm_s16le"or"pcm_mulaw". "pcm_mulaw"is a newly supported audio encoding for the real-time service.
- Allow any string into
final_modelfor LeMUR requests
- Add
"assemblyai/mistral-7b"toLemurModeltype
- Update types with
@example - Update types with
Format: uuidif applicable
- Add
node,deno,bun,browser, andworkerd(Cloudflare Workers) exports to package.json. These exports are compatible versions of the SDK, with a few limitations in some cases. For more details, consult the SDK Compatibility document. - Add
dist/assemblyai.umd.jsanddist/assemblyai.umd.min.js. You can reference these script files directly in the browser and the SDK will be available at the globalassemblyaivariable.
RealtimeService.sendAudioaccepts audio via typeArrayBufferLike.- Breaking:
RealtimeService.streamreturns a WHATWG Streams Standard stream, instead of a Node stream. In the browser, the native web standard stream will be used. wsis used as the WebSocket client as before, but in the browser, the native WebSocket client is used.- Rename Node SDK to JavaScript SDK as the SDK is compatible with more runtimes now.
- Add
client.transcripts.transcribefunction to transcribe an audio file with polling until transcript status iscompletedorerror. This function takes anaudiooption which can be an audio file URL, path, stream, or buffer. - Add
client.transcripts.submitfunction to queue a transcript. You can useclient.transcripts.waitUntilReadyto poll the transcript returned bysubmit. This function also takes anaudiooption which can be an audio file URL, path, stream, or buffer.
- Deprecated
client.transcripts.createin favor oftranscribeandsubmit, to be more consistent with other AssemblyAI SDKs. - Renamed types
- Renamed
Parameterstype suffix withParamstype suffix - Renamed
CreateTranscriptParameterstoTranscriptParams - Renamed
CreateTranscriptOptionalParameterstoTranscriptOptionalParams.
- Renamed
- Added deprecated aliases for the forementioned types
- Improved type docs
- Add
AssemblyAI.transcripts.waitUntilReadyfunction to wait until a transcript is ready, meaningstatusiscompletedorerror. - Add
chars_per_captionparameter toAssemblyAI.transcripts.subtitlesfunction. - Add
input_textproperty to LeMUR functions. Instead of usingtranscript_ids, you can useinput_textto provide custom formatted transcripts as input to LeMUR.
- Change default timeout from 3 minutes to infinite (-1). Fixes #17
- Correctly serialize the keywords for
client.transcripts.wordSearch. - Use more widely compatible syntax for wildcard exporting types. Fixes #18.
- The SDK uses
fetchinstead of Axios. This removes the Axios dependency. Axios relies on XMLHttpRequest which isn't supported in Cloudflare Workers, Deno, Bun, etc. By usingfetch, the SDK is now more compatible on the forementioned runtimes.
- The SDK uses relative imports instead of using path aliases, to make the library transpilable with tsc for consumers. Fixes #14.
- Added
speakerproperty to theTranscriptUtterancetype, and removedchannelproperty.
AssemblyAI.files.uploadaccepts streams and buffers, in addition to a string (path to file).
- Breaking: The module does not have a default export anymore, because of inconsistent functionality across module systems. Instead, use
AssemblyAIas a named import like this:import { AssemblyAI } from 'assemblyai'.
AssemblyAI.transcripts.wordSearchsearches for keywords in the transcript.AssemblyAI.lemur.purgeRequestDatadeletes data related to your LeMUR request.RealtimeService.streamcreates a writable stream that you can write audio data to instead of using `RealtimeService.sendAudio``.
- The AssemblyAI class would be exported as default named export instead in certain module systems.
Re-implement the Node SDK in TypeScript and add all AssemblyAI APIs.
- Transcript API client
- LeMUR API client
- Real-time transcript client