Skip to content

Add Amazon Polly and ElevenLabs TTS support#108

Open
xermitik wants to merge 1 commit into
gexgd0419:masterfrom
xermitik:add-polly-elevenlabs-tts
Open

Add Amazon Polly and ElevenLabs TTS support#108
xermitik wants to merge 1 commit into
gexgd0419:masterfrom
xermitik:add-polly-elevenlabs-tts

Conversation

@xermitik
Copy link
Copy Markdown

Pull Request: Amazon Polly & ElevenLabs TTS Support

Overview

This PR adds support for two cloud TTS providers — Amazon Polly and ElevenLabs — as online voice sources in NaturalVoiceSAPIAdapter. Both providers are fully integrated into the existing SAPI voice enumeration and synthesis pipeline and are configurable through the installer UI.

Additionally, several bug fixes are included (see below).

Related issues:


New Features

Amazon Polly

Files added:

  • NaturalVoiceSAPIAdapter/AmazonPollyAPI.h / .cpp — HTTP client for the Polly REST API:
    • AWS Signature Version 4 authentication
    • HTTPS via ASIO + OpenSSL (same stack as the existing WebSocket connection pool)
    • MP3 audio output, decoded to PCM via Windows ACM
    • Voice list pagination (GET /v1/voices, iterates NextToken)
    • SSML pre-processing: strips unsupported <prosody> tags before sending
  • Installer/PollyKeyDlg.cpp — installer dialog for entering AWS Access Key ID, Secret Key, region, and engine type

Files modified:

  • NaturalVoiceSAPIAdapter/TTSEngine.h / .cppInitPollyVoice, SetupPollyEvents; dispatch in SpeakAsync / Stop
  • NaturalVoiceSAPIAdapter/VoiceTokenEnumerator.cppEnumPollyVoices: fetches and filters voices by language, creates SAPI tokens (Polly;Cloud, registry key Polly-{VoiceId}, credentials stored in NaturalVoiceConfig)
  • NaturalVoiceSAPIAdapter/NaturalVoiceSAPIAdapter.vcxproj + .filters — added new source files
  • Installer/Installer.rc — "Enable Amazon Polly online voices" checkbox, "Set Polly keys…" button, IDD_POLLYKEY dialog (bilingual CN/EN)
  • Installer/resource.h — new resource IDs: IDD_POLLYKEY, IDC_CHK_POLLY_VOICES, IDC_SET_POLLY_KEY, IDC_POLLY_ACCESS_KEY, IDC_POLLY_SECRET_KEY, IDC_POLLY_REGION, IDC_POLLY_ENGINE
  • Installer/MainDlg.cppUpdateEnableStates / UpdateDisplay / SaveChanges for Polly; handler for IDC_SET_POLLY_KEY
  • Installer/Installer.vcxproj + .filters — added PollyKeyDlg.cpp

Registry keys (HKCU\Software\NaturalVoiceSAPIAdapter\Enumerator):

Value Type Description
NoPollyVoices DWORD 1 — disable Polly voice enumeration
PollyAccessKey STRING AWS Access Key ID
PollySecretKey STRING AWS Secret Access Key
PollyRegion STRING AWS region, e.g. us-east-1
PollyEngine STRING neural / standard / long-form / generative

ElevenLabs

Files added:

  • NaturalVoiceSAPIAdapter/ElevenLabsAPI.h / .cpp — HTTP client for the ElevenLabs REST API:
    • Authentication via xi-api-key HTTP header
    • HTTPS via ASIO + OpenSSL
    • PCM 24 kHz 16-bit mono output (output_format=pcm_24000) — no decoder required
    • SSML not supported by ElevenLabs: all XML tags are stripped, plain text is sent (SsmlToPlainText)
    • XML entities emitted by the SSML builder are decoded back to plain text before calling ElevenLabs
    • Structured error parsing: {"detail": {"message": "..."}} and {"detail": "..."}
    • Voice list pagination (GET /v2/voices?page_size=100, loops via next_page_token until has_more=false)
    • Language detection with 3-level fallback: verified_languages[0].localelabels["language"] (ISO 639-1 → BCP-47 table, ~30 languages) → en-US
  • Installer/ElevenLabsKeyDlg.cpp — installer dialog for entering API key and selecting model; includes a link to https://elevenlabs.io/app/settings/api-keys

Files modified:

  • NaturalVoiceSAPIAdapter/TTSEngine.h / .cppInitElevenLabsVoice, SetupElevenLabsEvents; dispatch in SpeakAsync / Stop
  • NaturalVoiceSAPIAdapter/VoiceTokenEnumerator.cppEnumElevenLabsVoices: fetches voices, creates SAPI tokens (ElevenLabs;Cloud, registry key ElevenLabs-{VoiceId}, credentials stored in NaturalVoiceConfig)
  • NaturalVoiceSAPIAdapter/NaturalVoiceSAPIAdapter.vcxproj + .filters — added new source files
  • NaturalVoiceSAPIAdapter/pch.h — added #include <cwctype> (required for std::towupper in locale detection)
  • Installer/Installer.rc — "Enable ElevenLabs online voices" checkbox, "Set ElevenLabs key…" button, IDD_ELEVENKEY dialog; IDD_MAIN height extended to accommodate new controls (bilingual CN/EN)
  • Installer/resource.h — new resource IDs: IDD_ELEVENKEY, IDC_CHK_ELEVENLABS_VOICES, IDC_SET_ELEVENLABS_KEY, IDC_ELEVENLABS_LINK, IDC_ELEVENLABS_API_KEY, IDC_ELEVENLABS_MODEL
  • Installer/MainDlg.cppUpdateEnableStates / UpdateDisplay / SaveChanges for ElevenLabs; handler for IDC_SET_ELEVENLABS_KEY
  • Installer/Installer.vcxproj + .filters — added ElevenLabsKeyDlg.cpp

Registry keys (HKCU\Software\NaturalVoiceSAPIAdapter\Enumerator):

Value Type Description
NoElevenLabsVoices DWORD 1 — disable ElevenLabs voice enumeration
ElevenLabsApiKey STRING xi-api-key value
ElevenLabsModel STRING model ID, default eleven_multilingual_v2

Diagnostics

  • Debug logging reports the selected provider voice/model/engine and received audio byte counts.
  • Trace logging can include provider request bodies and truncated API error/list responses for troubleshooting.
  • Trace-level request bodies may contain the text being synthesized. It should only be enabled when debugging.
  • API keys are not written into request bodies; Polly uses SigV4 headers and ElevenLabs uses the xi-api-key header.

Bug Fixes

  • NaturalVoiceSAPIAdapter/TTSEngine.cpp — fixed invalid SSML sent to Azure when the caller (e.g. .NET System.Speech) wraps its input in a <speak> root element: SAPI forwards such tags via SPVA_ParseUnknownTag when it does not recognise the namespace/version attributes, which previously caused a nested <speak> to appear in the SSML payload. The fix detects and skips any <speak> tag in the unknown-tag path before appending it to the SSML being built.

  • NaturalVoiceSAPIAdapter/Mp3Decoder.cpp — set minimum ACM stream buffer to 16 384 bytes; fixes a crash when Polly returns a short MP3 clip

  • NaturalVoiceSAPIAdapter/TaskScheduler.h — thread-safe one-time initialization via std::call_once; task tracking by TaskHandle instead of raw pointer

  • NaturalVoiceSAPIAdapter/WSConnectionPool.cpp — moved connectionChanged.notify_all() before RemoveConnection() in close/error handlers; eliminates a race condition on connection teardown

  • Installer/MainDlg.cpp — Azure and Polly checkboxes now require all mandatory credentials before showing those providers as enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Elevenlabs NaturalVoice SapiAdapter

1 participant