Baseten is Canopy Labs' preferred inference provider for running Orpheus TTS in production.
To deploy the model, go to https://www.baseten.co/library/orpheus-tts/ and use the one-click deploy option.
Baseten supports both fp8 (default for performance) and fp16 (full fidelity) versions of Orpheus.
If you want to customize the model serving code, you can instead deploy the prepackaged model from Baseten's example repository.
The call_orpheus.py file contains sample inference code for running the Orpheus TTS model with multiple parallel requests.
Prerequisites:
- Paste the
model_idfrom your deployed model into thecall_orpheus.pyscript. - Save your
BASETEN_API_KEYas an environment variable.
Then, you can call the model with python call_orpheus.py.