Firstly, thank you for open sourcing this project.
Issue
I'd like to suggest a few improvements to the README.md. The current README.md provides a great quickstart for text-to-speech generation. However, it doesn't explicitly mention how to adjust the advanced acoustic parameters (like noise_temperature, acoustic_cfg_scale, or num_flow_matching_steps) via the Python API.
Users who see these settings in the Gradio demo (or in blog posts) might try passing them directly as kwargs to model.generate(), which results in a TypeError: unexpected keyword argument.
Suggested Solution
It would be incredibly helpful to add a short section to the README (perhaps under "Run Inference") demonstrating how to import and use the InferenceOptions dataclass to customize the voice generation.
Example snippet that could be added:
import torch
import torchaudio
from tada.modules.encoder import Encoder
from tada.modules.tada import TadaForCausalLM, InferenceOptions
device = "cuda"
# ... (load encoder, model, and prompt as usual) ...
# Configure human voice dynamics and flow matching steps
custom_options = InferenceOptions(
noise_temperature=0.95, # Higher = more emotional/varied micro-expressions
acoustic_cfg_scale=2.5, # Adherence to the reference voice tone
duration_cfg_scale=1.5, # Adherence to the pacing/rhythm of the reference
num_flow_matching_steps=15 # Higher = better audio fidelity
)
output = model.generate(
prompt=prompt,
text="Your text here.",
inference_options=custom_options
)
I ran into this while trying to make the voice output sound less rigid and more natural. Discovering the InferenceOptions object completely solved my issue, so surfacing this in the docs will definitely help other developers get the most out of the model!
Firstly, thank you for open sourcing this project.
Issue
I'd like to suggest a few improvements to the README.md. The current README.md provides a great quickstart for text-to-speech generation. However, it doesn't explicitly mention how to adjust the advanced acoustic parameters (like noise_temperature, acoustic_cfg_scale, or num_flow_matching_steps) via the Python API.
Users who see these settings in the Gradio demo (or in blog posts) might try passing them directly as kwargs to model.generate(), which results in a TypeError: unexpected keyword argument.
Suggested Solution
It would be incredibly helpful to add a short section to the README (perhaps under "Run Inference") demonstrating how to import and use the InferenceOptions dataclass to customize the voice generation.
Example snippet that could be added:
I ran into this while trying to make the voice output sound less rigid and more natural. Discovering the InferenceOptions object completely solved my issue, so surfacing this in the docs will definitely help other developers get the most out of the model!