TTS (Text-to-Speech)
This package handles speech synthesis for PicoClaw.
If you are new to TTS setup, the simplest workflow is:
- Add a TTS-capable entry to
model_list.
- Point
voice.tts_model_name at that entry.
- Put the API key in
.security.yml.
Quick Recommendation
For most users, these are the best starting points:
| Provider |
Why start here |
| OpenAI |
Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible /audio/speech API shape, and OpenAI is the safest default. |
| Xiaomi MiMo |
A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack. |
How TTS Configuration Works
PicoClaw does not keep TTS API keys inside voice.
Instead:
voice.tts_model_name selects a named entry from model_list.
- That
model_list entry provides the provider, model ID, API base, and proxy settings.
.security.yml stores the API key for the same named model entry.
This is the recommended and supported configuration pattern.
Recommended Setup
Option A: OpenAI
config.json
{
"voice": {
"tts_model_name": "openai-tts"
},
"model_list": [
{
"model_name": "openai-tts",
"model": "openai/tts-1"
}
]
}
.security.yml
model_list:
openai-tts:
api_keys:
- "sk-openai-your-key"
Option B: Xiaomi MiMo
config.json
{
"voice": {
"tts_model_name": "mimo-tts"
},
"model_list": [
{
"model_name": "mimo-tts",
"model": "mimo/mimo-v2-tts"
}
]
}
.security.yml
model_list:
mimo-tts:
api_keys:
- "your-mimo-key"
If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.
What PicoClaw Sends Today
The current TTS runtime uses an OpenAI-compatible speech request with these defaults:
- Endpoint:
/audio/speech
- Response format:
opus
- Voice:
alloy
- Model: taken from the selected
model_list entry
That means:
openai/tts-1 works naturally.
- Other OpenAI-compatible providers can work if they accept the same request format.
- PicoClaw currently does not expose a user-facing config field for changing the TTS voice from
alloy.
How PicoClaw Chooses a TTS Provider
DetectTTS resolves TTS in this order:
- Preferred path: resolve
voice.tts_model_name against model_list.
- If a matching model entry exists and has an API key, PicoClaw creates an OpenAI-compatible TTS provider using that model's settings.
- Fallback path: if
voice.tts_model_name is not set or cannot be resolved, PicoClaw scans model_list for the first entry whose model string contains tts and has an API key.
Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.
Notes About API Base Handling
PicoClaw normalizes the configured base URL for TTS:
- For OpenAI, a base like
https://api.openai.com or https://api.openai.com/v1 becomes https://api.openai.com/v1/audio/speech.
- For other OpenAI-compatible providers, PicoClaw preserves the configured base path and ensures it ends with
/audio/speech.
- If
api_base is omitted, PicoClaw uses the provider default base when the model prefix is known.
Common Mistakes
- Setting
voice.tts_model_name to a name that does not exist in model_list.
- Adding a TTS model but forgetting to put its API key in
.security.yml.
- Assuming PicoClaw will automatically use provider-specific custom voices.
- Using a provider endpoint that is not compatible with the OpenAI
/audio/speech request format.
Minimal Checklist
Before testing send_tts, make sure:
voice.tts_model_name matches a model_list[].model_name.
- The matching
.security.yml entry contains a valid API key.
- The chosen provider supports an OpenAI-compatible speech synthesis endpoint.
- Your selected model is actually a TTS-capable model.