tts

package

v0.2.6 Latest Latest Go to latest Published: Apr 8, 2026 License: MIT Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sipeed/picoclaw

Links

Open Source Insights

README ¶

TTS (Text-to-Speech)

This package handles speech synthesis for PicoClaw.

If you are new to TTS setup, the simplest workflow is:

Add a TTS-capable entry to model_list.
Point voice.tts_model_name at that entry.
Put the API key in .security.yml.

Quick Recommendation

For most users, these are the best starting points:

Provider	Why start here
OpenAI	Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible `/audio/speech` API shape, and OpenAI is the safest default.
Xiaomi MiMo	A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack.

How TTS Configuration Works

PicoClaw does not keep TTS API keys inside voice.

Instead:

voice.tts_model_name selects a named entry from model_list.
That model_list entry provides the provider, model ID, API base, and proxy settings.
.security.yml stores the API key for the same named model entry.

This is the recommended and supported configuration pattern.

Recommended Setup

Option A: OpenAI

config.json

{
  "voice": {
    "tts_model_name": "openai-tts"
  },
  "model_list": [
    {
      "model_name": "openai-tts",
      "model": "openai/tts-1"
    }
  ]
}

.security.yml

model_list:
  openai-tts:
    api_keys:
      - "sk-openai-your-key"

Option B: Xiaomi MiMo

config.json

{
  "voice": {
    "tts_model_name": "mimo-tts"
  },
  "model_list": [
    {
      "model_name": "mimo-tts",
      "model": "mimo/mimo-v2-tts"
    }
  ]
}

.security.yml

model_list:
  mimo-tts:
    api_keys:
      - "your-mimo-key"

If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.

What PicoClaw Sends Today

The current TTS runtime uses an OpenAI-compatible speech request with these defaults:

Endpoint: /audio/speech
Response format: opus
Voice: alloy
Model: taken from the selected model_list entry

That means:

openai/tts-1 works naturally.
Other OpenAI-compatible providers can work if they accept the same request format.
PicoClaw currently does not expose a user-facing config field for changing the TTS voice from alloy.

How PicoClaw Chooses a TTS Provider

DetectTTS resolves TTS in this order:

Preferred path: resolve voice.tts_model_name against model_list.
If a matching model entry exists and has an API key, PicoClaw creates an OpenAI-compatible TTS provider using that model's settings.
Fallback path: if voice.tts_model_name is not set or cannot be resolved, PicoClaw scans model_list for the first entry whose model string contains tts and has an API key.

Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.

Notes About API Base Handling

PicoClaw normalizes the configured base URL for TTS:

For OpenAI, a base like https://api.openai.com or https://api.openai.com/v1 becomes https://api.openai.com/v1/audio/speech.
For other OpenAI-compatible providers, PicoClaw preserves the configured base path and ensures it ends with /audio/speech.
If api_base is omitted, PicoClaw uses the provider default base when the model prefix is known.

Common Mistakes

Setting voice.tts_model_name to a name that does not exist in model_list.
Adding a TTS model but forgetting to put its API key in .security.yml.
Assuming PicoClaw will automatically use provider-specific custom voices.
Using a provider endpoint that is not compatible with the OpenAI /audio/speech request format.

Minimal Checklist

Before testing send_tts, make sure:

voice.tts_model_name matches a model_list[].model_name.
The matching .security.yml entry contains a valid API key.
The chosen provider supports an OpenAI-compatible speech synthesis endpoint.
Your selected model is actually a TTS-capable model.

Documentation ¶

Index ¶

func SynthesizeAndStore(ctx context.Context, provider TTSProvider, store media.MediaStore, text string, ...) (string, error)
type MimoTTSProvider
- func NewMimoTTSProvider(apiKey string, apiBase string, model string, proxyURL string) *MimoTTSProvider
- func (t *MimoTTSProvider) Name() string
- func (t *MimoTTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)
type OpenAITTSProvider
- func NewOpenAITTSProvider(apiKey string, apiBase string, proxyURL string, model string) *OpenAITTSProvider
- func (t *OpenAITTSProvider) Name() string
- func (t *OpenAITTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)
type TTSProvider
- func DetectTTS(cfg *config.Config) TTSProvider

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func SynthesizeAndStore ¶

func SynthesizeAndStore(
	ctx context.Context,
	provider TTSProvider,
	store media.MediaStore,
	text string,
	filename string,
	channel string,
	chatID string,
) (string, error)

SynthesizeAndStore synthesizes text to speech and registers it in the media store, returning the media reference.

Types ¶

type MimoTTSProvider ¶

type MimoTTSProvider struct {
	// contains filtered or unexported fields
}

func NewMimoTTSProvider ¶

func NewMimoTTSProvider(apiKey string, apiBase string, model string, proxyURL string) *MimoTTSProvider

func (*MimoTTSProvider) Name ¶

func (t *MimoTTSProvider) Name() string

func (*MimoTTSProvider) Synthesize ¶

func (t *MimoTTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)

type OpenAITTSProvider ¶

type OpenAITTSProvider struct {
	// contains filtered or unexported fields
}

func NewOpenAITTSProvider ¶

func NewOpenAITTSProvider(apiKey string, apiBase string, proxyURL string, model string) *OpenAITTSProvider

func (*OpenAITTSProvider) Name ¶

func (t *OpenAITTSProvider) Name() string

func (*OpenAITTSProvider) Synthesize ¶

func (t *OpenAITTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)

type TTSProvider ¶

type TTSProvider interface {
	Name() string
	Synthesize(ctx context.Context, text string) (io.ReadCloser, error)
}

func DetectTTS ¶

func DetectTTS(cfg *config.Config) TTSProvider

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL