tts

package
v0.2.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2026 License: MIT Imports: 17 Imported by: 0

README

TTS (Text-to-Speech)

This package handles speech synthesis for PicoClaw.

If you are new to TTS setup, the simplest workflow is:

  1. Add a TTS-capable entry to model_list.
  2. Point voice.tts_model_name at that entry.
  3. Put the API key in .security.yml.

Quick Recommendation

For most users, these are the best starting points:

Provider Why start here
OpenAI Best-supported path in PicoClaw today. The current TTS implementation is built around the OpenAI-compatible /audio/speech API shape, and OpenAI is the safest default.
Xiaomi MiMo A good second option if you want an OpenAI-compatible provider endpoint and are already using MiMo models in the rest of your stack.

How TTS Configuration Works

PicoClaw does not keep TTS API keys inside voice.

Instead:

  • voice.tts_model_name selects a named entry from model_list.
  • That model_list entry provides the provider, model ID, API base, and proxy settings.
  • .security.yml stores the API key for the same named model entry.

This is the recommended and supported configuration pattern.

Option A: OpenAI

config.json

{
  "voice": {
    "tts_model_name": "openai-tts"
  },
  "model_list": [
    {
      "model_name": "openai-tts",
      "model": "openai/tts-1"
    }
  ]
}

.security.yml

model_list:
  openai-tts:
    api_keys:
      - "sk-openai-your-key"
Option B: Xiaomi MiMo

config.json

{
  "voice": {
    "tts_model_name": "mimo-tts"
  },
  "model_list": [
    {
      "model_name": "mimo-tts",
      "model": "mimo/mimo-v2-tts"
    }
  ]
}

.security.yml

model_list:
  mimo-tts:
    api_keys:
      - "your-mimo-key"

If you use a custom MiMo endpoint, you can also set api_base explicitly. Otherwise PicoClaw will use the provider default.

What PicoClaw Sends Today

The current TTS runtime uses an OpenAI-compatible speech request with these defaults:

  • Endpoint: /audio/speech
  • Response format: opus
  • Voice: alloy
  • Model: taken from the selected model_list entry

That means:

  • openai/tts-1 works naturally.
  • Other OpenAI-compatible providers can work if they accept the same request format.
  • PicoClaw currently does not expose a user-facing config field for changing the TTS voice from alloy.

How PicoClaw Chooses a TTS Provider

DetectTTS resolves TTS in this order:

  1. Preferred path: resolve voice.tts_model_name against model_list.
  2. If a matching model entry exists and has an API key, PicoClaw creates an OpenAI-compatible TTS provider using that model's settings.
  3. Fallback path: if voice.tts_model_name is not set or cannot be resolved, PicoClaw scans model_list for the first entry whose model string contains tts and has an API key.

Fallback scanning exists for compatibility. New configs should set voice.tts_model_name explicitly.

Notes About API Base Handling

PicoClaw normalizes the configured base URL for TTS:

  • For OpenAI, a base like https://api.openai.com or https://api.openai.com/v1 becomes https://api.openai.com/v1/audio/speech.
  • For other OpenAI-compatible providers, PicoClaw preserves the configured base path and ensures it ends with /audio/speech.
  • If api_base is omitted, PicoClaw uses the provider default base when the model prefix is known.

Common Mistakes

  • Setting voice.tts_model_name to a name that does not exist in model_list.
  • Adding a TTS model but forgetting to put its API key in .security.yml.
  • Assuming PicoClaw will automatically use provider-specific custom voices.
  • Using a provider endpoint that is not compatible with the OpenAI /audio/speech request format.

Minimal Checklist

Before testing send_tts, make sure:

  • voice.tts_model_name matches a model_list[].model_name.
  • The matching .security.yml entry contains a valid API key.
  • The chosen provider supports an OpenAI-compatible speech synthesis endpoint.
  • Your selected model is actually a TTS-capable model.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SynthesizeAndStore

func SynthesizeAndStore(
	ctx context.Context,
	provider TTSProvider,
	store media.MediaStore,
	text string,
	filename string,
	channel string,
	chatID string,
) (string, error)

SynthesizeAndStore synthesizes text to speech and registers it in the media store, returning the media reference.

Types

type MimoTTSProvider

type MimoTTSProvider struct {
	// contains filtered or unexported fields
}

func NewMimoTTSProvider

func NewMimoTTSProvider(apiKey string, apiBase string, model string, proxyURL string) *MimoTTSProvider

func (*MimoTTSProvider) Name

func (t *MimoTTSProvider) Name() string

func (*MimoTTSProvider) Synthesize

func (t *MimoTTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)

type OpenAITTSProvider

type OpenAITTSProvider struct {
	// contains filtered or unexported fields
}

func NewOpenAITTSProvider

func NewOpenAITTSProvider(apiKey string, apiBase string, proxyURL string, model string) *OpenAITTSProvider

func (*OpenAITTSProvider) Name

func (t *OpenAITTSProvider) Name() string

func (*OpenAITTSProvider) Synthesize

func (t *OpenAITTSProvider) Synthesize(ctx context.Context, text string) (io.ReadCloser, error)

type TTSProvider

type TTSProvider interface {
	Name() string
	Synthesize(ctx context.Context, text string) (io.ReadCloser, error)
}

func DetectTTS

func DetectTTS(cfg *config.Config) TTSProvider

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL