tts

package
v0.40.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 28, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package tts implements the SpeechKit text-to-speech surface: a small provider interface plus concrete adapters for OpenAI, Google, and Hugging Face. The router in this file picks a provider per request based on Strategy (CloudFirst, LocalFirst) and degrades gracefully when a provider is unavailable.

All providers must go through github.com/kombifyio/SpeechKit/internal/netsec for outbound HTTP. STT has a parallel routing layer in package router; this package is TTS-only.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func PreferredProviderForProfileID added in v0.37.8

func PreferredProviderForProfileID(profileID string) string

PreferredProviderForProfileID maps a Voice-Output profile ID (e.g. "tts.google.studio-o-de") to the Provider.Name() value the TTS router uses internally. Returns the empty string when the profile is unknown or unset — callers should treat that as "no preference, use default strategy ordering".

Mapping (kept in sync with pkg/speechkit/catalog.go TTS entries):

tts.openai.*                     → "openai"
tts.google.*                     → "google"
tts.huggingface.*                → "huggingface"
tts.openedai.*                   → "kokoro"           (OpenAI-compatible self-hosted endpoint)
tts.local.kokoro-*               → "kokoro_local"     (v0.37.3 ONNX in-process, Phase-3 runtime)
tts.local.supertonic-*           → "supertonic_local" (v0.37.3 ONNX in-process, multilingual)
tts.local.chatterbox-*           → "chatterbox_local" (v0.37.3 ONNX in-process, voice-clone)
tts.local.piper / tts.local.piper-* → "piper"          (HA-compatible Piper subprocess)

Types

type Google

type Google struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

Google implements Provider using the Google Cloud Text-to-Speech API.

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewGoogle

func NewGoogle(opts GoogleOpts) *Google

NewGoogle creates a Google Cloud TTS provider.

func (*Google) CloseIdleConnections added in v0.40.1

func (g *Google) CloseIdleConnections()

func (*Google) Health

func (g *Google) Health(ctx context.Context) error

func (*Google) Kind added in v0.40.1

func (g *Google) Kind() models.ProviderKind

func (*Google) Name

func (g *Google) Name() string

func (*Google) Synthesize

func (g *Google) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type GoogleOpts

type GoogleOpts struct {
	APIKey string
	Voice  string // e.g. "de-DE-Neural2-B", "en-US-Neural2-J"
}

GoogleOpts configures the Google TTS provider.

type HuggingFace

type HuggingFace struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

HuggingFace implements Provider using the HuggingFace Inference API with text-to-speech models (e.g. parler-tts).

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewHuggingFace

func NewHuggingFace(opts HuggingFaceOpts) *HuggingFace

NewHuggingFace creates a HuggingFace TTS provider.

func (*HuggingFace) CloseIdleConnections added in v0.40.1

func (h *HuggingFace) CloseIdleConnections()

func (*HuggingFace) Health

func (h *HuggingFace) Health(ctx context.Context) error

func (*HuggingFace) Kind added in v0.40.1

func (h *HuggingFace) Kind() models.ProviderKind

func (*HuggingFace) Name

func (h *HuggingFace) Name() string

func (*HuggingFace) Synthesize

func (h *HuggingFace) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type HuggingFaceOpts

type HuggingFaceOpts struct {
	Token string // HF API token
	Model string // Model ID, e.g. "parler-tts/parler-tts-mini-multilingual-v1.1"
}

HuggingFaceOpts configures the HuggingFace TTS provider.

type OpenAI

type OpenAI struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

OpenAI implements Provider using the OpenAI TTS API.

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewOpenAI

func NewOpenAI(opts OpenAIOpts) *OpenAI

NewOpenAI creates an OpenAI TTS provider.

func (*OpenAI) CloseIdleConnections added in v0.40.1

func (o *OpenAI) CloseIdleConnections()

func (*OpenAI) Health

func (o *OpenAI) Health(ctx context.Context) error

func (*OpenAI) Kind added in v0.40.1

func (o *OpenAI) Kind() models.ProviderKind

func (*OpenAI) Name

func (o *OpenAI) Name() string

func (*OpenAI) Synthesize

func (o *OpenAI) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type OpenAIOpts

type OpenAIOpts struct {
	APIKey string
	Model  string // "tts-1" or "tts-1-hd"
	Voice  string // alloy, echo, fable, onyx, nova, shimmer
}

OpenAIOpts configures the OpenAI TTS provider.

type Piper added in v0.37.8

type Piper struct {
	// contains filtered or unexported fields
}

Piper implements Provider via the `piper` command-line binary. The binary writes a WAV PCM stream to stdout when invoked with

piper --model <voice.onnx> --output-raw < input.txt

or `--output_file -` for a WAV-wrapped stream. Phase 3 of the voice-companion roadmap uses Piper as the all-local TTS so the Voice-Companion can talk without any cloud key.

The implementation does NOT bundle voice models. Operators run `scripts/prepare-piper-voices.ps1` once to download the desired voices into PiperOpts.VoiceDir. Default voice maps to <VoiceDir>/ en_US-amy-medium.onnx; locale "de" looks up de_DE-thorsten-medium.

func NewPiper added in v0.37.8

func NewPiper(opts PiperOpts) (*Piper, error)

NewPiper validates the options and returns a ready Provider. The voice directory must already exist; missing voice models surface as a clear "voice file not found" error at Synthesize time, not at construction, so a missing en_DE voice does not prevent the provider from answering en_US requests.

func (*Piper) Health added in v0.37.8

func (p *Piper) Health(ctx context.Context) error

Health verifies the piper binary can be located. Voice-model presence is not checked here — operators may have a partial set installed and we still want /readyz to be green for the locales they DO have. Synthesize surfaces missing-voice errors per request.

func (*Piper) Kind added in v0.40.1

func (*Piper) Kind() models.ProviderKind

func (*Piper) Name added in v0.37.8

func (*Piper) Name() string

Name returns the provider identifier.

func (*Piper) Probe added in v0.37.8

func (p *Piper) Probe(ctx context.Context, voice, locale string) (*Result, error)

Probe runs piper with a short test phrase to verify the binary and the requested voice work end-to-end. Returns the synthesized WAV bytes (small — a few seconds of speech) on success. Used by the Settings UI to preview a voice without persisting any text.

func (*Piper) Synthesize added in v0.37.8

func (p *Piper) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

Synthesize runs the piper subprocess for one utterance. The returned Result.Audio is a complete RIFF/WAVE PCM blob suitable for direct playback or HTTP delivery.

type PiperOpts added in v0.37.8

type PiperOpts struct {
	// Binary is the absolute or PATH-relative `piper` executable.
	// Empty defaults to "piper" (must be on $PATH).
	Binary string

	// VoiceDir is the filesystem root containing the .onnx voice
	// model files. Required.
	VoiceDir string

	// DefaultVoices maps a locale code (e.g. "en", "de") to a
	// voice-model filename inside VoiceDir. Falls back to the en_US
	// Amy medium voice when an entry is missing.
	DefaultVoices map[string]string

	// Timeout caps the subprocess execution. Zero defaults to 30 s
	// — generous because Piper warm-load on CPU can take several
	// seconds for a first synthesis.
	Timeout time.Duration
}

PiperOpts configures the local Piper subprocess.

type PiperVoiceInfo added in v0.37.8

type PiperVoiceInfo struct {
	Filename string `json:"filename"`
	Locale   string `json:"locale"`  // short code, e.g. "en", "de"
	Region   string `json:"region"`  // e.g. "US", "DE"; empty when missing
	Name     string `json:"name"`    // e.g. "amy", "thorsten"
	Quality  string `json:"quality"` // e.g. "low", "medium", "high"
	SizeKB   int    `json:"size_kb"`
}

PiperVoiceInfo is one entry from a voice-directory scan. Filename is the basename inside VoiceDir (e.g. "en_US-amy-medium.onnx"). The remaining fields are best-effort parses of the rhasspy/piper-voices naming convention <lang>_<REGION>-<name>-<quality>.onnx; when the filename does not follow that pattern only Filename is populated and the other fields remain empty.

func ListPiperVoices added in v0.37.8

func ListPiperVoices(voiceDir string) ([]PiperVoiceInfo, error)

ListPiperVoices scans voiceDir for *.onnx files and parses each filename using the piper-voices naming convention. The list is sorted by Filename so callers can present a stable UI. Returns an empty slice (not an error) when voiceDir is empty or missing — the UI surfaces "no voices installed" rather than a hard error.

type Provider

type Provider interface {
	// Synthesize converts text to audio bytes.
	Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

	// Name returns the provider identifier (e.g. "openai", "google", "kokoro").
	Name() string

	// Kind returns the provider deployment class used by routing strategies.
	Kind() models.ProviderKind

	// Health checks if the provider is reachable and ready.
	Health(ctx context.Context) error
}

Provider defines the interface for all text-to-speech backends.

func OrderByPreferredProvider added in v0.37.8

func OrderByPreferredProvider(providers []Provider, preferred string) []Provider

OrderByPreferredProvider returns providers reordered so that the one whose Provider.Name() matches `preferred` comes first; the remaining providers retain their original relative order. Used by the bootstrap layer to honour the [model_selection.tts] PrimaryProfileID without rewriting the strategy/fallback logic. Empty preferred or no-match returns the input slice unchanged.

type Result

type Result struct {
	Audio      []byte
	Format     string        // Actual format of the audio data
	SampleRate int           // Sample rate in Hz (e.g. 24000)
	Duration   time.Duration // Estimated duration of the audio
	Provider   string
	Voice      string
}

Result holds the output of a TTS synthesis.

type Router

type Router struct {
	// contains filtered or unexported fields
}

Router selects and falls back between TTS providers.

func NewRouter

func NewRouter(strategy Strategy, providers ...Provider) *Router

NewRouter creates a TTS router with the given strategy and providers. Providers are tried in order according to the strategy.

func (*Router) CloseIdleConnections added in v0.40.1

func (r *Router) CloseIdleConnections()

CloseIdleConnections asks HTTP-backed providers to drop idle connection pools. It is safe to call when a router is no longer referenced by any active mode runtime.

func (*Router) HealthCheck

func (r *Router) HealthCheck(ctx context.Context) map[string]error

HealthCheck returns health status for all providers.

func (*Router) SetProviders

func (r *Router) SetProviders(providers ...Provider)

SetProviders replaces the provider list (thread-safe for runtime reconfiguration).

func (*Router) Synthesize

func (r *Router) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

Synthesize tries each provider in order until one succeeds.

type Strategy

type Strategy string

Strategy determines how the TTS router selects a provider.

const (
	StrategyCloudFirst Strategy = "cloud-first" // Default: try cloud providers first
	StrategyLocalFirst Strategy = "local-first" // Try local (Kokoro) first, cloud fallback
	StrategyCloudOnly  Strategy = "cloud-only"
	StrategyLocalOnly  Strategy = "local-only"
)

type SynthesizeOpts

type SynthesizeOpts struct {
	Locale string  // "de-DE", "en-US", "auto"
	Voice  string  // Provider-specific voice ID; empty = default
	Speed  float64 // 0.25 - 4.0, default 1.0
	Format string  // "wav", "mp3", "opus", "pcm"; default "mp3"
}

SynthesizeOpts configures a single TTS request.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL