Documentation
¶
Overview ¶
Package tts implements the SpeechKit text-to-speech surface: a small provider interface plus concrete adapters for OpenAI, Google, and Hugging Face. The router in this file picks a provider per request based on Strategy (CloudFirst, LocalFirst) and degrades gracefully when a provider is unavailable.
All providers must go through github.com/kombifyio/SpeechKit/internal/netsec for outbound HTTP. STT has a parallel routing layer in package router; this package is TTS-only.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func PreferredProviderForProfileID ¶ added in v0.37.8
PreferredProviderForProfileID maps a Voice-Output profile ID (e.g. "tts.google.studio-o-de") to the Provider.Name() value the TTS router uses internally. Returns the empty string when the profile is unknown or unset — callers should treat that as "no preference, use default strategy ordering".
Mapping (kept in sync with pkg/speechkit/catalog.go TTS entries):
tts.openai.* → "openai" tts.google.* → "google" tts.huggingface.* → "huggingface" tts.openedai.* → "kokoro" (OpenAI-compatible self-hosted endpoint) tts.local.kokoro-* → "kokoro_local" (v0.37.3 ONNX in-process, Phase-3 runtime) tts.local.supertonic-* → "supertonic_local" (v0.37.3 ONNX in-process, multilingual) tts.local.chatterbox-* → "chatterbox_local" (v0.37.3 ONNX in-process, voice-clone) tts.local.piper / tts.local.piper-* → "piper" (HA-compatible Piper subprocess)
Types ¶
type Google ¶
type Google struct {
BaseURL string
Validation netsec.ValidationOptions
// contains filtered or unexported fields
}
Google implements Provider using the Google Cloud Text-to-Speech API.
BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).
func NewGoogle ¶
func NewGoogle(opts GoogleOpts) *Google
NewGoogle creates a Google Cloud TTS provider.
func (*Google) CloseIdleConnections ¶ added in v0.40.1
func (g *Google) CloseIdleConnections()
func (*Google) Kind ¶ added in v0.40.1
func (g *Google) Kind() models.ProviderKind
func (*Google) Synthesize ¶
type GoogleOpts ¶
GoogleOpts configures the Google TTS provider.
type HuggingFace ¶
type HuggingFace struct {
BaseURL string
Validation netsec.ValidationOptions
// contains filtered or unexported fields
}
HuggingFace implements Provider using the HuggingFace Inference API with text-to-speech models (e.g. parler-tts).
BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).
func NewHuggingFace ¶
func NewHuggingFace(opts HuggingFaceOpts) *HuggingFace
NewHuggingFace creates a HuggingFace TTS provider.
func (*HuggingFace) CloseIdleConnections ¶ added in v0.40.1
func (h *HuggingFace) CloseIdleConnections()
func (*HuggingFace) Kind ¶ added in v0.40.1
func (h *HuggingFace) Kind() models.ProviderKind
func (*HuggingFace) Name ¶
func (h *HuggingFace) Name() string
func (*HuggingFace) Synthesize ¶
func (h *HuggingFace) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type HuggingFaceOpts ¶
type HuggingFaceOpts struct {
Token string // HF API token
Model string // Model ID, e.g. "parler-tts/parler-tts-mini-multilingual-v1.1"
}
HuggingFaceOpts configures the HuggingFace TTS provider.
type OpenAI ¶
type OpenAI struct {
BaseURL string
Validation netsec.ValidationOptions
// contains filtered or unexported fields
}
OpenAI implements Provider using the OpenAI TTS API.
BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).
func (*OpenAI) CloseIdleConnections ¶ added in v0.40.1
func (o *OpenAI) CloseIdleConnections()
func (*OpenAI) Kind ¶ added in v0.40.1
func (o *OpenAI) Kind() models.ProviderKind
func (*OpenAI) Synthesize ¶
type OpenAIOpts ¶
type OpenAIOpts struct {
APIKey string
Model string // "tts-1" or "tts-1-hd"
Voice string // alloy, echo, fable, onyx, nova, shimmer
}
OpenAIOpts configures the OpenAI TTS provider.
type Piper ¶ added in v0.37.8
type Piper struct {
// contains filtered or unexported fields
}
Piper implements Provider via the `piper` command-line binary. The binary writes a WAV PCM stream to stdout when invoked with
piper --model <voice.onnx> --output-raw < input.txt
or `--output_file -` for a WAV-wrapped stream. Phase 3 of the voice-companion roadmap uses Piper as the all-local TTS so the Voice-Companion can talk without any cloud key.
The implementation does NOT bundle voice models. Operators run `scripts/prepare-piper-voices.ps1` once to download the desired voices into PiperOpts.VoiceDir. Default voice maps to <VoiceDir>/ en_US-amy-medium.onnx; locale "de" looks up de_DE-thorsten-medium.
func NewPiper ¶ added in v0.37.8
NewPiper validates the options and returns a ready Provider. The voice directory must already exist; missing voice models surface as a clear "voice file not found" error at Synthesize time, not at construction, so a missing en_DE voice does not prevent the provider from answering en_US requests.
func (*Piper) Health ¶ added in v0.37.8
Health verifies the piper binary can be located. Voice-model presence is not checked here — operators may have a partial set installed and we still want /readyz to be green for the locales they DO have. Synthesize surfaces missing-voice errors per request.
func (*Piper) Kind ¶ added in v0.40.1
func (*Piper) Kind() models.ProviderKind
func (*Piper) Probe ¶ added in v0.37.8
Probe runs piper with a short test phrase to verify the binary and the requested voice work end-to-end. Returns the synthesized WAV bytes (small — a few seconds of speech) on success. Used by the Settings UI to preview a voice without persisting any text.
func (*Piper) Synthesize ¶ added in v0.37.8
Synthesize runs the piper subprocess for one utterance. The returned Result.Audio is a complete RIFF/WAVE PCM blob suitable for direct playback or HTTP delivery.
type PiperOpts ¶ added in v0.37.8
type PiperOpts struct {
// Binary is the absolute or PATH-relative `piper` executable.
// Empty defaults to "piper" (must be on $PATH).
Binary string
// VoiceDir is the filesystem root containing the .onnx voice
// model files. Required.
VoiceDir string
// DefaultVoices maps a locale code (e.g. "en", "de") to a
// voice-model filename inside VoiceDir. Falls back to the en_US
// Amy medium voice when an entry is missing.
DefaultVoices map[string]string
// Timeout caps the subprocess execution. Zero defaults to 30 s
// — generous because Piper warm-load on CPU can take several
// seconds for a first synthesis.
Timeout time.Duration
}
PiperOpts configures the local Piper subprocess.
type PiperVoiceInfo ¶ added in v0.37.8
type PiperVoiceInfo struct {
Filename string `json:"filename"`
Locale string `json:"locale"` // short code, e.g. "en", "de"
Region string `json:"region"` // e.g. "US", "DE"; empty when missing
Name string `json:"name"` // e.g. "amy", "thorsten"
Quality string `json:"quality"` // e.g. "low", "medium", "high"
SizeKB int `json:"size_kb"`
}
PiperVoiceInfo is one entry from a voice-directory scan. Filename is the basename inside VoiceDir (e.g. "en_US-amy-medium.onnx"). The remaining fields are best-effort parses of the rhasspy/piper-voices naming convention <lang>_<REGION>-<name>-<quality>.onnx; when the filename does not follow that pattern only Filename is populated and the other fields remain empty.
func ListPiperVoices ¶ added in v0.37.8
func ListPiperVoices(voiceDir string) ([]PiperVoiceInfo, error)
ListPiperVoices scans voiceDir for *.onnx files and parses each filename using the piper-voices naming convention. The list is sorted by Filename so callers can present a stable UI. Returns an empty slice (not an error) when voiceDir is empty or missing — the UI surfaces "no voices installed" rather than a hard error.
type Provider ¶
type Provider interface {
// Synthesize converts text to audio bytes.
Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
// Name returns the provider identifier (e.g. "openai", "google", "kokoro").
Name() string
// Kind returns the provider deployment class used by routing strategies.
Kind() models.ProviderKind
// Health checks if the provider is reachable and ready.
Health(ctx context.Context) error
}
Provider defines the interface for all text-to-speech backends.
func OrderByPreferredProvider ¶ added in v0.37.8
OrderByPreferredProvider returns providers reordered so that the one whose Provider.Name() matches `preferred` comes first; the remaining providers retain their original relative order. Used by the bootstrap layer to honour the [model_selection.tts] PrimaryProfileID without rewriting the strategy/fallback logic. Empty preferred or no-match returns the input slice unchanged.
type Result ¶
type Result struct {
Audio []byte
Format string // Actual format of the audio data
SampleRate int // Sample rate in Hz (e.g. 24000)
Duration time.Duration // Estimated duration of the audio
Provider string
Voice string
}
Result holds the output of a TTS synthesis.
type Router ¶
type Router struct {
// contains filtered or unexported fields
}
Router selects and falls back between TTS providers.
func NewRouter ¶
NewRouter creates a TTS router with the given strategy and providers. Providers are tried in order according to the strategy.
func (*Router) CloseIdleConnections ¶ added in v0.40.1
func (r *Router) CloseIdleConnections()
CloseIdleConnections asks HTTP-backed providers to drop idle connection pools. It is safe to call when a router is no longer referenced by any active mode runtime.
func (*Router) HealthCheck ¶
HealthCheck returns health status for all providers.
func (*Router) SetProviders ¶
SetProviders replaces the provider list (thread-safe for runtime reconfiguration).
func (*Router) Synthesize ¶
Synthesize tries each provider in order until one succeeds.
type SynthesizeOpts ¶
type SynthesizeOpts struct {
Locale string // "de-DE", "en-US", "auto"
Voice string // Provider-specific voice ID; empty = default
Speed float64 // 0.25 - 4.0, default 1.0
Format string // "wav", "mp3", "opus", "pcm"; default "mp3"
}
SynthesizeOpts configures a single TTS request.