tts

package

v0.40.7 Latest Latest Go to latest Published: May 28, 2026 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kombifyio/SpeechKit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tts implements the SpeechKit text-to-speech surface: a small provider interface plus concrete adapters for OpenAI, Google, and Hugging Face. The router in this file picks a provider per request based on Strategy (CloudFirst, LocalFirst) and degrades gracefully when a provider is unavailable.

All providers must go through github.com/kombifyio/SpeechKit/internal/netsec for outbound HTTP. STT has a parallel routing layer in package router; this package is TTS-only.

Index ¶

func PreferredProviderForProfileID(profileID string) string
type Google
- func NewGoogle(opts GoogleOpts) *Google
- func (g *Google) CloseIdleConnections()
- func (g *Google) Health(ctx context.Context) error
- func (g *Google) Kind() models.ProviderKind
- func (g *Google) Name() string
- func (g *Google) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type GoogleOpts
type HuggingFace
- func NewHuggingFace(opts HuggingFaceOpts) *HuggingFace
- func (h *HuggingFace) CloseIdleConnections()
- func (h *HuggingFace) Health(ctx context.Context) error
- func (h *HuggingFace) Kind() models.ProviderKind
- func (h *HuggingFace) Name() string
- func (h *HuggingFace) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type HuggingFaceOpts
type OpenAI
- func NewOpenAI(opts OpenAIOpts) *OpenAI
- func (o *OpenAI) CloseIdleConnections()
- func (o *OpenAI) Health(ctx context.Context) error
- func (o *OpenAI) Kind() models.ProviderKind
- func (o *OpenAI) Name() string
- func (o *OpenAI) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type OpenAIOpts
type Piper
- func NewPiper(opts PiperOpts) (*Piper, error)
- func (p *Piper) Health(ctx context.Context) error
- func (*Piper) Kind() models.ProviderKind
- func (*Piper) Name() string
- func (p *Piper) Probe(ctx context.Context, voice, locale string) (*Result, error)
- func (p *Piper) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type PiperOpts
type PiperVoiceInfo
- func ListPiperVoices(voiceDir string) ([]PiperVoiceInfo, error)
type Provider
- func OrderByPreferredProvider(providers []Provider, preferred string) []Provider
type Result
type Router
- func NewRouter(strategy Strategy, providers ...Provider) *Router
- func (r *Router) CloseIdleConnections()
- func (r *Router) HealthCheck(ctx context.Context) map[string]error
- func (r *Router) SetProviders(providers ...Provider)
- func (r *Router) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)
type Strategy
type SynthesizeOpts

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func PreferredProviderForProfileID ¶ added in v0.37.8

func PreferredProviderForProfileID(profileID string) string

PreferredProviderForProfileID maps a Voice-Output profile ID (e.g. "tts.google.studio-o-de") to the Provider.Name() value the TTS router uses internally. Returns the empty string when the profile is unknown or unset — callers should treat that as "no preference, use default strategy ordering".

Mapping (kept in sync with pkg/speechkit/catalog.go TTS entries):

tts.openai.*                     → "openai"
tts.google.*                     → "google"
tts.huggingface.*                → "huggingface"
tts.openedai.*                   → "kokoro"           (OpenAI-compatible self-hosted endpoint)
tts.local.kokoro-*               → "kokoro_local"     (v0.37.3 ONNX in-process, Phase-3 runtime)
tts.local.supertonic-*           → "supertonic_local" (v0.37.3 ONNX in-process, multilingual)
tts.local.chatterbox-*           → "chatterbox_local" (v0.37.3 ONNX in-process, voice-clone)
tts.local.piper / tts.local.piper-* → "piper"          (HA-compatible Piper subprocess)

Types ¶

type Google ¶

type Google struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

Google implements Provider using the Google Cloud Text-to-Speech API.

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewGoogle ¶

func NewGoogle(opts GoogleOpts) *Google

NewGoogle creates a Google Cloud TTS provider.

func (*Google) CloseIdleConnections ¶ added in v0.40.1

func (g *Google) CloseIdleConnections()

func (*Google) Health ¶

func (g *Google) Health(ctx context.Context) error

func (*Google) Kind ¶ added in v0.40.1

func (g *Google) Kind() models.ProviderKind

func (*Google) Name ¶

func (g *Google) Name() string

func (*Google) Synthesize ¶

func (g *Google) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type GoogleOpts ¶

type GoogleOpts struct {
	APIKey string
	Voice  string // e.g. "de-DE-Neural2-B", "en-US-Neural2-J"
}

GoogleOpts configures the Google TTS provider.

type HuggingFace ¶

type HuggingFace struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

HuggingFace implements Provider using the HuggingFace Inference API with text-to-speech models (e.g. parler-tts).

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewHuggingFace ¶

func NewHuggingFace(opts HuggingFaceOpts) *HuggingFace

NewHuggingFace creates a HuggingFace TTS provider.

func (*HuggingFace) CloseIdleConnections ¶ added in v0.40.1

func (h *HuggingFace) CloseIdleConnections()

func (*HuggingFace) Health ¶

func (h *HuggingFace) Health(ctx context.Context) error

func (*HuggingFace) Kind ¶ added in v0.40.1

func (h *HuggingFace) Kind() models.ProviderKind

func (*HuggingFace) Name ¶

func (h *HuggingFace) Name() string

func (*HuggingFace) Synthesize ¶

func (h *HuggingFace) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type HuggingFaceOpts ¶

type HuggingFaceOpts struct {
	Token string // HF API token
	Model string // Model ID, e.g. "parler-tts/parler-tts-mini-multilingual-v1.1"
}

HuggingFaceOpts configures the HuggingFace TTS provider.

type OpenAI ¶

type OpenAI struct {
	BaseURL    string
	Validation netsec.ValidationOptions
	// contains filtered or unexported fields
}

OpenAI implements Provider using the OpenAI TTS API.

BaseURL is configurable for testing. It is validated against Validation on every request. Default Validation is strict (public https only).

func NewOpenAI ¶

func NewOpenAI(opts OpenAIOpts) *OpenAI

NewOpenAI creates an OpenAI TTS provider.

func (*OpenAI) CloseIdleConnections ¶ added in v0.40.1

func (o *OpenAI) CloseIdleConnections()

func (*OpenAI) Health ¶

func (o *OpenAI) Health(ctx context.Context) error

func (*OpenAI) Kind ¶ added in v0.40.1

func (o *OpenAI) Kind() models.ProviderKind

func (*OpenAI) Name ¶

func (o *OpenAI) Name() string

func (*OpenAI) Synthesize ¶

func (o *OpenAI) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

type OpenAIOpts ¶

type OpenAIOpts struct {
	APIKey string
	Model  string // "tts-1" or "tts-1-hd"
	Voice  string // alloy, echo, fable, onyx, nova, shimmer
}

OpenAIOpts configures the OpenAI TTS provider.

type Piper ¶ added in v0.37.8

type Piper struct {
	// contains filtered or unexported fields
}

Piper implements Provider via the `piper` command-line binary. The binary writes a WAV PCM stream to stdout when invoked with

piper --model <voice.onnx> --output-raw < input.txt

or `--output_file -` for a WAV-wrapped stream. Phase 3 of the voice-companion roadmap uses Piper as the all-local TTS so the Voice-Companion can talk without any cloud key.

The implementation does NOT bundle voice models. Operators run `scripts/prepare-piper-voices.ps1` once to download the desired voices into PiperOpts.VoiceDir. Default voice maps to <VoiceDir>/ en_US-amy-medium.onnx; locale "de" looks up de_DE-thorsten-medium.

func NewPiper ¶ added in v0.37.8

func NewPiper(opts PiperOpts) (*Piper, error)

NewPiper validates the options and returns a ready Provider. The voice directory must already exist; missing voice models surface as a clear "voice file not found" error at Synthesize time, not at construction, so a missing en_DE voice does not prevent the provider from answering en_US requests.

func (*Piper) Health ¶ added in v0.37.8

func (p *Piper) Health(ctx context.Context) error

Health verifies the piper binary can be located. Voice-model presence is not checked here — operators may have a partial set installed and we still want /readyz to be green for the locales they DO have. Synthesize surfaces missing-voice errors per request.

func (*Piper) Kind ¶ added in v0.40.1

func (*Piper) Kind() models.ProviderKind

func (*Piper) Name ¶ added in v0.37.8

func (*Piper) Name() string

Name returns the provider identifier.

func (*Piper) Probe ¶ added in v0.37.8

func (p *Piper) Probe(ctx context.Context, voice, locale string) (*Result, error)

Probe runs piper with a short test phrase to verify the binary and the requested voice work end-to-end. Returns the synthesized WAV bytes (small — a few seconds of speech) on success. Used by the Settings UI to preview a voice without persisting any text.

func (*Piper) Synthesize ¶ added in v0.37.8

func (p *Piper) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

Synthesize runs the piper subprocess for one utterance. The returned Result.Audio is a complete RIFF/WAVE PCM blob suitable for direct playback or HTTP delivery.

type PiperOpts ¶ added in v0.37.8

type PiperOpts struct {
	// Binary is the absolute or PATH-relative `piper` executable.
	// Empty defaults to "piper" (must be on $PATH).
	Binary string

	// VoiceDir is the filesystem root containing the .onnx voice
	// model files. Required.
	VoiceDir string

	// DefaultVoices maps a locale code (e.g. "en", "de") to a
	// voice-model filename inside VoiceDir. Falls back to the en_US
	// Amy medium voice when an entry is missing.
	DefaultVoices map[string]string

	// Timeout caps the subprocess execution. Zero defaults to 30 s
	// — generous because Piper warm-load on CPU can take several
	// seconds for a first synthesis.
	Timeout time.Duration
}

PiperOpts configures the local Piper subprocess.

type PiperVoiceInfo ¶ added in v0.37.8

type PiperVoiceInfo struct {
	Filename string `json:"filename"`
	Locale   string `json:"locale"`  // short code, e.g. "en", "de"
	Region   string `json:"region"`  // e.g. "US", "DE"; empty when missing
	Name     string `json:"name"`    // e.g. "amy", "thorsten"
	Quality  string `json:"quality"` // e.g. "low", "medium", "high"
	SizeKB   int    `json:"size_kb"`
}

PiperVoiceInfo is one entry from a voice-directory scan. Filename is the basename inside VoiceDir (e.g. "en_US-amy-medium.onnx"). The remaining fields are best-effort parses of the rhasspy/piper-voices naming convention <lang>_<REGION>-<name>-<quality>.onnx; when the filename does not follow that pattern only Filename is populated and the other fields remain empty.

func ListPiperVoices ¶ added in v0.37.8

func ListPiperVoices(voiceDir string) ([]PiperVoiceInfo, error)

ListPiperVoices scans voiceDir for *.onnx files and parses each filename using the piper-voices naming convention. The list is sorted by Filename so callers can present a stable UI. Returns an empty slice (not an error) when voiceDir is empty or missing — the UI surfaces "no voices installed" rather than a hard error.

type Provider ¶

type Provider interface {
	// Synthesize converts text to audio bytes.
	Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

	// Name returns the provider identifier (e.g. "openai", "google", "kokoro").
	Name() string

	// Kind returns the provider deployment class used by routing strategies.
	Kind() models.ProviderKind

	// Health checks if the provider is reachable and ready.
	Health(ctx context.Context) error
}

Provider defines the interface for all text-to-speech backends.

func OrderByPreferredProvider ¶ added in v0.37.8

func OrderByPreferredProvider(providers []Provider, preferred string) []Provider

OrderByPreferredProvider returns providers reordered so that the one whose Provider.Name() matches `preferred` comes first; the remaining providers retain their original relative order. Used by the bootstrap layer to honour the [model_selection.tts] PrimaryProfileID without rewriting the strategy/fallback logic. Empty preferred or no-match returns the input slice unchanged.

type Result ¶

type Result struct {
	Audio      []byte
	Format     string        // Actual format of the audio data
	SampleRate int           // Sample rate in Hz (e.g. 24000)
	Duration   time.Duration // Estimated duration of the audio
	Provider   string
	Voice      string
}

Result holds the output of a TTS synthesis.

type Router ¶

type Router struct {
	// contains filtered or unexported fields
}

Router selects and falls back between TTS providers.

func NewRouter ¶

func NewRouter(strategy Strategy, providers ...Provider) *Router

NewRouter creates a TTS router with the given strategy and providers. Providers are tried in order according to the strategy.

func (*Router) CloseIdleConnections ¶ added in v0.40.1

func (r *Router) CloseIdleConnections()

CloseIdleConnections asks HTTP-backed providers to drop idle connection pools. It is safe to call when a router is no longer referenced by any active mode runtime.

func (*Router) HealthCheck ¶

func (r *Router) HealthCheck(ctx context.Context) map[string]error

HealthCheck returns health status for all providers.

func (*Router) SetProviders ¶

func (r *Router) SetProviders(providers ...Provider)

SetProviders replaces the provider list (thread-safe for runtime reconfiguration).

func (*Router) Synthesize ¶

func (r *Router) Synthesize(ctx context.Context, text string, opts SynthesizeOpts) (*Result, error)

Synthesize tries each provider in order until one succeeds.

type Strategy ¶

type Strategy string

Strategy determines how the TTS router selects a provider.

const (
	StrategyCloudFirst Strategy = "cloud-first" // Default: try cloud providers first
	StrategyLocalFirst Strategy = "local-first" // Try local (Kokoro) first, cloud fallback
	StrategyCloudOnly  Strategy = "cloud-only"
	StrategyLocalOnly  Strategy = "local-only"
)

type SynthesizeOpts ¶

type SynthesizeOpts struct {
	Locale string  // "de-DE", "en-US", "auto"
	Voice  string  // Provider-specific voice ID; empty = default
	Speed  float64 // 0.25 - 4.0, default 1.0
	Format string  // "wav", "mp3", "opus", "pcm"; default "mp3"
}

SynthesizeOpts configures a single TTS request.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL