Documentation
¶
Overview ¶
Package tts provides a unified interface for Text-to-Speech providers.
Index ¶
- Variables
- type Client
- func (c *Client) Provider(name string) (Provider, bool)
- func (c *Client) SetFallbacks(names ...string)
- func (c *Client) SetPrimary(name string)
- func (c *Client) Synthesize(ctx context.Context, text string, config SynthesisConfig) (*SynthesisResult, error)
- func (c *Client) SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan StreamChunk, error)
- type Provider
- type StreamChunk
- type StreamingProvider
- type SynthesisConfig
- type SynthesisResult
- type Voice
Constants ¶
This section is empty.
Variables ¶
var ( // ErrNoAvailableProvider is returned when no provider is available. ErrNoAvailableProvider = errors.New("tts: no available provider") // ErrVoiceNotFound is returned when a voice ID is not found. ErrVoiceNotFound = errors.New("tts: voice not found") // ErrInvalidConfig is returned when the synthesis config is invalid. ErrInvalidConfig = errors.New("tts: invalid configuration") // ErrRateLimited is returned when the provider rate limits the request. ErrRateLimited = errors.New("tts: rate limited") // ErrQuotaExceeded is returned when the provider quota is exceeded. ErrQuotaExceeded = errors.New("tts: quota exceeded") // ErrStreamClosed is returned when attempting to use a closed stream. ErrStreamClosed = errors.New("tts: stream closed") )
Functions ¶
This section is empty.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client provides a unified interface across multiple TTS providers.
func (*Client) SetFallbacks ¶
SetFallbacks sets the fallback provider order.
func (*Client) SetPrimary ¶
SetPrimary sets the primary provider by name.
func (*Client) Synthesize ¶
func (c *Client) Synthesize(ctx context.Context, text string, config SynthesisConfig) (*SynthesisResult, error)
Synthesize uses the primary provider with automatic fallback.
func (*Client) SynthesizeStream ¶
func (c *Client) SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan StreamChunk, error)
SynthesizeStream uses the primary provider with automatic fallback.
type Provider ¶
type Provider interface {
// Name returns the provider name.
Name() string
// Synthesize converts text to speech and returns audio data.
Synthesize(ctx context.Context, text string, config SynthesisConfig) (*SynthesisResult, error)
// SynthesizeStream converts text to speech with streaming output.
SynthesizeStream(ctx context.Context, text string, config SynthesisConfig) (<-chan StreamChunk, error)
// ListVoices returns available voices from this provider.
ListVoices(ctx context.Context) ([]Voice, error)
// GetVoice returns a specific voice by ID.
GetVoice(ctx context.Context, voiceID string) (*Voice, error)
}
Provider defines the interface for TTS providers.
type StreamChunk ¶
type StreamChunk struct {
// Audio is a chunk of audio data.
Audio []byte
// IsFinal indicates if this is the last chunk.
IsFinal bool
// Error contains any error that occurred during streaming.
Error error
}
StreamChunk represents a chunk of streaming audio.
type StreamingProvider ¶
type StreamingProvider interface {
Provider
// SynthesizeFromReader reads text from a reader and streams audio output.
// Useful for streaming LLM output directly to TTS.
SynthesizeFromReader(ctx context.Context, reader io.Reader, config SynthesisConfig) (<-chan StreamChunk, error)
}
StreamingProvider extends Provider with input streaming support.
type SynthesisConfig ¶
type SynthesisConfig struct {
// VoiceID is the voice to use for synthesis.
VoiceID string
// Model is the provider-specific model identifier (optional).
Model string
// OutputFormat specifies the audio format ("mp3", "pcm", "wav", "opus").
OutputFormat string
// SampleRate is the audio sample rate in Hz (e.g., 22050, 44100).
SampleRate int
// Speed is the speech speed multiplier (1.0 = normal).
Speed float64
// Pitch adjusts the voice pitch (-1.0 to 1.0, 0 = normal).
Pitch float64
// Stability controls voice consistency (0.0 to 1.0, provider-specific).
Stability float64
// SimilarityBoost enhances voice similarity (0.0 to 1.0, provider-specific).
SimilarityBoost float64
}
SynthesisConfig configures a TTS synthesis request.
type SynthesisResult ¶
type SynthesisResult struct {
// Audio is the synthesized audio data.
Audio []byte
// Format is the audio format of the result.
Format string
// SampleRate is the sample rate of the audio.
SampleRate int
// DurationMs is the duration of the audio in milliseconds.
DurationMs int
// CharacterCount is the number of characters processed.
CharacterCount int
}
SynthesisResult contains the result of a TTS synthesis.
type Voice ¶
type Voice struct {
// ID is the provider-specific voice identifier.
ID string
// Name is a human-readable name for the voice.
Name string
// Language is the BCP-47 language code (e.g., "en-US").
Language string
// Gender is the voice gender ("male", "female", "neutral").
Gender string
// Provider is the name of the TTS provider.
Provider string
// Metadata contains provider-specific additional information.
Metadata map[string]any
}
Voice represents a voice configuration for TTS.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package providertest provides conformance tests for TTS provider implementations.
|
Package providertest provides conformance tests for TTS provider implementations. |