Documentation
¶
Overview ¶
Package pipeline provides components for connecting voice processing stages.
The pipeline package connects STT, LLM, and TTS providers to transport connections, handling audio streaming, buffering, and error handling.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( // ErrPipelineActive is returned when trying to start a pipeline that is already active. ErrPipelineActive = errors.New("pipeline is already active") )
Functions ¶
This section is empty.
Types ¶
type STTPipeline ¶
type STTPipeline struct {
// contains filtered or unexported fields
}
STTPipeline connects a transport connection's audio output to an STT provider. It handles streaming audio from the caller to the STT service and forwards transcripts.
func NewSTTPipeline ¶
func NewSTTPipeline(provider stt.StreamingProvider, config STTPipelineConfig) *STTPipeline
NewSTTPipeline creates a new STT pipeline.
func (*STTPipeline) IsActive ¶
func (p *STTPipeline) IsActive() bool
IsActive returns whether the pipeline is currently transcribing.
func (*STTPipeline) StartFromConnection ¶
func (p *STTPipeline) StartFromConnection(ctx context.Context, conn transport.Connection) error
StartFromConnection starts transcribing audio from the connection. This is a non-blocking call that runs in goroutines. Call Stop() to end transcription.
type STTPipelineConfig ¶
type STTPipelineConfig struct {
// Model is the STT model to use (provider-specific).
Model string
// Language is the BCP-47 language code (e.g., "en-US").
Language string
// Encoding is the audio encoding (e.g., "mulaw", "pcm").
// Use "mulaw" for Twilio Media Streams.
Encoding string
// SampleRate is the audio sample rate.
// Use 8000 for telephony (mu-law).
SampleRate int
// Channels is the number of audio channels (1 = mono).
Channels int
// OnTranscript is called when a transcript is received.
// The bool indicates if it's a final (non-interim) result.
OnTranscript func(transcript string, isFinal bool)
// OnError is called when an error occurs during streaming.
OnError func(error)
// OnSpeechStart is called when speech is detected.
OnSpeechStart func()
// OnSpeechEnd is called when speech ends (utterance complete).
OnSpeechEnd func()
}
STTPipelineConfig configures the STT pipeline.
func DefaultSTTConfig ¶
func DefaultSTTConfig() STTPipelineConfig
DefaultSTTConfig returns sensible defaults for telephony.
type StreamingTTSPipeline ¶
type StreamingTTSPipeline struct {
*TTSPipeline
// contains filtered or unexported fields
}
StreamingTTSPipeline extends TTSPipeline with input streaming support. It connects an io.Reader (e.g., LLM streaming output) to TTS to transport.
func NewStreamingTTSPipeline ¶
func NewStreamingTTSPipeline(provider tts.StreamingProvider, config TTSPipelineConfig) *StreamingTTSPipeline
NewStreamingTTSPipeline creates a pipeline that accepts streaming text input.
func (*StreamingTTSPipeline) StreamToConnection ¶
func (p *StreamingTTSPipeline) StreamToConnection(ctx context.Context, textReader io.Reader, conn transport.Connection) error
StreamToConnection reads text from a reader and streams synthesized audio to the connection. This enables low-latency streaming from LLM output to voice output.
type TTSPipeline ¶
type TTSPipeline struct {
// contains filtered or unexported fields
}
TTSPipeline connects a TTS provider's streaming output to a transport connection. It handles buffering, error handling, and graceful shutdown.
func NewTTSPipeline ¶
func NewTTSPipeline(provider tts.Provider, config TTSPipelineConfig) *TTSPipeline
NewTTSPipeline creates a new TTS pipeline.
func (*TTSPipeline) IsActive ¶
func (p *TTSPipeline) IsActive() bool
IsActive returns whether the pipeline is currently synthesizing.
func (*TTSPipeline) SynthesizeToConnection ¶
func (p *TTSPipeline) SynthesizeToConnection(ctx context.Context, text string, conn transport.Connection) error
SynthesizeToConnection synthesizes text and streams audio to the connection. This is a non-blocking call that runs in a goroutine.
type TTSPipelineConfig ¶
type TTSPipelineConfig struct {
// VoiceID is the voice to use for synthesis.
VoiceID string
// OutputFormat is the audio format (e.g., "ulaw", "pcm").
// Use "ulaw" for Twilio Media Streams.
OutputFormat string
// SampleRate is the audio sample rate.
// Use 8000 for telephony (mu-law).
SampleRate int
// Model is the TTS model to use (provider-specific).
Model string
// OnError is called when an error occurs during streaming.
OnError func(error)
// OnComplete is called when synthesis completes.
OnComplete func()
}
TTSPipelineConfig configures the TTS pipeline.
func DefaultTTSConfig ¶
func DefaultTTSConfig() TTSPipelineConfig
DefaultTTSConfig returns sensible defaults for telephony.