services

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0 Imports: 32 Imported by: 0

README

Services layer

This package provides LLM, STT, TTS, and realtime service abstractions aligned with common LLM/STT/TTS service patterns. Use the factory and config.Config to construct implementations by provider name.

API interaction (streaming)

sequenceDiagram
    participant P as Pipeline
    participant C as Provider Client
    participant A as External API

    P->>C: Call(ctx, input)
    C->>A: HTTP POST (streaming)
    loop chunks
        P-->>C: Write(chunk)
        C-->>A: stream bytes
    end
    A-->>C: response stream
    C-->>P: Result

Provider registry

graph TD
    Factory["Factory\nNew*FromConfig"] --> LLM["LLM Interface"]
    Factory --> STT["STT Interface"]
    Factory --> TTS["TTS Interface"]
    LLM --> OpenAI_LLM["openai"]
    LLM --> Groq_LLM["groq"]
    LLM --> Anthropic["anthropic"]
    LLM --> Mistral["mistral"]
    STT --> OpenAI_STT["openai"]
    STT --> Groq_STT["groq"]
    STT --> Sarvam_STT["sarvam"]
    TTS --> OpenAI_TTS["openai"]
    TTS --> Sarvam_TTS["sarvam"]
    TTS --> Groq_TTS["groq"]

Interfaces

  • LLMService — chat completion with optional streaming (Chat(ctx, messages, onToken)).
  • STTService — transcription (Transcribe(ctx, audio, sampleRate, numChannels)). Optional STTStreamingService adds TranscribeStream.
  • TTSService — text-to-speech (Speak(ctx, text, sampleRate)). Optional TTSStreamingService adds SpeakStream.
  • RealtimeService — creates RealtimeSession (SendText, SendAudio, Events, Close). Use realtime.NewFromConfig(cfg, provider) to construct (lives in pkg/realtime to avoid import cycles).

Supported providers (Go implementation)

These are the providers currently implemented in this Go port.

Provider LLM STT TTS Realtime
openai
groq
sarvam
grok
cerebras
elevenlabs
aws
mistral
deepseek
ollama
qwen
whisper
asyncai
camb
fish
gradium
hume ✓ (stub)
inworld ✓ (stub)
minimax
moondream
neuphonic
openpipe
soniox
xtts

Constants: ProviderOpenAI, ProviderGroq, ProviderSarvam, ProviderGrok, ProviderCerebras, ProviderElevenLabs, ProviderAWS, ProviderMistral, ProviderDeepSeek, ProviderOllama, ProviderQwen, ProviderWhisper, ProviderAsyncAI, ProviderCamb, ProviderFish, ProviderGradium, ProviderHume, ProviderInworld, ProviderMinimax, ProviderMoondream, ProviderNeuphonic, ProviderOpenPipe, ProviderSoniox, ProviderXTTS. Realtime: SupportedRealtimeProviders ("openai", "hume", "inworld").

Upstream providers and Go coverage

The upstream Python services expose many more providers. The table below inventories those providers by capability and indicates whether they currently have a Go implementation in this repository.

Legend:

  • — capability provided by the upstream Python services.
  • — capability not provided (or not primary) for that provider.
  • Go — whether this capability is implemented in the Go services layer.
Provider Upstream LLM Upstream STT Upstream TTS Upstream Realtime Go LLM Go STT Go TTS Go Realtime
anthropic
assemblyai
asyncai
aws
aws_nova_sonic
azure
camb
cartesia
cerebras
deepgram
deepseek
elevenlabs
fal
fireworks
fish
gemini_multimodal_live
gladia
google
gradium
grok
groq
hathora
heygen
hume ✓ (stub)
inworld ✓ (stub)
kokoro
lmnt
mem0
minimax
mistral
moondream
neuphonic
nim
nvidia
ollama
openai
openai_realtime
openai_realtime_beta
openpipe
openrouter
perplexity
piper
qwen
resembleai
rime
riva
sambanova
sarvam
simli
soniox
speechmatics
tavus
together
ultravox
whisper
xtts

Configuration

Use config.Config (JSON or env):

  • provider — default for all tasks.
  • stt_provider, llm_provider, tts_provider — override per task.
  • model — chat/LLM model (e.g. gpt-3.5-turbo, mistral-small-latest, deepseek-chat).
  • stt_model, tts_model, tts_voice — task-specific when supported.
  • api_keys — map of service name to API key; otherwise keys are read from environment.
Environment variables (fallback when not in api_keys)
Provider Env var
openai OPENAI_API_KEY
groq GROQ_API_KEY
sarvam SARVAM_API_KEY
grok (xai) XAI_API_KEY
cerebras CEREBRAS_API_KEY
elevenlabs ELEVENLABS_API_KEY
aws AWS_SECRET_ACCESS_KEY, AWS_REGION (optional, default us-east-1)
mistral MISTRAL_API_KEY
deepseek DEEPSEEK_API_KEY
ollama OLLAMA_API_KEY (optional), OLLAMA_BASE_URL (optional, default http://localhost:11434/v1)
qwen DASHSCOPE_API_KEY or QWEN_API_KEY, DASHSCOPE_BASE_URL (optional)
whisper WHISPER_API_KEY or OPENAI_API_KEY, WHISPER_BASE_URL (optional)
asyncai ASYNC_AI_API_KEY, ASYNC_AI_BASE_URL (optional)
camb CAMB_API_KEY, CAMB_BASE_URL (optional)
fish FISH_API_KEY, FISH_BASE_URL (optional)
gradium GRADIUM_API_KEY, GRADIUM_BASE_URL (optional)
hume HUME_API_KEY
inworld INWORLD_API_KEY
minimax MINIMAX_API_KEY, MINIMAX_BASE_URL (optional)
moondream MOONDREAM_API_KEY, MOONDREAM_BASE_URL (optional)
neuphonic NEUPHONIC_API_KEY, NEUPHONIC_BASE_URL (optional)
openpipe OPENPIPE_API_KEY
soniox SONIOX_API_KEY, SONIOX_WS_URL (optional), SONIOX_MODEL (optional)
xtts XTTS_BASE_URL (optional, default http://localhost:8000 for local server)

Usage

cfg, _ := config.LoadConfig("config.json")
// Or build manually:
cfg := &config.Config{
    LlmProvider: services.ProviderMistral,
    Model:       "mistral-small-latest",
}

llm := services.NewLLMFromConfig(cfg, cfg.LLMProvider(), cfg.Model)
stt := services.NewSTTFromConfig(cfg, cfg.STTProvider())
tts := services.NewTTSFromConfig(cfg, cfg.TTSProvider(), cfg.TTSModel, cfg.TTSVoice)

// Realtime (e.g. OpenAI Realtime WebSocket API):
realtimeSvc, err := realtime.NewFromConfig(cfg, "openai")

One-shot construction for all three:

llm, stt, tts := services.NewServicesFromConfig(cfg)

Tests

  • tests/pkg/services/ — factory construction tests for all supported providers; Sarvam integration test (requires SARVAM_API_KEY).
  • tests/pkg/realtime/ — realtime.NewFromConfig for openai and unsupported provider.

See also

Documentation

Overview

Package services defines interfaces and implementations for LLM, STT, and TTS. Use the factory functions (NewLLMFromConfig, NewSTTFromConfig, NewTTSFromConfig) to construct services by provider name; see Supported*Providers for capability matrix. For RealtimeService use realtime.NewFromConfig(cfg, provider) to avoid an import cycle.

Package services defines interfaces and implementations for LLM, STT, and TTS. These align conceptually with common LLM/STT/TTS service abstractions and websocket/realtime session handling. See pkg/services/factory.go for provider wiring.

Index

Constants

View Source
const (
	ProviderOpenAI       = "openai"
	ProviderGroq         = "groq"
	ProviderSarvam       = "sarvam"
	ProviderGrok         = "grok"
	ProviderCerebras     = "cerebras"
	ProviderElevenLabs   = "elevenlabs"
	ProviderAWS          = "aws"
	ProviderMistral      = "mistral"
	ProviderDeepSeek     = "deepseek"
	ProviderAnthropic    = "anthropic"
	ProviderGoogle       = "google"
	ProviderGoogleVertex = "google_vertex"
	ProviderOllama       = "ollama"
	ProviderQwen         = "qwen"
	ProviderWhisper      = "whisper"
	// Pipecat-integrated providers
	ProviderAsyncAI   = "asyncai"
	ProviderCamb      = "camb"
	ProviderFish      = "fish"
	ProviderGradium   = "gradium"
	ProviderHume      = "hume"
	ProviderInworld   = "inworld"
	ProviderMinimax   = "minimax"
	ProviderMoondream = "moondream"
	ProviderNeuphonic = "neuphonic"
	ProviderOpenPipe  = "openpipe"
	ProviderSoniox    = "soniox"
	ProviderXTTS      = "xtts"
)

Variables

SupportedLLMProviders lists provider keys that can be passed to NewLLMFromConfig.

View Source
var SupportedRealtimeProviders = []string{ProviderOpenAI, ProviderHume, ProviderInworld}

SupportedRealtimeProviders lists provider keys for realtime (use realtime.NewFromConfig to construct).

SupportedSTTProviders lists provider keys that can be passed to NewSTTFromConfig.

SupportedTTSProviders lists provider keys that can be passed to NewTTSFromConfig.

Functions

func NewServicesFromConfig

func NewServicesFromConfig(cfg *config.Config) (LLMService, STTService, TTSService)

NewServicesFromConfig returns LLM, STT, and TTS services based on cfg. Resolves provider per task (stt_provider/llm_provider/tts_provider or provider); uses task-specific model/voice when set.

Types

type LLMService

type LLMService = llmapi.LLMService

LLMService provides chat completion; may stream text frames. Re-exported from llmapi.

func NewLLMFromConfig

func NewLLMFromConfig(cfg *config.Config, provider, model string) LLMService

NewLLMFromConfig returns an LLMService for the given provider and model. Provider must be one of SupportedLLMProviders; model is the chat model (e.g. cfg.Model).

type LLMServiceWithTools

type LLMServiceWithTools = llmapi.LLMServiceWithTools

LLMServiceWithTools is an LLM service that supports registering tools. Re-exported from llmapi.

type RealtimeConfig

type RealtimeConfig struct {
	Provider string           // e.g. "openai"
	Model    string           // e.g. "gpt-4o-realtime" or regular chat model
	Voice    string           // TTS voice, if applicable
	Tools    []map[string]any // optional function calling tools
}

RealtimeConfig configures a realtime session for a given provider/model.

type RealtimeEvent

type RealtimeEvent struct {
	Text  *frames.LLMTextFrame
	Audio *frames.TTSAudioRawFrame
	Frame frames.Frame
}

RealtimeEvent represents a high-level event emitted by a realtime session. It can carry LLM text, TTS audio, or generic frames for extensibility.

type RealtimeService

type RealtimeService interface {
	NewSession(ctx context.Context, cfg RealtimeConfig) (RealtimeSession, error)
}

RealtimeService creates realtime sessions.

type RealtimeSession

type RealtimeSession interface {
	// SendText sends text input into the session (e.g. user message).
	SendText(ctx context.Context, text string) error
	// SendAudio sends raw audio input into the session (e.g. microphone audio).
	SendAudio(ctx context.Context, audio []byte, sampleRate, numChannels int) error
	// Events returns a channel of high-level events from the session.
	Events() <-chan RealtimeEvent
	// Close terminates the session and closes the events channel.
	Close(ctx context.Context) error
}

RealtimeSession is a bidirectional, long-lived conversation with an AI service.

type STTService

type STTService interface {
	Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)
}

STTService transcribes audio to text (transcription frames).

func NewSTTFromConfig

func NewSTTFromConfig(cfg *config.Config, provider string) STTService

NewSTTFromConfig returns an STTService for the given provider. Provider must be one of SupportedSTTProviders; cfg.STTModel is used when supported (e.g. Groq).

type STTStreamingService

type STTStreamingService interface {
	STTService
	// TranscribeStream sends transcription frames (interim and final) to outCh as audio is received on audioCh.
	TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, outCh chan<- frames.Frame)
}

STTStreamingService optionally supports streaming transcription (interim + final frames).

type TTSService

type TTSService interface {
	Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)
}

TTSService converts text to speech (audio frames).

func NewTTSFromConfig

func NewTTSFromConfig(cfg *config.Config, provider, model, voice string) TTSService

NewTTSFromConfig returns a TTSService for the given provider, model, and voice. Provider must be one of SupportedTTSProviders; model and voice are typically cfg.TTSModel and cfg.TTSVoice.

type TTSStreamingService

type TTSStreamingService interface {
	TTSService
	// SpeakStream streams TTS audio frames to outCh as they are produced.
	SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)
}

TTSStreamingService optionally supports streaming TTS (incremental audio to outCh).

type ToolHandler

type ToolHandler = llmapi.ToolHandler

ToolHandler is called when the LLM requests a tool call. Re-exported from llmapi.

Directories

Path Synopsis
Package camb provides Camb AI speech-to-text.
Package camb provides Camb AI speech-to-text.
Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
Package gradium provides Gradium speech-to-text (WebSocket or REST).
Package gradium provides Gradium speech-to-text (WebSocket or REST).
Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
Package hume provides Hume (Hume AI) text-to-speech.
Package hume provides Hume (Hume AI) text-to-speech.
Package inworld provides Inworld text-to-speech (and LLM).
Package inworld provides Inworld text-to-speech (and LLM).
Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
Package minimax provides Minimax text-to-speech.
Package minimax provides Minimax text-to-speech.
Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
Package sarvam provides Sarvam AI TTS and STT service implementations.
Package sarvam provides Sarvam AI TTS and STT service implementations.
Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
Package xtts provides Coqui XTTS text-to-speech via local streaming server.
Package xtts provides Coqui XTTS text-to-speech via local streaming server.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL