services

package

v0.2.0 Latest Latest Go to latest Published: May 16, 2026 License: Apache-2.0 Imports: 32 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Voxray-AI/Voxray

Links

Open Source Insights

README ¶

Services layer

This package provides LLM, STT, TTS, and realtime service abstractions aligned with common LLM/STT/TTS service patterns. Use the factory and config.Config to construct implementations by provider name.

API interaction (streaming)

sequenceDiagram
    participant P as Pipeline
    participant C as Provider Client
    participant A as External API

    P->>C: Call(ctx, input)
    C->>A: HTTP POST (streaming)
    loop chunks
        P-->>C: Write(chunk)
        C-->>A: stream bytes
    end
    A-->>C: response stream
    C-->>P: Result

Provider registry

graph TD
    Factory["Factory\nNew*FromConfig"] --> LLM["LLM Interface"]
    Factory --> STT["STT Interface"]
    Factory --> TTS["TTS Interface"]
    LLM --> OpenAI_LLM["openai"]
    LLM --> Groq_LLM["groq"]
    LLM --> Anthropic["anthropic"]
    LLM --> Mistral["mistral"]
    STT --> OpenAI_STT["openai"]
    STT --> Groq_STT["groq"]
    STT --> Sarvam_STT["sarvam"]
    TTS --> OpenAI_TTS["openai"]
    TTS --> Sarvam_TTS["sarvam"]
    TTS --> Groq_TTS["groq"]

Interfaces

LLMService — chat completion with optional streaming (Chat(ctx, messages, onToken)).
STTService — transcription (Transcribe(ctx, audio, sampleRate, numChannels)). Optional STTStreamingService adds TranscribeStream.
TTSService — text-to-speech (Speak(ctx, text, sampleRate)). Optional TTSStreamingService adds SpeakStream.
RealtimeService — creates RealtimeSession (SendText, SendAudio, Events, Close). Use realtime.NewFromConfig(cfg, provider) to construct (lives in pkg/realtime to avoid import cycles).

Supported providers (Go implementation)

These are the providers currently implemented in this Go port.

Provider	LLM	STT	TTS	Realtime
openai	✓	✓	✓	✓
groq	✓	✓	✓	—
sarvam	—	✓	✓	—
grok	✓	—	—	—
cerebras	✓	—	—	—
elevenlabs	—	✓	✓	—
aws	✓	✓	✓	—
mistral	✓	—	—	—
deepseek	✓	—	—	—
ollama	✓	—	—	—
qwen	✓	—	—	—
whisper	—	✓	—	—
asyncai	✓	—	—	—
camb	—	✓	—	—
fish	✓	—	—	—
gradium	—	✓	—	—
hume	—	—	✓	✓ (stub)
inworld	✓	—	✓	✓ (stub)
minimax	✓	—	✓	—
moondream	✓	—	—	—
neuphonic	—	—	✓	—
openpipe	✓	—	—	—
soniox	—	✓	—	—
xtts	—	—	✓	—

Constants: ProviderOpenAI, ProviderGroq, ProviderSarvam, ProviderGrok, ProviderCerebras, ProviderElevenLabs, ProviderAWS, ProviderMistral, ProviderDeepSeek, ProviderOllama, ProviderQwen, ProviderWhisper, ProviderAsyncAI, ProviderCamb, ProviderFish, ProviderGradium, ProviderHume, ProviderInworld, ProviderMinimax, ProviderMoondream, ProviderNeuphonic, ProviderOpenPipe, ProviderSoniox, ProviderXTTS. Realtime: SupportedRealtimeProviders ("openai", "hume", "inworld").

Upstream providers and Go coverage

The upstream Python services expose many more providers. The table below inventories those providers by capability and indicates whether they currently have a Go implementation in this repository.

Legend:

✓ — capability provided by the upstream Python services.
— — capability not provided (or not primary) for that provider.
Go — whether this capability is implemented in the Go services layer.

Provider	Upstream LLM	Upstream STT	Upstream TTS	Upstream Realtime	Go LLM	Go STT	Go TTS	Go Realtime
anthropic	✓	—	—	—	—	—	—	—
assemblyai	—	✓	—	—	—	—	—	—
asyncai	✓	—	—	—	✓	—	—	—
aws	✓	✓	✓	—	✓	✓	✓	—
aws_nova_sonic	✓	—	✓	—	—	—	—	—
azure	✓	✓	✓	—	—	—	—	—
camb	—	✓	—	—	—	✓	—	—
cartesia	—	—	✓	—	—	—	—	—
cerebras	✓	—	—	—	✓	—	—	—
deepgram	—	✓	—	—	—	—	—	—
deepseek	✓	—	—	—	✓	—	—	—
elevenlabs	—	✓	✓	—	—	✓	✓	—
fal	✓	—	—	—	—	—	—	—
fireworks	✓	—	—	—	—	—	—	—
fish	✓	—	—	—	—	—	—	—
gemini_multimodal_live	✓	✓	✓	✓	—	—	—	—
gladia	—	✓	—	—	—	—	—	—
google	✓	✓	✓	✓	—	—	—	—
gradium	—	✓	—	—	—	—	—	—
grok	✓	—	—	—	✓	—	—	—
groq	✓	✓	✓	—	✓	✓	✓	—
hathora	—	—	—	✓	—	—	—	—
heygen	—	—	✓	✓	—	—	—	—
hume	—	—	✓	✓	—	—	✓	✓ (stub)
inworld	✓	—	✓	✓	✓	—	✓	✓ (stub)
kokoro	—	—	✓	—	—	—	—	—
lmnt	—	—	✓	—	—	—	—	—
mem0	—	—	—	—	—	—	—	—
minimax	✓	—	✓	—	✓	—	✓	—
mistral	✓	—	—	—	✓	—	—	—
moondream	✓	—	—	—	✓	—	—	—
neuphonic	—	—	✓	—	—	—	✓	—
nim	✓	—	—	—	—	—	—	—
nvidia	✓	✓	✓	—	—	—	—	—
ollama	✓	—	—	—	✓	—	—	—
openai	✓	✓	✓	✓	✓	✓	✓	✓
openai_realtime	✓	✓	✓	✓	—	—	—	—
openai_realtime_beta	✓	✓	✓	✓	—	—	—	—
openpipe	✓	—	—	—	✓	—	—	—
openrouter	✓	—	—	—	—	—	—	—
perplexity	✓	—	—	—	—	—	—	—
piper	—	—	✓	—	—	—	—	—
qwen	✓	—	—	—	✓	—	—	—
resembleai	—	—	✓	—	—	—	—	—
rime	—	✓	—	—	—	—	—	—
riva	—	✓	✓	—	—	—	—	—
sambanova	✓	—	—	—	—	—	—	—
sarvam	—	✓	✓	—	—	✓	✓	—
simli	—	—	✓	✓	—	—	—	—
soniox	—	✓	—	—	—	✓	—	—
speechmatics	—	✓	—	—	—	—	—	—
tavus	—	—	✓	✓	—	—	—	—
together	✓	—	—	—	—	—	—	—
ultravox	—	—	✓	—	—	—	—	—
whisper	—	✓	—	—	—	✓	—	—
xtts	—	—	✓	—	—	—	✓	—

Configuration

Use config.Config (JSON or env):

provider — default for all tasks.
stt_provider, llm_provider, tts_provider — override per task.
model — chat/LLM model (e.g. gpt-3.5-turbo, mistral-small-latest, deepseek-chat).
stt_model, tts_model, tts_voice — task-specific when supported.
api_keys — map of service name to API key; otherwise keys are read from environment.

Environment variables (fallback when not in `api_keys`)

Provider	Env var
openai	OPENAI_API_KEY
groq	GROQ_API_KEY
sarvam	SARVAM_API_KEY
grok (xai)	XAI_API_KEY
cerebras	CEREBRAS_API_KEY
elevenlabs	ELEVENLABS_API_KEY
aws	AWS_SECRET_ACCESS_KEY, AWS_REGION (optional, default us-east-1)
mistral	MISTRAL_API_KEY
deepseek	DEEPSEEK_API_KEY
ollama	OLLAMA_API_KEY (optional), OLLAMA_BASE_URL (optional, default http://localhost:11434/v1)
qwen	DASHSCOPE_API_KEY or QWEN_API_KEY, DASHSCOPE_BASE_URL (optional)
whisper	WHISPER_API_KEY or OPENAI_API_KEY, WHISPER_BASE_URL (optional)
asyncai	ASYNC_AI_API_KEY, ASYNC_AI_BASE_URL (optional)
camb	CAMB_API_KEY, CAMB_BASE_URL (optional)
fish	FISH_API_KEY, FISH_BASE_URL (optional)
gradium	GRADIUM_API_KEY, GRADIUM_BASE_URL (optional)
hume	HUME_API_KEY
inworld	INWORLD_API_KEY
minimax	MINIMAX_API_KEY, MINIMAX_BASE_URL (optional)
moondream	MOONDREAM_API_KEY, MOONDREAM_BASE_URL (optional)
neuphonic	NEUPHONIC_API_KEY, NEUPHONIC_BASE_URL (optional)
openpipe	OPENPIPE_API_KEY
soniox	SONIOX_API_KEY, SONIOX_WS_URL (optional), SONIOX_MODEL (optional)
xtts	XTTS_BASE_URL (optional, default http://localhost:8000 for local server)

Usage

cfg, _ := config.LoadConfig("config.json")
// Or build manually:
cfg := &config.Config{
    LlmProvider: services.ProviderMistral,
    Model:       "mistral-small-latest",
}

llm := services.NewLLMFromConfig(cfg, cfg.LLMProvider(), cfg.Model)
stt := services.NewSTTFromConfig(cfg, cfg.STTProvider())
tts := services.NewTTSFromConfig(cfg, cfg.TTSProvider(), cfg.TTSModel, cfg.TTSVoice)

// Realtime (e.g. OpenAI Realtime WebSocket API):
realtimeSvc, err := realtime.NewFromConfig(cfg, "openai")

One-shot construction for all three:

llm, stt, tts := services.NewServicesFromConfig(cfg)

Tests

tests/pkg/services/ — factory construction tests for all supported providers; Sarvam integration test (requires SARVAM_API_KEY).
tests/pkg/realtime/ — realtime.NewFromConfig for openai and unsupported provider.

Documentation ¶

Overview ¶

Package services defines interfaces and implementations for LLM, STT, and TTS. Use the factory functions (NewLLMFromConfig, NewSTTFromConfig, NewTTSFromConfig) to construct services by provider name; see Supported*Providers for capability matrix. For RealtimeService use realtime.NewFromConfig(cfg, provider) to avoid an import cycle.

Package services defines interfaces and implementations for LLM, STT, and TTS. These align conceptually with common LLM/STT/TTS service abstractions and websocket/realtime session handling. See pkg/services/factory.go for provider wiring.

Index ¶

Constants
Variables
func NewServicesFromConfig(cfg *config.Config) (LLMService, STTService, TTSService)
type LLMService
- func NewLLMFromConfig(cfg *config.Config, provider, model string) LLMService
type LLMServiceWithTools
type RealtimeConfig
type RealtimeEvent
type RealtimeService
type RealtimeSession
type STTService
- func NewSTTFromConfig(cfg *config.Config, provider string) STTService
type STTStreamingService
type TTSService
- func NewTTSFromConfig(cfg *config.Config, provider, model, voice string) TTSService
type TTSStreamingService
type ToolHandler

Constants ¶

View Source

const (
	ProviderOpenAI       = "openai"
	ProviderGroq         = "groq"
	ProviderSarvam       = "sarvam"
	ProviderGrok         = "grok"
	ProviderCerebras     = "cerebras"
	ProviderElevenLabs   = "elevenlabs"
	ProviderAWS          = "aws"
	ProviderMistral      = "mistral"
	ProviderDeepSeek     = "deepseek"
	ProviderAnthropic    = "anthropic"
	ProviderGoogle       = "google"
	ProviderGoogleVertex = "google_vertex"
	ProviderOllama       = "ollama"
	ProviderQwen         = "qwen"
	ProviderWhisper      = "whisper"
	// Pipecat-integrated providers
	ProviderAsyncAI   = "asyncai"
	ProviderCamb      = "camb"
	ProviderFish      = "fish"
	ProviderGradium   = "gradium"
	ProviderHume      = "hume"
	ProviderInworld   = "inworld"
	ProviderMinimax   = "minimax"
	ProviderMoondream = "moondream"
	ProviderNeuphonic = "neuphonic"
	ProviderOpenPipe  = "openpipe"
	ProviderSoniox    = "soniox"
	ProviderXTTS      = "xtts"
)

Variables ¶

View Source

var SupportedLLMProviders = []string{
	ProviderOpenAI,
	ProviderGroq,
	ProviderGrok,
	ProviderCerebras,
	ProviderAWS,
	ProviderMistral,
	ProviderDeepSeek,
	ProviderAnthropic,
	ProviderGoogle,
	ProviderGoogleVertex,
	ProviderOllama,
	ProviderQwen,
	ProviderAsyncAI,
	ProviderFish,
	ProviderInworld,
	ProviderMinimax,
	ProviderMoondream,
	ProviderOpenPipe,
}

SupportedLLMProviders lists provider keys that can be passed to NewLLMFromConfig.

View Source

var SupportedRealtimeProviders = []string{ProviderOpenAI, ProviderHume, ProviderInworld}

SupportedRealtimeProviders lists provider keys for realtime (use realtime.NewFromConfig to construct).

View Source

var SupportedSTTProviders = []string{
	ProviderOpenAI, ProviderGroq, ProviderSarvam, ProviderElevenLabs, ProviderAWS, ProviderGoogle, ProviderWhisper,
	ProviderCamb, ProviderGradium, ProviderSoniox,
}

SupportedSTTProviders lists provider keys that can be passed to NewSTTFromConfig.

View Source

var SupportedTTSProviders = []string{
	ProviderOpenAI, ProviderGroq, ProviderSarvam, ProviderElevenLabs, ProviderAWS, ProviderGoogle,
	ProviderHume, ProviderInworld, ProviderMinimax, ProviderNeuphonic, ProviderXTTS,
}

SupportedTTSProviders lists provider keys that can be passed to NewTTSFromConfig.

Functions ¶

func NewServicesFromConfig ¶

func NewServicesFromConfig(cfg *config.Config) (LLMService, STTService, TTSService)

NewServicesFromConfig returns LLM, STT, and TTS services based on cfg. Resolves provider per task (stt_provider/llm_provider/tts_provider or provider); uses task-specific model/voice when set.

Types ¶

type LLMService ¶

type LLMService = llmapi.LLMService

LLMService provides chat completion; may stream text frames. Re-exported from llmapi.

func NewLLMFromConfig ¶

func NewLLMFromConfig(cfg *config.Config, provider, model string) LLMService

NewLLMFromConfig returns an LLMService for the given provider and model. Provider must be one of SupportedLLMProviders; model is the chat model (e.g. cfg.Model).

type LLMServiceWithTools ¶

type LLMServiceWithTools = llmapi.LLMServiceWithTools

LLMServiceWithTools is an LLM service that supports registering tools. Re-exported from llmapi.

type RealtimeConfig ¶

type RealtimeConfig struct {
	Provider string           // e.g. "openai"
	Model    string           // e.g. "gpt-4o-realtime" or regular chat model
	Voice    string           // TTS voice, if applicable
	Tools    []map[string]any // optional function calling tools
}

RealtimeConfig configures a realtime session for a given provider/model.

type RealtimeEvent ¶

type RealtimeEvent struct {
	Text  *frames.LLMTextFrame
	Audio *frames.TTSAudioRawFrame
	Frame frames.Frame
}

RealtimeEvent represents a high-level event emitted by a realtime session. It can carry LLM text, TTS audio, or generic frames for extensibility.

type RealtimeService ¶

type RealtimeService interface {
	NewSession(ctx context.Context, cfg RealtimeConfig) (RealtimeSession, error)
}

RealtimeService creates realtime sessions.

type RealtimeSession ¶

type RealtimeSession interface {
	// SendText sends text input into the session (e.g. user message).
	SendText(ctx context.Context, text string) error
	// SendAudio sends raw audio input into the session (e.g. microphone audio).
	SendAudio(ctx context.Context, audio []byte, sampleRate, numChannels int) error
	// Events returns a channel of high-level events from the session.
	Events() <-chan RealtimeEvent
	// Close terminates the session and closes the events channel.
	Close(ctx context.Context) error
}

RealtimeSession is a bidirectional, long-lived conversation with an AI service.

type STTService ¶

type STTService interface {
	Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)
}

STTService transcribes audio to text (transcription frames).

func NewSTTFromConfig ¶

func NewSTTFromConfig(cfg *config.Config, provider string) STTService

NewSTTFromConfig returns an STTService for the given provider. Provider must be one of SupportedSTTProviders; cfg.STTModel is used when supported (e.g. Groq).

type STTStreamingService ¶

type STTStreamingService interface {
	STTService
	// TranscribeStream sends transcription frames (interim and final) to outCh as audio is received on audioCh.
	TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, outCh chan<- frames.Frame)
}

STTStreamingService optionally supports streaming transcription (interim + final frames).

type TTSService ¶

type TTSService interface {
	Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)
}

TTSService converts text to speech (audio frames).

func NewTTSFromConfig ¶

func NewTTSFromConfig(cfg *config.Config, provider, model, voice string) TTSService

NewTTSFromConfig returns a TTSService for the given provider, model, and voice. Provider must be one of SupportedTTSProviders; model and voice are typically cfg.TTSModel and cfg.TTSVoice.

type TTSStreamingService ¶

type TTSStreamingService interface {
	TTSService
	// SpeakStream streams TTS audio frames to outCh as they are produced.
	SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)
}

TTSStreamingService optionally supports streaming TTS (incremental audio to outCh).

type ToolHandler ¶

type ToolHandler = llmapi.ToolHandler

ToolHandler is called when the LLM requests a tool call. Re-exported from llmapi.

Source Files ¶

View all Source files

Directories ¶

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

Path	Synopsis
anthropic
asyncai
aws
camb Package camb provides Camb AI speech-to-text.	Package camb provides Camb AI speech-to-text.
cerebras Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.	Package cerebras provides Cerebras inference API-backed LLM via OpenAI-compatible API.
deepseek Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.	Package deepseek provides DeepSeek-backed LLM via OpenAI-compatible API.
elevenlabs
fish
google Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.	Package google provides Google Gemini LLM, Vertex AI LLM, and Google Cloud STT/TTS services.
gradium Package gradium provides Gradium speech-to-text (WebSocket or REST).	Package gradium provides Gradium speech-to-text (WebSocket or REST).
grok Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.	Package grok provides xAI Grok-backed LLM via OpenAI-compatible API.
groq Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.	Package groq provides Groq-backed LLM, STT, and TTS via OpenAI-compatible API.
hume Package hume provides Hume (Hume AI) text-to-speech.	Package hume provides Hume (Hume AI) text-to-speech.
inworld Package inworld provides Inworld text-to-speech (and LLM).	Package inworld provides Inworld text-to-speech (and LLM).
llmapi Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.	Package llmapi defines LLM and tool-calling interfaces so that implementers (e.g.
minimax Package minimax provides Minimax text-to-speech.	Package minimax provides Minimax text-to-speech.
mistral Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.	Package mistral provides Mistral AI-backed LLM via OpenAI-compatible API.
mock Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.	Package mock provides mock STT, LLM, and TTS services for testing and stress testing without calling real APIs.
moondream
neuphonic Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).	Package neuphonic provides Neuphonic text-to-speech (HTTP SSE streaming).
ollama Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).	Package ollama provides Ollama-backed LLM via OpenAI-compatible API (localhost or custom base URL).
openai Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.	Package openai provides OpenAI-based LLM (and optionally STT/TTS) for Voxray.
openpipe Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.	Package openpipe provides OpenPipe-backed LLM via OpenAI-compatible API.
qwen Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.	Package qwen provides Alibaba DashScope Qwen LLM via OpenAI-compatible API.
sarvam Package sarvam provides Sarvam AI TTS and STT service implementations.	Package sarvam provides Sarvam AI TTS and STT service implementations.
soniox Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).	Package soniox provides Soniox speech-to-text (WebSocket API used for batch Transcribe).
stt Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).	Package stt provides STT service implementations (OpenAI Whisper, Groq Whisper).
tts Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).	Package tts provides TTS service implementations (OpenAI TTS, Groq TTS).
whisper Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.	Package whisper provides Whisper API-backed STT (OpenAI or self-hosted compatible) with configurable base URL.
xtts Package xtts provides Coqui XTTS text-to-speech via local streaming server.	Package xtts provides Coqui XTTS text-to-speech via local streaming server.

README ¶

Services layer

API interaction (streaming)

Provider registry

Interfaces

Supported providers (Go implementation)

Upstream providers and Go coverage

Configuration

Environment variables (fallback when not in api_keys)

Usage

Tests

See also

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func NewServicesFromConfig ¶

Types ¶

type LLMService ¶

func NewLLMFromConfig ¶

type LLMServiceWithTools ¶

type RealtimeConfig ¶

type RealtimeEvent ¶

type RealtimeService ¶

type RealtimeSession ¶

type STTService ¶

func NewSTTFromConfig ¶

type STTStreamingService ¶

type TTSService ¶

func NewTTSFromConfig ¶

type TTSStreamingService ¶

type ToolHandler ¶

Source Files ¶

Directories ¶

Environment variables (fallback when not in `api_keys`)