sarvam

package

v0.2.0 Latest Latest Go to latest Published: May 16, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Voxray-AI/Voxray

Links

Open Source Insights

Documentation ¶

Overview ¶

Package sarvam provides Sarvam AI TTS and STT service implementations.

Index ¶

Constants
type SarvamSTTService
- func NewSTT(apiKey, model string) *SarvamSTTService
- func NewSTTWithLanguage(apiKey, model, languageCode string) *SarvamSTTService
- func (s *SarvamSTTService) Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)
- func (s *SarvamSTTService) TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, ...)
type SarvamTTSService
- func NewTTS(apiKey, model, voice string) *SarvamTTSService
- func (s *SarvamTTSService) Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)
- func (s *SarvamTTSService) SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)

Constants ¶

View Source

const (
	// DefaultBaseURL is the default Sarvam AI API base URL.
	DefaultBaseURL = "https://api.sarvam.ai"
)

View Source

const DefaultSarvamSTTModel = "saarika:v2.5"

DefaultSarvamSTTModel is the default Sarvam STT model when none is specified. It matches the REST API default (saarika:v2.5).

View Source

const DefaultSarvamTTSModel = "bulbul:v2"

DefaultSarvamTTSModel is the default Sarvam TTS model (bulbul v2).

View Source

const DefaultSarvamTTSSpeaker = "anushka"

DefaultSarvamTTSSpeaker is the default Sarvam TTS speaker.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type SarvamSTTService ¶

type SarvamSTTService struct {
	// contains filtered or unexported fields
}

SarvamSTTService implements services.STTService (and STTStreamingService via TranscribeStream) using Sarvam AI's speech-to-text REST API.

It uses:

POST https://api.sarvam.ai/speech-to-text (multipart/form-data)

with fields:

file: binary audio (WAV or raw PCM; format must match input_audio_codec)
model: e.g. "saarika:v2.5" or "saaras:v3"
input_audio_codec: "wav" when sending WAV bytes, "pcm_s16le" for raw PCM
language_code: optional, e.g. "en-IN", "hi-IN"; empty means auto-detect

func NewSTT ¶

func NewSTT(apiKey, model string) *SarvamSTTService

NewSTT creates a Sarvam STT service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model is empty, DefaultSarvamSTTModel is used.

func NewSTTWithLanguage ¶

func NewSTTWithLanguage(apiKey, model, languageCode string) *SarvamSTTService

NewSTTWithLanguage creates a Sarvam STT service with an optional language code.

func (*SarvamSTTService) Transcribe ¶

func (s *SarvamSTTService) Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)

Transcribe sends audio to Sarvam's REST STT API and returns one TranscriptionFrame (final).

func (*SarvamSTTService) TranscribeStream ¶

func (s *SarvamSTTService) TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, outCh chan<- frames.Frame)

TranscribeStream uses Sarvam's WebSocket streaming STT API: it connects to the streaming endpoint, sends audio from audioCh (as base64), and pushes TranscriptionFrame(s) to outCh as transcript messages arrive. When audioCh closes, the buffered audio is sent and the connection is closed. For one-off transcription use Transcribe (REST) instead.

type SarvamTTSService ¶

type SarvamTTSService struct {
	// contains filtered or unexported fields
}

SarvamTTSService implements services.TTSService (and TTSStreamingService via SpeakStream) using Sarvam AI's text-to-speech HTTP API.

It mirrors the behavior of the Python SarvamHttpTTSService at a high level: - POST https://api.sarvam.ai/text-to-speech with JSON payload - Audio is returned as base64-encoded WAV/PCM in "audios"[0] - We decode the base64 and strip WAV headers when present, returning raw PCM.

func NewTTS ¶

func NewTTS(apiKey, model, voice string) *SarvamTTSService

NewTTS creates a Sarvam TTS service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model or voice is empty, sensible Sarvam defaults are used.

func (*SarvamTTSService) Speak ¶

func (s *SarvamTTSService) Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)

Speak requests TTS from Sarvam, decodes base64 audio (WAV or PCM) and returns TTSAudioRawFrame(s).

func (*SarvamTTSService) SpeakStream ¶

func (s *SarvamTTSService) SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)

SpeakStream runs TTS using Sarvam's WebSocket streaming API and sends TTSAudioRawFrame(s) to outCh as audio chunks arrive.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL