sarvam

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 16, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Overview

Package sarvam provides Sarvam AI TTS and STT service implementations.

Index

Constants

View Source
const (
	// DefaultBaseURL is the default Sarvam AI API base URL.
	DefaultBaseURL = "https://api.sarvam.ai"
)
View Source
const DefaultSarvamSTTModel = "saarika:v2.5"

DefaultSarvamSTTModel is the default Sarvam STT model when none is specified. It matches the REST API default (saarika:v2.5).

View Source
const DefaultSarvamTTSModel = "bulbul:v2"

DefaultSarvamTTSModel is the default Sarvam TTS model (bulbul v2).

View Source
const DefaultSarvamTTSSpeaker = "anushka"

DefaultSarvamTTSSpeaker is the default Sarvam TTS speaker.

Variables

This section is empty.

Functions

This section is empty.

Types

type SarvamSTTService

type SarvamSTTService struct {
	// contains filtered or unexported fields
}

SarvamSTTService implements services.STTService (and STTStreamingService via TranscribeStream) using Sarvam AI's speech-to-text REST API.

It uses:

POST https://api.sarvam.ai/speech-to-text (multipart/form-data)

with fields:

  • file: binary audio (WAV or raw PCM; format must match input_audio_codec)
  • model: e.g. "saarika:v2.5" or "saaras:v3"
  • input_audio_codec: "wav" when sending WAV bytes, "pcm_s16le" for raw PCM
  • language_code: optional, e.g. "en-IN", "hi-IN"; empty means auto-detect

func NewSTT

func NewSTT(apiKey, model string) *SarvamSTTService

NewSTT creates a Sarvam STT service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model is empty, DefaultSarvamSTTModel is used.

func NewSTTWithLanguage

func NewSTTWithLanguage(apiKey, model, languageCode string) *SarvamSTTService

NewSTTWithLanguage creates a Sarvam STT service with an optional language code.

func (*SarvamSTTService) Transcribe

func (s *SarvamSTTService) Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)

Transcribe sends audio to Sarvam's REST STT API and returns one TranscriptionFrame (final).

func (*SarvamSTTService) TranscribeStream

func (s *SarvamSTTService) TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, outCh chan<- frames.Frame)

TranscribeStream uses Sarvam's WebSocket streaming STT API: it connects to the streaming endpoint, sends audio from audioCh (as base64), and pushes TranscriptionFrame(s) to outCh as transcript messages arrive. When audioCh closes, the buffered audio is sent and the connection is closed. For one-off transcription use Transcribe (REST) instead.

type SarvamTTSService

type SarvamTTSService struct {
	// contains filtered or unexported fields
}

SarvamTTSService implements services.TTSService (and TTSStreamingService via SpeakStream) using Sarvam AI's text-to-speech HTTP API.

It mirrors the behavior of the Python SarvamHttpTTSService at a high level: - POST https://api.sarvam.ai/text-to-speech with JSON payload - Audio is returned as base64-encoded WAV/PCM in "audios"[0] - We decode the base64 and strip WAV headers when present, returning raw PCM.

func NewTTS

func NewTTS(apiKey, model, voice string) *SarvamTTSService

NewTTS creates a Sarvam TTS service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model or voice is empty, sensible Sarvam defaults are used.

func (*SarvamTTSService) Speak

func (s *SarvamTTSService) Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)

Speak requests TTS from Sarvam, decodes base64 audio (WAV or PCM) and returns TTSAudioRawFrame(s).

func (*SarvamTTSService) SpeakStream

func (s *SarvamTTSService) SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)

SpeakStream runs TTS using Sarvam's WebSocket streaming API and sends TTSAudioRawFrame(s) to outCh as audio chunks arrive.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL