Documentation
¶
Overview ¶
Package sarvam provides Sarvam AI TTS and STT service implementations.
Index ¶
Constants ¶
const (
// DefaultBaseURL is the default Sarvam AI API base URL.
DefaultBaseURL = "https://api.sarvam.ai"
)
const DefaultSarvamSTTModel = "saarika:v2.5"
DefaultSarvamSTTModel is the default Sarvam STT model when none is specified. It matches the REST API default (saarika:v2.5).
const DefaultSarvamTTSModel = "bulbul:v2"
DefaultSarvamTTSModel is the default Sarvam TTS model (bulbul v2).
const DefaultSarvamTTSSpeaker = "anushka"
DefaultSarvamTTSSpeaker is the default Sarvam TTS speaker.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type SarvamSTTService ¶
type SarvamSTTService struct {
// contains filtered or unexported fields
}
SarvamSTTService implements services.STTService (and STTStreamingService via TranscribeStream) using Sarvam AI's speech-to-text REST API.
It uses:
POST https://api.sarvam.ai/speech-to-text (multipart/form-data)
with fields:
- file: binary audio (WAV or raw PCM; format must match input_audio_codec)
- model: e.g. "saarika:v2.5" or "saaras:v3"
- input_audio_codec: "wav" when sending WAV bytes, "pcm_s16le" for raw PCM
- language_code: optional, e.g. "en-IN", "hi-IN"; empty means auto-detect
func NewSTT ¶
func NewSTT(apiKey, model string) *SarvamSTTService
NewSTT creates a Sarvam STT service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model is empty, DefaultSarvamSTTModel is used.
func NewSTTWithLanguage ¶
func NewSTTWithLanguage(apiKey, model, languageCode string) *SarvamSTTService
NewSTTWithLanguage creates a Sarvam STT service with an optional language code.
func (*SarvamSTTService) Transcribe ¶
func (s *SarvamSTTService) Transcribe(ctx context.Context, audio []byte, sampleRate, numChannels int) ([]*frames.TranscriptionFrame, error)
Transcribe sends audio to Sarvam's REST STT API and returns one TranscriptionFrame (final).
func (*SarvamSTTService) TranscribeStream ¶
func (s *SarvamSTTService) TranscribeStream(ctx context.Context, audioCh <-chan []byte, sampleRate, numChannels int, outCh chan<- frames.Frame)
TranscribeStream uses Sarvam's WebSocket streaming STT API: it connects to the streaming endpoint, sends audio from audioCh (as base64), and pushes TranscriptionFrame(s) to outCh as transcript messages arrive. When audioCh closes, the buffered audio is sent and the connection is closed. For one-off transcription use Transcribe (REST) instead.
type SarvamTTSService ¶
type SarvamTTSService struct {
// contains filtered or unexported fields
}
SarvamTTSService implements services.TTSService (and TTSStreamingService via SpeakStream) using Sarvam AI's text-to-speech HTTP API.
It mirrors the behavior of the Python SarvamHttpTTSService at a high level: - POST https://api.sarvam.ai/text-to-speech with JSON payload - Audio is returned as base64-encoded WAV/PCM in "audios"[0] - We decode the base64 and strip WAV headers when present, returning raw PCM.
func NewTTS ¶
func NewTTS(apiKey, model, voice string) *SarvamTTSService
NewTTS creates a Sarvam TTS service. If apiKey is empty, config.GetEnv("SARVAM_API_KEY", "") is used. If model or voice is empty, sensible Sarvam defaults are used.
func (*SarvamTTSService) Speak ¶
func (s *SarvamTTSService) Speak(ctx context.Context, text string, sampleRate int) ([]*frames.TTSAudioRawFrame, error)
Speak requests TTS from Sarvam, decodes base64 audio (WAV or PCM) and returns TTSAudioRawFrame(s).
func (*SarvamTTSService) SpeakStream ¶
func (s *SarvamTTSService) SpeakStream(ctx context.Context, text string, sampleRate int, outCh chan<- frames.Frame)
SpeakStream runs TTS using Sarvam's WebSocket streaming API and sends TTSAudioRawFrame(s) to outCh as audio chunks arrive.