providers

package
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 23, 2025 License: Apache-2.0 Imports: 13 Imported by: 10

Documentation

Overview

Package providers implements multi-LLM provider support with unified interfaces.

Package providers implements multi-LLM provider support with unified interfaces.

This package provides a common abstraction for predict-based LLM providers including OpenAI, Anthropic Claude, and Google Gemini. It handles:

  • Predict completion requests with streaming support
  • Tool/function calling with provider-specific formats
  • Cost tracking and token usage calculation
  • Rate limiting and error handling

All providers implement the Provider interface for basic predict, and ToolSupport interface for function calling capabilities.

Index

Constants

View Source
const (
	ContentTypeHeader   = "Content-Type"
	AuthorizationHeader = "Authorization"
	ApplicationJSON     = "application/json"
	BearerPrefix        = "Bearer "
)

Common HTTP constants for embedding providers.

Variables

This section is empty.

Functions

func CheckHTTPError added in v1.1.0

func CheckHTTPError(resp *http.Response) error

CheckHTTPError checks if HTTP response is an error and returns formatted error with body

func ExtractOrderedEmbeddings added in v1.1.6

func ExtractOrderedEmbeddings[T any](
	data []T,
	getIndex func(T) int,
	getEmbedding func(T) []float32,
	expectedCount int,
) ([][]float32, error)

ExtractOrderedEmbeddings extracts embeddings from indexed response data and places them in the correct order. Returns an error if count doesn't match.

func HasAudioSupport added in v1.1.0

func HasAudioSupport(p Provider) bool

HasAudioSupport checks if a provider supports audio inputs

func HasImageSupport added in v1.1.0

func HasImageSupport(p Provider) bool

HasImageSupport checks if a provider supports image inputs

func HasVideoSupport added in v1.1.0

func HasVideoSupport(p Provider) bool

HasVideoSupport checks if a provider supports video inputs

func IsFormatSupported added in v1.1.0

func IsFormatSupported(p Provider, contentType, mimeType string) bool

IsFormatSupported checks if a provider supports a specific media format (MIME type)

func IsValidationAbort

func IsValidationAbort(err error) bool

IsValidationAbort checks if an error is a validation abort

func LoadFileAsBase64 deprecated added in v1.1.0

func LoadFileAsBase64(filePath string) (string, error)

LoadFileAsBase64 reads a file and returns its content as a base64-encoded string.

Deprecated: Use MediaLoader.GetBase64Data instead for better functionality including storage reference support, URL loading, and proper context handling.

This function is kept for backward compatibility but will be removed in a future version. It now delegates to the new MediaLoader implementation.

func LogEmbeddingRequest added in v1.1.6

func LogEmbeddingRequest(provider, model string, textCount int, start time.Time)

LogEmbeddingRequest logs a completed embedding request with common fields.

func LogEmbeddingRequestWithTokens added in v1.1.6

func LogEmbeddingRequestWithTokens(provider, model string, textCount, tokens int, start time.Time)

LogEmbeddingRequestWithTokens logs a completed embedding request with token count.

func MarshalRequest added in v1.1.6

func MarshalRequest(req any) ([]byte, error)

MarshalRequest marshals a request body to JSON with standardized error handling.

func RegisterProviderFactory added in v1.1.0

func RegisterProviderFactory(providerType string, factory ProviderFactory)

RegisterProviderFactory registers a factory function for a provider type

func SetErrorResponse added in v1.1.0

func SetErrorResponse(predictResp *PredictionResponse, respBody []byte, start time.Time)

SetErrorResponse sets latency and raw body on error responses

func StringPtr added in v1.1.0

func StringPtr(s string) *string

StringPtr is a helper function that returns a pointer to a string. This is commonly used across provider implementations for optional fields.

func SupportsMultimodal added in v1.1.0

func SupportsMultimodal(p Provider) bool

SupportsMultimodal checks if a provider implements multimodal support

func UnmarshalJSON added in v1.1.0

func UnmarshalJSON(respBody []byte, v interface{}, predictResp *PredictionResponse, start time.Time) error

UnmarshalJSON unmarshals JSON with error recovery that sets latency and raw response

func UnmarshalResponse added in v1.1.6

func UnmarshalResponse(body []byte, resp any) error

UnmarshalResponse unmarshals a response body from JSON with standardized error handling.

func ValidateMultimodalMessage added in v1.1.0

func ValidateMultimodalMessage(p Provider, msg types.Message) error

ValidateMultimodalMessage checks if a message's multimodal content is supported by the provider

func ValidateMultimodalRequest added in v1.1.0

func ValidateMultimodalRequest(p MultimodalSupport, req PredictionRequest) error

ValidateMultimodalRequest validates all messages in a predict request for multimodal compatibility This is a helper function to reduce duplication across provider implementations

Types

type AudioStreamingCapabilities added in v1.1.0

type AudioStreamingCapabilities struct {
	// SupportedEncodings lists supported audio encodings
	// Common values: "pcm", "opus", "mp3", "aac"
	SupportedEncodings []string `json:"supported_encodings"`

	// SupportedSampleRates lists supported sample rates in Hz
	// Common values: 8000, 16000, 24000, 44100, 48000
	SupportedSampleRates []int `json:"supported_sample_rates"`

	// SupportedChannels lists supported channel counts
	// Common values: 1 (mono), 2 (stereo)
	SupportedChannels []int `json:"supported_channels"`

	// SupportedBitDepths lists supported bit depths
	// Common values: 16, 24, 32
	SupportedBitDepths []int `json:"supported_bit_depths,omitempty"`

	// PreferredEncoding is the recommended encoding for best quality/latency
	PreferredEncoding string `json:"preferred_encoding"`

	// PreferredSampleRate is the recommended sample rate
	PreferredSampleRate int `json:"preferred_sample_rate"`
}

AudioStreamingCapabilities describes audio streaming support.

type BaseEmbeddingProvider added in v1.1.6

type BaseEmbeddingProvider struct {
	ProviderModel string
	BaseURL       string
	APIKey        string
	HTTPClient    *http.Client
	Dimensions    int
	ProviderID    string
	BatchSize     int
}

BaseEmbeddingProvider provides common functionality for embedding providers. Embed this struct in provider-specific implementations to reduce duplication.

func NewBaseEmbeddingProvider added in v1.1.6

func NewBaseEmbeddingProvider(
	providerID, defaultModel, defaultBaseURL string,
	defaultDimensions, defaultBatchSize int,
	defaultTimeout time.Duration,
) *BaseEmbeddingProvider

NewBaseEmbeddingProvider creates a base embedding provider with defaults.

func (*BaseEmbeddingProvider) DoEmbeddingRequest added in v1.1.6

func (b *BaseEmbeddingProvider) DoEmbeddingRequest(
	ctx context.Context,
	cfg HTTPRequestConfig,
) ([]byte, error)

DoEmbeddingRequest performs a common HTTP POST request for embeddings. Returns the response body and any error.

func (*BaseEmbeddingProvider) EmbedWithEmptyCheck added in v1.1.6

func (b *BaseEmbeddingProvider) EmbedWithEmptyCheck(
	ctx context.Context,
	req EmbeddingRequest,
	embedFn EmbedFunc,
) (EmbeddingResponse, error)

EmbedWithEmptyCheck wraps embedding logic with empty request handling.

func (*BaseEmbeddingProvider) EmbeddingDimensions added in v1.1.6

func (b *BaseEmbeddingProvider) EmbeddingDimensions() int

EmbeddingDimensions returns the dimensionality of embedding vectors.

func (*BaseEmbeddingProvider) EmptyResponseForModel added in v1.1.6

func (b *BaseEmbeddingProvider) EmptyResponseForModel(model string) EmbeddingResponse

EmptyResponseForModel returns an empty EmbeddingResponse with the given model. Use this for handling empty input cases.

func (*BaseEmbeddingProvider) HandleEmptyRequest added in v1.1.6

func (b *BaseEmbeddingProvider) HandleEmptyRequest(
	req EmbeddingRequest,
) (EmbeddingResponse, bool)

HandleEmptyRequest checks if the request has no texts and returns early if so. Returns (response, true) if empty, (zero, false) if not empty.

func (*BaseEmbeddingProvider) ID added in v1.1.6

func (b *BaseEmbeddingProvider) ID() string

ID returns the provider identifier.

func (*BaseEmbeddingProvider) MaxBatchSize added in v1.1.6

func (b *BaseEmbeddingProvider) MaxBatchSize() int

MaxBatchSize returns the maximum texts per single API request.

func (*BaseEmbeddingProvider) Model added in v1.1.6

func (b *BaseEmbeddingProvider) Model() string

Model returns the current embedding model.

func (*BaseEmbeddingProvider) ResolveModel added in v1.1.6

func (b *BaseEmbeddingProvider) ResolveModel(reqModel string) string

ResolveModel returns the model to use, preferring the request model over the default.

type BaseProvider added in v1.1.0

type BaseProvider struct {
	// contains filtered or unexported fields
}

BaseProvider provides common functionality shared across all provider implementations. It should be embedded in concrete provider structs to avoid code duplication.

func NewBaseProvider added in v1.1.0

func NewBaseProvider(id string, includeRawOutput bool, client *http.Client) BaseProvider

NewBaseProvider creates a new BaseProvider with common fields

func NewBaseProviderWithAPIKey added in v1.1.0

func NewBaseProviderWithAPIKey(id string, includeRawOutput bool, primaryKey, fallbackKey string) (provider BaseProvider, apiKey string)

NewBaseProviderWithAPIKey creates a BaseProvider and retrieves API key from environment It tries the primary key first, then falls back to the secondary key if primary is empty.

func (*BaseProvider) Close added in v1.1.0

func (b *BaseProvider) Close() error

Close closes the HTTP client's idle connections

func (*BaseProvider) GetHTTPClient added in v1.1.0

func (b *BaseProvider) GetHTTPClient() *http.Client

GetHTTPClient returns the underlying HTTP client for provider-specific use

func (*BaseProvider) ID added in v1.1.0

func (b *BaseProvider) ID() string

ID returns the provider ID

func (*BaseProvider) ShouldIncludeRawOutput added in v1.1.0

func (b *BaseProvider) ShouldIncludeRawOutput() bool

ShouldIncludeRawOutput returns whether to include raw API responses in output

func (*BaseProvider) SupportsStreaming added in v1.1.0

func (b *BaseProvider) SupportsStreaming() bool

SupportsStreaming returns true by default (can be overridden by providers that don't support streaming)

type EmbedFunc added in v1.1.6

type EmbedFunc func(ctx context.Context, texts []string, model string) (EmbeddingResponse, error)

EmbedFunc is the signature for provider-specific embedding logic.

type EmbeddingProvider added in v1.1.6

type EmbeddingProvider interface {
	// Embed generates embeddings for the given texts.
	// The response contains one embedding vector per input text, in the same order.
	// Implementations should handle batching internally if the request exceeds MaxBatchSize.
	Embed(ctx context.Context, req EmbeddingRequest) (EmbeddingResponse, error)

	// EmbeddingDimensions returns the dimensionality of embedding vectors.
	// Common values: 1536 (OpenAI ada-002/3-small), 768 (Gemini), 3072 (OpenAI 3-large)
	EmbeddingDimensions() int

	// MaxBatchSize returns the maximum number of texts per single API request.
	// Callers should batch requests appropriately, or rely on the provider
	// to handle splitting internally.
	MaxBatchSize() int

	// ID returns the provider identifier (e.g., "openai-embedding", "gemini-embedding")
	ID() string
}

EmbeddingProvider generates text embeddings for semantic similarity operations. Implementations exist for OpenAI, Gemini, and other embedding APIs.

Embeddings are dense vector representations of text that capture semantic meaning. Similar texts will have embeddings with high cosine similarity scores.

Example usage:

provider, _ := openai.NewEmbeddingProvider()
resp, err := provider.Embed(ctx, providers.EmbeddingRequest{
    Texts: []string{"Hello world", "Hi there"},
})
similarity := CosineSimilarity(resp.Embeddings[0], resp.Embeddings[1])

type EmbeddingRequest added in v1.1.6

type EmbeddingRequest struct {
	// Texts to embed (batched for efficiency)
	Texts []string

	// Model override for embedding model (optional, uses provider default if empty)
	Model string
}

EmbeddingRequest represents a request for text embeddings.

type EmbeddingResponse added in v1.1.6

type EmbeddingResponse struct {
	// Embeddings contains one vector per input text, in the same order
	Embeddings [][]float32

	// Model is the model that was used for embedding
	Model string

	// Usage contains token consumption information (optional)
	Usage *EmbeddingUsage
}

EmbeddingResponse contains the embedding vectors from a provider.

type EmbeddingUsage added in v1.1.6

type EmbeddingUsage struct {
	// TotalTokens is the total number of tokens processed
	TotalTokens int
}

EmbeddingUsage tracks token consumption for embedding requests.

type ExecutionResult

type ExecutionResult interface{}

ExecutionResult is a forward declaration to avoid circular import.

type HTTPRequestConfig added in v1.1.6

type HTTPRequestConfig struct {
	URL         string
	Body        []byte
	UseAPIKey   bool   // If true, adds Authorization: Bearer <APIKey> header
	ContentType string // Defaults to application/json
}

HTTPRequestConfig configures how to make an HTTP request.

type ImageDetail added in v1.1.0

type ImageDetail string

ImageDetail specifies the level of detail for image processing

const (
	ImageDetailLow  ImageDetail = "low"  // Faster, less detailed analysis
	ImageDetailHigh ImageDetail = "high" // Slower, more detailed analysis
	ImageDetailAuto ImageDetail = "auto" // Provider chooses automatically
)

Image detail levels for multimodal processing.

type MediaLoader added in v1.1.2

type MediaLoader struct {
	// contains filtered or unexported fields
}

MediaLoader handles loading media content from various sources (inline data, files, URLs, storage). It provides a unified interface for providers to access media regardless of the source.

func NewMediaLoader added in v1.1.2

func NewMediaLoader(config MediaLoaderConfig) *MediaLoader

NewMediaLoader creates a new MediaLoader with the given configuration.

func (*MediaLoader) GetBase64Data added in v1.1.2

func (ml *MediaLoader) GetBase64Data(ctx context.Context, media *types.MediaContent) (string, error)

GetBase64Data loads media content and returns it as base64-encoded data. It handles all media sources: inline data, file paths, URLs, and storage references.

type MediaLoaderConfig added in v1.1.2

type MediaLoaderConfig struct {
	// StorageService is optional - required only for loading from storage references
	StorageService storage.MediaStorageService

	// HTTPTimeout for URL fetching (default: 30s)
	HTTPTimeout time.Duration

	// MaxURLSizeBytes is the maximum size for URL-based media (default: 50MB)
	MaxURLSizeBytes int64
}

MediaLoaderConfig configures the MediaLoader behavior.

type MultimodalCapabilities added in v1.1.0

type MultimodalCapabilities struct {
	SupportsImages bool     // Provider can process image inputs
	SupportsAudio  bool     // Provider can process audio inputs
	SupportsVideo  bool     // Provider can process video inputs
	ImageFormats   []string // Supported image MIME types (e.g., "image/jpeg", "image/png")
	AudioFormats   []string // Supported audio MIME types (e.g., "audio/mpeg", "audio/wav")
	VideoFormats   []string // Supported video MIME types (e.g., "video/mp4")
	MaxImageSizeMB int      // Maximum image size in megabytes (0 = unlimited/unknown)
	MaxAudioSizeMB int      // Maximum audio size in megabytes (0 = unlimited/unknown)
	MaxVideoSizeMB int      // Maximum video size in megabytes (0 = unlimited/unknown)
}

MultimodalCapabilities describes what types of multimodal content a provider supports

type MultimodalSupport added in v1.1.0

type MultimodalSupport interface {
	Provider // Extends the base Provider interface

	// GetMultimodalCapabilities returns what types of multimodal content this provider supports
	GetMultimodalCapabilities() MultimodalCapabilities

	// PredictMultimodal performs a predict request with multimodal message content
	// Messages in the request can contain Parts with images, audio, or video
	PredictMultimodal(ctx context.Context, req PredictionRequest) (PredictionResponse, error)

	// PredictMultimodalStream performs a streaming predict request with multimodal content
	PredictMultimodalStream(ctx context.Context, req PredictionRequest) (<-chan StreamChunk, error)
}

MultimodalSupport interface for providers that support multimodal inputs

func GetMultimodalProvider added in v1.1.0

func GetMultimodalProvider(p Provider) MultimodalSupport

GetMultimodalProvider safely casts a provider to MultimodalSupport Returns nil if the provider doesn't support multimodal

type MultimodalToolSupport added in v1.1.0

type MultimodalToolSupport interface {
	MultimodalSupport // Extends multimodal support
	ToolSupport       // Extends tool support

	// PredictMultimodalWithTools performs a predict request with both multimodal content and tools
	PredictMultimodalWithTools(ctx context.Context, req PredictionRequest, tools interface{}, toolChoice string) (PredictionResponse, []types.MessageToolCall, error)
}

MultimodalToolSupport interface for providers that support both multimodal and tools

type PredictionRequest added in v1.1.0

type PredictionRequest struct {
	System      string                 `json:"system"`
	Messages    []types.Message        `json:"messages"`
	Temperature float32                `json:"temperature"`
	TopP        float32                `json:"top_p"`
	MaxTokens   int                    `json:"max_tokens"`
	Seed        *int                   `json:"seed,omitempty"`
	Metadata    map[string]interface{} `json:"metadata,omitempty"` // Optional metadata for provider-specific context
}

PredictionRequest represents a request to a predict provider

type PredictionResponse added in v1.1.0

type PredictionResponse struct {
	Content    string                  `json:"content"`
	Parts      []types.ContentPart     `json:"parts,omitempty"`     // Multimodal content parts (text, image, audio, video)
	CostInfo   *types.CostInfo         `json:"cost_info,omitempty"` // Cost breakdown for this response (includes token counts)
	Latency    time.Duration           `json:"latency"`
	Raw        []byte                  `json:"raw,omitempty"`
	RawRequest interface{}             `json:"raw_request,omitempty"` // Raw API request (for debugging)
	ToolCalls  []types.MessageToolCall `json:"tool_calls,omitempty"`  // Tools called in this response
}

PredictionResponse represents a response from a predict provider

type Pricing

type Pricing struct {
	InputCostPer1K  float64
	OutputCostPer1K float64
}

Pricing defines cost per 1K tokens for input and output

type Provider

type Provider interface {
	ID() string

	Predict(ctx context.Context, req PredictionRequest) (PredictionResponse, error)

	// Streaming support
	PredictStream(ctx context.Context, req PredictionRequest) (<-chan StreamChunk, error)

	SupportsStreaming() bool

	ShouldIncludeRawOutput() bool

	Close() error // Close cleans up provider resources (e.g., HTTP connections)

	// CalculateCost calculates cost breakdown for given token counts
	CalculateCost(inputTokens, outputTokens, cachedTokens int) types.CostInfo
}

Provider interface defines the contract for predict providers

func CreateProviderFromSpec

func CreateProviderFromSpec(spec ProviderSpec) (Provider, error)

CreateProviderFromSpec creates a provider implementation from a spec. Returns an error if the provider type is unsupported.

type ProviderDefaults

type ProviderDefaults struct {
	Temperature float32
	TopP        float32
	MaxTokens   int
	Pricing     Pricing
}

ProviderDefaults holds default parameters for providers

type ProviderFactory added in v1.1.0

type ProviderFactory func(spec ProviderSpec) (Provider, error)

ProviderFactory is a function that creates a provider from a spec

type ProviderSpec

type ProviderSpec struct {
	ID               string
	Type             string
	Model            string
	BaseURL          string
	Defaults         ProviderDefaults
	IncludeRawOutput bool
	AdditionalConfig map[string]interface{} // Flexible key-value pairs for provider-specific configuration
}

ProviderSpec holds the configuration needed to create a provider instance

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry manages available providers

func NewRegistry

func NewRegistry() *Registry

NewRegistry creates a new provider registry

func (*Registry) Close

func (r *Registry) Close() error

Close closes all registered providers and cleans up their resources. Returns the first error encountered, if any.

func (*Registry) Get

func (r *Registry) Get(id string) (Provider, bool)

Get retrieves a provider by ID, returning the provider and a boolean indicating if it was found.

func (*Registry) List

func (r *Registry) List() []string

List returns all registered provider IDs

func (*Registry) Register

func (r *Registry) Register(provider Provider)

Register adds a provider to the registry using its ID as the key.

type SSEScanner

type SSEScanner struct {
	// contains filtered or unexported fields
}

SSEScanner scans Server-Sent Events (SSE) streams

func NewSSEScanner

func NewSSEScanner(r io.Reader) *SSEScanner

NewSSEScanner creates a new SSE scanner

func (*SSEScanner) Data

func (s *SSEScanner) Data() string

Data returns the current event data

func (*SSEScanner) Err

func (s *SSEScanner) Err() error

Err returns any scanning error

func (*SSEScanner) Scan

func (s *SSEScanner) Scan() bool

Scan advances to the next SSE event

type StreamChunk

type StreamChunk struct {
	// Content is the accumulated content so far
	Content string `json:"content"`

	// Delta is the new content in this chunk
	Delta string `json:"delta"`

	// MediaDelta contains new media content in this chunk (audio, video, images)
	// Uses the same MediaContent type as non-streaming messages for API consistency.
	MediaDelta *types.MediaContent `json:"media_delta,omitempty"`

	// TokenCount is the total number of tokens so far
	TokenCount int `json:"token_count"`

	// DeltaTokens is the number of tokens in this delta
	DeltaTokens int `json:"delta_tokens"`

	// ToolCalls contains accumulated tool calls (for assistant messages that invoke tools)
	ToolCalls []types.MessageToolCall `json:"tool_calls,omitempty"`

	// FinishReason is nil until stream is complete
	// Values: "stop", "length", "content_filter", "tool_calls", "error", "validation_failed", "cancelled"
	FinishReason *string `json:"finish_reason,omitempty"`

	// Interrupted indicates the response was interrupted (e.g., user started speaking)
	// When true, clients should clear any buffered audio and prepare for a new response
	Interrupted bool `json:"interrupted,omitempty"`

	// Error is set if an error occurred during streaming
	Error error `json:"error,omitempty"`

	// Metadata contains provider-specific metadata
	Metadata map[string]interface{} `json:"metadata,omitempty"`

	// FinalResult contains the complete execution result (only set in the final chunk)
	FinalResult ExecutionResult `json:"final_result,omitempty"`

	// CostInfo contains cost breakdown (only present in final chunk when FinishReason != nil)
	CostInfo *types.CostInfo `json:"cost_info,omitempty"`
}

StreamChunk represents a batch of tokens with metadata

type StreamEvent

type StreamEvent struct {
	// Type is the event type: "chunk", "complete", "error"
	Type string `json:"type"`

	// Chunk contains the stream chunk data
	Chunk *StreamChunk `json:"chunk,omitempty"`

	// Error is set for error events
	Error error `json:"error,omitempty"`

	// Timestamp is when the event occurred
	Timestamp time.Time `json:"timestamp"`
}

StreamEvent is sent to observers for monitoring

type StreamInputSession added in v1.1.0

type StreamInputSession interface {
	// SendChunk sends a media chunk to the provider.
	// Returns an error if the chunk cannot be sent or the session is closed.
	// This method is safe to call from multiple goroutines.
	SendChunk(ctx context.Context, chunk *types.MediaChunk) error

	// SendText sends a text message to the provider during the streaming session.
	// This is useful for sending text prompts or instructions during audio streaming.
	// Note: This marks the turn as complete, triggering a response.
	SendText(ctx context.Context, text string) error

	// SendSystemContext sends a text message as context without completing the turn.
	// Use this for system prompts that provide context but shouldn't trigger an immediate response.
	// The audio/text that follows will be processed with this context in mind.
	SendSystemContext(ctx context.Context, text string) error

	// Response returns a receive-only channel for streaming responses.
	// The channel is closed when the session ends or encounters an error.
	// Consumers should read from this channel in a separate goroutine.
	Response() <-chan StreamChunk

	// Close ends the streaming session and releases resources.
	// After calling Close, SendChunk and SendText will return errors.
	// The Response channel will be closed.
	// Close is safe to call multiple times.
	Close() error

	// Error returns any error that occurred during the session.
	// Returns nil if no error has occurred.
	Error() error

	// Done returns a channel that's closed when the session ends.
	// This is useful for select statements to detect session completion.
	Done() <-chan struct{}
}

StreamInputSession manages a bidirectional streaming session with a provider. The session allows sending media chunks (e.g., audio from a microphone) and receiving streaming responses from the LLM.

Example usage:

session, err := provider.CreateStreamSession(ctx, StreamInputRequest{
    Config: types.StreamingMediaConfig{
        Type:       types.ContentTypeAudio,
        ChunkSize:  8192,
        SampleRate: 16000,
        Encoding:   "pcm",
        Channels:   1,
    },
})
if err != nil {
    return err
}
defer session.Close()

// Send audio chunks in a goroutine
go func() {
    for chunk := range micInput {
        if err := session.SendChunk(ctx, chunk); err != nil {
            log.Printf("send error: %v", err)
            break
        }
    }
}()

// Receive responses
for chunk := range session.Response() {
    if chunk.Error != nil {
        log.Printf("response error: %v", chunk.Error)
        break
    }
    fmt.Print(chunk.Delta)
}

type StreamInputSupport added in v1.1.0

type StreamInputSupport interface {
	Provider // Extends the base Provider interface

	// CreateStreamSession creates a new bidirectional streaming session.
	// The session remains active until Close() is called or an error occurs.
	// Returns an error if the provider doesn't support the requested media type.
	CreateStreamSession(ctx context.Context, req *StreamingInputConfig) (StreamInputSession, error)

	// SupportsStreamInput returns the media types supported for streaming input.
	// Common values: types.ContentTypeAudio, types.ContentTypeVideo
	SupportsStreamInput() []string

	// GetStreamingCapabilities returns detailed information about streaming support.
	// This includes supported codecs, sample rates, and other constraints.
	GetStreamingCapabilities() StreamingCapabilities
}

StreamInputSupport extends the Provider interface for bidirectional streaming. Providers that implement this interface can handle streaming media input (e.g., real-time audio) and provide streaming responses.

type StreamObserver

type StreamObserver interface {
	OnChunk(chunk StreamChunk)
	OnComplete(totalTokens int, duration time.Duration)
	OnError(err error)
}

StreamObserver receives stream events for monitoring

type StreamingCapabilities added in v1.1.0

type StreamingCapabilities struct {
	// SupportedMediaTypes lists the media types that can be streamed
	// Values: types.ContentTypeAudio, types.ContentTypeVideo
	SupportedMediaTypes []string `json:"supported_media_types"`

	// Audio capabilities
	Audio *AudioStreamingCapabilities `json:"audio,omitempty"`

	// Video capabilities
	Video *VideoStreamingCapabilities `json:"video,omitempty"`

	// BidirectionalSupport indicates if the provider supports full bidirectional streaming
	BidirectionalSupport bool `json:"bidirectional_support"`

	// MaxSessionDuration is the maximum duration for a streaming session (in seconds)
	// Zero means no limit
	MaxSessionDuration int `json:"max_session_duration,omitempty"`

	// MinChunkSize is the minimum chunk size in bytes
	MinChunkSize int `json:"min_chunk_size,omitempty"`

	// MaxChunkSize is the maximum chunk size in bytes
	MaxChunkSize int `json:"max_chunk_size,omitempty"`
}

StreamingCapabilities describes what streaming features a provider supports.

type StreamingInputConfig added in v1.1.6

type StreamingInputConfig struct {
	// Config specifies the media streaming configuration (codec, sample rate, etc.)
	Config types.StreamingMediaConfig `json:"config"`

	// SystemInstruction is the system prompt to configure the model's behavior.
	// For Gemini Live API, this is included in the setup message.
	SystemInstruction string `json:"system_instruction,omitempty"`

	// Tools defines functions the model can call during the session.
	// When configured, the model returns structured tool calls instead of
	// speaking them as text. Supported by Gemini Live API.
	Tools []StreamingToolDefinition `json:"tools,omitempty"`

	// Metadata contains provider-specific session configuration
	// Example: {"response_modalities": ["TEXT", "AUDIO"]} for Gemini
	Metadata map[string]interface{} `json:"metadata,omitempty"`
}

StreamingInputConfig configures a new streaming input session.

func (*StreamingInputConfig) Validate added in v1.1.6

func (r *StreamingInputConfig) Validate() error

Validate checks if the StreamInputRequest is valid

type StreamingToolDefinition added in v1.1.6

type StreamingToolDefinition struct {
	Name        string                 `json:"name"`
	Description string                 `json:"description,omitempty"`
	Parameters  map[string]interface{} `json:"parameters,omitempty"` // JSON Schema
}

StreamingToolDefinition represents a function/tool available in streaming sessions.

type ToolDescriptor

type ToolDescriptor struct {
	Name         string          `json:"name"`
	Description  string          `json:"description"`
	InputSchema  json.RawMessage `json:"input_schema"`
	OutputSchema json.RawMessage `json:"output_schema"`
}

ToolDescriptor represents a tool that can be used by providers

type ToolResponse added in v1.1.6

type ToolResponse struct {
	ToolCallID string `json:"tool_call_id"`
	Result     string `json:"result"`
	IsError    bool   `json:"is_error,omitempty"` // True if the tool execution failed
}

ToolResponse represents a single tool execution result.

type ToolResponseSupport added in v1.1.6

type ToolResponseSupport interface {
	// SendToolResponse sends the result of a tool execution back to the model.
	// The toolCallID must match the ID from the MessageToolCall.
	// The result is typically JSON-encoded but the format depends on the tool.
	// After receiving the tool response, the model will continue generating.
	SendToolResponse(ctx context.Context, toolCallID string, result string) error

	// SendToolResponses sends multiple tool results at once (for parallel tool calls).
	// This is more efficient than sending individual responses for providers that
	// support batched tool responses.
	SendToolResponses(ctx context.Context, responses []ToolResponse) error
}

ToolResponseSupport is an optional interface for streaming sessions that support tool calling. When the model returns a tool call, the caller can execute the tool and send the result back using this interface. The session will then continue generating a response based on the tool result.

Use type assertion to check if a StreamInputSession supports this interface:

if toolSession, ok := session.(ToolResponseSupport); ok {
    err := toolSession.SendToolResponse(ctx, toolCallID, result)
}

type ToolResult

type ToolResult = types.MessageToolResult

ToolResult represents the result of a tool execution This is an alias to types.MessageToolResult for provider-specific context

type ToolSupport

type ToolSupport interface {
	Provider // Extends the base Provider interface

	// BuildTooling converts tool descriptors to provider-native format
	BuildTooling(descriptors []*ToolDescriptor) (interface{}, error)

	// PredictWithTools performs a predict request with tool support
	PredictWithTools(ctx context.Context, req PredictionRequest, tools interface{}, toolChoice string) (PredictionResponse, []types.MessageToolCall, error)

	// PredictStreamWithTools performs a streaming predict request with tool support
	PredictStreamWithTools(
		ctx context.Context,
		req PredictionRequest,
		tools interface{},
		toolChoice string,
	) (<-chan StreamChunk, error)
}

ToolSupport interface for providers that support tool/function calling

type UnsupportedContentError added in v1.1.0

type UnsupportedContentError struct {
	Provider    string // Provider ID
	ContentType string // "image", "audio", "video", or "multimodal"
	Message     string // Human-readable error message
	PartIndex   int    // Index of the unsupported content part (if applicable)
	MIMEType    string // Specific MIME type that's unsupported (if applicable)
}

UnsupportedContentError is returned when a provider doesn't support certain content types

func (*UnsupportedContentError) Error added in v1.1.0

func (e *UnsupportedContentError) Error() string

type UnsupportedProviderError

type UnsupportedProviderError struct {
	ProviderType string
}

UnsupportedProviderError is returned when a provider type is not recognized

func (*UnsupportedProviderError) Error

func (e *UnsupportedProviderError) Error() string

Error returns the error message for this unsupported provider error.

type ValidationAbortError

type ValidationAbortError struct {
	Reason string
	Chunk  StreamChunk
}

ValidationAbortError is returned when a streaming validator aborts a stream

func (*ValidationAbortError) Error

func (e *ValidationAbortError) Error() string

Error returns the error message for this validation abort error.

type VideoResolution added in v1.1.0

type VideoResolution struct {
	Width  int `json:"width"`
	Height int `json:"height"`
}

VideoResolution represents a video resolution.

func (VideoResolution) String added in v1.1.0

func (r VideoResolution) String() string

String returns a string representation of the resolution (e.g., "1920x1080")

type VideoStreamingCapabilities added in v1.1.0

type VideoStreamingCapabilities struct {
	// SupportedEncodings lists supported video encodings
	// Common values: "h264", "vp8", "vp9", "av1"
	SupportedEncodings []string `json:"supported_encodings"`

	// SupportedResolutions lists supported resolutions (width x height)
	SupportedResolutions []VideoResolution `json:"supported_resolutions"`

	// SupportedFrameRates lists supported frame rates
	// Common values: 15, 24, 30, 60
	SupportedFrameRates []int `json:"supported_frame_rates"`

	// PreferredEncoding is the recommended encoding
	PreferredEncoding string `json:"preferred_encoding"`

	// PreferredResolution is the recommended resolution
	PreferredResolution VideoResolution `json:"preferred_resolution"`

	// PreferredFrameRate is the recommended frame rate
	PreferredFrameRate int `json:"preferred_frame_rate"`
}

VideoStreamingCapabilities describes video streaming support.

Directories

Path Synopsis
Package claude provides Anthropic Claude LLM provider integration.
Package claude provides Anthropic Claude LLM provider integration.
Package gemini provides Gemini Live API streaming support.
Package gemini provides Gemini Live API streaming support.
Package imagen provides Google Imagen image generation provider integration.
Package imagen provides Google Imagen image generation provider integration.
Package mock provides mock provider implementation for testing and development.
Package mock provides mock provider implementation for testing and development.
Package openai provides OpenAI LLM provider integration.
Package openai provides OpenAI LLM provider integration.
Package replay provides a provider that replays recorded sessions deterministically.
Package replay provides a provider that replays recorded sessions deterministically.
Package voyageai provides embedding generation via the Voyage AI API.
Package voyageai provides embedding generation via the Voyage AI API.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL