Documentation
¶
Index ¶
- Constants
- func CountMediaParts(msg Message) int
- func CountPartsByType(msg Message, contentType string) int
- func ExtractTextContent(msg Message) string
- func HasOnlyTextContent(msg Message) bool
- func MigrateMessagesToLegacy(messages []Message) error
- func MigrateMessagesToMultimodal(messages []Message)
- func MigrateToLegacy(msg *Message) error
- func MigrateToMultimodal(msg *Message)
- type ChunkReader
- type ChunkWriter
- type ContentPart
- func NewAudioPart(filePath string) (ContentPart, error)
- func NewAudioPartFromData(base64Data, mimeType string) ContentPart
- func NewImagePart(filePath string, detail *string) (ContentPart, error)
- func NewImagePartFromData(base64Data, mimeType string, detail *string) ContentPart
- func NewImagePartFromURL(url string, detail *string) ContentPart
- func NewTextPart(text string) ContentPart
- func NewVideoPart(filePath string) (ContentPart, error)
- func NewVideoPartFromData(base64Data, mimeType string) ContentPart
- func SplitMultimodalMessage(msg Message) (text string, mediaParts []ContentPart)
- type CostInfo
- type MediaChunk
- type MediaContent
- type MediaItemSummary
- type MediaSummary
- type Message
- func (m *Message) AddAudioPart(filePath string) error
- func (m *Message) AddImagePart(filePath string, detail *string) error
- func (m *Message) AddImagePartFromURL(url string, detail *string)
- func (m *Message) AddPart(part ContentPart)
- func (m *Message) AddTextPart(text string)
- func (m *Message) AddVideoPart(filePath string) error
- func (m *Message) GetContent() string
- func (m *Message) HasMediaContent() bool
- func (m *Message) IsMultimodal() bool
- func (m Message) MarshalJSON() ([]byte, error)
- func (m *Message) SetMultimodalContent(parts []ContentPart)
- func (m *Message) SetTextContent(text string)
- type MessageToolCall
- type MessageToolResult
- type StreamingMediaConfig
- type ToolDef
- type ToolStats
- type ValidationError
- type ValidationResult
Constants ¶
const ( ContentTypeText = "text" ContentTypeImage = "image" ContentTypeAudio = "audio" ContentTypeVideo = "video" )
ContentType constants for different content part types
const ( MIMETypeImageJPEG = "image/jpeg" MIMETypeImagePNG = "image/png" MIMETypeImageGIF = "image/gif" MIMETypeImageWebP = "image/webp" MIMETypeAudioMP3 = "audio/mpeg" MIMETypeAudioWAV = "audio/wav" MIMETypeAudioOgg = "audio/ogg" MIMETypeAudioWebM = "audio/webm" MIMETypeVideoMP4 = "video/mp4" MIMETypeVideoWebM = "video/webm" MIMETypeVideoOgg = "video/ogg" )
Common MIME types
Variables ¶
This section is empty.
Functions ¶
func CountMediaParts ¶ added in v1.1.0
CountMediaParts returns the number of media parts (image, audio, video) in a message
func CountPartsByType ¶ added in v1.1.0
CountPartsByType returns the number of parts of a specific type in a message
func ExtractTextContent ¶ added in v1.1.0
ExtractTextContent extracts all text content from a message, regardless of format. This is useful for backward compatibility when you need just the text.
func HasOnlyTextContent ¶ added in v1.1.0
HasOnlyTextContent returns true if the message contains only text (no media)
func MigrateMessagesToLegacy ¶ added in v1.1.0
MigrateMessagesToLegacy converts a slice of multimodal messages to legacy format in-place. Returns an error if any message contains media content.
func MigrateMessagesToMultimodal ¶ added in v1.1.0
func MigrateMessagesToMultimodal(messages []Message)
MigrateMessagesToMultimodal converts a slice of legacy messages to multimodal format in-place
func MigrateToLegacy ¶ added in v1.1.0
MigrateToLegacy converts a multimodal message back to legacy text-only format. This is useful for backward compatibility with systems that don't support multimodal. Returns an error if the message contains non-text content.
func MigrateToMultimodal ¶ added in v1.1.0
func MigrateToMultimodal(msg *Message)
MigrateToMultimodal converts a legacy text-only message to use the Parts structure. This is useful when transitioning existing code to the new multimodal API.
Types ¶
type ChunkReader ¶ added in v1.1.0
type ChunkReader struct {
// contains filtered or unexported fields
}
ChunkReader reads from an io.Reader and produces MediaChunks. Useful for converting continuous streams (e.g., microphone input) into chunks.
Example usage:
reader := NewChunkReader(micInput, config)
for {
chunk, err := reader.NextChunk(ctx)
if err == io.EOF {
break
}
if err != nil {
return err
}
session.SendChunk(ctx, chunk)
}
func NewChunkReader ¶ added in v1.1.0
func NewChunkReader(r io.Reader, config StreamingMediaConfig) *ChunkReader
NewChunkReader creates a new ChunkReader that reads from the given reader and produces MediaChunks according to the config.
func (*ChunkReader) NextChunk ¶ added in v1.1.0
func (cr *ChunkReader) NextChunk(ctx context.Context) (*MediaChunk, error)
NextChunk reads the next chunk from the reader. Returns io.EOF when the stream is complete. The returned chunk's IsLast field will be true on the final chunk.
type ChunkWriter ¶ added in v1.1.0
type ChunkWriter struct {
// contains filtered or unexported fields
}
ChunkWriter writes MediaChunks to an io.Writer. Useful for converting chunks back into continuous streams (e.g., speaker output).
Example usage:
writer := NewChunkWriter(speakerOutput)
for chunk := range session.Response() {
if chunk.MediaDelta != nil {
err := writer.WriteChunk(chunk.MediaDelta)
if err != nil {
return err
}
}
}
func NewChunkWriter ¶ added in v1.1.0
func NewChunkWriter(w io.Writer) *ChunkWriter
NewChunkWriter creates a new ChunkWriter that writes to the given writer.
func (*ChunkWriter) Flush ¶ added in v1.1.0
func (cw *ChunkWriter) Flush() error
Flush flushes any buffered data to the underlying writer (if it supports flushing).
func (*ChunkWriter) WriteChunk ¶ added in v1.1.0
func (cw *ChunkWriter) WriteChunk(chunk *MediaChunk) (int, error)
WriteChunk writes a MediaChunk to the underlying writer. Returns the number of bytes written and any error encountered.
type ContentPart ¶ added in v1.1.0
type ContentPart struct {
Type string `json:"type"` // "text", "image", "audio", "video"
// For text content
Text *string `json:"text,omitempty"`
// For media content (image, audio, video)
Media *MediaContent `json:"media,omitempty"`
}
ContentPart represents a single piece of content in a multimodal message. A message can contain multiple parts: text, images, audio, video, etc.
func NewAudioPart ¶ added in v1.1.0
func NewAudioPart(filePath string) (ContentPart, error)
NewAudioPart creates a ContentPart with audio content from a file path
func NewAudioPartFromData ¶ added in v1.1.0
func NewAudioPartFromData(base64Data, mimeType string) ContentPart
NewAudioPartFromData creates a ContentPart with base64-encoded audio data
func NewImagePart ¶ added in v1.1.0
func NewImagePart(filePath string, detail *string) (ContentPart, error)
NewImagePart creates a ContentPart with image content from a file path
func NewImagePartFromData ¶ added in v1.1.0
func NewImagePartFromData(base64Data, mimeType string, detail *string) ContentPart
NewImagePartFromData creates a ContentPart with base64-encoded image data
func NewImagePartFromURL ¶ added in v1.1.0
func NewImagePartFromURL(url string, detail *string) ContentPart
NewImagePartFromURL creates a ContentPart with image content from a URL
func NewTextPart ¶ added in v1.1.0
func NewTextPart(text string) ContentPart
NewTextPart creates a ContentPart with text content
func NewVideoPart ¶ added in v1.1.0
func NewVideoPart(filePath string) (ContentPart, error)
NewVideoPart creates a ContentPart with video content from a file path
func NewVideoPartFromData ¶ added in v1.1.0
func NewVideoPartFromData(base64Data, mimeType string) ContentPart
NewVideoPartFromData creates a ContentPart with base64-encoded video data
func SplitMultimodalMessage ¶ added in v1.1.0
func SplitMultimodalMessage(msg Message) (text string, mediaParts []ContentPart)
SplitMultimodalMessage splits a multimodal message into separate text and media parts. Returns the text content and a slice of media content parts.
func (*ContentPart) Validate ¶ added in v1.1.0
func (cp *ContentPart) Validate() error
Validate checks if the ContentPart is valid
type CostInfo ¶
type CostInfo struct {
InputTokens int `json:"input_tokens"` // Number of input tokens consumed
OutputTokens int `json:"output_tokens"` // Number of output tokens generated
CachedTokens int `json:"cached_tokens,omitempty"` // Number of cached tokens used (reduces cost)
InputCostUSD float64 `json:"input_cost_usd"` // Cost of input tokens in USD
OutputCostUSD float64 `json:"output_cost_usd"` // Cost of output tokens in USD
CachedCostUSD float64 `json:"cached_cost_usd,omitempty"` // Cost savings from cached tokens
TotalCost float64 `json:"total_cost_usd"` // Total cost in USD
}
CostInfo tracks token usage and associated costs for LLM operations. All cost values are in USD. Used for both individual messages and aggregated tracking.
type MediaChunk ¶ added in v1.1.0
type MediaChunk struct {
// Data contains the raw media bytes for this chunk
Data []byte `json:"data"`
// SequenceNum is the sequence number for ordering chunks (starts at 0)
SequenceNum int64 `json:"sequence_num"`
// Timestamp indicates when this chunk was created
Timestamp time.Time `json:"timestamp"`
// IsLast indicates if this is the final chunk in the stream
IsLast bool `json:"is_last"`
// Metadata contains chunk-specific metadata (MIME type, encoding, etc.)
Metadata map[string]string `json:"metadata,omitempty"`
}
MediaChunk represents a chunk of streaming media data. Used for bidirectional streaming where media is sent or received in chunks.
Example usage:
chunk := &MediaChunk{
Data: audioData,
SequenceNum: 1,
Timestamp: time.Now(),
IsLast: false,
Metadata: map[string]string{"mime_type": "audio/pcm"},
}
type MediaContent ¶ added in v1.1.0
type MediaContent struct {
// Data source - exactly one should be set
Data *string `json:"data,omitempty"` // Base64-encoded media data
FilePath *string `json:"file_path,omitempty"` // Local file path
URL *string `json:"url,omitempty"` // External URL (http/https)
// Storage backend reference (used when media is externalized)
StorageReference *string `json:"storage_reference,omitempty"` // Backend-specific storage reference
// Media metadata
MIMEType string `json:"mime_type"` // e.g., "image/jpeg", "audio/mp3", "video/mp4"
Format *string `json:"format,omitempty"` // Optional format hint (e.g., "png", "mp3", "mp4")
SizeKB *int64 `json:"size_kb,omitempty"` // Optional size in kilobytes
Detail *string `json:"detail,omitempty"` // Optional detail level for images: "low", "high", "auto"
Caption *string `json:"caption,omitempty"` // Optional caption/description
Duration *int `json:"duration,omitempty"` // Optional duration in seconds (for audio/video)
BitRate *int `json:"bit_rate,omitempty"` // Optional bit rate in kbps (for audio/video)
Channels *int `json:"channels,omitempty"` // Optional number of channels (for audio)
Width *int `json:"width,omitempty"` // Optional width in pixels (for image/video)
Height *int `json:"height,omitempty"` // Optional height in pixels (for image/video)
FPS *int `json:"fps,omitempty"` // Optional frames per second (for video)
PolicyName *string `json:"policy_name,omitempty"` // Retention policy name
}
MediaContent represents media data (image, audio, video) in a message. Supports both inline base64 data and external file/URL references.
func (*MediaContent) GetBase64Data
deprecated
added in
v1.1.0
func (mc *MediaContent) GetBase64Data() (string, error)
GetBase64Data returns the base64-encoded data for this media content. If the data is already base64-encoded, it returns it directly. If the data is from a file, it reads and encodes the file. If the data is from a URL or StorageReference, it returns an error (caller should use MediaLoader).
Deprecated: For new code, use providers.MediaLoader.GetBase64Data which supports all sources including storage references and URLs with proper context handling.
func (*MediaContent) ReadData ¶ added in v1.1.0
func (mc *MediaContent) ReadData() (io.ReadCloser, error)
ReadData returns an io.Reader for the media content. For base64 data, it decodes and returns a reader. For file paths, it opens and returns the file. For URLs, it returns an error (caller should fetch separately).
func (*MediaContent) Validate ¶ added in v1.1.0
func (mc *MediaContent) Validate() error
Validate checks if the MediaContent is valid
type MediaItemSummary ¶ added in v1.1.0
type MediaItemSummary struct {
Type string `json:"type"` // Content type: "image", "audio", "video"
Source string `json:"source"` // Source description (file path, URL, or "inline data")
MIMEType string `json:"mime_type"` // MIME type
SizeBytes int `json:"size_bytes"` // Size in bytes (0 if unknown)
Detail string `json:"detail,omitempty"` // Detail level for images
Loaded bool `json:"loaded"` // Whether media was successfully loaded
Error string `json:"error,omitempty"` // Error message if loading failed
}
MediaItemSummary provides details about a single media item in a message.
type MediaSummary ¶ added in v1.1.0
type MediaSummary struct {
TotalParts int `json:"total_parts"` // Total number of content parts
TextParts int `json:"text_parts"` // Number of text parts
ImageParts int `json:"image_parts"` // Number of image parts
AudioParts int `json:"audio_parts"` // Number of audio parts
VideoParts int `json:"video_parts"` // Number of video parts
MediaItems []MediaItemSummary `json:"media_items,omitempty"` // Details of each media item
}
MediaSummary provides a high-level overview of media content in a message. This is included in JSON output to make multimodal messages more observable.
type Message ¶
type Message struct {
Role string `json:"role"` // "system", "user", "assistant", "tool"
Content string `json:"content"` // Message content (legacy text-only, maintained for backward compatibility)
// Multimodal content parts (text, images, audio, video)
// If Parts is non-empty, it takes precedence over Content.
// For backward compatibility, if Parts is empty, Content will be used.
Parts []ContentPart `json:"parts,omitempty"`
// Tool invocations (for assistant messages that call tools)
ToolCalls []MessageToolCall `json:"tool_calls,omitempty"`
// Tool result (for tool role messages)
// When Role="tool", this contains the tool execution result
ToolResult *MessageToolResult `json:"tool_result,omitempty"`
// Source indicates where this message originated (runtime-only, not persisted in JSON)
// Values: "statestore" (loaded from StateStore), "pipeline" (created during execution), "" (user input)
Source string `json:"-"`
// Metadata for observability and tracking
Timestamp time.Time `json:"timestamp,omitempty"` // When the message was created
LatencyMs int64 `json:"latency_ms,omitempty"` // Time taken to generate (for assistant messages)
CostInfo *CostInfo `json:"cost_info,omitempty"` // Token usage and cost tracking
Meta map[string]interface{} `json:"meta,omitempty"` // Custom metadata
// Validation results (for assistant messages)
Validations []ValidationResult `json:"validations,omitempty"`
}
Message represents a single message in a conversation. This is the canonical message type used throughout the system.
func CloneMessage ¶ added in v1.1.0
CloneMessage creates a deep copy of a message
func CombineTextAndMedia ¶ added in v1.1.0
func CombineTextAndMedia(role, text string, mediaParts []ContentPart) Message
CombineTextAndMedia creates a multimodal message from separate text and media parts. This is the inverse of SplitMultimodalMessage.
func ConvertTextToMultimodal ¶ added in v1.1.0
ConvertTextToMultimodal is a convenience function that creates a multimodal message from a role and text content. This helps with code migration.
func (*Message) AddAudioPart ¶ added in v1.1.0
AddAudioPart adds an audio content part from a file path
func (*Message) AddImagePart ¶ added in v1.1.0
AddImagePart adds an image content part from a file path
func (*Message) AddImagePartFromURL ¶ added in v1.1.0
AddImagePartFromURL adds an image content part from a URL
func (*Message) AddPart ¶ added in v1.1.0
func (m *Message) AddPart(part ContentPart)
AddPart adds a content part to the message. If this is the first part added, it clears the legacy Content field.
func (*Message) AddTextPart ¶ added in v1.1.0
AddTextPart adds a text content part to the message
func (*Message) AddVideoPart ¶ added in v1.1.0
AddVideoPart adds a video content part from a file path
func (*Message) GetContent ¶ added in v1.1.0
GetContent returns the content of the message. If Parts is non-empty, it returns only the text parts concatenated. Otherwise, it returns the legacy Content field.
func (*Message) HasMediaContent ¶ added in v1.1.0
HasMediaContent returns true if the message contains any media (image, audio, video)
func (*Message) IsMultimodal ¶ added in v1.1.0
IsMultimodal returns true if the message contains multimodal content (Parts)
func (Message) MarshalJSON ¶ added in v1.1.0
MarshalJSON implements custom JSON marshaling for Message. This enhances the output by: 1. Populating the Content field with a human-readable summary when Parts exist 2. Adding a MediaSummary field for observability of multimodal content
func (*Message) SetMultimodalContent ¶ added in v1.1.0
func (m *Message) SetMultimodalContent(parts []ContentPart)
SetMultimodalContent sets the message content to multimodal parts. This clears the legacy Content field.
func (*Message) SetTextContent ¶ added in v1.1.0
SetTextContent sets the message content to simple text. This clears any existing Parts and sets the legacy Content field.
type MessageToolCall ¶
type MessageToolCall struct {
ID string `json:"id"` // Unique identifier for this tool call
Name string `json:"name"` // Name of the tool to invoke
Args json.RawMessage `json:"args"` // JSON-encoded tool arguments
}
MessageToolCall represents a request to call a tool within a Message. The Args field contains the JSON-encoded arguments for the tool.
type MessageToolResult ¶
type MessageToolResult struct {
ID string `json:"id"` // References the MessageToolCall.ID that triggered this result
Name string `json:"name"` // Tool name that was executed
Content string `json:"content"` // Result content or error message
Error string `json:"error,omitempty"` // Error message if tool execution failed
LatencyMs int64 `json:"latency_ms"` // Tool execution latency in milliseconds
}
MessageToolResult represents the result of a tool execution in a Message. When embedded in Message, the Message.Role should be "tool".
type StreamingMediaConfig ¶ added in v1.1.0
type StreamingMediaConfig struct {
// Type specifies the media type being streamed
// Values: ContentTypeAudio, ContentTypeVideo
Type string `json:"type"`
// ChunkSize is the target size in bytes for each chunk
// Typical values: 4096-8192 for audio, 32768-65536 for video
ChunkSize int `json:"chunk_size"`
// SampleRate is the audio sample rate in Hz
// Common values: 8000 (phone quality), 16000 (wideband), 44100 (CD quality), 48000 (pro audio)
SampleRate int `json:"sample_rate,omitempty"`
// Encoding specifies the audio encoding format
// Values: "pcm" (raw), "opus", "mp3", "aac"
Encoding string `json:"encoding,omitempty"`
// Channels is the number of audio channels
// Values: 1 (mono), 2 (stereo)
Channels int `json:"channels,omitempty"`
// BitDepth is the audio bit depth in bits
// Common values: 16, 24, 32
BitDepth int `json:"bit_depth,omitempty"`
// Width is the video width in pixels
Width int `json:"width,omitempty"`
// Height is the video height in pixels
Height int `json:"height,omitempty"`
// FrameRate is the video frame rate (FPS)
// Common values: 24, 30, 60
FrameRate int `json:"frame_rate,omitempty"`
// BufferSize is the maximum number of chunks to buffer
// Larger values increase latency but provide more stability
// Typical values: 5-20
BufferSize int `json:"buffer_size,omitempty"`
// FlushInterval is how often to flush buffered data (if applicable)
FlushInterval time.Duration `json:"flush_interval,omitempty"`
// Metadata contains additional provider-specific configuration
Metadata map[string]interface{} `json:"metadata,omitempty"`
}
StreamingMediaConfig configures streaming media input parameters. Used to configure audio/video streaming sessions with providers.
Example usage for audio streaming:
config := &StreamingMediaConfig{
Type: ContentTypeAudio,
ChunkSize: 8192, // 8KB chunks
SampleRate: 16000, // 16kHz audio
Encoding: "pcm", // Raw PCM audio
Channels: 1, // Mono
BufferSize: 10, // Buffer 10 chunks
}
func (*StreamingMediaConfig) Validate ¶ added in v1.1.0
func (c *StreamingMediaConfig) Validate() error
Validate checks if the StreamingMediaConfig is valid
type ToolDef ¶
type ToolDef struct {
Name string `json:"name"` // Unique tool name
Description string `json:"description"` // Human-readable description of what the tool does
InputSchema json.RawMessage `json:"input_schema"` // JSON Schema for input validation
OutputSchema json.RawMessage `json:"output_schema,omitempty"` // Optional JSON Schema for output validation
}
ToolDef represents a tool definition that can be provided to an LLM. The InputSchema and OutputSchema use JSON Schema format for validation.
type ToolStats ¶
type ToolStats struct {
TotalCalls int `json:"total_calls"` // Total number of tool calls
ByTool map[string]int `json:"by_tool"` // Count of calls per tool name
}
ToolStats tracks tool usage statistics across a conversation or run. Useful for monitoring which tools are being used and how frequently.
type ValidationError ¶
type ValidationError struct {
Type string `json:"type"` // Error type: "args_invalid" | "result_invalid" | "policy_violation"
Tool string `json:"tool"` // Name of the tool that failed validation
Detail string `json:"detail"` // Human-readable error details
}
ValidationError represents a validation failure in tool usage or message content. Used to provide structured error information when validation fails.
type ValidationResult ¶
type ValidationResult struct {
ValidatorType string `json:"validator_type"` // Type of validator
Passed bool `json:"passed"` // Whether the validation passed
Details map[string]interface{} `json:"details,omitempty"` // Validator-specific details
Timestamp time.Time `json:"timestamp,omitempty"` // When validation was performed
}
ValidationResult represents the outcome of a validator check on a message. These are attached to assistant messages to show which validations passed or failed.