types

package
v1.1.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 23, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

View Source
const (
	ContentTypeText  = "text"
	ContentTypeImage = "image"
	ContentTypeAudio = "audio"
	ContentTypeVideo = "video"
)

ContentType constants for different content part types

View Source
const (
	MIMETypeImageJPEG = "image/jpeg"
	MIMETypeImagePNG  = "image/png"
	MIMETypeImageGIF  = "image/gif"
	MIMETypeImageWebP = "image/webp"

	MIMETypeAudioMP3  = "audio/mpeg"
	MIMETypeAudioWAV  = "audio/wav"
	MIMETypeAudioOgg  = "audio/ogg"
	MIMETypeAudioWebM = "audio/webm"

	MIMETypeVideoMP4  = "video/mp4"
	MIMETypeVideoWebM = "video/webm"
	MIMETypeVideoOgg  = "video/ogg"
)

Common MIME types

Variables

This section is empty.

Functions

func CountMediaParts added in v1.1.0

func CountMediaParts(msg Message) int

CountMediaParts returns the number of media parts (image, audio, video) in a message

func CountPartsByType added in v1.1.0

func CountPartsByType(msg Message, contentType string) int

CountPartsByType returns the number of parts of a specific type in a message

func ExtractTextContent added in v1.1.0

func ExtractTextContent(msg Message) string

ExtractTextContent extracts all text content from a message, regardless of format. This is useful for backward compatibility when you need just the text.

func HasOnlyTextContent added in v1.1.0

func HasOnlyTextContent(msg Message) bool

HasOnlyTextContent returns true if the message contains only text (no media)

func MigrateMessagesToLegacy added in v1.1.0

func MigrateMessagesToLegacy(messages []Message) error

MigrateMessagesToLegacy converts a slice of multimodal messages to legacy format in-place. Returns an error if any message contains media content.

func MigrateMessagesToMultimodal added in v1.1.0

func MigrateMessagesToMultimodal(messages []Message)

MigrateMessagesToMultimodal converts a slice of legacy messages to multimodal format in-place

func MigrateToLegacy added in v1.1.0

func MigrateToLegacy(msg *Message) error

MigrateToLegacy converts a multimodal message back to legacy text-only format. This is useful for backward compatibility with systems that don't support multimodal. Returns an error if the message contains non-text content.

func MigrateToMultimodal added in v1.1.0

func MigrateToMultimodal(msg *Message)

MigrateToMultimodal converts a legacy text-only message to use the Parts structure. This is useful when transitioning existing code to the new multimodal API.

Types

type ChunkReader added in v1.1.0

type ChunkReader struct {
	// contains filtered or unexported fields
}

ChunkReader reads from an io.Reader and produces MediaChunks. Useful for converting continuous streams (e.g., microphone input) into chunks.

Example usage:

reader := NewChunkReader(micInput, config)
for {
    chunk, err := reader.NextChunk(ctx)
    if err == io.EOF {
        break
    }
    if err != nil {
        return err
    }
    session.SendChunk(ctx, chunk)
}

func NewChunkReader added in v1.1.0

func NewChunkReader(r io.Reader, config StreamingMediaConfig) *ChunkReader

NewChunkReader creates a new ChunkReader that reads from the given reader and produces MediaChunks according to the config.

func (*ChunkReader) NextChunk added in v1.1.0

func (cr *ChunkReader) NextChunk(ctx context.Context) (*MediaChunk, error)

NextChunk reads the next chunk from the reader. Returns io.EOF when the stream is complete. The returned chunk's IsLast field will be true on the final chunk.

type ChunkWriter added in v1.1.0

type ChunkWriter struct {
	// contains filtered or unexported fields
}

ChunkWriter writes MediaChunks to an io.Writer. Useful for converting chunks back into continuous streams (e.g., speaker output).

Example usage:

writer := NewChunkWriter(speakerOutput)
for chunk := range session.Response() {
    if chunk.MediaDelta != nil {
        err := writer.WriteChunk(chunk.MediaDelta)
        if err != nil {
            return err
        }
    }
}

func NewChunkWriter added in v1.1.0

func NewChunkWriter(w io.Writer) *ChunkWriter

NewChunkWriter creates a new ChunkWriter that writes to the given writer.

func (*ChunkWriter) Flush added in v1.1.0

func (cw *ChunkWriter) Flush() error

Flush flushes any buffered data to the underlying writer (if it supports flushing).

func (*ChunkWriter) WriteChunk added in v1.1.0

func (cw *ChunkWriter) WriteChunk(chunk *MediaChunk) (int, error)

WriteChunk writes a MediaChunk to the underlying writer. Returns the number of bytes written and any error encountered.

type ContentPart added in v1.1.0

type ContentPart struct {
	Type string `json:"type"` // "text", "image", "audio", "video"

	// For text content
	Text *string `json:"text,omitempty"`

	// For media content (image, audio, video)
	Media *MediaContent `json:"media,omitempty"`
}

ContentPart represents a single piece of content in a multimodal message. A message can contain multiple parts: text, images, audio, video, etc.

func NewAudioPart added in v1.1.0

func NewAudioPart(filePath string) (ContentPart, error)

NewAudioPart creates a ContentPart with audio content from a file path

func NewAudioPartFromData added in v1.1.0

func NewAudioPartFromData(base64Data, mimeType string) ContentPart

NewAudioPartFromData creates a ContentPart with base64-encoded audio data

func NewImagePart added in v1.1.0

func NewImagePart(filePath string, detail *string) (ContentPart, error)

NewImagePart creates a ContentPart with image content from a file path

func NewImagePartFromData added in v1.1.0

func NewImagePartFromData(base64Data, mimeType string, detail *string) ContentPart

NewImagePartFromData creates a ContentPart with base64-encoded image data

func NewImagePartFromURL added in v1.1.0

func NewImagePartFromURL(url string, detail *string) ContentPart

NewImagePartFromURL creates a ContentPart with image content from a URL

func NewTextPart added in v1.1.0

func NewTextPart(text string) ContentPart

NewTextPart creates a ContentPart with text content

func NewVideoPart added in v1.1.0

func NewVideoPart(filePath string) (ContentPart, error)

NewVideoPart creates a ContentPart with video content from a file path

func NewVideoPartFromData added in v1.1.0

func NewVideoPartFromData(base64Data, mimeType string) ContentPart

NewVideoPartFromData creates a ContentPart with base64-encoded video data

func SplitMultimodalMessage added in v1.1.0

func SplitMultimodalMessage(msg Message) (text string, mediaParts []ContentPart)

SplitMultimodalMessage splits a multimodal message into separate text and media parts. Returns the text content and a slice of media content parts.

func (*ContentPart) Validate added in v1.1.0

func (cp *ContentPart) Validate() error

Validate checks if the ContentPart is valid

type CostInfo

type CostInfo struct {
	InputTokens   int     `json:"input_tokens"`              // Number of input tokens consumed
	OutputTokens  int     `json:"output_tokens"`             // Number of output tokens generated
	CachedTokens  int     `json:"cached_tokens,omitempty"`   // Number of cached tokens used (reduces cost)
	InputCostUSD  float64 `json:"input_cost_usd"`            // Cost of input tokens in USD
	OutputCostUSD float64 `json:"output_cost_usd"`           // Cost of output tokens in USD
	CachedCostUSD float64 `json:"cached_cost_usd,omitempty"` // Cost savings from cached tokens
	TotalCost     float64 `json:"total_cost_usd"`            // Total cost in USD
}

CostInfo tracks token usage and associated costs for LLM operations. All cost values are in USD. Used for both individual messages and aggregated tracking.

type MediaChunk added in v1.1.0

type MediaChunk struct {
	// Data contains the raw media bytes for this chunk
	Data []byte `json:"data"`

	// SequenceNum is the sequence number for ordering chunks (starts at 0)
	SequenceNum int64 `json:"sequence_num"`

	// Timestamp indicates when this chunk was created
	Timestamp time.Time `json:"timestamp"`

	// IsLast indicates if this is the final chunk in the stream
	IsLast bool `json:"is_last"`

	// Metadata contains chunk-specific metadata (MIME type, encoding, etc.)
	Metadata map[string]string `json:"metadata,omitempty"`
}

MediaChunk represents a chunk of streaming media data. Used for bidirectional streaming where media is sent or received in chunks.

Example usage:

chunk := &MediaChunk{
    Data:        audioData,
    SequenceNum: 1,
    Timestamp:   time.Now(),
    IsLast:      false,
    Metadata:    map[string]string{"mime_type": "audio/pcm"},
}

type MediaContent added in v1.1.0

type MediaContent struct {
	// Data source - exactly one should be set
	Data     *string `json:"data,omitempty"`      // Base64-encoded media data
	FilePath *string `json:"file_path,omitempty"` // Local file path
	URL      *string `json:"url,omitempty"`       // External URL (http/https)

	// Storage backend reference (used when media is externalized)
	StorageReference *string `json:"storage_reference,omitempty"` // Backend-specific storage reference

	// Media metadata
	MIMEType   string  `json:"mime_type"`             // e.g., "image/jpeg", "audio/mp3", "video/mp4"
	Format     *string `json:"format,omitempty"`      // Optional format hint (e.g., "png", "mp3", "mp4")
	SizeKB     *int64  `json:"size_kb,omitempty"`     // Optional size in kilobytes
	Detail     *string `json:"detail,omitempty"`      // Optional detail level for images: "low", "high", "auto"
	Caption    *string `json:"caption,omitempty"`     // Optional caption/description
	Duration   *int    `json:"duration,omitempty"`    // Optional duration in seconds (for audio/video)
	BitRate    *int    `json:"bit_rate,omitempty"`    // Optional bit rate in kbps (for audio/video)
	Channels   *int    `json:"channels,omitempty"`    // Optional number of channels (for audio)
	Width      *int    `json:"width,omitempty"`       // Optional width in pixels (for image/video)
	Height     *int    `json:"height,omitempty"`      // Optional height in pixels (for image/video)
	FPS        *int    `json:"fps,omitempty"`         // Optional frames per second (for video)
	PolicyName *string `json:"policy_name,omitempty"` // Retention policy name
}

MediaContent represents media data (image, audio, video) in a message. Supports both inline base64 data and external file/URL references.

func (*MediaContent) GetBase64Data deprecated added in v1.1.0

func (mc *MediaContent) GetBase64Data() (string, error)

GetBase64Data returns the base64-encoded data for this media content. If the data is already base64-encoded, it returns it directly. If the data is from a file, it reads and encodes the file. If the data is from a URL or StorageReference, it returns an error (caller should use MediaLoader).

Deprecated: For new code, use providers.MediaLoader.GetBase64Data which supports all sources including storage references and URLs with proper context handling.

func (*MediaContent) ReadData added in v1.1.0

func (mc *MediaContent) ReadData() (io.ReadCloser, error)

ReadData returns an io.Reader for the media content. For base64 data, it decodes and returns a reader. For file paths, it opens and returns the file. For URLs, it returns an error (caller should fetch separately).

func (*MediaContent) Validate added in v1.1.0

func (mc *MediaContent) Validate() error

Validate checks if the MediaContent is valid

type MediaItemSummary added in v1.1.0

type MediaItemSummary struct {
	Type      string `json:"type"`             // Content type: "image", "audio", "video"
	Source    string `json:"source"`           // Source description (file path, URL, or "inline data")
	MIMEType  string `json:"mime_type"`        // MIME type
	SizeBytes int    `json:"size_bytes"`       // Size in bytes (0 if unknown)
	Detail    string `json:"detail,omitempty"` // Detail level for images
	Loaded    bool   `json:"loaded"`           // Whether media was successfully loaded
	Error     string `json:"error,omitempty"`  // Error message if loading failed
}

MediaItemSummary provides details about a single media item in a message.

type MediaSummary added in v1.1.0

type MediaSummary struct {
	TotalParts int                `json:"total_parts"`           // Total number of content parts
	TextParts  int                `json:"text_parts"`            // Number of text parts
	ImageParts int                `json:"image_parts"`           // Number of image parts
	AudioParts int                `json:"audio_parts"`           // Number of audio parts
	VideoParts int                `json:"video_parts"`           // Number of video parts
	MediaItems []MediaItemSummary `json:"media_items,omitempty"` // Details of each media item
}

MediaSummary provides a high-level overview of media content in a message. This is included in JSON output to make multimodal messages more observable.

type Message

type Message struct {
	Role    string `json:"role"`    // "system", "user", "assistant", "tool"
	Content string `json:"content"` // Message content (legacy text-only, maintained for backward compatibility)

	// Multimodal content parts (text, images, audio, video)
	// If Parts is non-empty, it takes precedence over Content.
	// For backward compatibility, if Parts is empty, Content will be used.
	Parts []ContentPart `json:"parts,omitempty"`

	// Tool invocations (for assistant messages that call tools)
	ToolCalls []MessageToolCall `json:"tool_calls,omitempty"`

	// Tool result (for tool role messages)
	// When Role="tool", this contains the tool execution result
	ToolResult *MessageToolResult `json:"tool_result,omitempty"`

	// Source indicates where this message originated (runtime-only, not persisted in JSON)
	// Values: "statestore" (loaded from StateStore), "pipeline" (created during execution), "" (user input)
	Source string `json:"-"`

	// Metadata for observability and tracking
	Timestamp time.Time              `json:"timestamp,omitempty"`  // When the message was created
	LatencyMs int64                  `json:"latency_ms,omitempty"` // Time taken to generate (for assistant messages)
	CostInfo  *CostInfo              `json:"cost_info,omitempty"`  // Token usage and cost tracking
	Meta      map[string]interface{} `json:"meta,omitempty"`       // Custom metadata

	// Validation results (for assistant messages)
	Validations []ValidationResult `json:"validations,omitempty"`
}

Message represents a single message in a conversation. This is the canonical message type used throughout the system.

func CloneMessage added in v1.1.0

func CloneMessage(msg Message) Message

CloneMessage creates a deep copy of a message

func CombineTextAndMedia added in v1.1.0

func CombineTextAndMedia(role, text string, mediaParts []ContentPart) Message

CombineTextAndMedia creates a multimodal message from separate text and media parts. This is the inverse of SplitMultimodalMessage.

func ConvertTextToMultimodal added in v1.1.0

func ConvertTextToMultimodal(role, content string) Message

ConvertTextToMultimodal is a convenience function that creates a multimodal message from a role and text content. This helps with code migration.

func (*Message) AddAudioPart added in v1.1.0

func (m *Message) AddAudioPart(filePath string) error

AddAudioPart adds an audio content part from a file path

func (*Message) AddImagePart added in v1.1.0

func (m *Message) AddImagePart(filePath string, detail *string) error

AddImagePart adds an image content part from a file path

func (*Message) AddImagePartFromURL added in v1.1.0

func (m *Message) AddImagePartFromURL(url string, detail *string)

AddImagePartFromURL adds an image content part from a URL

func (*Message) AddPart added in v1.1.0

func (m *Message) AddPart(part ContentPart)

AddPart adds a content part to the message. If this is the first part added, it clears the legacy Content field.

func (*Message) AddTextPart added in v1.1.0

func (m *Message) AddTextPart(text string)

AddTextPart adds a text content part to the message

func (*Message) AddVideoPart added in v1.1.0

func (m *Message) AddVideoPart(filePath string) error

AddVideoPart adds a video content part from a file path

func (*Message) GetContent added in v1.1.0

func (m *Message) GetContent() string

GetContent returns the content of the message. If Parts is non-empty, it returns only the text parts concatenated. Otherwise, it returns the legacy Content field.

func (*Message) HasMediaContent added in v1.1.0

func (m *Message) HasMediaContent() bool

HasMediaContent returns true if the message contains any media (image, audio, video)

func (*Message) IsMultimodal added in v1.1.0

func (m *Message) IsMultimodal() bool

IsMultimodal returns true if the message contains multimodal content (Parts)

func (Message) MarshalJSON added in v1.1.0

func (m Message) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling for Message. This enhances the output by: 1. Populating the Content field with a human-readable summary when Parts exist 2. Adding a MediaSummary field for observability of multimodal content 3. Omitting Content field when ToolResult is present to avoid duplication

func (*Message) SetMultimodalContent added in v1.1.0

func (m *Message) SetMultimodalContent(parts []ContentPart)

SetMultimodalContent sets the message content to multimodal parts. This clears the legacy Content field.

func (*Message) SetTextContent added in v1.1.0

func (m *Message) SetTextContent(text string)

SetTextContent sets the message content to simple text. This clears any existing Parts and sets the legacy Content field.

func (*Message) UnmarshalJSON added in v1.1.5

func (m *Message) UnmarshalJSON(data []byte) error

UnmarshalJSON implements custom JSON unmarshaling for Message. After unmarshaling, if ToolResult is present, copy its Content to Message.Content for provider compatibility (providers expect Content field to be populated).

type MessageToolCall

type MessageToolCall struct {
	ID   string          `json:"id"`   // Unique identifier for this tool call
	Name string          `json:"name"` // Name of the tool to invoke
	Args json.RawMessage `json:"args"` // JSON-encoded tool arguments
}

MessageToolCall represents a request to call a tool within a Message. The Args field contains the JSON-encoded arguments for the tool.

type MessageToolResult

type MessageToolResult struct {
	ID        string `json:"id"`              // References the MessageToolCall.ID that triggered this result
	Name      string `json:"name"`            // Tool name that was executed
	Content   string `json:"content"`         // Result content or error message
	Error     string `json:"error,omitempty"` // Error message if tool execution failed
	LatencyMs int64  `json:"latency_ms"`      // Tool execution latency in milliseconds
}

MessageToolResult represents the result of a tool execution in a Message. When embedded in Message, the Message.Role should be "tool".

type StreamingMediaConfig added in v1.1.0

type StreamingMediaConfig struct {
	// Type specifies the media type being streamed
	// Values: ContentTypeAudio, ContentTypeVideo
	Type string `json:"type"`

	// ChunkSize is the target size in bytes for each chunk
	// Typical values: 4096-8192 for audio, 32768-65536 for video
	ChunkSize int `json:"chunk_size"`

	// SampleRate is the audio sample rate in Hz
	// Common values: 8000 (phone quality), 16000 (wideband), 44100 (CD quality), 48000 (pro audio)
	SampleRate int `json:"sample_rate,omitempty"`

	// Encoding specifies the audio encoding format
	// Values: "pcm" (raw), "opus", "mp3", "aac"
	Encoding string `json:"encoding,omitempty"`

	// Channels is the number of audio channels
	// Values: 1 (mono), 2 (stereo)
	Channels int `json:"channels,omitempty"`

	// BitDepth is the audio bit depth in bits
	// Common values: 16, 24, 32
	BitDepth int `json:"bit_depth,omitempty"`

	// Width is the video width in pixels
	Width int `json:"width,omitempty"`

	// Height is the video height in pixels
	Height int `json:"height,omitempty"`

	// FrameRate is the video frame rate (FPS)
	// Common values: 24, 30, 60
	FrameRate int `json:"frame_rate,omitempty"`

	// BufferSize is the maximum number of chunks to buffer
	// Larger values increase latency but provide more stability
	// Typical values: 5-20
	BufferSize int `json:"buffer_size,omitempty"`

	// FlushInterval is how often to flush buffered data (if applicable)
	FlushInterval time.Duration `json:"flush_interval,omitempty"`

	// Metadata contains additional provider-specific configuration
	Metadata map[string]interface{} `json:"metadata,omitempty"`
}

StreamingMediaConfig configures streaming media input parameters. Used to configure audio/video streaming sessions with providers.

Example usage for audio streaming:

config := &StreamingMediaConfig{
    Type:       ContentTypeAudio,
    ChunkSize:  8192,    // 8KB chunks
    SampleRate: 16000,   // 16kHz audio
    Encoding:   "pcm",   // Raw PCM audio
    Channels:   1,       // Mono
    BufferSize: 10,      // Buffer 10 chunks
}

func (*StreamingMediaConfig) Validate added in v1.1.0

func (c *StreamingMediaConfig) Validate() error

Validate checks if the StreamingMediaConfig is valid

type ToolDef

type ToolDef struct {
	Name         string          `json:"name"`                    // Unique tool name
	Description  string          `json:"description"`             // Human-readable description of what the tool does
	InputSchema  json.RawMessage `json:"input_schema"`            // JSON Schema for input validation
	OutputSchema json.RawMessage `json:"output_schema,omitempty"` // Optional JSON Schema for output validation
}

ToolDef represents a tool definition that can be provided to an LLM. The InputSchema and OutputSchema use JSON Schema format for validation.

type ToolStats

type ToolStats struct {
	TotalCalls int            `json:"total_calls"` // Total number of tool calls
	ByTool     map[string]int `json:"by_tool"`     // Count of calls per tool name
}

ToolStats tracks tool usage statistics across a conversation or run. Useful for monitoring which tools are being used and how frequently.

type ValidationError

type ValidationError struct {
	Type   string `json:"type"`   // Error type: "args_invalid" | "result_invalid" | "policy_violation"
	Tool   string `json:"tool"`   // Name of the tool that failed validation
	Detail string `json:"detail"` // Human-readable error details
}

ValidationError represents a validation failure in tool usage or message content. Used to provide structured error information when validation fails.

type ValidationResult

type ValidationResult struct {
	ValidatorType string                 `json:"validator_type"`      // Type of validator
	Passed        bool                   `json:"passed"`              // Whether the validation passed
	Details       map[string]interface{} `json:"details,omitempty"`   // Validator-specific details
	Timestamp     time.Time              `json:"timestamp,omitempty"` // When validation was performed
}

ValidationResult represents the outcome of a validator check on a message. These are attached to assistant messages to show which validations passed or failed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL