gemini

package
v0.13.0-beta.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 6, 2025 License: Apache-2.0 Imports: 19 Imported by: 0

Documentation

Overview

Package gemini implements the Google Gemini backend for WaveTerm's AI chat system.

This package provides a complete implementation of the UseChatBackend interface for Google's Gemini API, including:

  • Streaming chat responses via Server-Sent Events (SSE)
  • Function calling (tool use) support
  • Multi-modal input support (text, images, PDFs)
  • Proper message conversion and state management

API Type

The Gemini backend uses the API type constant:

uctypes.APIType_GoogleGemini = "google-gemini"

Supported Features

- Text messages - Image uploads (JPEG, PNG, etc.) - inline base64 encoding - PDF document uploads - inline base64 encoding - Text file attachments - Directory listings - Function/tool calling with structured arguments - Streaming responses with real-time token delivery

Usage

The backend is automatically registered and can be obtained via:

backend, err := aiusechat.GetBackendByAPIType(uctypes.APIType_GoogleGemini)

To use the Gemini API, you need:

  1. A Google AI API key
  2. Configure the chat with APIType_GoogleGemini
  3. Set the Model (e.g., "gemini-2.0-flash-exp")
  4. Provide the API key in the Config.APIToken field

Configuration Example

chatOpts := uctypes.WaveChatOpts{
    ChatId:   "my-chat-id",
    ClientId: "my-client-id",
    Config: uctypes.AIOptsType{
        APIType:      uctypes.APIType_GoogleGemini,
        Model:        "gemini-2.0-flash-exp",
        APIToken:     "your-google-api-key",
        MaxTokens:    8192,
        Capabilities: []string{
            uctypes.AICapabilityTools,
            uctypes.AICapabilityImages,
            uctypes.AICapabilityPdfs,
        },
    },
    Tools:        []uctypes.ToolDefinition{...},
    SystemPrompt: []string{"You are a helpful assistant."},
}

Message Format

The Gemini backend uses the GeminiChatMessage type internally, which stores:

  • MessageId: Unique identifier for idempotency
  • Role: "user" or "model" (model is Gemini's term for assistant)
  • Parts: Array of message parts (text, inline data, function calls/responses)
  • Usage: Token usage metadata

Function Calling

Function calling is supported via Gemini's native function calling feature:

  • Tools are converted to Gemini's FunctionDeclaration format
  • Function calls are streamed with real-time argument updates
  • Function responses are sent back as user messages with FunctionResponse parts

API Endpoint

By default, the backend uses:

https://generativelanguage.googleapis.com/v1beta/models/{model}:streamGenerateContent

You can override this by setting Config.BaseURL.

Error Handling

The backend properly handles:

  • Content blocking/safety filters
  • Token limit errors
  • Network errors
  • Malformed responses
  • Context cancellation

All errors are properly propagated through the SSE stream.

Limitations

- File uploads must be provided as base64-encoded inline data - Images and PDFs use inline data, not file upload URIs - Multi-turn conversations require proper role alternation (user/model) - Some advanced Gemini features like caching are not yet implemented

Index

Constants

View Source
const (
	GeminiDefaultMaxTokens = 8192
)

Variables

This section is empty.

Functions

func ConvertAIChatToUIChat

func ConvertAIChatToUIChat(aiChat uctypes.AIChat) (*uctypes.UIChat, error)

ConvertAIChatToUIChat converts an AIChat to a UIChat for Gemini

func GetFunctionCallInputByToolCallId

func GetFunctionCallInputByToolCallId(aiChat uctypes.AIChat, toolCallId string) *uctypes.AIFunctionCallInput

GetFunctionCallInputByToolCallId returns the function call input associated with the given tool call ID

func UpdateToolUseData

func UpdateToolUseData(chatId string, toolCallId string, toolUseData uctypes.UIMessageDataToolUse) error

UpdateToolUseData updates the tool use data for a specific tool call in the chat

Types

type GeminiCandidate

type GeminiCandidate struct {
	Content           *GeminiContent           `json:"content,omitempty"`
	FinishReason      string                   `json:"finishReason,omitempty"`
	Index             int                      `json:"index,omitempty"`
	SafetyRatings     []GeminiSafetyRating     `json:"safetyRatings,omitempty"`
	GroundingMetadata *GeminiGroundingMetadata `json:"groundingMetadata,omitempty"`
}

GeminiCandidate represents a candidate response

type GeminiChatMessage

type GeminiChatMessage struct {
	MessageId string               `json:"messageid"`
	Role      string               `json:"role"` // "user", "model"
	Parts     []GeminiMessagePart  `json:"parts"`
	Usage     *GeminiUsageMetadata `json:"usage,omitempty"`
}

GeminiChatMessage represents a stored chat message for Gemini backend

func ConvertAIMessageToGeminiChatMessage

func ConvertAIMessageToGeminiChatMessage(aiMsg uctypes.AIMessage) (*GeminiChatMessage, error)

ConvertAIMessageToGeminiChatMessage converts an AIMessage to GeminiChatMessage These messages are ALWAYS role "user"

func ConvertToolResultsToGeminiChatMessage

func ConvertToolResultsToGeminiChatMessage(toolResults []uctypes.AIToolResult) (*GeminiChatMessage, error)

ConvertToolResultsToGeminiChatMessage converts AIToolResult slice to GeminiChatMessage

func RunGeminiChatStep

RunGeminiChatStep executes a chat step using the Gemini API

func (*GeminiChatMessage) GetMessageId

func (m *GeminiChatMessage) GetMessageId() string

func (*GeminiChatMessage) GetRole

func (m *GeminiChatMessage) GetRole() string

func (*GeminiChatMessage) GetUsage

func (m *GeminiChatMessage) GetUsage() *uctypes.AIUsage

type GeminiContent

type GeminiContent struct {
	Role  string              `json:"role,omitempty"`
	Parts []GeminiMessagePart `json:"parts"`
}

GeminiContent represents a content message for the API

func (*GeminiContent) Clean

func (c *GeminiContent) Clean() *GeminiContent

Clean removes internal fields from all parts

type GeminiError

type GeminiError struct {
	Code    int    `json:"code"`
	Message string `json:"message"`
	Status  string `json:"status,omitempty"`
}

GeminiError represents an error

type GeminiErrorResponse

type GeminiErrorResponse struct {
	Error *GeminiError `json:"error,omitempty"`
}

GeminiErrorResponse represents an error response

type GeminiFileData

type GeminiFileData struct {
	MimeType    string `json:"mimeType"`
	FileUri     string `json:"fileUri"`               // gs:// URI from file upload
	DisplayName string `json:"displayName,omitempty"` // for multimodal function responses
}

GeminiFileData represents uploaded file reference

type GeminiFunctionCall

type GeminiFunctionCall struct {
	Name string         `json:"name"`
	Args map[string]any `json:"args,omitempty"`
}

GeminiFunctionCall represents a function call from the model

type GeminiFunctionCallingConfig

type GeminiFunctionCallingConfig struct {
	Mode string `json:"mode,omitempty"` // "AUTO", "ANY", "NONE"
}

GeminiFunctionCallingConfig represents function calling configuration

type GeminiFunctionDeclaration

type GeminiFunctionDeclaration struct {
	Name        string         `json:"name"`
	Description string         `json:"description"`
	Parameters  map[string]any `json:"parameters,omitempty"`
}

GeminiFunctionDeclaration represents a function schema

func ConvertToolDefinitionToGemini

func ConvertToolDefinitionToGemini(tool uctypes.ToolDefinition) GeminiFunctionDeclaration

ConvertToolDefinitionToGemini converts a Wave ToolDefinition to Gemini format

type GeminiFunctionResponse

type GeminiFunctionResponse struct {
	Name     string              `json:"name"`
	Response map[string]any      `json:"response"`
	Parts    []GeminiMessagePart `json:"parts,omitempty"` // nested parts for multimodal content (Gemini 3 Pro and later)
}

GeminiFunctionResponse represents a function execution result

type GeminiGenerationConfig

type GeminiGenerationConfig struct {
	Temperature     float32               `json:"temperature,omitempty"`
	TopP            float32               `json:"topP,omitempty"`
	TopK            int32                 `json:"topK,omitempty"`
	CandidateCount  int32                 `json:"candidateCount,omitempty"`
	MaxOutputTokens int32                 `json:"maxOutputTokens,omitempty"`
	StopSequences   []string              `json:"stopSequences,omitempty"`
	ThinkingConfig  *GeminiThinkingConfig `json:"thinkingConfig,omitempty"` // for Gemini 3+ models
}

GeminiGenerationConfig represents generation parameters

type GeminiGoogleSearch

type GeminiGoogleSearch struct{}

GeminiGoogleSearch represents Google Search configuration (empty for default)

type GeminiGroundingMetadata

type GeminiGroundingMetadata struct {
	WebSearchQueries []string `json:"webSearchQueries,omitempty"`
}

GeminiGroundingMetadata represents grounding metadata with web search results

type GeminiInlineData

type GeminiInlineData struct {
	MimeType    string `json:"mimeType"`
	Data        string `json:"data"`                  // base64 encoded
	DisplayName string `json:"displayName,omitempty"` // for multimodal function responses
}

GeminiInlineData represents inline binary data

type GeminiMessagePart

type GeminiMessagePart struct {
	// Text part
	Text string `json:"text,omitempty"`

	// Inline data (images, PDFs, etc.)
	InlineData *GeminiInlineData `json:"inlineData,omitempty"`

	// File data (for uploaded files)
	FileData *GeminiFileData `json:"fileData,omitempty"`

	// Function call (assistant calling a tool)
	FunctionCall *GeminiFunctionCall `json:"functionCall,omitempty"`

	// Function response (result of tool execution)
	FunctionResponse *GeminiFunctionResponse `json:"functionResponse,omitempty"`

	// Thought signature (for thinking models - applies to text and function calls)
	ThoughtSignature string `json:"thoughtSignature,omitempty"`

	// Internal fields (not sent to API)
	PreviewUrl  string                        `json:"previewurl,omitempty"`  // internal field
	FileName    string                        `json:"filename,omitempty"`    // internal field
	ToolUseData *uctypes.UIMessageDataToolUse `json:"toolusedata,omitempty"` // internal field
}

GeminiMessagePart represents different types of content in a message

func (*GeminiMessagePart) Clean

Clean removes internal fields before sending to API

type GeminiPromptFeedback

type GeminiPromptFeedback struct {
	BlockReason   string               `json:"blockReason,omitempty"`
	SafetyRatings []GeminiSafetyRating `json:"safetyRatings,omitempty"`
}

GeminiPromptFeedback represents feedback about the prompt

type GeminiRequest

type GeminiRequest struct {
	Contents          []GeminiContent         `json:"contents"`
	SystemInstruction *GeminiContent          `json:"systemInstruction,omitempty"`
	GenerationConfig  *GeminiGenerationConfig `json:"generationConfig,omitempty"`
	Tools             []GeminiTool            `json:"tools,omitempty"`
	ToolConfig        *GeminiToolConfig       `json:"toolConfig,omitempty"`
}

GeminiRequest represents a request to the Gemini API

type GeminiSafetyRating

type GeminiSafetyRating struct {
	Category    string `json:"category"`
	Probability string `json:"probability"`
}

GeminiSafetyRating represents a safety rating

type GeminiStreamResponse

type GeminiStreamResponse struct {
	Candidates        []GeminiCandidate        `json:"candidates,omitempty"`
	PromptFeedback    *GeminiPromptFeedback    `json:"promptFeedback,omitempty"`
	UsageMetadata     *GeminiUsageMetadata     `json:"usageMetadata,omitempty"`
	GroundingMetadata *GeminiGroundingMetadata `json:"groundingMetadata,omitempty"`
}

GeminiStreamResponse represents a streaming response chunk

type GeminiThinkingConfig

type GeminiThinkingConfig struct {
	ThinkingLevel string `json:"thinkingLevel,omitempty"` // "low" or "high"
}

GeminiThinkingConfig represents thinking configuration for Gemini 3+ models

type GeminiTool

type GeminiTool struct {
	FunctionDeclarations []GeminiFunctionDeclaration `json:"functionDeclarations,omitempty"`
	GoogleSearch         *GeminiGoogleSearch         `json:"googleSearch,omitempty"`
}

GeminiTool represents a function tool definition

type GeminiToolConfig

type GeminiToolConfig struct {
	FunctionCallingConfig *GeminiFunctionCallingConfig `json:"functionCallingConfig,omitempty"`
}

GeminiToolConfig represents tool choice configuration

type GeminiUsageMetadata

type GeminiUsageMetadata struct {
	Model                   string `json:"model,omitempty"` // internal field
	PromptTokenCount        int    `json:"promptTokenCount"`
	CachedContentTokenCount int    `json:"cachedContentTokenCount,omitempty"`
	CandidatesTokenCount    int    `json:"candidatesTokenCount"`
	TotalTokenCount         int    `json:"totalTokenCount"`
}

GeminiUsageMetadata represents token usage

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL