multimodal

command
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 15, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

README

Multimodal SDK Example

This example demonstrates multimodal (vision) capabilities using the PromptKit SDK with streaming responses.

Features

  • Image Analysis: Send images with text prompts for visual analysis
  • Streaming Responses: Get real-time streaming output as the model analyzes images
  • Conversation Context: Follow-up questions maintain context about previously analyzed images
  • Multiple Input Methods: Support for image URLs, file paths, and raw image data

Prerequisites

  1. A Google Gemini API key (for vision capabilities)
  2. Go 1.21 or later

Setup

export GEMINI_API_KEY=your-gemini-api-key

Running the Example

cd sdk/examples/multimodal
go run .

How It Works

Opening a Multimodal Conversation
conv, err := sdk.Open("./multimodal.pack.json", "vision-analyst")
if err != nil {
    log.Fatalf("Failed to open pack: %v", err)
}
defer conv.Close()
Streaming Image Analysis
for chunk := range conv.Stream(ctx, "What do you see in this image?",
    sdk.WithImageURL("https://example.com/image.jpg"),
) {
    if chunk.Error != nil {
        log.Printf("Error: %v", chunk.Error)
        break
    }
    if chunk.Type == sdk.ChunkDone {
        break
    }
    fmt.Print(chunk.Text)
}
Non-Streaming Image Analysis
resp, err := conv.Send(ctx, "Describe this image",
    sdk.WithImageURL("https://example.com/image.jpg"),
)
if err != nil {
    log.Fatalf("Error: %v", err)
}
fmt.Println(resp.Text())

Image Input Options

The SDK supports multiple ways to provide images:

From URL
sdk.WithImageURL("https://example.com/image.jpg")
From File
sdk.WithImageFile("/path/to/local/image.png")
From Raw Data
sdk.WithImageData(imageBytes, "image/png")

Supported Providers

Multimodal capabilities require a provider that supports vision:

  • Gemini (recommended): Full multimodal support with streaming
  • OpenAI GPT-4V: Vision capabilities with GPT-4 Vision models
  • Claude: Vision support with Claude 3 models

Pack Configuration

The pack file configures the vision analyst prompt:

{
  "prompts": {
    "vision-analyst": {
      "id": "vision-analyst",
      "name": "Vision Analyst",
      "system_template": "You are an expert visual analyst...",
      "parameters": {
        "temperature": 0.7,
        "max_tokens": 1024
      }
    }
  }
}

Notes

  • Image analysis typically requires more tokens than text-only requests
  • Large images may be resized by the provider for processing
  • Some providers have limits on image size and format

Documentation

Overview

Package main demonstrates multimodal capabilities with the PromptKit SDK.

This example shows:

  • Sending images with text prompts using WithImageURL
  • Streaming multimodal responses
  • Using Gemini provider for vision analysis

Run with:

export GEMINI_API_KEY=your-key
go run .

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL