multimodal

command

v1.2.0 Latest Latest Go to latest Published: Feb 15, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/AltairaLabs/PromptKit

Links

Open Source Insights

README ¶

Multimodal SDK Example

This example demonstrates multimodal (vision) capabilities using the PromptKit SDK with streaming responses.

Features

Image Analysis: Send images with text prompts for visual analysis
Streaming Responses: Get real-time streaming output as the model analyzes images
Conversation Context: Follow-up questions maintain context about previously analyzed images
Multiple Input Methods: Support for image URLs, file paths, and raw image data

Prerequisites

A Google Gemini API key (for vision capabilities)
Go 1.21 or later

Setup

export GEMINI_API_KEY=your-gemini-api-key

Running the Example

cd sdk/examples/multimodal
go run .

How It Works

Opening a Multimodal Conversation

conv, err := sdk.Open("./multimodal.pack.json", "vision-analyst")
if err != nil {
    log.Fatalf("Failed to open pack: %v", err)
}
defer conv.Close()

Streaming Image Analysis

for chunk := range conv.Stream(ctx, "What do you see in this image?",
    sdk.WithImageURL("https://example.com/image.jpg"),
) {
    if chunk.Error != nil {
        log.Printf("Error: %v", chunk.Error)
        break
    }
    if chunk.Type == sdk.ChunkDone {
        break
    }
    fmt.Print(chunk.Text)
}

Non-Streaming Image Analysis

resp, err := conv.Send(ctx, "Describe this image",
    sdk.WithImageURL("https://example.com/image.jpg"),
)
if err != nil {
    log.Fatalf("Error: %v", err)
}
fmt.Println(resp.Text())

Image Input Options

The SDK supports multiple ways to provide images:

From URL

sdk.WithImageURL("https://example.com/image.jpg")

From File

sdk.WithImageFile("/path/to/local/image.png")

From Raw Data

sdk.WithImageData(imageBytes, "image/png")

Supported Providers

Multimodal capabilities require a provider that supports vision:

Gemini (recommended): Full multimodal support with streaming
OpenAI GPT-4V: Vision capabilities with GPT-4 Vision models
Claude: Vision support with Claude 3 models

Pack Configuration

The pack file configures the vision analyst prompt:

{
  "prompts": {
    "vision-analyst": {
      "id": "vision-analyst",
      "name": "Vision Analyst",
      "system_template": "You are an expert visual analyst...",
      "parameters": {
        "temperature": 0.7,
        "max_tokens": 1024
      }
    }
  }
}

Notes

Image analysis typically requires more tokens than text-only requests
Large images may be resized by the provider for processing
Some providers have limits on image size and format

Documentation ¶

Overview ¶

Package main demonstrates multimodal capabilities with the PromptKit SDK.

This example shows:

Sending images with text prompts using WithImageURL
Streaming multimodal responses
Using Gemini provider for vision analysis

Run with:

export GEMINI_API_KEY=your-key
go run .

Source Files ¶

View all Source files

main_interactive.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL