vision-mcp

command module
v0.0.0-...-57ab18a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2026 License: AGPL-3.0 Imports: 1 Imported by: 0

README

vision-mcp

An MCP (Model Context Protocol) server providing image vision analysis via OpenAI-compatible APIs.

Features

  • Multiple image sources: Accepts local file paths, HTTP(S) URLs, or data URLs
  • Region cropping: Crop to specific regions using fractional coordinates before analysis
  • OpenAI-compatible: Works with any OpenAI-compatible API endpoint (local models like Ollama, or cloud providers)
  • Configurable detail levels: Control analysis fidelity with low, high, or auto detail settings

Installation

go install github.com/SecKatie/vision-mcp@latest

Configuration

Required environment variables:

  • VISION_API_BASE_URL — Base URL of your OpenAI-compatible API (e.g., http://localhost:11434/v1)
  • VISION_API_KEY — API key or bearer token

Optional environment variables:

  • VISION_API_MODEL — Model name to use (default: gpt-4.1-mini)
  • VISION_API_MAX_TOKENS — Default max response tokens (default: 1024)

Usage

The server exposes a single see tool with the following schema:

Input
Field Type Description
source string Required. Image source: local file path, HTTP(S) URL, or data URL
question string What to ask about the image (defaults to "Describe this image in detail.")
detail string API image detail level: low, high, or auto (default: auto)
max_tokens int Maximum tokens in the response for this request
crop object Crop region as fractional coordinates (0.0–1.0, see below)
Crop Region

When cropping, all coordinates are fractional (0.0–1.0) relative to image dimensions:

Field Type Description
x number Required. Left edge fraction 0.0-1.0
y number Required. Top edge fraction 0.0-1.0
width number Required. Width fraction 0.0-1.0
height number Required. Height fraction 0.0-1.0
Output
{
  "text": "...",
  "model": "...",
  "prompt_tokens": 1234,
  "completion_tokens": 567
}

Example MCP Client Configuration

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "vision": {
      "command": "vision-mcp",
      "env": {
        "VISION_API_BASE_URL": "http://localhost:11434/v1",
        "VISION_API_KEY": "your-api-key",
        "VISION_API_MODEL": "llava"
      }
    }
  }
}

Development

Build:

go build ./...

Test:

go test ./...

Lint:

golangci-lint run

Format:

gofumpt -w .
goimports -w .

License

AGPL-3.0 — See LICENSE for details.

Documentation

Overview

Copyright © 2026 Katie Mulliken <katie@mulliken.net>

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Directories

Path Synopsis
Package cmd provides the CLI entry point for vision-mcp.
Package cmd provides the CLI entry point for vision-mcp.
internal
client
Package client provides an HTTP client for OpenAI-compatible vision APIs.
Package client provides an HTTP client for OpenAI-compatible vision APIs.
image
Package image provides helpers for loading and transforming images.
Package image provides helpers for loading and transforming images.
vision
Package vision registers the "see" MCP tool for image analysis.
Package vision registers the "see" MCP tool for image analysis.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL