vision-mcp

command module

v0.0.0-...-57ab18a Latest Latest Go to latest Published: Mar 13, 2026 License: AGPL-3.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/SecKatie/vision-mcp

Links

Open Source Insights

README ¶

vision-mcp

An MCP (Model Context Protocol) server providing image vision analysis via OpenAI-compatible APIs.

Features

Multiple image sources: Accepts local file paths, HTTP(S) URLs, or data URLs
Region cropping: Crop to specific regions using fractional coordinates before analysis
OpenAI-compatible: Works with any OpenAI-compatible API endpoint (local models like Ollama, or cloud providers)
Configurable detail levels: Control analysis fidelity with low, high, or auto detail settings

Installation

go install github.com/SecKatie/vision-mcp@latest

Configuration

Required environment variables:

VISION_API_BASE_URL — Base URL of your OpenAI-compatible API (e.g., http://localhost:11434/v1)
VISION_API_KEY — API key or bearer token

Optional environment variables:

VISION_API_MODEL — Model name to use (default: gpt-4.1-mini)
VISION_API_MAX_TOKENS — Default max response tokens (default: 1024)

Usage

The server exposes a single see tool with the following schema:

Input

Field	Type	Description
`source`	string	Required. Image source: local file path, HTTP(S) URL, or data URL
`question`	string	What to ask about the image (defaults to "Describe this image in detail.")
`detail`	string	API image detail level: `low`, `high`, or `auto` (default: `auto`)
`max_tokens`	int	Maximum tokens in the response for this request
`crop`	object	Crop region as fractional coordinates (0.0–1.0, see below)

Crop Region

When cropping, all coordinates are fractional (0.0–1.0) relative to image dimensions:

Field	Type	Description
`x`	number	Required. Left edge fraction 0.0-1.0
`y`	number	Required. Top edge fraction 0.0-1.0
`width`	number	Required. Width fraction 0.0-1.0
`height`	number	Required. Height fraction 0.0-1.0

Output

{
  "text": "...",
  "model": "...",
  "prompt_tokens": 1234,
  "completion_tokens": 567
}

Example MCP Client Configuration

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "vision": {
      "command": "vision-mcp",
      "env": {
        "VISION_API_BASE_URL": "http://localhost:11434/v1",
        "VISION_API_KEY": "your-api-key",
        "VISION_API_MODEL": "llava"
      }
    }
  }
}

Development

Build:

go build ./...

Test:

go test ./...

Lint:

golangci-lint run

Format:

gofumpt -w .
goimports -w .

License

AGPL-3.0 — See LICENSE for details.

Documentation ¶

Overview ¶

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd Package cmd provides the CLI entry point for vision-mcp.	Package cmd provides the CLI entry point for vision-mcp.
internal
client Package client provides an HTTP client for OpenAI-compatible vision APIs.	Package client provides an HTTP client for OpenAI-compatible vision APIs.
image Package image provides helpers for loading and transforming images.	Package image provides helpers for loading and transforming images.
vision Package vision registers the "see" MCP tool for image analysis.	Package vision registers the "see" MCP tool for image analysis.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL