ocrtool

package
v0.1.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package ocrtool provides optical character recognition (OCR) tools for agents. It extracts text from images (screenshots, photos, scanned documents, whiteboard captures) using the Tesseract OCR engine.

Problem: Agents receive image attachments containing text (screenshots of error messages, photos of whiteboards, scanned receipts) that they cannot read natively. This tool converts images to machine-readable text so agents can process, search, and reason about visual content.

Supported image formats:

  • PNG, JPEG/JPG, TIFF, BMP, WebP, PNM

Safety guards:

  • File size capped at 20 MB to prevent memory exhaustion
  • 60-second processing timeout
  • Output truncated at 32 KB to limit LLM context consumption

Dependencies:

  • tesseract CLI — the industry-standard OCR engine (Apache 2.0)
  • Install on macOS: brew install tesseract
  • Install on Debian: apt-get install tesseract-ocr
  • Additional languages: brew install tesseract-lang

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ToolProvider

type ToolProvider struct{}

ToolProvider wraps the OCR tool and satisfies the tools.ToolProviders interface.

func NewToolProvider

func NewToolProvider() *ToolProvider

NewToolProvider creates a ToolProvider for the OCR tool.

func (*ToolProvider) GetTools

func (p *ToolProvider) GetTools() []tool.Tool

GetTools returns the OCR tool.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL