Documentation
¶
Overview ¶
Package ocrtool provides optical character recognition (OCR) tools for agents. It extracts text from images (screenshots, photos, scanned documents, whiteboard captures) using the Tesseract OCR engine.
Problem: Agents receive image attachments containing text (screenshots of error messages, photos of whiteboards, scanned receipts) that they cannot read natively. This tool converts images to machine-readable text so agents can process, search, and reason about visual content.
Supported image formats:
- PNG, JPEG/JPG, TIFF, BMP, WebP, PNM
Safety guards:
- File size capped at 20 MB to prevent memory exhaustion
- 60-second processing timeout
- Output truncated at 32 KB to limit LLM context consumption
Dependencies:
- tesseract CLI — the industry-standard OCR engine (Apache 2.0)
- Install on macOS: brew install tesseract
- Install on Debian: apt-get install tesseract-ocr
- Additional languages: brew install tesseract-lang
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ToolProvider ¶
type ToolProvider struct{}
ToolProvider wraps the OCR tool and satisfies the tools.ToolProviders interface.
func NewToolProvider ¶
func NewToolProvider() *ToolProvider
NewToolProvider creates a ToolProvider for the OCR tool.
func (*ToolProvider) GetTools ¶
func (p *ToolProvider) GetTools() []tool.Tool
GetTools returns the OCR tool.