ocrtool

package

v0.1.7 Latest Latest Go to latest Published: Mar 10, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/stackgenhq/genie

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ocrtool provides optical character recognition (OCR) tools for agents. It extracts text from images (screenshots, photos, scanned documents, whiteboard captures) using the Tesseract OCR engine.

Problem: Agents receive image attachments containing text (screenshots of error messages, photos of whiteboards, scanned receipts) that they cannot read natively. This tool converts images to machine-readable text so agents can process, search, and reason about visual content.

Supported image formats:

PNG, JPEG/JPG, TIFF, BMP, WebP, PNM

Safety guards:

File size capped at 20 MB to prevent memory exhaustion
60-second processing timeout
Output truncated at 32 KB to limit LLM context consumption

Dependencies:

tesseract CLI — the industry-standard OCR engine (Apache 2.0)
Install on macOS: brew install tesseract
Install on Debian: apt-get install tesseract-ocr
Additional languages: brew install tesseract-lang

Index ¶

type ToolProvider
- func NewToolProvider() *ToolProvider
- func (p *ToolProvider) GetTools() []tool.Tool

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ToolProvider ¶

type ToolProvider struct{}

ToolProvider wraps the OCR tool and satisfies the tools.ToolProviders interface.

func NewToolProvider ¶

func NewToolProvider() *ToolProvider

NewToolProvider creates a ToolProvider for the OCR tool.

func (*ToolProvider) GetTools ¶

func (p *ToolProvider) GetTools() []tool.Tool

GetTools returns the OCR tool.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL