ocr

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 15 Imported by: 0

Documentation

Overview

Package ocr provides text recognition using Apple's Vision framework.

It wraps VNRecognizeTextRequest to detect text in images and return observations with pixel coordinates, bounding boxes, and confidence scores. Search functions support region filtering and ranking preferences for locating specific text on screen.

Basic usage:

svc := ocr.NewService(false)
obs, err := svc.RecognizeText(img)
for _, o := range obs {
	fmt.Printf("%s at (%d,%d)\n", o.Text, o.Center.X, o.Center.Y)
}

Finding text by content:

x, y, found := svc.FindText(img, "Continue")

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func DrawRect

func DrawRect(img *image.RGBA, x1, y1, x2, y2 int, c color.RGBA)

DrawRect draws a 2px rectangle outline on an RGBA image.

func SaveDebugScreenshot

func SaveDebugScreenshot(img image.Image, observations []TextObservation, dir, prefix string) error

SaveDebugScreenshot saves a screenshot with OCR bounding boxes overlaid. It writes both a PNG image and a JSON file with the observations.

Types

type DebugEntry

type DebugEntry struct {
	Timestamp    string            `json:"timestamp"`
	Step         string            `json:"step,omitempty"`
	Observations []TextObservation `json:"observations"`
	ScreenState  string            `json:"screen_state,omitempty"`
}

DebugEntry represents a single OCR debug log entry.

type Region

type Region struct {
	MinX float64
	MinY float64
	MaxX float64
	MaxY float64
}

Region describes a normalized screen rectangle (0-1, top-left origin).

type SearchOptions

type SearchOptions struct {
	Region    *Region
	PreferTop bool
}

SearchOptions controls OCR text matching behavior.

func MenuSearchOptions() SearchOptions

MenuSearchOptions returns options tuned for menu bar targeting.

func ParseSearchOptions

func ParseSearchOptions(regionSpec string) (SearchOptions, error)

ParseSearchOptions parses a region selector for OCR commands. Supported selectors:

  • "" / "screen" / "full": whole screen
  • "menu" / "menubar": top menu bar strip
  • "x1,y1,x2,y2": normalized rectangle coordinates

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service performs text recognition using Apple's Vision framework.

func NewService

func NewService(verbose bool) *Service

NewService creates a new OCR service.

func (*Service) AllText

func (s *Service) AllText(img image.Image) string

AllText returns all recognized text as a single string with lines separated by newlines.

func (*Service) FindText

func (s *Service) FindText(img image.Image, needle string) (x, y float64, found bool)

FindText searches for text on screen and returns its center pixel coordinates. When multiple observations contain the needle, it prefers:

  1. Exact full-text matches (observation text equals needle, case-insensitive)
  2. Shorter observation text (closer to the needle itself)
  3. Observations closer to the bottom of the screen

Returns found=false if the text is not visible.

func (*Service) FindTextNormalized

func (s *Service) FindTextNormalized(img image.Image, needle string) (x, y float64, found bool)

FindTextNormalized searches for text and returns its center in normalized coordinates (0-1, top-left origin) suitable for mouse input commands.

func (*Service) FindTextNormalizedWithOptions

func (s *Service) FindTextNormalizedWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)

FindTextNormalizedWithOptions searches for text with options and returns the center in normalized coordinates (0-1, top-left origin).

func (*Service) FindTextWithOptions

func (s *Service) FindTextWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)

FindTextWithOptions searches for text with optional region and ranking preferences.

func (*Service) RecognizeText

func (s *Service) RecognizeText(img image.Image) ([]TextObservation, error)

RecognizeText runs OCR on an image and returns all recognized text observations.

type TextObservation

type TextObservation struct {
	Text        string
	Confidence  float32
	BoundingBox corefoundation.CGRect // normalized coordinates (0-1), origin at bottom-left
	Center      image.Point           // center in screen pixel coordinates
}

TextObservation holds a recognized text region from OCR.

func BestMatch

func BestMatch(observations []TextObservation, needle string, opts SearchOptions, bounds image.Rectangle) (TextObservation, bool)

BestMatch finds the best matching observation for needle among the given observations, applying search options and ranking heuristics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL