Documentation
¶
Overview ¶
Package ocr provides text recognition using Apple's Vision framework.
It wraps VNRecognizeTextRequest to detect text in images and return observations with pixel coordinates, bounding boxes, and confidence scores. Search functions support region filtering and ranking preferences for locating specific text on screen.
Basic usage:
svc := ocr.NewService(false)
obs, err := svc.RecognizeText(img)
for _, o := range obs {
fmt.Printf("%s at (%d,%d)\n", o.Text, o.Center.X, o.Center.Y)
}
Finding text by content:
x, y, found := svc.FindText(img, "Continue")
Index ¶
- func DrawRect(img *image.RGBA, x1, y1, x2, y2 int, c color.RGBA)
- func SaveDebugScreenshot(img image.Image, observations []TextObservation, dir, prefix string) error
- type DebugEntry
- type Region
- type SearchOptions
- type Service
- func (s *Service) AllText(img image.Image) string
- func (s *Service) FindText(img image.Image, needle string) (x, y float64, found bool)
- func (s *Service) FindTextNormalized(img image.Image, needle string) (x, y float64, found bool)
- func (s *Service) FindTextNormalizedWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)
- func (s *Service) FindTextWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)
- func (s *Service) RecognizeText(img image.Image) ([]TextObservation, error)
- type TextObservation
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func SaveDebugScreenshot ¶
func SaveDebugScreenshot(img image.Image, observations []TextObservation, dir, prefix string) error
SaveDebugScreenshot saves a screenshot with OCR bounding boxes overlaid. It writes both a PNG image and a JSON file with the observations.
Types ¶
type DebugEntry ¶
type DebugEntry struct {
Timestamp string `json:"timestamp"`
Step string `json:"step,omitempty"`
Observations []TextObservation `json:"observations"`
ScreenState string `json:"screen_state,omitempty"`
}
DebugEntry represents a single OCR debug log entry.
type SearchOptions ¶
SearchOptions controls OCR text matching behavior.
func MenuSearchOptions ¶
func MenuSearchOptions() SearchOptions
MenuSearchOptions returns options tuned for menu bar targeting.
func ParseSearchOptions ¶
func ParseSearchOptions(regionSpec string) (SearchOptions, error)
ParseSearchOptions parses a region selector for OCR commands. Supported selectors:
- "" / "screen" / "full": whole screen
- "menu" / "menubar": top menu bar strip
- "x1,y1,x2,y2": normalized rectangle coordinates
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service performs text recognition using Apple's Vision framework.
func (*Service) AllText ¶
AllText returns all recognized text as a single string with lines separated by newlines.
func (*Service) FindText ¶
FindText searches for text on screen and returns its center pixel coordinates. When multiple observations contain the needle, it prefers:
- Exact full-text matches (observation text equals needle, case-insensitive)
- Shorter observation text (closer to the needle itself)
- Observations closer to the bottom of the screen
Returns found=false if the text is not visible.
func (*Service) FindTextNormalized ¶
FindTextNormalized searches for text and returns its center in normalized coordinates (0-1, top-left origin) suitable for mouse input commands.
func (*Service) FindTextNormalizedWithOptions ¶
func (s *Service) FindTextNormalizedWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)
FindTextNormalizedWithOptions searches for text with options and returns the center in normalized coordinates (0-1, top-left origin).
func (*Service) FindTextWithOptions ¶
func (s *Service) FindTextWithOptions(img image.Image, needle string, opts SearchOptions) (x, y float64, found bool)
FindTextWithOptions searches for text with optional region and ranking preferences.
func (*Service) RecognizeText ¶
func (s *Service) RecognizeText(img image.Image) ([]TextObservation, error)
RecognizeText runs OCR on an image and returns all recognized text observations.
type TextObservation ¶
type TextObservation struct {
Text string
Confidence float32
BoundingBox corefoundation.CGRect // normalized coordinates (0-1), origin at bottom-left
Center image.Point // center in screen pixel coordinates
}
TextObservation holds a recognized text region from OCR.
func BestMatch ¶
func BestMatch(observations []TextObservation, needle string, opts SearchOptions, bounds image.Rectangle) (TextObservation, bool)
BestMatch finds the best matching observation for needle among the given observations, applying search options and ranking heuristics.