ocr

package
v0.101.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 20, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package ocr provides OCR (Optical Character Recognition) functionality using Tesseract. This is used to extract text from images for full-text search.

Index

Constants

This section is empty.

Variables

View Source
var SupportedMimeTypes = []string{
	"image/png",
	"image/jpeg",
	"image/jpg",
	"image/gif",
	"image/bmp",
	"image/webp",
}

Supported image MIME types for OCR.

Functions

func FormatOutput

func FormatOutput(result *Result, format string) (string, error)

FormatOutput formats the OCR output.

func GetLanguageName

func GetLanguageName(code string) string

GetLanguageName returns the full name of a language code.

Types

type Box

type Box struct {
	X      int32 `json:"x"`
	Y      int32 `json:"y"`
	Width  int32 `json:"width"`
	Height int32 `json:"height"`
}

Box represents a bounding box.

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client provides OCR functionality.

func NewClient

func NewClient(config *Config) *Client

NewClient creates a new OCR client.

func (*Client) ExtractText

func (c *Client) ExtractText(ctx context.Context, image []byte, mimeType string) (string, error)

ExtractText extracts text from an image using Tesseract OCR.

func (*Client) ExtractTextToHOCR

func (c *Client) ExtractTextToHOCR(ctx context.Context, image []byte, mimeType string) (string, error)

ExtractTextToHOCR extracts text with hOCR format (HTML with position info).

func (*Client) ExtractTextWithLayout

func (c *Client) ExtractTextWithLayout(ctx context.Context, image []byte, mimeType string) (*Result, error)

ExtractTextWithLayout extracts text with layout information (boxes).

func (*Client) GetAvailableLanguages

func (c *Client) GetAvailableLanguages(ctx context.Context) ([]string, error)

GetAvailableLanguages returns the list of available languages.

func (*Client) GetVersion

func (c *Client) GetVersion(ctx context.Context) (string, error)

GetVersion returns the Tesseract version.

func (*Client) IsAvailable

func (c *Client) IsAvailable(ctx context.Context) bool

IsAvailable checks if Tesseract is available.

func (*Client) IsSupported

func (c *Client) IsSupported(mimeType string) bool

IsSupported checks if a MIME type is supported for OCR.

type Config

type Config struct {
	// TesseractPath is the path to the tesseract executable
	TesseractPath string
	// DataPath is the path to the tessdata directory (optional)
	DataPath string
	// Languages are the languages to use for OCR (e.g., "chi_sim+eng")
	Languages string
}

Config holds the OCR configuration.

func ConfigFromEnv

func ConfigFromEnv() *Config

ConfigFromEnv creates OCR config from environment variables.

func DefaultConfig

func DefaultConfig() *Config

DefaultConfig returns the default OCR configuration.

type Line

type Line struct {
	BoundingBox *Box   `json:"bounding_box,omitempty"`
	Text        string `json:"text"`
	Words       []Word `json:"words,omitempty"`
}

Line represents a line of text.

type Result

type Result struct {
	Text       string  `json:"text"`
	Languages  string  `json:"languages,omitempty"`
	Words      []Word  `json:"words,omitempty"`
	Lines      []Line  `json:"lines,omitempty"`
	Confidence float64 `json:"confidence,omitempty"`
}

Result represents the OCR result with metadata.

func Merge

func Merge(results []*Result) *Result

Merge merges multiple OCR results.

func ParseHOCR

func ParseHOCR(hocr string) (*Result, error)

ParseHOCR parses hOCR output and returns structured data.

func (*Result) MarshalJSON

func (r *Result) MarshalJSON() ([]byte, error)

MarshalJSON implements custom JSON marshaling.

func (*Result) Validate

func (r *Result) Validate() error

Validate validates the OCR result.

type Word

type Word struct {
	BoundingBox *Box    `json:"bounding_box,omitempty"`
	Text        string  `json:"text"`
	Confidence  float64 `json:"confidence,omitempty"`
}

Word represents a single word with position.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL