extraction

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Extensions

func Extensions() []string

Extensions returns the supported document extensions (e.g. ".pdf", ".docx").

func IsDocument

func IsDocument(ext string) bool

IsDocument returns true if the extension is a supported document format.

Types

type DocumentText

type DocumentText struct{}

DocumentText extracts plain text from binary document files using tabula.

func NewDocumentText

func NewDocumentText() *DocumentText

NewDocumentText creates a new DocumentText.

func (*DocumentText) Text

func (d *DocumentText) Text(path string) (string, error)

Text extracts readable text from the file at the given path. It validates that the file exists, has a supported extension, and is within the maximum size limit before passing it to tabula.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL