Documentation
¶
Overview ¶
Package pdf provides a Normaliser implementation for PDF files. It uses the pdftotext utility from poppler-utils for text extraction.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrPDFToolNotFound = errors.New("pdftotext not found: install poppler-utils")
ErrPDFToolNotFound is returned when pdftotext is not installed.
Functions ¶
func CheckAvailable ¶
func CheckAvailable() error
CheckAvailable returns nil if pdftotext is installed, error otherwise.
func InstallInstructions ¶
func InstallInstructions() string
InstallInstructions returns platform-specific install instructions.
Types ¶
type CommandRunner ¶
type CommandRunner interface {
Run(ctx context.Context, name string, args ...string) ([]byte, error)
}
CommandRunner abstracts command execution for testing.
type Normaliser ¶
type Normaliser struct {
// contains filtered or unexported fields
}
Normaliser handles PDF documents using pdftotext.
func NewWithRunner ¶
func NewWithRunner(runner CommandRunner) *Normaliser
NewWithRunner creates a PDF normaliser with a custom command runner (for testing).
func (*Normaliser) Normalise ¶
func (n *Normaliser) Normalise(ctx context.Context, raw *domain.RawDocument) (*driven.NormaliseResult, error)
Normalise converts a PDF document to a normalised document.
func (*Normaliser) Priority ¶
func (n *Normaliser) Priority() int
Priority returns the selection priority.
func (*Normaliser) SupportedConnectorTypes ¶
func (n *Normaliser) SupportedConnectorTypes() []string
SupportedConnectorTypes returns connector types for specialised handling.
func (*Normaliser) SupportedMIMETypes ¶
func (n *Normaliser) SupportedMIMETypes() []string
SupportedMIMETypes returns the MIME types this normaliser handles.