pdf

package
v0.2.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2025 License: Apache-2.0 Imports: 12 Imported by: 0

Documentation

Overview

Package pdf provides a Normaliser implementation for PDF files. It uses the pdftotext utility from poppler-utils for text extraction.

Index

Constants

This section is empty.

Variables

View Source
var ErrPDFToolNotFound = errors.New("pdftotext not found: install poppler-utils")

ErrPDFToolNotFound is returned when pdftotext is not installed.

Functions

func CheckAvailable

func CheckAvailable() error

CheckAvailable returns nil if pdftotext is installed, error otherwise.

func InstallInstructions

func InstallInstructions() string

InstallInstructions returns platform-specific install instructions.

Types

type CommandRunner

type CommandRunner interface {
	Run(ctx context.Context, name string, args ...string) ([]byte, error)
}

CommandRunner abstracts command execution for testing.

type DefaultRunner

type DefaultRunner struct{}

DefaultRunner executes commands using os/exec.

func (*DefaultRunner) Run

func (r *DefaultRunner) Run(ctx context.Context, name string, args ...string) ([]byte, error)

Run executes a command and returns its output.

type Normaliser

type Normaliser struct {
	// contains filtered or unexported fields
}

Normaliser handles PDF documents using pdftotext.

func New

func New() *Normaliser

New creates a new PDF normaliser.

func NewWithRunner

func NewWithRunner(runner CommandRunner) *Normaliser

NewWithRunner creates a PDF normaliser with a custom command runner (for testing).

func (*Normaliser) Normalise

Normalise converts a PDF document to a normalised document.

func (*Normaliser) Priority

func (n *Normaliser) Priority() int

Priority returns the selection priority.

func (*Normaliser) SupportedConnectorTypes

func (n *Normaliser) SupportedConnectorTypes() []string

SupportedConnectorTypes returns connector types for specialised handling.

func (*Normaliser) SupportedMIMETypes

func (n *Normaliser) SupportedMIMETypes() []string

SupportedMIMETypes returns the MIME types this normaliser handles.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL