mupdf

package
v0.0.0-...-1b76ffa Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 4, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var MuPDFLock sync.Mutex

Functions

func WithConfig

func WithConfig(config PDFOptions) func(o *PDFOptions)

WithConfig sets the PDF loader configuration.

func WithDisablePageMerge

func WithDisablePageMerge() func(o *PDFOptions)

Types

type PDF

type PDF struct {
	Opts      PDFOptions
	Document  *fitz.Document
	Converter *mdconv.Converter
	Lock      *sync.Mutex
	Tokenizer *tiktoken.Tiktoken
}

PDF represents a PDF Document loader that implements the DocumentLoader interface.

func NewPDF

func NewPDF(r io.Reader, optFns ...func(o *PDFOptions)) (*PDF, error)

NewPDF creates a new PDF loader with the given options.

func (*PDF) Load

func (l *PDF) Load(ctx context.Context) ([]vs.Document, error)

Load loads the PDF Document and returns a slice of vs.Document containing the page contents and metadata.

type PDFOptions

type PDFOptions struct {
	// Password for encrypted PDF files.
	Password string

	// Page number to start loading from (default is 1).
	StartPage uint

	// Number of goroutines to load pdf documents
	NumThread int

	// EnablePageMerge
	EnablePageMerge bool

	// ChunkSize - maximum number of tokens allowed in a single document
	ChunkSize int

	ChunkOverlap int

	// TokenEncoding - encoding for Tokenizer to use for page merging
	TokenEncoding string

	// Tokenizer - target model for Tokenizer to use for page merging
	TokenModel string
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL