pdf

package
v0.9.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractPDFContent

func ExtractPDFContent(filePath string, chunkSize int, skipSmallImages bool, minImageSize int, noTips bool) ([]PDFTextData, []PDFImageData, error)

ExtractPDFContent extracts both text and images from a PDF file

Types

type PDFImageData

type PDFImageData struct {
	ID              string
	ImageData       string // Base64 encoded image data
	Image           string // Data URL format
	URL             string
	Metadata        map[string]interface{}
	OCRText         string
	EXIFData        map[string]interface{}
	Caption         string
	PageNumber      int
	ImageIndex      int
	SourcePDF       string
	SurroundingText string // Text context from the page where the image appears
	SectionHeading  string // Section/chapter heading if available
}

PDFImageData represents an extracted image from a PDF

type PDFTextData

type PDFTextData struct {
	ID         string
	Content    string
	URL        string
	Metadata   map[string]interface{}
	PageNumber int
	SourcePDF  string
}

PDFTextData represents extracted text from a PDF

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL