Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ExtractPDFContent ¶
func ExtractPDFContent(filePath string, chunkSize int, skipSmallImages bool, minImageSize int, noTips bool) ([]PDFTextData, []PDFImageData, error)
ExtractPDFContent extracts both text and images from a PDF file
Types ¶
type PDFImageData ¶
type PDFImageData struct {
ID string
ImageData string // Base64 encoded image data
Image string // Data URL format
URL string
Metadata map[string]interface{}
OCRText string
EXIFData map[string]interface{}
Caption string
PageNumber int
ImageIndex int
SourcePDF string
SurroundingText string // Text context from the page where the image appears
SectionHeading string // Section/chapter heading if available
}
PDFImageData represents an extracted image from a PDF
Click to show internal directories.
Click to hide internal directories.