pdf

package

v0.11.3 Latest Latest Go to latest Published: Mar 9, 2026 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/maximilien/weave-cli

Links

Open Source Insights

Documentation ¶

Index ¶

func ExtractPDFContent(filePath string, chunkSize int, skipSmallImages bool, minImageSize int, ...) ([]PDFTextData, []PDFImageData, error)
type PDFImageData
type PDFTextData

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ExtractPDFContent ¶

func ExtractPDFContent(filePath string, chunkSize int, skipSmallImages bool, minImageSize int, maxMetadataLength int, noTips bool) ([]PDFTextData, []PDFImageData, error)

ExtractPDFContent extracts both text and images from a PDF file

Types ¶

type PDFImageData ¶

type PDFImageData struct {
	ID              string
	ImageData       string // Base64 encoded image data
	Image           string // Data URL format
	URL             string
	Metadata        map[string]interface{}
	OCRText         string
	EXIFData        map[string]interface{}
	Caption         string
	PageNumber      int
	ImageIndex      int
	SourcePDF       string
	SurroundingText string // Text context from the page where the image appears
	SectionHeading  string // Section/chapter heading if available
}

PDFImageData represents an extracted image from a PDF

type PDFTextData ¶

type PDFTextData struct {
	ID         string
	Content    string
	URL        string
	Metadata   map[string]interface{}
	PageNumber int
	SourcePDF  string
}

PDFTextData represents extracted text from a PDF

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL