extract

package

v0.8.0 Latest Latest Go to latest Published: Jan 11, 2026 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/benedoc-inc/pdfer

Links

Open Source Insights

Documentation ¶

Overview ¶

Package extraction provides comprehensive PDF content extraction This package extracts all content types from PDFs into structured data models that can be serialized to JSON

Index ¶

func CompareTextElements(extracted, expected []types.TextElement) bool
func CreateTestPDFWithComplexText() ([]byte, []types.TextElement, error)
func CreateTestPDFWithGraphics() ([]byte, []types.Graphic, error)
func CreateTestPDFWithText(texts []TestText) ([]byte, []types.TextElement, error)
func ExtractAllImages(pdfBytes []byte, password []byte, verbose bool) ([]types.Image, error)
func ExtractBookmarks(pdfBytes []byte, pdf *parse.PDF, verbose bool) ([]types.Bookmark, error)
func ExtractContent(pdfBytes []byte, password []byte, verbose bool) (*types.ContentDocument, error)
func ExtractContentToJSON(pdfBytes []byte, password []byte, verbose bool) (string, error)
func ExtractMetadata(pdfBytes []byte, pdf *parse.PDF, verbose bool) (*types.DocumentMetadata, error)
func ExtractPages(pdfBytes []byte, pdf *parse.PDF, verbose bool) ([]types.Page, error)
func ParseTestPDF(pdfBytes []byte) (*parse.PDF, error)
type TestText

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CompareTextElements ¶ added in v0.8.0

func CompareTextElements(extracted, expected []types.TextElement) bool

CompareTextElements compares extracted text elements with expected ones Returns true if they match (allowing for small differences in width calculations)

func CreateTestPDFWithComplexText ¶ added in v0.8.0

func CreateTestPDFWithComplexText() ([]byte, []types.TextElement, error)

CreateTestPDFWithComplexText creates a PDF with complex text operations for testing

func CreateTestPDFWithGraphics ¶ added in v0.8.0

func CreateTestPDFWithGraphics() ([]byte, []types.Graphic, error)

CreateTestPDFWithGraphics creates a PDF with graphics for testing extraction

func CreateTestPDFWithText ¶ added in v0.8.0

func CreateTestPDFWithText(texts []TestText) ([]byte, []types.TextElement, error)

CreateTestPDFWithText creates a simple PDF with known text content for testing extraction Returns the PDF bytes and the expected text elements

func ExtractAllImages ¶ added in v0.8.0

func ExtractAllImages(pdfBytes []byte, password []byte, verbose bool) ([]types.Image, error)

ExtractAllImages extracts all images from a PDF document

func ExtractBookmarks ¶

func ExtractBookmarks(pdfBytes []byte, pdf *parse.PDF, verbose bool) ([]types.Bookmark, error)

ExtractBookmarks extracts bookmarks/outlines from a PDF

func ExtractContent ¶

func ExtractContent(pdfBytes []byte, password []byte, verbose bool) (*types.ContentDocument, error)

ExtractContent extracts all content from a PDF into a ContentDocument This is the main entry point for content extraction

func ExtractContentToJSON ¶

func ExtractContentToJSON(pdfBytes []byte, password []byte, verbose bool) (string, error)

ExtractContentToJSON extracts content and returns as JSON string

func ExtractMetadata ¶

func ExtractMetadata(pdfBytes []byte, pdf *parse.PDF, verbose bool) (*types.DocumentMetadata, error)

ExtractMetadata extracts document metadata

func ExtractPages ¶

func ExtractPages(pdfBytes []byte, pdf *parse.PDF, verbose bool) ([]types.Page, error)

ExtractPages extracts all pages from a PDF

func ParseTestPDF ¶ added in v0.8.0

func ParseTestPDF(pdfBytes []byte) (*parse.PDF, error)

ParseTestPDF parses a PDF created by test helpers

Types ¶

type TestText ¶ added in v0.8.0

type TestText struct {
	Text     string
	X        float64
	Y        float64
	FontSize float64
	Width    float64 // Expected width (approximate)
}

TestText represents text to be added to a test PDF

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL