pdf

package
v0.24.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 7, 2025 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// PDF security limits
	DefaultMaxFileSize      = int64(200 * 1024 * 1024)      // 200MB default file size limit
	DefaultMaxMemoryLimit   = int64(5 * 1024 * 1024 * 1024) // 5GB default memory limit
	PDFMaxFileSizeEnvVar    = "PDF_MAX_FILE_SIZE"
	PDFMaxMemoryLimitEnvVar = "PDF_MAX_MEMORY_LIMIT"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type PDFRequest

type PDFRequest struct {
	// FilePath is the absolute path to the PDF file to process
	FilePath string `json:"file_path"`

	// OutputDir is the directory where markdown and images will be saved
	OutputDir string `json:"output_dir"`

	// ExtractImages indicates whether to extract images from the PDF
	ExtractImages bool `json:"extract_images"`

	// Pages specifies which pages to process (e.g., "1-5", "1,3,5", "all")
	Pages string `json:"pages"`
}

PDFRequest represents a request to process a PDF file

type PDFResponse

type PDFResponse struct {
	// FilePath is the original PDF file that was processed
	FilePath string `json:"file_path"`

	// MarkdownFile is the path to the generated markdown file
	MarkdownFile string `json:"markdown_file"`

	// ExtractedImages is a list of extracted image file paths
	ExtractedImages []string `json:"extracted_images"`

	// PagesProcessed is the number of pages that were processed
	PagesProcessed int `json:"pages_processed"`

	// TotalPages is the total number of pages in the PDF
	TotalPages int `json:"total_pages"`

	// OutputDir is the directory where files were saved
	OutputDir string `json:"output_dir"`
}

PDFResponse represents the result of PDF processing

type PDFTool

type PDFTool struct{}

PDFTool implements PDF processing with pdfcpu

func (*PDFTool) Definition

func (t *PDFTool) Definition() mcp.Tool

Definition returns the tool's definition for MCP registration

func (*PDFTool) Execute

func (t *PDFTool) Execute(ctx context.Context, logger *logrus.Logger, cache *sync.Map, args map[string]interface{}) (*mcp.CallToolResult, error)

Execute processes the PDF file

func (*PDFTool) ExtractTextFromPDFOperation added in v0.18.0

func (t *PDFTool) ExtractTextFromPDFOperation(operation string) []string

ExtractTextFromPDFOperation extracts all text strings from a PDF operation line

func (*PDFTool) GetMaxFileSize added in v0.21.2

func (t *PDFTool) GetMaxFileSize() int64

GetMaxFileSize returns the configured maximum file size in bytes

func (*PDFTool) GetMaxMemoryLimit added in v0.21.2

func (t *PDFTool) GetMaxMemoryLimit() int64

GetMaxMemoryLimit returns the configured maximum memory limit in bytes

func (*PDFTool) ParsePageSelection added in v0.18.0

func (t *PDFTool) ParsePageSelection(pages string, maxPage int) ([]int, error)

ParsePageSelection parses page selection string into a slice of page numbers

func (*PDFTool) ParseRequest added in v0.18.0

func (t *PDFTool) ParseRequest(args map[string]interface{}) (*PDFRequest, error)

ParseRequest parses and validates the tool arguments

func (*PDFTool) ProvideExtendedInfo added in v0.22.0

func (t *PDFTool) ProvideExtendedInfo() *tools.ExtendedHelp

ProvideExtendedInfo provides detailed usage information for the PDF tool

func (*PDFTool) ValidateFileSize added in v0.21.2

func (t *PDFTool) ValidateFileSize(fileSize int64) error

ValidateFileSize validates that the file size is within limits

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL