docprocessing

package

v0.16.11 Latest Latest Go to latest Published: Jul 15, 2025 License: Apache-2.0 Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sammcj/mcp-devtools

Links

Open Source Insights

README ¶

Document Processing Tool

The Document Processing tool provides intelligent document conversion capabilities for PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, and JPG files using the powerful Docling library. It converts documents to structured Markdown while preserving formatting, extracting tables, images, and metadata.

Experimental! This tool is in active development and has more than a few rough edges.

Features

Multi-format Support: PDF, DOCX, XLSX, PPTX, HTML, CSV, PNG, JPG document processing
Processing Profiles: Simplified interface with preset configurations for common use cases
Intelligent Conversion: Preserves document structure and formatting
OCR Support: Extract text from scanned documents
Hardware Acceleration: Supports MPS (macOS), CUDA, and CPU processing
Caching System: Intelligent caching to avoid reprocessing identical documents
Metadata Extraction: Extracts document metadata (title, author, page count, etc.)
Table & Image Extraction: Preserves tables and images in markdown format
Diagram Analysis: Advanced diagram detection and description using vision models
Mermaid Generation: Convert diagrams to editable Mermaid syntax by using an external LLM provider
Auto-Save: Automatically saves processed content to files by default

Installation

Prerequisites

Note: mcp-devtools will attempt to install the docling package if it's unavailable.

Python 3.13+ with Docling installed:
```
pip install docling
```
Optional: Hardware Acceleration
- macOS: Install PyTorch with MPS support
- NVIDIA GPUs: Install PyTorch with CUDA support
- CPU: Works out of the box

Configuration

The tool can be configured via environment variables:

# Python Configuration
DOCLING_PYTHON_PATH="/path/to/python"  # Auto-detected if not set

# Cache Configuration
DOCLING_CACHE_DIR="~/.mcp-devtools/docling-cache"
DOCLING_CACHE_ENABLED="true"

# Hardware Acceleration
DOCLING_HARDWARE_ACCELERATION="auto"  # auto, mps, cuda, cpu

# Processing Configuration
DOCLING_TIMEOUT="300"        # 5 minutes
DOCLING_MAX_FILE_SIZE="100"  # 100 MB

# OCR Configuration
DOCLING_OCR_LANGUAGES="en,fr,de"

# Vision Model Configuration
DOCLING_VISION_MODEL="SmolDocling"

# Certificate Configuration (for MITM proxies)
DOCLING_EXTRA_CA_CERTS="/path/to/mitm-ca-bundle.pem"

# LLM Configuration (for advanced diagram processing)
DOCLING_VLM_API_URL="http://localhost:11434/v1"
DOCLING_VLM_MODEL="qwen2.5vl:7b-q8_0"
DOCLING_VLM_API_KEY="your-api-key-here"

Usage

Simple Usage (Recommended)

The tool now features a simplified interface using processing profiles that automatically configure all necessary parameters:

{
  "source": "/path/to/document.pdf"
}

This uses the default text-and-image profile and automatically saves the processed content to /path/to/document.md.

Processing Profiles

Choose from preset profiles that configure multiple parameters automatically:

`basic` - Fast Text Extraction

{
  "source": "/path/to/document.pdf",
  "profile": "basic"
}

Text extraction only
Fastest processing
No image or diagram analysis

`text-and-image` - Balanced Processing (Default)

{
  "source": "/path/to/document.pdf",
  "profile": "text-and-image"
}

Text and image extraction
Table processing
Good balance of speed and features

`scanned` - OCR Processing

{
  "source": "/path/to/scanned-document.pdf",
  "profile": "scanned"
}

Optimised for scanned documents
OCR enabled by default
Best for image-based PDFs

`llm-smoldocling` - Vision Enhancement

{
  "source": "/path/to/document.pdf",
  "profile": "llm-smoldocling"
}

Enhanced with SmolDocling vision model
Diagram detection and description
Chart data extraction
No external LLM required
Slower than text-and-image

`llm-external` - Advanced Diagram Processing

{
  "source": "/path/to/document.pdf",
  "profile": "llm-external"
}

Full diagram-to-Mermaid conversion
Requires LLM environment variables
Most advanced processing capabilities
Slower processing time
Best for documents with diagrams and charts
Only available when DOCLING_LLM_* environment variables are configured

Output Control

Save to File (Default)

{
  "source": "/path/to/document.pdf"
}

Automatically saves to /path/to/document.md and if images are extracted, they will be saved in the same directory
Returns success message with file path

Custom Save Location

{
  "source": "/path/to/document.pdf",
  "save_to": "/custom/path/output.md"
}

Saves to specified location
Must be an absolute path

Return Content Inline

{
  "source": "/path/to/document.pdf",
  "inline": true
}

Returns content in the response
No file is saved

OCR (Optical Character Recognition)

The tool supports OCR processing for extracting text from scanned documents and images. Understanding when to use OCR is important for optimal results:

OCR Disabled (Default for most profiles)

Best for: Digital documents (native PDFs, Word documents, Excel files)
How it works: Extracts text directly from the document's digital structure
Advantages:
- Faster processing
- Perfect text accuracy (no recognition errors)
- Preserves original formatting and fonts
- Lower resource usage
Limitations: Cannot process scanned documents or image-based PDFs

OCR Enabled (Default for `scanned` profile)

Best for: Scanned documents, image-based PDFs, photos of documents
How it works: Uses computer vision to recognise text from images
Advantages:
- Can process any document type, including scanned pages
- Handles handwritten text (with varying accuracy)
- Works with photos and screenshots of documents
Limitations:
- Slower processing
- May introduce text recognition errors
- Formatting may not be perfectly preserved
- Higher resource usage

OCR Language Support

When using the scanned profile or enabling OCR manually, you can specify languages:

{
  "profile": "scanned",
  "ocr_languages": ["en", "fr", "de", "es"]
}

Supported languages include: English (en), French (fr), German (de), Spanish (es), Italian (it), Portuguese (pt), Dutch (nl), Russian (ru), Chinese (zh), Japanese (ja), Korean (ko), and many others.

Diagram Analysis and Mermaid Generation

Basic Diagram Analysis (`llm-smoldocling` profile)

Uses the built-in SmolDocling vision model for diagram detection and description:

{
  "source": "/path/to/document.pdf",
  "profile": "llm-smoldocling"
}

Advanced Mermaid Generation (`llm-external` profile)

For diagram-to-Mermaid conversion, first configure external LLM integration:

# Required environment variables
export DOCLING_VLM_API_URL="http://localhost:11434/v1"   # Any OpenAI-compatible endpoint
export DOCLING_VLM_MODEL="qwen2.5vl:7b-q8_0"                     # Vision-capable model
export DOCLING_VLM_API_KEY="your-api-key-here"            # API key

# Optional configuration
export DOCLING_LLM_MAX_TOKENS="16384"        # Maximum tokens for LLM response
export DOCLING_LLM_TEMPERATURE="0.1"         # Temperature for LLM inference
export DOCLING_LLM_TIMEOUT="240"             # Timeout for LLM requests in seconds

Then use the llm-external profile:

{
  "source": "/path/to/document.pdf",
  "profile": "llm-external"
}

Supported LLM Providers

The tool supports any OpenAI-compatible API endpoint, e.g:

Ollama (local): http://localhost:11434/v1
LM Studio (local): http://localhost:1234/v1
OpenAI: https://api.openai.com/v1
OpenRouter: https://openrouter.ai/api/v1

Ensure you select a model that supports vision input (e.g., qwen2.5vl:7b-q8_0, gpt-4-vision-preview, claude-3-sonnet).

Diagram Analysis Features

Automatic Detection: Identifies diagrams, flowcharts, architecture diagrams, and charts
Type Classification: Classifies diagram types with confidence scoring
Mermaid Conversion: Generates valid Mermaid syntax for diagrams
Element Extraction: Extracts text elements and structural components
AWS Colour Coding: Applies consistent colour schemes for architecture diagrams
Validation: Validates generated Mermaid syntax for correctness
Fallback Handling: Gracefully falls back to basic analysis if LLM is unavailable

Response Format

File Save Response (Default)

{
  "success": true,
  "message": "Content successfully exported to file",
  "save_path": "/path/to/document.md",
  "source": "/path/to/document.pdf",
  "cache_hit": false,
  "metadata": {
    "file_size": 15420,
    "document_title": "Document Title",
    "document_author": "Author Name",
    "page_count": 10,
    "word_count": 1500
  },
  "processing_info": {
    "processing_mode": "advanced",
    "processing_method": "advanced+vision:standard",
    "hardware_acceleration": "mps",
    "ocr_enabled": false,
    "processing_time": 2.5,
    "timestamp": "2025-07-09T22:12:15+10:00"
  }
}

Inline Content Response

{
  "source": "/path/to/document.pdf",
  "content": "# Document Title\n\nDocument content in markdown...",
  "cache_hit": false,
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "subject": "Document Subject",
    "page_count": 10,
    "word_count": 1500
  },
  "images": [
    {
      "id": "image_1",
      "type": "picture",
      "caption": "Figure 1",
      "file_path": "/path/to/extracted/image_1.png",
      "width": 800,
      "height": 600
    }
  ],
  "diagrams": [
    {
      "id": "diagram_1",
      "type": "flowchart",
      "description": "Process flow diagram showing...",
      "mermaid_code": "flowchart TD\n    A[Start] --> B[Process]\n    B --> C[End]",
      "confidence": 0.95
    }
  ],
  "processing_info": {
    "processing_mode": "advanced",
    "processing_method": "advanced+vision:smoldocling+llm:enhanced",
    "hardware_acceleration": "mps",
    "ocr_enabled": false,
    "processing_time": 8.2,
    "timestamp": "2025-07-09T22:12:15+10:00"
  }
}

Error Handling

The tool provides detailed error information:

{
  "source": "/path/to/document.pdf",
  "error": "Processing failed: File not found",
  "system_info": {
    "platform": "darwin",
    "python_available": true,
    "docling_available": false,
    "hardware_acceleration": ["cpu", "mps"]
  }
}

Architecture

Components

DocumentProcessorTool: Main MCP tool interface with simplified profile system
Config: Configuration management with environment variable support
CacheManager: Intelligent caching system with TTL support
LLMClient: External LLM integration for advanced diagram processing
Python Wrapper: Subprocess interface to Docling Python library

File Structure

internal/tools/docprocessing/
├── README.md                    # This file
├── document_processor.go        # Main tool implementation
├── types.go                     # Type definitions including profiles
├── config.go                    # Configuration management
├── cache.go                     # Caching system
├── llm_client.go               # LLM integration for diagram processing
└── python/
    ├── docling_processor.py     # Main Python wrapper script
    ├── image_processing.py      # Image extraction and processing
    └── table_processing.py      # Table extraction and formatting

Processing Profiles Implementation

The profile system automatically configures multiple parameters:

Profile Selection: Chooses appropriate processing mode, vision settings, and features
Dependency Resolution: Automatically enables required services and parameters
Environment Awareness: llm-external profile only available when LLM is configured
Backward Compatibility: Individual parameters still work for advanced users

Performance

Caching

The tool implements intelligent caching based on:

Document source (file path/URL)
Processing parameters (profile, mode, OCR settings, etc.)
File modification time (for local files)

Cache entries have a 24-hour TTL by default and are stored as JSON files.

Hardware Acceleration

Processing performance varies by hardware:

CPU: Baseline performance, works everywhere
MPS (macOS): 2-5x faster on Apple Silicon
CUDA: 3-10x faster on NVIDIA GPUs

Profile Performance Comparison

basic: Fastest
text-and-image: Moderate
scanned: Slower
llm-smoldocling: Moderate
llm-external: Slowest

Use With Custom MITM Certs

The document processing tool performs a pip install docling (if it's not found) and downloads models, so corporate environments with MITM proxies may need additional certs. Set the DOCLING_EXTRA_CA_CERTS environment variable:

export DOCLING_EXTRA_CA_CERTS="/path/to/mitm-ca-bundle.pem"

Supported Certificate Formats

.pem - PEM encoded certificates
.crt - Certificate files
.cer - Certificate files
.ca-bundle - Certificate bundles

Troubleshooting

Common Issues

"Python path is required but not found"
- Install Python 3.10+ (ideally 3.13+) and ensure it's in PATH
- Or set DOCLING_PYTHON_PATH environment variable
"Docling not available"
- Install Docling: pip install docling
- Verify installation: python -c "import docling; print('OK')"
"Processing timeout"
- Increase timeout with DOCLING_TIMEOUT environment variable
- Or pass timeout parameter in request
"Hardware acceleration not working"
- Install appropriate PyTorch version for your hardware
- Check system compatibility with python -c "import torch; print(torch.backends.mps.is_available())"
"LLM external profile not available"
- Ensure all DOCLING_LLM_* environment variables are set
- Verify LLM endpoint is accessible
- Check model supports vision input
"Certificate path does not exist"
- Verify the path specified in DOCLING_EXTRA_CA_CERTS exists
- Ensure the certificate file or directory is readable

Debug Mode

Enable debug mode to see detailed processing information:

{
  "source": "/path/to/document.pdf",
  "debug": true
}

Potential Future Enhancements

Document Structure Enhancement

Reading Order Detection: Improve paragraph and section ordering algorithms
Metadata Extraction: Enhanced title, author, reference detection using NLP
Language Detection: Automatic document language identification with confidence scores
Figure-Caption Matching: Automatic association of figures with their captions using proximity and semantic analysis

Processing Pipeline Options

Batch Processing: Support for processing multiple documents efficiently with shared model loading
Resource Limits: Configurable page limits, file size limits, CPU thread limits for enterprise deployment
Remote Services: Optional integration with cloud-based OCR or vision services (Azure, AWS, GCP)
Custom Model Pipelines: Extensible architecture for adding new models via plugin system

Advanced Output Formats

Custom Chunking: Integration with HybridChunker for RAG applications
Semantic Markup: Add semantic tags for better downstream processing

Diagram/Chart Processing (External Integration)

External Service Integration: Use services like "Diagram to Mermaid Converter" APIs
Vision Model Integration: Potentially add support for using an external LLM API for diagram processing
OCR + Pattern Recognition: Extract text from diagrams and attempt to reconstruct logical structure
Flowchart Recognition: Specific support for flowchart-to-Mermaid conversion

Performance and Scalability

Streaming Processing: Support for processing large documents in chunks
Distributed Processing: Support for processing across multiple nodes

Quality and Accuracy Improvements

Confidence Scoring: Add confidence scores for all extracted elements
Quality Metrics: Implement quality assessment for extracted content
Error Recovery: Better handling of corrupted or unusual document formats

Smart Defaults and Auto-Detection

Language Detection: Auto-detect instead of requiring ocr_languages
Processing Mode: Auto-select based on document analysis
Table Processing: Always use optimal settings

License

This tool is part of the mcp-devtools project and follows the same license terms.

Documentation ¶

Index ¶

Constants
func CleanupEmbeddedScripts() error
func GetEmbeddedScriptPath() (string, error)
func IsEmbeddedScriptsAvailable() bool
func IsLLMConfigured() bool
func ReadEmbeddedFile(path string) ([]byte, error)
type BatchProcessingRequest
type BatchProcessingResponse
type BatchSummary
type BoundingBox
type CacheManager
- func NewCacheManager(config *Config) *CacheManager
- func (cm *CacheManager) CleanExpired() error
- func (cm *CacheManager) CleanOldFiles(maxAge time.Duration) error
- func (cm *CacheManager) Clear() error
- func (cm *CacheManager) ClearFileCache(source string) error
- func (cm *CacheManager) Delete(cacheKey string) error
- func (cm *CacheManager) GenerateCacheKey(req *DocumentProcessingRequest) string
- func (cm *CacheManager) Get(cacheKey string) (*DocumentProcessingResponse, bool)
- func (cm *CacheManager) GetCacheFilePath(cacheKey string) string
- func (cm *CacheManager) GetStats() (*CacheStats, error)
- func (cm *CacheManager) PerformMaintenance(maxAge time.Duration) error
- func (cm *CacheManager) Set(cacheKey string, response *DocumentProcessingResponse) error
type CacheStats
type CachedResponse
type Config
- func DefaultConfig() *Config
- func LoadConfig() *Config
- func (c *Config) EnsureCacheDir() error
- func (c *Config) GetCertificateEnvironment() []string
- func (c *Config) GetScriptPath() string
- func (c *Config) GetSystemInfo() *SystemInfo
- func (c *Config) ResolveHardwareAcceleration() HardwareAcceleration
- func (c *Config) Validate() error
- func (c *Config) ValidateCertificates() error
type DiagramAnalysis
type DiagramElement
type DiagramLLMClient
- func NewDiagramLLMClient() (*DiagramLLMClient, error)
- func (c *DiagramLLMClient) AnalyseDiagram(diagram *ExtractedDiagram) (*DiagramAnalysis, error)
type DocumentMetadata
type DocumentProcessingRequest
type DocumentProcessingResponse
type DocumentProcessorTool
- func (t *DocumentProcessorTool) Definition() mcp.Tool
- func (t *DocumentProcessorTool) Execute(ctx context.Context, logger *logrus.Logger, cache *sync.Map, ...) (*mcp.CallToolResult, error)
type ErrorInfo
type ExtractedDiagram
type ExtractedImage
type ExtractedTable
type HardwareAcceleration
type LLMConfig
type OutputFormat
type ProcessingInfo
type ProcessingMode
type ProcessingProfile
type SystemInfo
type TableFormerMode
type TokenUsage
type VisionProcessingMode

Constants ¶

View Source

const (
	EnvOpenAIAPIBase  = "DOCLING_VLM_API_URL"     // e.g., "https://api.openai.com/v1"
	EnvOpenAIModel    = "DOCLING_VLM_MODEL"       // e.g., "gpt-4-vision-preview"
	EnvOpenAIAPIKey   = "DOCLING_VLM_API_KEY"     // API key for the provider (consistent with VLM naming)
	EnvLLMMaxTokens   = "DOCLING_LLM_MAX_TOKENS"  // Maximum tokens for LLM response (default: 16384)
	EnvLLMTemperature = "DOCLING_LLM_TEMPERATURE" // Temperature for LLM inference (default: 0.1)
	EnvLLMTimeout     = "DOCLING_LLM_TIMEOUT"     // Timeout for LLM requests in seconds (default: 240)

	// Prompt configuration environment variables
	EnvPromptBase         = "DOCLING_LLM_PROMPT_BASE"         // Base prompt for diagram analysis
	EnvPromptFlowchart    = "DOCLING_LLM_PROMPT_FLOWCHART"    // Flowchart-specific prompt
	EnvPromptArchitecture = "DOCLING_LLM_PROMPT_ARCHITECTURE" // Architecture diagram prompt
	EnvPromptChart        = "DOCLING_LLM_PROMPT_CHART"        // Chart analysis prompt
	EnvPromptGeneric      = "DOCLING_LLM_PROMPT_GENERIC"      // Generic diagram prompt
)

Environment variable constants for LLM integration

View Source

const (
	DefaultMaxTokens   = 16384
	DefaultTemperature = 0.1
	DefaultTimeout     = 240
)

Default LLM configuration values

View Source

const (
	// VLM Pipeline Configuration
	EnvVLMAPIURL        = "DOCLING_VLM_API_URL"        // User-provided API endpoint URL (e.g., "http://localhost:1234/v1")
	EnvVLMModel         = "DOCLING_VLM_MODEL"          // Model name/ID (e.g., "gpt-4-vision-preview", "SmolVLM-Instruct")
	EnvVLMAPIKey        = "DOCLING_VLM_API_KEY"        // Authentication key for external APIs
	EnvVLMTimeout       = "DOCLING_VLM_TIMEOUT"        // Request timeout in seconds (default: 240)
	EnvVLMFallbackLocal = "DOCLING_VLM_FALLBACK_LOCAL" // Enable local model fallback (default: true)

	// Image Processing Configuration
	EnvImageScale = "DOCLING_IMAGE_SCALE" // Image resolution scale factor (default: 3.0, range: 1.0-4.0)

	// Performance Optimisation Configuration
	EnvDisablePictureClassification = "DOCLING_DISABLE_PICTURE_CLASSIFICATION" // Disable picture classification to speed up processing (default: false)
	EnvDisablePictureDescription    = "DOCLING_DISABLE_PICTURE_DESCRIPTION"    // Disable picture description to speed up processing (default: false)
	EnvAcceleratorProcesses         = "DOCLING_ACCELERATOR_PROCESSES"          // Number of accelerator processes (default: CPU cores - 1)
)

Environment variable constants for VLM Pipeline integration and image processing

View Source

const (
	DefaultDiagramPrompt = `` /* 514-byte string literal not displayed */

)

Default prompts

Variables ¶

This section is empty.

Functions ¶

func CleanupEmbeddedScripts ¶

func CleanupEmbeddedScripts() error

CleanupEmbeddedScripts removes the temporary directory containing extracted scripts This should be called during graceful shutdown, but the OS will clean up temp files anyway

func GetEmbeddedScriptPath ¶

func GetEmbeddedScriptPath() (string, error)

GetEmbeddedScriptPath extracts the embedded Python scripts to a temporary directory and returns the path to the main docling_processor.py script. This is thread-safe and only extracts once per process.

func IsEmbeddedScriptsAvailable ¶

func IsEmbeddedScriptsAvailable() bool

IsEmbeddedScriptsAvailable checks if the embedded Python scripts are available

func IsLLMConfigured ¶

func IsLLMConfigured() bool

IsLLMConfigured checks if the required environment variables are set

func ReadEmbeddedFile ¶

func ReadEmbeddedFile(path string) ([]byte, error)

ReadEmbeddedFile reads an embedded file and returns its content

Types ¶

type BatchProcessingRequest ¶

type BatchProcessingRequest struct {
	Sources        []string       `json:"sources"`                   // Multiple document sources
	ProcessingMode ProcessingMode `json:"processing_mode,omitempty"` // Processing mode for all documents
	OutputFormat   OutputFormat   `json:"output_format,omitempty"`   // Output format for all documents
	EnableOCR      bool           `json:"enable_ocr,omitempty"`      // Enable OCR for all documents
	OCRLanguages   []string       `json:"ocr_languages,omitempty"`   // OCR languages for all documents
	PreserveImages bool           `json:"preserve_images,omitempty"` // Extract images from all documents
	CacheEnabled   *bool          `json:"cache_enabled,omitempty"`   // Cache setting for all documents
	Timeout        *int           `json:"timeout,omitempty"`         // Timeout for each document
	MaxConcurrency int            `json:"max_concurrency,omitempty"` // Maximum concurrent processing
}

BatchProcessingRequest represents a request to process multiple documents

type BatchProcessingResponse ¶

type BatchProcessingResponse struct {
	Results   []DocumentProcessingResponse `json:"results"`    // Individual processing results
	Summary   BatchSummary                 `json:"summary"`    // Batch processing summary
	TotalTime time.Duration                `json:"total_time"` // Total processing time
	Timestamp time.Time                    `json:"timestamp"`  // Batch processing timestamp
}

BatchProcessingResponse represents the response from batch processing

type BatchSummary ¶

type BatchSummary struct {
	TotalDocuments  int `json:"total_documents"`  // Total number of documents
	SuccessfulCount int `json:"successful_count"` // Number of successfully processed documents
	FailedCount     int `json:"failed_count"`     // Number of failed documents
	CacheHitCount   int `json:"cache_hit_count"`  // Number of cache hits
	TotalPages      int `json:"total_pages"`      // Total pages processed
	TotalWords      int `json:"total_words"`      // Total words processed
	TotalImages     int `json:"total_images"`     // Total images extracted
	TotalTables     int `json:"total_tables"`     // Total tables extracted
}

BatchSummary provides summary statistics for batch processing

type BoundingBox ¶

type BoundingBox struct {
	X      float64 `json:"x"`      // X coordinate (left)
	Y      float64 `json:"y"`      // Y coordinate (top)
	Width  float64 `json:"width"`  // Width
	Height float64 `json:"height"` // Height
}

BoundingBox represents the position and size of an element on a page

type CacheManager ¶

type CacheManager struct {
	// contains filtered or unexported fields
}

CacheManager handles caching of document processing results

func NewCacheManager ¶

func NewCacheManager(config *Config) *CacheManager

NewCacheManager creates a new cache manager

func (*CacheManager) CleanExpired ¶

func (cm *CacheManager) CleanExpired() error

CleanExpired removes expired cache entries

func (*CacheManager) CleanOldFiles ¶

func (cm *CacheManager) CleanOldFiles(maxAge time.Duration) error

CleanOldFiles removes cache files older than the specified duration This is useful for cleaning up files that may not have proper TTL metadata

func (*CacheManager) Clear ¶

func (cm *CacheManager) Clear() error

Clear removes all cached results

func (*CacheManager) ClearFileCache ¶

func (cm *CacheManager) ClearFileCache(source string) error

ClearFileCache removes all cache entries for a specific source file

func (*CacheManager) Delete ¶

func (cm *CacheManager) Delete(cacheKey string) error

Delete removes a cached result

func (*CacheManager) GenerateCacheKey ¶

func (cm *CacheManager) GenerateCacheKey(req *DocumentProcessingRequest) string

GenerateCacheKey generates a cache key for the given request

func (*CacheManager) Get ¶

func (cm *CacheManager) Get(cacheKey string) (*DocumentProcessingResponse, bool)

Get retrieves a cached result if it exists and is valid

func (*CacheManager) GetCacheFilePath ¶

func (cm *CacheManager) GetCacheFilePath(cacheKey string) string

GetCacheFilePath returns the file path for a cache key

func (*CacheManager) GetStats ¶

func (cm *CacheManager) GetStats() (*CacheStats, error)

GetStats returns cache statistics

func (*CacheManager) PerformMaintenance ¶

func (cm *CacheManager) PerformMaintenance(maxAge time.Duration) error

PerformMaintenance performs routine cache maintenance including: - Removing expired entries - Removing old files (older than maxAge)

func (*CacheManager) Set ¶

func (cm *CacheManager) Set(cacheKey string, response *DocumentProcessingResponse) error

Set stores a result in the cache

type CacheStats ¶

type CacheStats struct {
	Enabled      bool   `json:"enabled"`
	Directory    string `json:"directory"`
	TotalFiles   int    `json:"total_files"`
	TotalSize    int64  `json:"total_size"`    // Size in bytes
	ExpiredFiles int    `json:"expired_files"` // Number of expired files
}

CacheStats provides statistics about the cache

type CachedResponse ¶

type CachedResponse struct {
	Response  DocumentProcessingResponse `json:"response"`
	CacheKey  string                     `json:"cache_key"`
	Timestamp time.Time                  `json:"timestamp"`
	TTL       time.Duration              `json:"ttl"` // Time to live
}

CachedResponse represents a cached document processing response

type Config ¶

type Config struct {
	// Python Configuration
	PythonPath string // Path to Python executable with Docling installed

	// Cache Configuration
	CacheDir     string // Directory for caching processed documents
	CacheEnabled bool   // Enable/disable caching

	// Hardware Configuration
	HardwareAcceleration HardwareAcceleration // Hardware acceleration mode

	// Processing Configuration
	Timeout     int // Processing timeout in seconds
	MaxFileSize int // Maximum file size in MB

	// OCR Configuration
	OCRLanguages []string // Default OCR languages

	// Vision Model Configuration
	VisionModel string // Vision model to use

	// Certificate Configuration
	ExtraCACerts string // Path to additional CA certificates file or directory
}

Config holds the configuration for document processing

func DefaultConfig ¶

func DefaultConfig() *Config

DefaultConfig returns the default configuration

func LoadConfig ¶

func LoadConfig() *Config

LoadConfig loads configuration from environment variables

func (*Config) EnsureCacheDir ¶

func (c *Config) EnsureCacheDir() error

EnsureCacheDir creates the cache directory if it doesn't exist

func (*Config) GetCertificateEnvironment ¶

func (c *Config) GetCertificateEnvironment() []string

GetCertificateEnvironment returns environment variables for certificate configuration

func (*Config) GetScriptPath ¶

func (c *Config) GetScriptPath() string

GetScriptPath returns the path to the Python wrapper script

func (*Config) GetSystemInfo ¶

func (c *Config) GetSystemInfo() *SystemInfo

GetSystemInfo returns system information for diagnostics

func (*Config) ResolveHardwareAcceleration ¶

func (c *Config) ResolveHardwareAcceleration() HardwareAcceleration

ResolveHardwareAcceleration resolves the hardware acceleration setting

func (*Config) Validate ¶

func (c *Config) Validate() error

Validate validates the configuration

func (*Config) ValidateCertificates ¶

func (c *Config) ValidateCertificates() error

ValidateCertificates validates the certificate configuration

type DiagramAnalysis ¶

type DiagramAnalysis struct {
	Description    string                 `json:"description"`
	DiagramType    string                 `json:"diagram_type"`
	MermaidCode    string                 `json:"mermaid_code"`
	Elements       []DiagramElement       `json:"elements"`
	Confidence     float64                `json:"confidence"`
	Properties     map[string]interface{} `json:"properties"`
	ProcessingTime time.Duration          `json:"processing_time"`
	TokenUsage     *TokenUsage            `json:"token_usage,omitempty"` // Token usage from LLM provider (if available)
}

DiagramAnalysis represents the result of LLM-based diagram analysis

type DiagramElement ¶

type DiagramElement struct {
	Type        string       `json:"type"`                   // Element type (text, shape, connector, etc.)
	Content     string       `json:"content,omitempty"`      // Text content of the element
	Position    string       `json:"position,omitempty"`     // Position description within diagram
	BoundingBox *BoundingBox `json:"bounding_box,omitempty"` // Position within the diagram
}

DiagramElement represents a text or structural element within a diagram

type DiagramLLMClient ¶

type DiagramLLMClient struct {
	// contains filtered or unexported fields
}

DiagramLLMClient handles LLM-based diagram analysis using OpenAI API

func NewDiagramLLMClient ¶

func NewDiagramLLMClient() (*DiagramLLMClient, error)

NewDiagramLLMClient creates a new LLM client for diagram analysis using OpenAI API

func (*DiagramLLMClient) AnalyseDiagram ¶

func (c *DiagramLLMClient) AnalyseDiagram(diagram *ExtractedDiagram) (*DiagramAnalysis, error)

AnalyseDiagram performs LLM-based analysis of a diagram

type DocumentMetadata ¶

type DocumentMetadata struct {
	Title        string            `json:"title,omitempty"`         // Document title
	Author       string            `json:"author,omitempty"`        // Document author
	Subject      string            `json:"subject,omitempty"`       // Document subject
	Creator      string            `json:"creator,omitempty"`       // Document creator
	Producer     string            `json:"producer,omitempty"`      // Document producer
	CreationDate *time.Time        `json:"creation_date,omitempty"` // Creation date
	ModifiedDate *time.Time        `json:"modified_date,omitempty"` // Last modified date
	PageCount    int               `json:"page_count,omitempty"`    // Number of pages
	WordCount    int               `json:"word_count,omitempty"`    // Estimated word count
	Language     string            `json:"language,omitempty"`      // Detected language
	Format       string            `json:"format"`                  // Original document format
	FileSize     int64             `json:"file_size,omitempty"`     // File size in bytes
	Properties   map[string]string `json:"properties,omitempty"`    // Additional properties
}

DocumentMetadata contains metadata about the processed document

type DocumentProcessingRequest ¶

type DocumentProcessingRequest struct {
	Source                   string               `json:"source"`                                // File path, URL, or base64 content
	Profile                  ProcessingProfile    `json:"profile,omitempty"`                     // Processing profile (replaces multiple parameters)
	ProcessingMode           ProcessingMode       `json:"processing_mode,omitempty"`             // Processing mode (default: basic)
	OutputFormat             OutputFormat         `json:"output_format,omitempty"`               // Output format (default: markdown)
	EnableOCR                bool                 `json:"enable_ocr,omitempty"`                  // Enable OCR processing
	OCRLanguages             []string             `json:"ocr_languages,omitempty"`               // OCR language codes
	PreserveImages           bool                 `json:"preserve_images,omitempty"`             // Extract and preserve images
	Timeout                  *int                 `json:"timeout,omitempty"`                     // Processing timeout in seconds
	MaxFileSize              *int                 `json:"max_file_size,omitempty"`               // Maximum file size in MB
	ReturnInlineOnly         *bool                `json:"return_inline_only,omitempty"`          // Return content inline in the response only. When false (default), the tool will save the processed content to a file in the same directory as the source file, and also return the content inline.
	SaveTo                   string               `json:"save_to,omitempty"`                     // File path to save content when return_inline_only=false
	ClearFileCache           bool                 `json:"clear_file_cache,omitempty"`            // Force clear all cache entries for this source file before processing
	TableFormerMode          TableFormerMode      `json:"table_former_mode,omitempty"`           // TableFormer processing mode for table structure recognition
	CellMatching             *bool                `json:"cell_matching,omitempty"`               // Control table cell matching (true: use PDF cells, false: use predicted cells)
	VisionMode               VisionProcessingMode `json:"vision_mode,omitempty"`                 // Vision processing mode for enhanced document understanding
	DiagramDescription       bool                 `json:"diagram_description,omitempty"`         // Enable diagram and chart description using vision models
	ChartDataExtraction      bool                 `json:"chart_data_extraction,omitempty"`       // Enable data extraction from charts and graphs
	EnableRemoteServices     bool                 `json:"enable_remote_services,omitempty"`      // Allow communication with external vision model services
	ConvertDiagramsToMermaid bool                 `json:"convert_diagrams_to_mermaid,omitempty"` // Convert detected diagrams to Mermaid syntax using AI vision models
	GenerateDiagrams         bool                 `json:"generate_diagrams,omitempty"`           // Generate enhanced diagram analysis using external LLM (requires DOCLING_VLM_API_URL, DOCLING_VLM_MODEL, DOCLING_VLM_API_KEY environment variables)
	ExtractImages            bool                 `json:"extract_images,omitempty"`              // Extract individual images, charts, and diagrams as base64-encoded data with AI recreation prompts
	Debug                    bool                 `json:"debug,omitempty"`                       // Return debug information including environment variables (secrets masked)
}

DocumentProcessingRequest represents the input parameters for document processing

type DocumentProcessingResponse ¶

type DocumentProcessingResponse struct {
	Source         string             `json:"source"`             // Original source
	Content        string             `json:"content"`            // Processed content (markdown)
	Metadata       *DocumentMetadata  `json:"metadata,omitempty"` // Document metadata
	Images         []ExtractedImage   `json:"images,omitempty"`   // Extracted images
	Tables         []ExtractedTable   `json:"tables,omitempty"`   // Extracted tables
	Diagrams       []ExtractedDiagram `json:"diagrams,omitempty"` // Extracted diagrams
	ProcessingInfo ProcessingInfo     `json:"processing_info"`    // Processing information
	CacheHit       bool               `json:"cache_hit"`          // Whether result came from cache
	Error          string             `json:"error,omitempty"`    // Error message if processing failed
}

DocumentProcessingResponse represents the output from document processing

type DocumentProcessorTool ¶

type DocumentProcessorTool struct {
	// contains filtered or unexported fields
}

DocumentProcessorTool implements document processing using Docling via Python subprocess

func (*DocumentProcessorTool) Definition ¶

func (t *DocumentProcessorTool) Definition() mcp.Tool

Definition returns the MCP tool definition

func (*DocumentProcessorTool) Execute ¶

func (t *DocumentProcessorTool) Execute(ctx context.Context, logger *logrus.Logger, cache *sync.Map, args map[string]interface{}) (*mcp.CallToolResult, error)

Execute processes the document using the Python wrapper

type ErrorInfo ¶

type ErrorInfo struct {
	Code        string            `json:"code"`              // Error code
	Message     string            `json:"message"`           // Error message
	Details     string            `json:"details,omitempty"` // Additional error details
	Source      string            `json:"source,omitempty"`  // Source that caused the error
	Timestamp   time.Time         `json:"timestamp"`         // When the error occurred
	Context     map[string]string `json:"context,omitempty"` // Additional context
	Recoverable bool              `json:"recoverable"`       // Whether the error is recoverable
}

ErrorInfo represents detailed error information

type ExtractedDiagram ¶

type ExtractedDiagram struct {
	ID          string                 `json:"id"`                     // Unique diagram identifier
	Type        string                 `json:"type"`                   // Type of diagram (flowchart, chart, diagram, etc.)
	Caption     string                 `json:"caption,omitempty"`      // Diagram caption if available
	Description string                 `json:"description,omitempty"`  // Generated description of the diagram
	DiagramType string                 `json:"diagram_type,omitempty"` // Classified diagram type (flowchart, chart, etc.)
	MermaidCode string                 `json:"mermaid_code,omitempty"` // Generated Mermaid syntax for the diagram
	Base64Data  string                 `json:"base64_data,omitempty"`  // Base64-encoded image data for LLM vision processing
	Elements    []DiagramElement       `json:"elements,omitempty"`     // Text elements within the diagram
	PageNumber  int                    `json:"page_number,omitempty"`  // Page number where diagram appears
	BoundingBox *BoundingBox           `json:"bounding_box,omitempty"` // Position on page
	Confidence  float64                `json:"confidence,omitempty"`   // Confidence score for diagram analysis
	Properties  map[string]interface{} `json:"properties,omitempty"`   // Additional diagram-specific properties
}

ExtractedDiagram represents a diagram extracted from the document

type ExtractedImage ¶

type ExtractedImage struct {
	ID            string       `json:"id"`                       // Unique image identifier
	Type          string       `json:"type"`                     // Type of image (picture, table, chart, diagram)
	Caption       string       `json:"caption,omitempty"`        // Image caption if available
	AltText       string       `json:"alt_text,omitempty"`       // Alternative text
	Format        string       `json:"format"`                   // Image format (PNG, JPEG, etc.)
	Width         int          `json:"width,omitempty"`          // Image width in pixels
	Height        int          `json:"height,omitempty"`         // Image height in pixels
	Size          int64        `json:"size,omitempty"`           // Image size in bytes
	FilePath      string       `json:"file_path,omitempty"`      // Path to saved image file
	PageNumber    int          `json:"page_number,omitempty"`    // Page number where image appears
	BoundingBox   *BoundingBox `json:"bounding_box,omitempty"`   // Position on page
	ExtractedText []string     `json:"extracted_text,omitempty"` // Text elements extracted from the image
}

ExtractedImage represents an image extracted from the document

type ExtractedTable ¶

type ExtractedTable struct {
	ID          string       `json:"id"`                     // Unique table identifier
	Caption     string       `json:"caption,omitempty"`      // Table caption if available
	Headers     []string     `json:"headers,omitempty"`      // Column headers
	Rows        [][]string   `json:"rows"`                   // Table data rows
	PageNumber  int          `json:"page_number,omitempty"`  // Page number where table appears
	BoundingBox *BoundingBox `json:"bounding_box,omitempty"` // Position on page
	Markdown    string       `json:"markdown,omitempty"`     // Markdown representation
	CSV         string       `json:"csv,omitempty"`          // CSV representation
}

ExtractedTable represents a table extracted from the document

type HardwareAcceleration ¶

type HardwareAcceleration string

HardwareAcceleration defines the hardware acceleration mode

const (
	HardwareAccelerationAuto HardwareAcceleration = "auto" // Auto-detect best option
	HardwareAccelerationMPS  HardwareAcceleration = "mps"  // Metal Performance Shaders (macOS)
	HardwareAccelerationCUDA HardwareAcceleration = "cuda" // CUDA (NVIDIA GPUs)
	HardwareAccelerationCPU  HardwareAcceleration = "cpu"  // CPU-only processing
)

type LLMConfig ¶

type LLMConfig struct {
	Provider string
	Model    string
	APIKey   string
	BaseURL  string
}

LLMConfig contains configuration for the LLM client

type OutputFormat ¶

type OutputFormat string

OutputFormat defines the output format for processed documents

const (
	OutputFormatMarkdown OutputFormat = "markdown" // Markdown output (default)
	OutputFormatJSON     OutputFormat = "json"     // JSON metadata
	OutputFormatBoth     OutputFormat = "both"     // Both markdown and JSON
)

type ProcessingInfo ¶

type ProcessingInfo struct {
	ProcessingMode       ProcessingMode       `json:"processing_mode"`           // Mode used for processing
	ProcessingMethod     string               `json:"processing_method"`         // Concise description of processing method used
	HardwareAcceleration HardwareAcceleration `json:"hardware_acceleration"`     // Hardware acceleration used
	VisionModel          string               `json:"vision_model,omitempty"`    // Vision model used (if any)
	OCREnabled           bool                 `json:"ocr_enabled"`               // Whether OCR was enabled
	OCRLanguages         []string             `json:"ocr_languages,omitempty"`   // OCR languages used
	ProcessingTime       float64              `json:"processing_time"`           // Time taken to process in seconds
	PythonVersion        string               `json:"python_version,omitempty"`  // Python version used
	DoclingVersion       string               `json:"docling_version,omitempty"` // Docling version used
	CacheKey             string               `json:"cache_key,omitempty"`       // Cache key used
	Timestamp            time.Time            `json:"timestamp"`                 // Processing timestamp
	TokenUsage           *TokenUsage          `json:"token_usage,omitempty"`     // Token usage from external LLM (if available)
}

ProcessingInfo contains information about the processing operation

type ProcessingMode ¶

type ProcessingMode string

ProcessingMode defines the type of document processing to perform

const (
	ProcessingModeBasic    ProcessingMode = "basic"    // Fast, code-only processing
	ProcessingModeAdvanced ProcessingMode = "advanced" // Vision model with layout preservation
	ProcessingModeOCR      ProcessingMode = "ocr"      // OCR for scanned documents
	ProcessingModeTables   ProcessingMode = "tables"   // Table extraction focus
	ProcessingModeImages   ProcessingMode = "images"   // Image extraction focus
)

type ProcessingProfile ¶

type ProcessingProfile string

ProcessingProfile defines preset configurations for common document processing scenarios

const (
	ProfileBasic          ProcessingProfile = "basic"           // Text extraction only (fast processing)
	ProfileTextAndImage   ProcessingProfile = "text-and-image"  // Text and image extraction with tables
	ProfileScanned        ProcessingProfile = "scanned"         // OCR-focused processing for scanned documents
	ProfileLLMSmolDocling ProcessingProfile = "llm-smoldocling" // Text and image extraction enhanced with SmolDocling vision model
	ProfileLLMExternal    ProcessingProfile = "llm-external"    // Text and image extraction enhanced with external vision LLM for diagram conversion to Mermaid
)

type SystemInfo ¶

type SystemInfo struct {
	Platform             string                 `json:"platform"`                        // Operating system
	Architecture         string                 `json:"architecture"`                    // CPU architecture
	PythonPath           string                 `json:"python_path,omitempty"`           // Path to Python executable
	PythonVersion        string                 `json:"python_version,omitempty"`        // Python version
	DoclingVersion       string                 `json:"docling_version,omitempty"`       // Docling version
	DoclingAvailable     bool                   `json:"docling_available"`               // Whether Docling is available
	HardwareAcceleration []HardwareAcceleration `json:"hardware_acceleration_available"` // Available acceleration options
	CacheDirectory       string                 `json:"cache_directory,omitempty"`       // Cache directory path
	CacheEnabled         bool                   `json:"cache_enabled"`                   // Whether caching is enabled
	MaxFileSize          int                    `json:"max_file_size"`                   // Maximum file size in MB
	DefaultTimeout       int                    `json:"default_timeout"`                 // Default timeout in seconds
}

SystemInfo represents system information for diagnostics

type TableFormerMode ¶

type TableFormerMode string

TableFormerMode defines the TableFormer processing mode for table structure recognition

const (
	TableFormerModeFast     TableFormerMode = "fast"     // Faster but less accurate table processing
	TableFormerModeAccurate TableFormerMode = "accurate" // More accurate but slower table processing (default)
)

type TokenUsage ¶

type TokenUsage struct {
	PromptTokens     int `json:"prompt_tokens,omitempty"`     // Tokens used in prompts
	CompletionTokens int `json:"completion_tokens,omitempty"` // Tokens used in completions
	TotalTokens      int `json:"total_tokens,omitempty"`      // Total tokens used
}

TokenUsage represents token consumption from external LLM providers

type VisionProcessingMode ¶

type VisionProcessingMode string

VisionProcessingMode defines the vision model processing mode for enhanced document understanding

const (
	VisionModeStandard    VisionProcessingMode = "standard"    // Standard vision processing
	VisionModeSmolDocling VisionProcessingMode = "smoldocling" // Compact vision-language model (256M parameters)
	VisionModeAdvanced    VisionProcessingMode = "advanced"    // Advanced vision processing with remote services
)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Document Processing Tool

Features

Installation

Prerequisites

Configuration

Usage

Simple Usage (Recommended)

Processing Profiles

basic - Fast Text Extraction

text-and-image - Balanced Processing (Default)

scanned - OCR Processing

llm-smoldocling - Vision Enhancement

llm-external - Advanced Diagram Processing

Output Control

Save to File (Default)

Custom Save Location

Return Content Inline

OCR (Optical Character Recognition)

OCR Disabled (Default for most profiles)

OCR Enabled (Default for scanned profile)

OCR Language Support

Diagram Analysis and Mermaid Generation

Basic Diagram Analysis (llm-smoldocling profile)

Advanced Mermaid Generation (llm-external profile)

Supported LLM Providers

Diagram Analysis Features

Response Format

File Save Response (Default)

Inline Content Response

Error Handling

Architecture

Components

File Structure

Processing Profiles Implementation

Performance

Caching

Hardware Acceleration

Profile Performance Comparison

Use With Custom MITM Certs

Supported Certificate Formats

Troubleshooting

Common Issues

Debug Mode

Potential Future Enhancements

Document Structure Enhancement

Processing Pipeline Options

Advanced Output Formats

Diagram/Chart Processing (External Integration)

Performance and Scalability

Quality and Accuracy Improvements

Smart Defaults and Auto-Detection

License

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func CleanupEmbeddedScripts ¶

func GetEmbeddedScriptPath ¶

func IsEmbeddedScriptsAvailable ¶

func IsLLMConfigured ¶

func ReadEmbeddedFile ¶

Types ¶

type BatchProcessingRequest ¶

type BatchProcessingResponse ¶

type BatchSummary ¶

type BoundingBox ¶

type CacheManager ¶

func NewCacheManager ¶

func (*CacheManager) CleanExpired ¶

func (*CacheManager) CleanOldFiles ¶

func (*CacheManager) Clear ¶

func (*CacheManager) ClearFileCache ¶

func (*CacheManager) Delete ¶

func (*CacheManager) GenerateCacheKey ¶

func (*CacheManager) Get ¶

func (*CacheManager) GetCacheFilePath ¶

func (*CacheManager) GetStats ¶

func (*CacheManager) PerformMaintenance ¶

`basic` - Fast Text Extraction

`text-and-image` - Balanced Processing (Default)

`scanned` - OCR Processing

`llm-smoldocling` - Vision Enhancement

`llm-external` - Advanced Diagram Processing

OCR Enabled (Default for `scanned` profile)

Basic Diagram Analysis (`llm-smoldocling` profile)

Advanced Mermaid Generation (`llm-external` profile)