pdf_chat

command

v1.0.0 Latest Latest Go to latest Published: Nov 11, 2025 License: MIT Imports: 4 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/liuzl/ai

Links

Open Source Insights

README ¶

PDF Document Analysis Example

This example demonstrates how to analyze PDF documents with AI models. PDF support is available with Gemini and Anthropic models.

Supported Providers

✅ Gemini - Native PDF support
✅ Anthropic - PDF support (Claude 3.5 Sonnet and later)
❌ OpenAI - Not natively supported (requires text extraction first)

Prerequisites

For Gemini

export AI_PROVIDER=gemini
export GEMINI_API_KEY="your-gemini-api-key"
export GEMINI_MODEL="gemini-2.0-flash-exp"  # or gemini-1.5-pro, gemini-1.5-flash

For Anthropic

export AI_PROVIDER=anthropic
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export ANTHROPIC_MODEL="claude-3-5-sonnet-20241022"

Running the Example

go run main.go

Use Cases

Document Summarization: Quickly summarize long PDF documents
Question Answering: Ask specific questions about PDF content
Information Extraction: Extract specific data, tables, figures
Research Analysis: Analyze research papers, find key findings
Contract Review: Review legal documents and contracts
Report Analysis: Analyze business reports, financial statements
Educational: Answer questions about textbooks, study materials
Technical Documentation: Understand technical manuals and specifications
Multi-Document Analysis: Compare multiple PDF documents
Citation Extraction: Extract references and citations

Example Output

=== PDF Document Analysis Example ===

Analyzing PDF document...
PDF URL: https://arxiv.org/pdf/1706.03762.pdf
(This may take a moment as the PDF is being processed...)

AI Response:
1. **Title and Authors**:
   - Title: "Attention Is All You Need"
   - Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
     Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin (Google Brain and Google Research)

2. **Main Contribution Summary**:
   This paper introduces the Transformer, a novel neural network architecture for
   sequence transduction tasks that relies entirely on attention mechanisms, dispensing
   with recurrence and convolutions. The model achieves state-of-the-art results on
   machine translation tasks while being more parallelizable and requiring significantly
   less time to train.

3. **Key Innovation**:
   The key innovation is the self-attention mechanism that allows the model to process
   all positions in a sequence simultaneously, unlike RNNs which process sequentially.
   This parallel processing enables:
   - Faster training
   - Better handling of long-range dependencies
   - More efficient use of computational resources

=== Question Answering ===
Detailed Explanation:

1. **What is the Transformer architecture?**
   The Transformer is an encoder-decoder architecture that uses stacked self-attention
   and point-wise fully connected layers. It processes input sequences in parallel
   rather than sequentially, using multi-head attention mechanisms to model dependencies.

2. **Advantages over RNNs**:
   - **Parallelization**: Can process entire sequences simultaneously
   - **Long-range dependencies**: Direct connections between all positions
   - **Training speed**: Much faster to train
   - **Computational efficiency**: Better use of modern hardware (GPUs/TPUs)
   - **No vanishing gradients**: Shorter paths for gradient flow

3. **Key Components**:
   - **Multi-Head Attention**: Allows model to attend to different positions
   - **Positional Encoding**: Injects sequence order information
   - **Feed-Forward Networks**: Applied to each position independently
   - **Layer Normalization**: Stabilizes training
   - **Residual Connections**: Helps with gradient flow

=== Information Extraction ===
Extracted Information:

1. **BLEU Scores**:
   - English-to-German: 28.4 BLEU (new state-of-the-art)
   - English-to-French: 41.0 BLEU

2. **Training Details**:
   - Dataset: WMT 2014 (4.5M sentence pairs for EN-DE, 36M for EN-FR)
   - Training time: 3.5 days on 8 P100 GPUs for the base model
   - 12 hours for the big model on 8 P100 GPUs

3. **Model Parameters**:
   - Base model: ~65 million parameters
   - Big model: ~213 million parameters

Code Example with Base64

For local PDF files:

package main

import (
	"context"
	"encoding/base64"
	"fmt"
	"log"
	"os"

	"github.com/liuzl/ai"
)

func main() {
	// Read local PDF file
	pdfData, err := os.ReadFile("path/to/document.pdf")
	if err != nil {
		log.Fatal(err)
	}

	// Convert to base64
	base64PDF := base64.StdEncoding.EncodeToString(pdfData)

	// Create client
	client, err := ai.NewClientFromEnv()
	if err != nil {
		log.Fatal(err)
	}

	// Use base64 PDF
	req := &ai.Request{
		Messages: []ai.Message{
			ai.NewMultimodalMessage(ai.RoleUser, []ai.ContentPart{
				ai.NewTextPart("Summarize this document"),
				ai.NewPDFPartFromBase64(base64PDF),
			}),
		},
	}

	resp, err := client.Generate(context.Background(), req)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(resp.Text)
}

Multi-Document Analysis

Compare multiple PDFs:

req := &ai.Request{
	Messages: []ai.Message{
		ai.NewMultimodalMessage(ai.RoleUser, []ai.ContentPart{
			ai.NewTextPart("Compare these two research papers and highlight the key differences in their approaches:"),
			ai.NewPDFPartFromURL("paper1.pdf"),
			ai.NewPDFPartFromURL("paper2.pdf"),
		}),
	},
}

Conversational Document Q&A

Have a conversation about a document:

req := &ai.Request{
	Messages: []ai.Message{
		// First question
		ai.NewMultimodalMessage(ai.RoleUser, []ai.ContentPart{
			ai.NewTextPart("What is the main topic of this document?"),
			ai.NewPDFPartFromURL("document.pdf"),
		}),

		// AI response (from previous call)
		ai.NewTextMessage(ai.RoleAssistant, previousResponse),

		// Follow-up question (document doesn't need to be sent again)
		ai.NewTextMessage(ai.RoleUser, "Can you elaborate on section 3?"),
	},
}

Notes

The example uses a famous AI research paper (Attention Is All You Need)
PDF files are automatically downloaded and converted for the API
Large PDFs may take longer to process and consume more tokens
Some complex PDFs with unusual formatting may have parsing issues
Consider the token limits of your chosen model when processing long documents

Limitations

Maximum PDF size varies by provider (check provider documentation)
Scanned PDFs (images) may require OCR capabilities
Complex layouts (multi-column, tables) may affect accuracy
Some special characters or fonts may not be processed correctly

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL