image-extraction

command

v0.6.0 Latest Latest Go to latest Published: Feb 24, 2026 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/coregx/gxpdf

Links

Open Source Insights

README ¶

Image Extraction Example

This example demonstrates how to extract images from PDF documents using GxPDF.

Features Demonstrated

Extract all images from a document
- Get all images from all pages
- Save to disk automatically
Extract images from specific page
- Target specific pages
- Control output location
Process image metadata
- Get image dimensions
- Check color space and encoding
- Access raw image data

Usage

# Extract all images from a PDF
go run main.go document.pdf

# Extract to specific directory
go run main.go document.pdf ./extracted_images

# Help
go run main.go

API Reference

Document-level Extraction

doc, _ := gxpdf.Open("document.pdf")
defer doc.Close()

// Get all images from all pages
images := doc.GetImages()

// With error handling
images, err := doc.GetImagesWithError()

Page-level Extraction

page := doc.Page(0) // First page

// Get all images from this page
images := page.GetImages()

// With error handling
images, err := page.GetImagesWithError()

Image Operations

for _, img := range images {
    // Get metadata
    fmt.Printf("%dx%d, %s\n", img.Width(), img.Height(), img.ColorSpace())

    // Save to file (format determined by extension)
    img.SaveToFile("output.jpg")  // JPEG
    img.SaveToFile("output.png")  // PNG

    // Convert to Go image for processing
    goImg, _ := img.ToGoImage()
}

Supported Image Formats

GxPDF can extract images with the following encodings:

DCTDecode (JPEG) - Direct extraction, no re-encoding
FlateDecode (zlib) - Decompressed and converted
Uncompressed - Direct extraction

Color Spaces

DeviceRGB - RGB color images
DeviceGray - Grayscale images
DeviceCMYK - CMYK images (conversion to RGB coming soon)
Indexed - Palette-based images (expansion coming soon)

Output Formats

Images can be saved as:

JPEG (.jpg, .jpeg) - Best for photos and DCTDecode images
PNG (.png) - Best for lossless images and transparency

For DCTDecode images saved as JPEG, the original compressed data is used directly, preserving quality without re-encoding.

Example Output

=== Example 1: Extract All Images from Document ===
Found 5 images in document
  Saved: image_0.jpg (800x600, DeviceRGB, /DCTDecode)
  Saved: image_1.jpg (1024x768, DeviceRGB, /DCTDecode)
  Saved: image_2.png (200x200, DeviceGray, /FlateDecode)
  Saved: image_3.jpg (640x480, DeviceRGB, /DCTDecode)
  Saved: image_4.png (100x100, DeviceRGB, /FlateDecode)

=== Example 2: Extract Images from Specific Page ===
Found 2 images on page 1
  Saved: page1_image_0.jpg (800x600)
  Saved: page1_image_1.jpg (1024x768)

=== Example 3: Process Images (Metadata Only) ===
Page 1:
  Image 0:
    Name: /Im1
    Dimensions: 800x600 pixels
    Color Space: DeviceRGB
    Bits per Component: 8
    Filter: /DCTDecode
    Go Image Bounds: (0,0)-(800,600)

Notes

Images are extracted in the order they appear in the PDF's resource dictionary
Inline images are not yet supported (coming soon)
CMYK to RGB conversion is not yet implemented
Indexed color space expansion is not yet implemented

Documentation ¶

Overview ¶

Package main demonstrates how to extract images from PDF documents.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL