rasterization

package
v1.3.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 21, 2026 License: Apache-2.0 Imports: 20 Imported by: 0

Documentation

Overview

Package rasterization converts document pages to images. It provides a generic Rasterizer interface and a Registry that dispatches by file extension, supporting PDFs now and extensible to spreadsheets, presentations, and other document types.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

Cache renders document pages on demand and caches the resulting JPEGs on the filesystem. The cache is ephemeral — lost on container restart, recomputed when accessed again.

func NewCache

func NewCache(registry *Registry, cacheDir string) (*Cache, error)

NewCache creates a Cache backed by the given Registry. The cacheDir is created if it does not exist.

func (*Cache) Image

func (c *Cache) Image(path string, page int) ([]byte, error)

Image returns JPEG bytes for the given page of the document at path. Results are cached on disk; a cache miss triggers rendering.

func (*Cache) Supports

func (c *Cache) Supports(ext string) bool

Supports returns true if the registry can rasterize files with the given extension.

type OfficeImageExtractor added in v1.3.1

type OfficeImageExtractor struct{}

OfficeImageExtractor extracts embedded images from Office Open XML documents (docx, pptx, xlsx). These formats are ZIP archives with media files at predictable paths.

func NewOfficeImageExtractor added in v1.3.1

func NewOfficeImageExtractor() *OfficeImageExtractor

NewOfficeImageExtractor creates an OfficeImageExtractor.

func (*OfficeImageExtractor) Close added in v1.3.1

func (o *OfficeImageExtractor) Close() error

Close is a no-op — OfficeImageExtractor holds no persistent state.

func (*OfficeImageExtractor) PageCount added in v1.3.1

func (o *OfficeImageExtractor) PageCount(path string) (int, error)

PageCount returns the number of extractable images in the document.

func (*OfficeImageExtractor) Render added in v1.3.1

func (o *OfficeImageExtractor) Render(path string, page int) (image.Image, error)

Render returns the Nth (1-based) embedded image as an image.Image.

type PdfiumRasterizer

type PdfiumRasterizer struct {
	// contains filtered or unexported fields
}

PdfiumRasterizer renders PDF pages using the PDFium library via WebAssembly (Wazero). No CGO or system libraries are required — the PDFium WASM binary is embedded in the go-pdfium module.

The underlying pdfium.Pdfium instance wraps a single WASM module whose internal state is not safe for concurrent use: parallel calls corrupt the module's memory and function tables, after which every subsequent call fails until the process restarts. mu serialises all calls into the instance.

func NewPdfiumRasterizer

func NewPdfiumRasterizer() (*PdfiumRasterizer, error)

NewPdfiumRasterizer initialises PDFium via the Wazero WebAssembly runtime and returns a Rasterizer for PDF files.

func (*PdfiumRasterizer) Close

func (r *PdfiumRasterizer) Close() error

Close releases all PDFium resources.

func (*PdfiumRasterizer) PageCount

func (r *PdfiumRasterizer) PageCount(path string) (int, error)

PageCount returns the number of pages in the PDF at the given path.

func (*PdfiumRasterizer) Render

func (r *PdfiumRasterizer) Render(path string, page int) (image.Image, error)

Render returns the given 1-based page of the PDF as an image.

type Rasterizer

type Rasterizer interface {
	io.Closer

	// PageCount returns the number of renderable pages in the document.
	PageCount(path string) (int, error)

	// Render returns the given 1-based page as an image.
	Render(path string, page int) (image.Image, error)
}

Rasterizer converts document pages to images. For PDFs this means pages; for spreadsheets, sheets; for presentations, slides.

type Registry

type Registry struct {
	// contains filtered or unexported fields
}

Registry maps file extensions to Rasterizer implementations.

func NewRegistry

func NewRegistry() *Registry

NewRegistry creates an empty Registry.

func (*Registry) Close

func (r *Registry) Close() error

Close closes all registered rasterizers.

func (*Registry) For

func (r *Registry) For(ext string) (Rasterizer, bool)

For returns the Rasterizer for the given extension, or nil and false if none is registered.

func (*Registry) Register

func (r *Registry) Register(ext string, rasterizer Rasterizer)

Register associates a file extension (e.g. ".pdf") with a Rasterizer.

func (*Registry) Supports

func (r *Registry) Supports(ext string) bool

Supports returns true if a Rasterizer is registered for the given extension.

type StandaloneImage added in v1.3.1

type StandaloneImage struct{}

StandaloneImage is a Rasterizer for plain image files. Each file is treated as a single "page".

func NewStandaloneImage added in v1.3.1

func NewStandaloneImage() *StandaloneImage

NewStandaloneImage creates a StandaloneImage rasterizer.

func (*StandaloneImage) Close added in v1.3.1

func (s *StandaloneImage) Close() error

Close is a no-op.

func (*StandaloneImage) PageCount added in v1.3.1

func (s *StandaloneImage) PageCount(_ string) (int, error)

PageCount always returns 1 — a standalone image is a single page.

func (*StandaloneImage) Render added in v1.3.1

func (s *StandaloneImage) Render(path string, page int) (image.Image, error)

Render decodes the image file and returns it.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL