extract

command

v0.0.1 Latest Latest Go to latest Published: Nov 21, 2025 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/wudi/pdfkit

Links

Open Source Insights

README ¶

Extract Command

cmd/extract is a utility that exercises the github.com/wudi/pdfkit/extractor helpers to pull common artifacts out of PDFs without writing custom code. It runs on top of the existing parser/decoder pipeline, so it benefits from any filter/security updates automatically.

Usage

go run ./cmd/extract [flags] <pdf>

Flags:

-text — emit page text extracted from Tj/TJ operators.
-images — decode image XObjects and write them under -out/images.
-annotations — list per-page annotations, URIs, and colors.
-metadata — dump Info dictionary fields, tagging flags, and XMP bytes (base64 encoded within JSON).
-bookmarks — report the outline tree.
-toc — flatten the table of contents (outline + page labels).
-fonts — list unique fonts plus the pages they appear on.
-attachments — export embedded files into -out/attachments.
-out — destination directory for binary artifacts (defaults to extract_output).
-password — password to open encrypted PDFs.

If no feature flags are provided, the tool runs all extractors.

Examples

Extract everything from testdata/basic.pdf:

go run ./cmd/extract testdata/basic.pdf

Extract only annotations and attachments into a custom directory:

go run ./cmd/extract -annotations -attachments -out tmp/out testdata/basic.pdf

The command prints JSON summaries for each feature and writes binary payloads (images/attachments) to disk for further inspection.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL