clip

package module
v0.0.0-...-dc79f52 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2020 License: MIT Imports: 14 Imported by: 0

README

Clip

Tool for clipping content of web pages to PDF files.

Overfiew

Clip can be used as CLI, REST service, AWS Lambda function or Go library.

Prerequisites

  • working Go installation (version 1.15 or higher).
  • wkhtmltopdf need to be in your PATH environment variable. You can also set WKHTMLTOPDF_PATH to target wkhtmlotpdf directory, or just place executable in clip's directory (see go-wkhtmltopdf reference)

Installation

CLI:

go install github.com/dinalt/clip/cmd/clip

REST service:

go install github.com/dinalt/clip/cmd/clip-serve

Usage examples

CLI

Use clip -h to get list of available format arguments.

Clip article from site habr.com using presets:

clip -presets-path ./presets.json -p auto,margins:a4 https://habr.com/en/post/510746/ habr.pdf

Clip index article from site restfulapi.net:

 clip -query .content -remove .comment-respond -custom-styles .content{width:auto} https://restfulapi.net/ rest.pdf

REST service

Use clip-serve -h to get REST service launch arguments list.

Launch service:

clip-serve -a :8080 -p ./presets.json -w 5

Clip article from site habr.com using presets:

curl http://localhost:8080/v0/clip\?presets\=auto,margins:a4\&url\=https://habr.com/ru/post/263897/ --output habr.pdf

POST queries are also allowed via form params or json object (with Content-Type: application/json provided).

Presets

Presets are useful shortcuts for common used parameters sets. Definition samples can be found in file presets.json in root of this repository.

To use custom preset (or list of presets), provide -p param (for CLI) or presets parameter (for REST query), with comma-separated preset names: presets=habr:post,margins:a4.

Presets are applied in the specified order and do not overwrite settings already set. Settings qualified in query or CLI params have higher priority.

Auto

auto is a special preset, which tells clip (CLI or REST service or Lambda function) to infer preset from site's url, using url_regexp field of preset JSON object (see example in presets.json)

Build for AWS Lambda

git clone https://github.com/dinAlt/clip
cd clip
CGO_ENABLED=0 go build -o clip-lambda cmd/clip-lambda/main.go

More info

  • All sizes (margins, page width and height) are integer values (in millimeters).
  • query, remove, break_before, no_break_inside and no_break_after values should be a valid css selectors.
  • Javascript is disabled for page rendering by default, but you can enable it via setting enable_javascript param value to true.
  • Use custom_styles parameter to adjust result PDF document view.
  • query and remove parameters doesn't work for progressive web apps (PWA), because they are modify DOM before javascript executed. Try to use custom_styles, if this is your case.

Supported OS

  • Tested on Linux.
  • On MacOS also should work fine (but not tested).
  • On Windows may work with wkhtmltopdf added to PATH. In other cases shouldn't work due hardcoded executable name (without .exe extension).

Contribution

Pull requests and issues (especially for adding new presets) are welcome!

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	PrintArgs           bool   // print wkhtmltopdf args
	SaveProcessedHTMLTo string // save processed HTML to directory (used if not empty)
)
View Source
var (
	ErrBadStatus       = errors.New("bad status")
	ErrNoQueryResult   = errors.New("no result")
	ErrNoURL           = errors.New("url is required")
	ErrBadURLScheme    = errors.New("bad URL scheme")
	ErrInvalidPageSize = errors.New("invalid page size")
)

Package errors.

Functions

func ToPDF

func ToPDF(url string, w io.Writer, p *Params) error

func ToPDFCtx

func ToPDFCtx(ctx context.Context, url string, w io.Writer, p *Params) error

ToPDFCtx downloads page from url, converts it to PDF via wkhtmltopdf and writes result to w.

Types

type IgnoredError

type IgnoredError struct {
	// contains filtered or unexported fields
}

IgnoredError returned just for logging.

func (*IgnoredError) Error

func (e *IgnoredError) Error() string

Error is error interface implementation.

type Params

type Params struct {
	Query             *string `json:"query,omitempty" desc:"elements to include in result document"`                                  // css selector to be included in resulted PDF document
	Remove            *string `json:"remove,omitempty" desc:"elements to remove from result document"`                                // css selector of elements to be removed
	NoBreakBefore     *string `json:"no_break_before,omitempty" desc:"elements to disable break page before"`                         // css selector for elements to set break-before:avoid-page
	NoBreakInside     *string `json:"no_break_inside,omitempty" desc:"elements to disable break page inside"`                         // css selector for elements to set break-inside:avoid-page
	NoBreakAfter      *string `json:"no_break_after,omitempty" desc:"elements to disable break page after"`                           // css selector for elements to set break-after:avoid-page
	CustomStyles      *string `json:"custom_styles,omitempty" desc:"custom css stylesheet (will be included in <head>)"`              // custom css styles to be injected into doc
	WithContainers    *bool   `json:"with_containers,omitempty" desc:"preserve doc containers structure (useful when -query is set)"` // preserve all containert from document body to selector query result
	ForceImageLoading *bool   `json:"force_image_loading,omitempty" desc:"replace img[src} attribute value by value of data-src"`     // replace img[src] by img[data-src] conetnt
	// global options
	Grayscale    *bool   `json:"grayscale,omitempty"`
	MarginBottom *uint   `json:"margin_bottom,omitempty"`
	MarginLeft   *uint   `json:"margin_left,omitempty"`
	MarginRight  *uint   `json:"margin_right,omitempty"`
	MarginTop    *uint   `json:"margin_top,omitempty"`
	Orientation  *string `json:"orientation,omitempty"`
	PageHeight   *uint   `json:"page_height,omitempty"`
	PageWidth    *uint   `json:"page_width,omitempty"`
	PageSize     *string `json:"page_size,omitempty"`
	Title        *string `json:"title,omitempty"`
	// page options
	DisableExternalLinks *bool    `json:"disable_external_links,omitempty"`
	DisableInternalLinks *bool    `json:"disable_internal_links,omitempty"`
	EnableJavascript     *bool    `json:"enable_javascript,omitempty"`
	NoBackground         *bool    `json:"no_background,omitempty"`
	NoImages             *bool    `json:"no_images,omitempty"`
	PageOffset           *uint    `json:"page_offset,omitempty"`
	Zoom                 *float64 `json:"zoom,omitempty"`
	ViewportSize         *string  `json:"viewport_size,omitempty"`
}

Params are used to tweak ToPDF output.

func (*Params) AddFrom

func (p *Params) AddFrom(o *Params)

AddFrom adds missed in p values from o. Don't overwrites existed values.

func (*Params) String

func (p *Params) String() string

String pretty prints struct values

type URLError

type URLError struct {
	// contains filtered or unexported fields
}

URLError wraps error from url.Parse method.

func (*URLError) Error

func (e *URLError) Error() string

Error is error interface implementation.

type ValidationError

type ValidationError struct {
	Message string
}

func (*ValidationError) Error

func (e *ValidationError) Error() string

Error is error interface implementation.

Directories

Path Synopsis
cmd
clip command
clip-lambda command
clip-serve command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL