scrape

package
v0.0.0-...-fb615b9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package scrape provides tools to scrape web page content using Playwright and convert the results to Markdown.

Index

Constants

This section is empty.

Variables

View Source
var DefaultOptions = Options{
	Timeout:           15 * time.Second,
	MaxAge:            8 * time.Hour,
	MaxSpeed:          time.Second,
	WaitUntil:         *playwright.WaitUntilStateLoad,
	Locale:            "en-GB",
	Timezone:          "Europe/London",
	IgnoreHttpsErrors: true,
}

Default options if not overridden in NewBrowser or Scrape calls

Functions

This section is empty.

Types

type Browser

type Browser struct {
	// contains filtered or unexported fields
}

Browser instance with default options and cache.

func New

func New(options ...Option) (*Browser, error)

Load new firefox browser. Options can be given to override those in DefaultOptions.

func (*Browser) LastURL

func (b *Browser) LastURL() string

Most recently scraped URL

func (*Browser) Line

func (b *Browser) Line() int

Current line document is positioned at

func (*Browser) Scrape

func (b *Browser) Scrape(ctx context.Context, uri string, options ...Option) (r Response, err error)

Scrape HTML content from given URL and convert to Markdown. Options can be given to override those from NewBrowser.

func (*Browser) SetLine

func (b *Browser) SetLine(line int)

Update line from browser tool

func (*Browser) Shutdown

func (b *Browser) Shutdown()

Close browser and stop playwright

type Option

type Option func(*Options)

func WithIgnoreHttpsErrors

func WithIgnoreHttpsErrors(on bool) Option

func WithLocale

func WithLocale(loc string) Option

func WithMaxAge

func WithMaxAge(d time.Duration) Option

func WithMaxSpeed

func WithMaxSpeed(d time.Duration) Option

func WithReferer

func WithReferer(ref string) Option

func WithTimeout

func WithTimeout(d time.Duration) Option

func WithTimezone

func WithTimezone(zone string) Option

func WithWaitFor

func WithWaitFor(d time.Duration) Option

func WithWaitUntil

func WithWaitUntil(s *playwright.WaitUntilState) Option

type Options

type Options struct {
	Timeout           time.Duration // Timeout for each goto request
	MaxAge            time.Duration // Used cached response if age of request less than this
	MaxSpeed          time.Duration // Minimum delay between requests from same host
	WaitFor           time.Duration // Wait after load has completed
	WaitUntil         playwright.WaitUntilState
	Referer           string
	Locale            string
	Timezone          string
	IgnoreHttpsErrors bool
}

Options for playwright browser.

type Response

type Response struct {
	URL        string
	Title      string
	Markdown   string
	RawHTML    string
	MainHTML   string
	Status     int
	StatusText string
	Timestamp  time.Time
}

Scrape page response data

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL