crawler

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2026 License: MIT Imports: 29 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrElementNotVisible = errors.New("element not visible")
View Source
var ErrEmptyPage = errors.New("page is empty")
View Source
var ErrNoCrawlingAction = errors.New("no more actions to crawl")
View Source
var ErrNoNavigationPossible = errors.New("no navigation possible")

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func New

func New(opts Options) (*Crawler, error)

func (*Crawler) Close

func (c *Crawler) Close()

func (*Crawler) Crawl

func (c *Crawler) Crawl(URL string) error

func (*Crawler) GetCrawlGraph

func (c *Crawler) GetCrawlGraph() *graph.CrawlGraph

type Options

type Options struct {
	ChromiumPath        string
	MaxBrowsers         int
	MaxDepth            int
	PageMaxTimeout      time.Duration
	NoSandbox           bool
	ShowBrowser         bool
	SlowMotion          bool
	MaxCrawlDuration    time.Duration
	MaxFailureCount     int
	Trace               bool
	CookieConsentBypass bool
	AutomaticFormFill   bool

	// EnableDiagnostics enables the diagnostics mode
	// which writes diagnostic information to a directory
	// specified by the DiagnosticsDir optionally.
	EnableDiagnostics bool
	DiagnosticsDir    string

	Proxy           string
	Logger          *slog.Logger
	ScopeValidator  browser.ScopeValidator
	RequestCallback func(*output.Result)
	ChromeUser      *user.User
	CaptchaHandler  *captcha.Handler
}

Directories

Path Synopsis
simhash
Package simhash implements SimHash algorithm for near-duplicate detection.
Package simhash implements SimHash algorithm for near-duplicate detection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL