crawler

package
v1.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: MIT Imports: 30 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ErrElementNotVisible = errors.New("element not visible")
View Source
var ErrEmptyPage = errors.New("page is empty")
View Source
var ErrNoCrawlingAction = errors.New("no more actions to crawl")
View Source
var ErrNoNavigationPossible = errors.New("no navigation possible")

Functions

This section is empty.

Types

type Crawler

type Crawler struct {
	// contains filtered or unexported fields
}

func New

func New(opts Options) (*Crawler, error)

func (*Crawler) Close

func (c *Crawler) Close()

func (*Crawler) Crawl

func (c *Crawler) Crawl(URL string) error

func (*Crawler) GetCrawlGraph

func (c *Crawler) GetCrawlGraph() *graph.CrawlGraph

type Options

type Options struct {
	ChromiumPath        string
	MaxBrowsers         int
	MaxDepth            int
	PageMaxTimeout      time.Duration
	NoSandbox           bool
	NoIncognito         bool
	ShowBrowser         bool
	SlowMotion          bool
	MaxCrawlDuration    time.Duration
	MaxFailureCount     int
	Trace               bool
	CookieConsentBypass bool
	AutomaticFormFill   bool
	PageLoadStrategy    string
	ChromeWSUrl         string
	DOMWaitTime         int
	UserDataDir         string

	// EnableDiagnostics enables the diagnostics mode
	// which writes diagnostic information to a directory
	// specified by the DiagnosticsDir optionally.
	EnableDiagnostics bool
	DiagnosticsDir    string

	Proxy           string
	Logger          *slog.Logger
	ScopeValidator  browser.ScopeValidator
	RequestCallback func(*output.Result)
	ChromeUser      *user.User
	CaptchaHandler  *captcha.Handler
	UserArguments   map[string]string

	AuthUsername  string
	AuthPassword  string
	DitClassifier *dit.Classifier
}

Directories

Path Synopsis
simhash
Package simhash implements SimHash algorithm for near-duplicate detection.
Package simhash implements SimHash algorithm for near-duplicate detection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL