Documentation
¶
Overview ¶
Package hybrid implements the functionality for a hybrid-headless crawler. It uses both headless browser and net/http for making requests, and goquery for processing rawand dom-rendered web page HTML.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FetchContinueRequest ¶
func FetchContinueRequest(page *rod.Page, e *proto.FetchRequestPaused) error
FetchContinueRequest continue request
func FetchGetResponseBody ¶
FetchGetResponseBody get request body.
Types ¶
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
Crawler is a standard crawler instance
func New ¶
func New(options *types.CrawlerOptions) (*Crawler, error)
New returns a new standard crawler instance
func (*Crawler) Do ¶ added in v1.4.0
func (c *Crawler) Do(crawlSession *common.CrawlSession, doRequest common.DoRequestFunc) error
Do executes the crawling loop with browser-safe concurrency. Unlike the base implementation, this uses sequential processing (concurrency=1) because Chrome DevTools Protocol operations cannot safely run concurrently on the same browser instance. Multiple concurrent page operations cause race conditions, navigation conflicts, and network interception issues.
type Hijack ¶
type Hijack struct {
// contains filtered or unexported fields
}
Hijack is a hijack handler
func (*Hijack) SetPattern ¶
func (h *Hijack) SetPattern(pattern *proto.FetchRequestPattern)
SetPattern set pattern directly
type HijackHandler ¶
type HijackHandler = func(e *proto.FetchRequestPaused) error
HijackHandler type