crawler

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Overview

Package crawler provides the core crawler engine for request scheduling and page downloading.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler interface {
	Init(*spider.Spider) Crawler // Init initializes the crawler engine
	Run()                        // Run executes the task
	Stop()                       // Stop terminates the crawler
	CanStop() bool               // CanStop reports whether the crawler can be stopped
	GetID() int                  // GetID returns the engine ID
}

Crawler is the core crawler engine.

func New

func New(id int, dl downloader.Downloader, outType string, batchCap int) Crawler

New creates a new Crawler with the given ID, Downloader, and pipeline config.

type CrawlerPool

type CrawlerPool interface {
	Reset(spiderNum int) int
	SetPipelineConfig(outType string, batchCap int)
	Use() Crawler
	UseOpt() option.Option[Crawler]
	Free(Crawler)
	Stop()
}

CrawlerPool manages a pool of crawler engines.

func NewCrawlerPool

func NewCrawlerPool(dl downloader.Downloader) CrawlerPool

NewCrawlerPool creates a new crawler pool with the given Downloader.

type SpiderQueue

type SpiderQueue interface {
	Reset() // Reset clears the queue
	Add(*spider.Spider)
	AddAll([]*spider.Spider)
	AddKeyins(string) // AddKeyins assigns Keyin to queue members that have not been assigned yet
	GetByIndex(int) *spider.Spider
	GetByIndexOpt(int) option.Option[*spider.Spider]
	GetByName(string) *spider.Spider
	GetByNameOpt(string) option.Option[*spider.Spider]
	GetAll() []*spider.Spider
	Len() int // Len returns the queue length
}

SpiderQueue holds the spider rule queue for the crawler engine.

func NewSpiderQueue

func NewSpiderQueue() SpiderQueue

NewSpiderQueue creates a new spider queue.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL