Documentation
¶
Overview ¶
Package crawler provides the core crawler engine for request scheduling and page downloading.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler interface {
Init(*spider.Spider) Crawler // Init initializes the crawler engine
Run() // Run executes the task
Stop() // Stop terminates the crawler
CanStop() bool // CanStop reports whether the crawler can be stopped
GetID() int // GetID returns the engine ID
}
Crawler is the core crawler engine.
func New ¶
func New(id int, dl downloader.Downloader, outType string, batchCap int) Crawler
New creates a new Crawler with the given ID, Downloader, and pipeline config.
type CrawlerPool ¶
type CrawlerPool interface {
Reset(spiderNum int) int
SetPipelineConfig(outType string, batchCap int)
Use() Crawler
UseOpt() option.Option[Crawler]
Free(Crawler)
Stop()
}
CrawlerPool manages a pool of crawler engines.
func NewCrawlerPool ¶
func NewCrawlerPool(dl downloader.Downloader) CrawlerPool
NewCrawlerPool creates a new crawler pool with the given Downloader.
type SpiderQueue ¶
type SpiderQueue interface {
Reset() // Reset clears the queue
Add(*spider.Spider)
AddAll([]*spider.Spider)
AddKeyins(string) // AddKeyins assigns Keyin to queue members that have not been assigned yet
GetByIndex(int) *spider.Spider
GetByIndexOpt(int) option.Option[*spider.Spider]
GetByName(string) *spider.Spider
GetByNameOpt(string) option.Option[*spider.Spider]
GetAll() []*spider.Spider
Len() int // Len returns the queue length
}
SpiderQueue holds the spider rule queue for the crawler engine.
Click to show internal directories.
Click to hide internal directories.