Documentation
¶
Overview ¶
FILE: pkg/crawler/crawler.go
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Crawler ¶
type Crawler struct {
// contains filtered or unexported fields
}
Crawler orchestrates the web crawling process for a single configured site
func NewCrawler ¶
func NewCrawler( appCfg config.AppConfig, siteCfg config.SiteConfig, siteKey string, baseLogger *logrus.Logger, store storage.VisitedStore, fetcher *fetch.Fetcher, rateLimiter *fetch.RateLimiter, crawlCtx context.Context, cancelCrawl context.CancelFunc, resume bool, ) (*Crawler, error)
NewCrawler creates and initializes a new Crawler instance and its components
func NewCrawlerWithOptions ¶
func NewCrawlerWithOptions( appCfg config.AppConfig, siteCfg config.SiteConfig, siteKey string, baseLogger *logrus.Logger, store storage.VisitedStore, fetcher *fetch.Fetcher, rateLimiter *fetch.RateLimiter, crawlCtx context.Context, cancelCrawl context.CancelFunc, resume bool, opts *CrawlerOptions, ) (*Crawler, error)
NewCrawlerWithOptions creates a new Crawler with optional configuration
func (*Crawler) FoundSitemap ¶
FoundSitemap implements fetch.SitemapDiscoverer for the RobotsHandler callback. It's called by RobotsHandler when a sitemap URL is found in robots.txt.
func (*Crawler) GetProgress ¶
func (c *Crawler) GetProgress() CrawlerProgress
GetProgress returns the current progress of the crawler
type CrawlerOptions ¶
type CrawlerOptions struct {
// If nil, the crawler creates its own semaphore based on appCfg.MaxRequests
SharedSemaphore *semaphore.Weighted
}
CrawlerOptions contains optional parameters for NewCrawler
Click to show internal directories.
Click to hide internal directories.