Documentation
¶
Overview ¶
Package ratelimit provides rate limiting functionality for the crawler.
Index ¶
- type AdaptiveRateLimiter
- type Limiter
- func (l *Limiter) Allow() bool
- func (l *Limiter) AllowDomain(domain string) bool
- func (l *Limiter) GetCrawlDelay(domain, userAgent string) time.Duration
- func (l *Limiter) GetRobots() *RobotsManager
- func (l *Limiter) IsAllowed(ctx context.Context, domain, path, userAgent string, respectRobots bool) bool
- func (l *Limiter) Reserve() *rate.Reservation
- func (l *Limiter) SetDomainDelay(delay time.Duration)
- func (l *Limiter) SetDomainRate(domain string, requestsPerSecond float64, burst int)
- func (l *Limiter) SetRate(requestsPerSecond float64, burst int)
- func (l *Limiter) Stats() LimiterStats
- func (l *Limiter) Wait(ctx context.Context) error
- func (l *Limiter) WaitDomain(ctx context.Context, domain string) error
- type LimiterStats
- type RobotsManager
- type RobotsRules
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AdaptiveRateLimiter ¶
type AdaptiveRateLimiter struct {
*Limiter
// contains filtered or unexported fields
}
AdaptiveRateLimiter adjusts rate based on response times and errors.
func NewAdaptiveRateLimiter ¶
func NewAdaptiveRateLimiter(minRate, maxRate float64, burst int) *AdaptiveRateLimiter
NewAdaptiveRateLimiter creates a new adaptive rate limiter.
func (*AdaptiveRateLimiter) CurrentRate ¶
func (a *AdaptiveRateLimiter) CurrentRate() float64
CurrentRate returns the current rate.
func (*AdaptiveRateLimiter) RecordError ¶
func (a *AdaptiveRateLimiter) RecordError()
RecordError records a failed request.
func (*AdaptiveRateLimiter) RecordSuccess ¶
func (a *AdaptiveRateLimiter) RecordSuccess()
RecordSuccess records a successful request.
type Limiter ¶
type Limiter struct {
// contains filtered or unexported fields
}
Limiter implements rate limiting for crawling.
func NewLimiter ¶
NewLimiter creates a new rate limiter.
func (*Limiter) AllowDomain ¶
AllowDomain checks if a request to a domain is allowed without blocking.
func (*Limiter) GetCrawlDelay ¶
GetCrawlDelay returns the crawl delay for a domain from robots.txt.
func (*Limiter) GetRobots ¶
func (l *Limiter) GetRobots() *RobotsManager
GetRobots returns the robots.txt manager.
func (*Limiter) IsAllowed ¶
func (l *Limiter) IsAllowed(ctx context.Context, domain, path, userAgent string, respectRobots bool) bool
IsAllowed checks if a URL is allowed by rate limit and robots.txt.
func (*Limiter) Reserve ¶
func (l *Limiter) Reserve() *rate.Reservation
Reserve reserves a token for later use.
func (*Limiter) SetDomainDelay ¶
SetDomainDelay sets the minimum delay between requests to the same domain.
func (*Limiter) SetDomainRate ¶
SetDomainRate sets a custom rate limit for a specific domain.
func (*Limiter) Stats ¶
func (l *Limiter) Stats() LimiterStats
Stats returns rate limiter statistics.
type LimiterStats ¶
type LimiterStats struct {
DomainCount int `json:"domain_count"`
DefaultRate float64 `json:"default_rate"`
DefaultBurst int `json:"default_burst"`
DomainDelay time.Duration `json:"domain_delay"`
}
LimiterStats contains rate limiter statistics.
type RobotsManager ¶
type RobotsManager struct {
// contains filtered or unexported fields
}
RobotsManager manages robots.txt rules for multiple domains.
func NewRobotsManager ¶
func NewRobotsManager() *RobotsManager
NewRobotsManager creates a new robots.txt manager.
func (*RobotsManager) Fetch ¶
func (m *RobotsManager) Fetch(ctx context.Context, domain string) error
Fetch fetches and parses robots.txt for a domain.
func (*RobotsManager) GetCrawlDelay ¶
func (m *RobotsManager) GetCrawlDelay(domain, userAgent string) time.Duration
GetCrawlDelay returns the crawl delay for a domain.
func (*RobotsManager) GetSitemaps ¶
func (m *RobotsManager) GetSitemaps(domain string) []string
GetSitemaps returns sitemap URLs for a domain.
func (*RobotsManager) IsAllowed ¶
func (m *RobotsManager) IsAllowed(domain, path, userAgent string) bool
IsAllowed checks if a path is allowed by robots.txt.
type RobotsRules ¶
type RobotsRules struct {
Disallow []*regexp.Regexp
Allow []*regexp.Regexp
CrawlDelay time.Duration
Sitemaps []string
FetchedAt time.Time
}
RobotsRules represents parsed robots.txt rules.
func ParseRobots ¶
func ParseRobots(r io.Reader, userAgent string) (*RobotsRules, error)
ParseRobots parses robots.txt content.
func (*RobotsRules) IsAllowed ¶
func (r *RobotsRules) IsAllowed(path string) bool
IsAllowed checks if a path is allowed by the rules.