ratelimit

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package ratelimit provides rate limiting functionality for the crawler.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AdaptiveRateLimiter

type AdaptiveRateLimiter struct {
	*Limiter
	// contains filtered or unexported fields
}

AdaptiveRateLimiter adjusts rate based on response times and errors.

func NewAdaptiveRateLimiter

func NewAdaptiveRateLimiter(minRate, maxRate float64, burst int) *AdaptiveRateLimiter

NewAdaptiveRateLimiter creates a new adaptive rate limiter.

func (*AdaptiveRateLimiter) CurrentRate

func (a *AdaptiveRateLimiter) CurrentRate() float64

CurrentRate returns the current rate.

func (*AdaptiveRateLimiter) RecordError

func (a *AdaptiveRateLimiter) RecordError()

RecordError records a failed request.

func (*AdaptiveRateLimiter) RecordSuccess

func (a *AdaptiveRateLimiter) RecordSuccess()

RecordSuccess records a successful request.

type Limiter

type Limiter struct {
	// contains filtered or unexported fields
}

Limiter implements rate limiting for crawling.

func NewLimiter

func NewLimiter(requestsPerSecond float64, burst int) *Limiter

NewLimiter creates a new rate limiter.

func (*Limiter) Allow

func (l *Limiter) Allow() bool

Allow checks if a request is allowed without blocking.

func (*Limiter) AllowDomain

func (l *Limiter) AllowDomain(domain string) bool

AllowDomain checks if a request to a domain is allowed without blocking.

func (*Limiter) GetCrawlDelay

func (l *Limiter) GetCrawlDelay(domain, userAgent string) time.Duration

GetCrawlDelay returns the crawl delay for a domain from robots.txt.

func (*Limiter) GetRobots

func (l *Limiter) GetRobots() *RobotsManager

GetRobots returns the robots.txt manager.

func (*Limiter) IsAllowed

func (l *Limiter) IsAllowed(ctx context.Context, domain, path, userAgent string, respectRobots bool) bool

IsAllowed checks if a URL is allowed by rate limit and robots.txt.

func (*Limiter) Reserve

func (l *Limiter) Reserve() *rate.Reservation

Reserve reserves a token for later use.

func (*Limiter) SetDomainDelay

func (l *Limiter) SetDomainDelay(delay time.Duration)

SetDomainDelay sets the minimum delay between requests to the same domain.

func (*Limiter) SetDomainRate

func (l *Limiter) SetDomainRate(domain string, requestsPerSecond float64, burst int)

SetDomainRate sets a custom rate limit for a specific domain.

func (*Limiter) SetRate

func (l *Limiter) SetRate(requestsPerSecond float64, burst int)

SetRate updates the global rate limit.

func (*Limiter) Stats

func (l *Limiter) Stats() LimiterStats

Stats returns rate limiter statistics.

func (*Limiter) Wait

func (l *Limiter) Wait(ctx context.Context) error

Wait blocks until a request is allowed or context is cancelled.

func (*Limiter) WaitDomain

func (l *Limiter) WaitDomain(ctx context.Context, domain string) error

WaitDomain blocks until a request to a specific domain is allowed.

type LimiterStats

type LimiterStats struct {
	DomainCount  int           `json:"domain_count"`
	DefaultRate  float64       `json:"default_rate"`
	DefaultBurst int           `json:"default_burst"`
	DomainDelay  time.Duration `json:"domain_delay"`
}

LimiterStats contains rate limiter statistics.

type RobotsManager

type RobotsManager struct {
	// contains filtered or unexported fields
}

RobotsManager manages robots.txt rules for multiple domains.

func NewRobotsManager

func NewRobotsManager() *RobotsManager

NewRobotsManager creates a new robots.txt manager.

func (*RobotsManager) Fetch

func (m *RobotsManager) Fetch(ctx context.Context, domain string) error

Fetch fetches and parses robots.txt for a domain.

func (*RobotsManager) GetCrawlDelay

func (m *RobotsManager) GetCrawlDelay(domain, userAgent string) time.Duration

GetCrawlDelay returns the crawl delay for a domain.

func (*RobotsManager) GetSitemaps

func (m *RobotsManager) GetSitemaps(domain string) []string

GetSitemaps returns sitemap URLs for a domain.

func (*RobotsManager) IsAllowed

func (m *RobotsManager) IsAllowed(domain, path, userAgent string) bool

IsAllowed checks if a path is allowed by robots.txt.

type RobotsRules

type RobotsRules struct {
	Disallow   []*regexp.Regexp
	Allow      []*regexp.Regexp
	CrawlDelay time.Duration
	Sitemaps   []string
	FetchedAt  time.Time
}

RobotsRules represents parsed robots.txt rules.

func ParseRobots

func ParseRobots(r io.Reader, userAgent string) (*RobotsRules, error)

ParseRobots parses robots.txt content.

func (*RobotsRules) IsAllowed

func (r *RobotsRules) IsAllowed(path string) bool

IsAllowed checks if a path is allowed by the rules.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL