ratelimit

package

v1.0.0 Latest Latest Go to latest Published: Feb 1, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/PentesterFlow/OpenCrawler

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ratelimit provides rate limiting functionality for the crawler.

Index ¶

type AdaptiveRateLimiter
- func NewAdaptiveRateLimiter(minRate, maxRate float64, burst int) *AdaptiveRateLimiter
type Limiter
- func NewLimiter(requestsPerSecond float64, burst int) *Limiter
type LimiterStats
type RobotsManager
- func NewRobotsManager() *RobotsManager
type RobotsRules
- func ParseRobots(r io.Reader, userAgent string) (*RobotsRules, error)
- func (r *RobotsRules) IsAllowed(path string) bool

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AdaptiveRateLimiter ¶

type AdaptiveRateLimiter struct {
	*Limiter
	// contains filtered or unexported fields
}

AdaptiveRateLimiter adjusts rate based on response times and errors.

func NewAdaptiveRateLimiter ¶

func NewAdaptiveRateLimiter(minRate, maxRate float64, burst int) *AdaptiveRateLimiter

NewAdaptiveRateLimiter creates a new adaptive rate limiter.

func (*AdaptiveRateLimiter) CurrentRate ¶

func (a *AdaptiveRateLimiter) CurrentRate() float64

CurrentRate returns the current rate.

func (*AdaptiveRateLimiter) RecordError ¶

func (a *AdaptiveRateLimiter) RecordError()

RecordError records a failed request.

func (*AdaptiveRateLimiter) RecordSuccess ¶

func (a *AdaptiveRateLimiter) RecordSuccess()

RecordSuccess records a successful request.

type Limiter ¶

type Limiter struct {
	// contains filtered or unexported fields
}

Limiter implements rate limiting for crawling.

func NewLimiter ¶

func NewLimiter(requestsPerSecond float64, burst int) *Limiter

NewLimiter creates a new rate limiter.

func (*Limiter) Allow ¶

func (l *Limiter) Allow() bool

Allow checks if a request is allowed without blocking.

func (*Limiter) AllowDomain ¶

func (l *Limiter) AllowDomain(domain string) bool

AllowDomain checks if a request to a domain is allowed without blocking.

func (*Limiter) GetCrawlDelay ¶

func (l *Limiter) GetCrawlDelay(domain, userAgent string) time.Duration

GetCrawlDelay returns the crawl delay for a domain from robots.txt.

func (*Limiter) GetRobots ¶

func (l *Limiter) GetRobots() *RobotsManager

GetRobots returns the robots.txt manager.

func (*Limiter) IsAllowed ¶

func (l *Limiter) IsAllowed(ctx context.Context, domain, path, userAgent string, respectRobots bool) bool

IsAllowed checks if a URL is allowed by rate limit and robots.txt.

func (*Limiter) Reserve ¶

func (l *Limiter) Reserve() *rate.Reservation

Reserve reserves a token for later use.

func (*Limiter) SetDomainDelay ¶

func (l *Limiter) SetDomainDelay(delay time.Duration)

SetDomainDelay sets the minimum delay between requests to the same domain.

func (*Limiter) SetDomainRate ¶

func (l *Limiter) SetDomainRate(domain string, requestsPerSecond float64, burst int)

SetDomainRate sets a custom rate limit for a specific domain.

func (*Limiter) SetRate ¶

func (l *Limiter) SetRate(requestsPerSecond float64, burst int)

SetRate updates the global rate limit.

func (*Limiter) Stats ¶

func (l *Limiter) Stats() LimiterStats

Stats returns rate limiter statistics.

func (*Limiter) Wait ¶

func (l *Limiter) Wait(ctx context.Context) error

Wait blocks until a request is allowed or context is cancelled.

func (*Limiter) WaitDomain ¶

func (l *Limiter) WaitDomain(ctx context.Context, domain string) error

WaitDomain blocks until a request to a specific domain is allowed.

type LimiterStats ¶

type LimiterStats struct {
	DomainCount  int           `json:"domain_count"`
	DefaultRate  float64       `json:"default_rate"`
	DefaultBurst int           `json:"default_burst"`
	DomainDelay  time.Duration `json:"domain_delay"`
}

LimiterStats contains rate limiter statistics.

type RobotsManager ¶

type RobotsManager struct {
	// contains filtered or unexported fields
}

RobotsManager manages robots.txt rules for multiple domains.

func NewRobotsManager ¶

func NewRobotsManager() *RobotsManager

NewRobotsManager creates a new robots.txt manager.

func (*RobotsManager) Fetch ¶

func (m *RobotsManager) Fetch(ctx context.Context, domain string) error

Fetch fetches and parses robots.txt for a domain.

func (*RobotsManager) GetCrawlDelay ¶

func (m *RobotsManager) GetCrawlDelay(domain, userAgent string) time.Duration

GetCrawlDelay returns the crawl delay for a domain.

func (*RobotsManager) GetSitemaps ¶

func (m *RobotsManager) GetSitemaps(domain string) []string

GetSitemaps returns sitemap URLs for a domain.

func (*RobotsManager) IsAllowed ¶

func (m *RobotsManager) IsAllowed(domain, path, userAgent string) bool

IsAllowed checks if a path is allowed by robots.txt.

type RobotsRules ¶

type RobotsRules struct {
	Disallow   []*regexp.Regexp
	Allow      []*regexp.Regexp
	CrawlDelay time.Duration
	Sitemaps   []string
	FetchedAt  time.Time
}

RobotsRules represents parsed robots.txt rules.

func ParseRobots ¶

func ParseRobots(r io.Reader, userAgent string) (*RobotsRules, error)

ParseRobots parses robots.txt content.

func (*RobotsRules) IsAllowed ¶

func (r *RobotsRules) IsAllowed(path string) bool

IsAllowed checks if a path is allowed by the rules.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL