Documentation
¶
Index ¶
- type CrawlURL
- type Frontier
- func (f *Frontier) Add(crawlURL CrawlURL) bool
- func (f *Frontier) Close()
- func (f *Frontier) Delay() time.Duration
- func (f *Frontier) Len() int
- func (f *Frontier) MarkSeen(url string)
- func (f *Frontier) Next() *CrawlURL
- func (f *Frontier) SeenCount() int
- func (f *Frontier) SetDelay(delay time.Duration)
- type HostQueue
- type URLDb
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CrawlURL ¶
type CrawlURL struct {
URL string
Priority int // lower = higher priority
Depth int
FoundOn string
Attempt int // retry attempt number (0 = first try)
// contains filtered or unexported fields
}
CrawlURL represents a URL to be crawled with priority and metadata.
type Frontier ¶
type Frontier struct {
// contains filtered or unexported fields
}
Frontier manages the URL queue with priority, dedup, and per-host politeness.
func (*Frontier) Add ¶
Add adds a URL to the frontier if it hasn't been seen before. Returns true if the URL was added. Even if already seen, updates the minimum depth tracking so that dequeued URLs get their true shortest-path depth.
func (*Frontier) Close ¶
func (f *Frontier) Close()
Close closes the frontier, preventing new URLs from being added.
func (*Frontier) MarkSeen ¶
MarkSeen adds a URL to the dedup database without adding it to the queue.
func (*Frontier) Next ¶
Next returns the next URL that is ready to be fetched (respecting per-host delay). Returns nil if no URL is ready or the frontier is empty.
type HostQueue ¶
type HostQueue struct {
// contains filtered or unexported fields
}
HostQueue manages per-host politeness delays.
func NewHostQueue ¶
NewHostQueue creates a new HostQueue with the given default delay.
func (*HostQueue) CanFetch ¶
CanFetch returns true if enough time has passed since the last fetch to this host.
func (*HostQueue) RecordFetch ¶
RecordFetch records that a fetch was made to this host.