Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FindNextPageURL ¶
FindNextPageURL extracts the next page URL from the current page using the pagination rule.
Types ¶
type Options ¶
type Options struct {
URL string
FetchOpts fetch.Options
Strategy *strategy.ExtractionStrategy // pre-loaded strategy (nil = derive)
FieldNames []string
FieldDescs map[string]string
Query string // natural language query (--query mode)
Provider string
Model string
APIKey string
MaxPages int
NoCache bool
NoHeal bool
Verbose bool
}
Options configures a crawl run.
type Result ¶
type Result struct {
Strategy *strategy.ExtractionStrategy
Extract *extract.Result
Pages int
}
Result holds the output of a full crawl run.
type WorkerPool ¶
type WorkerPool struct {
Concurrency int
FetchOpts fetch.Options
Strategy *strategy.ExtractionStrategy
// contains filtered or unexported fields
}
WorkerPool manages concurrent page fetching and extraction.
func (*WorkerPool) ProcessURLs ¶
func (wp *WorkerPool) ProcessURLs(urls []string) ([]*extract.Result, []error)
ProcessURLs fetches and extracts data from multiple URLs concurrently.
Click to show internal directories.
Click to hide internal directories.