crawl

package

v0.1.1 Latest Latest Go to latest Published: Mar 10, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/akdavidsson/trawl

Links

Open Source Insights

Documentation ¶

Index ¶

func FindNextPageURL(html []byte, baseURL string, rule *strategy.PaginationRule) (string, error)
type Options
type Result
- func Run(ctx context.Context, opts Options) (*Result, error)
type WorkerPool
- func (wp *WorkerPool) ProcessURLs(urls []string) ([]*extract.Result, []error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func FindNextPageURL ¶

func FindNextPageURL(html []byte, baseURL string, rule *strategy.PaginationRule) (string, error)

FindNextPageURL extracts the next page URL from the current page using the pagination rule.

Types ¶

type Options ¶

type Options struct {
	URL        string
	FetchOpts  fetch.Options
	Strategy   *strategy.ExtractionStrategy // pre-loaded strategy (nil = derive)
	FieldNames []string
	FieldDescs map[string]string
	Query      string // natural language query (--query mode)
	Provider   string
	Model      string
	APIKey     string
	MaxPages   int
	NoCache    bool
	NoHeal     bool
	Verbose    bool
}

Options configures a crawl run.

type Result ¶

type Result struct {
	Strategy *strategy.ExtractionStrategy
	Extract  *extract.Result
	Pages    int
}

Result holds the output of a full crawl run.

func Run ¶

func Run(ctx context.Context, opts Options) (*Result, error)

Run executes the full crawl pipeline: fetch, analyze, derive/load strategy, extract.

type WorkerPool ¶

type WorkerPool struct {
	Concurrency int
	FetchOpts   fetch.Options
	Strategy    *strategy.ExtractionStrategy
	// contains filtered or unexported fields
}

WorkerPool manages concurrent page fetching and extraction.

func (*WorkerPool) ProcessURLs ¶

func (wp *WorkerPool) ProcessURLs(urls []string) ([]*extract.Result, []error)

ProcessURLs fetches and extracts data from multiple URLs concurrently.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL