Documentation
¶
Overview ¶
Package webfetch provides URL content fetching with HTML-to-Markdown conversion. It supports configurable timeouts, size limits, robots.txt/agents.txt compliance, and optional enhanced fetchers (e.g. Jina Reader) for JS-heavy pages.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ChainFetcher ¶
type ChainFetcher struct {
// contains filtered or unexported fields
}
ChainFetcher tries the primary fetcher first. If the result content looks empty (common with JS-rendered pages), it falls back to an enhanced fetcher.
func NewChainFetcher ¶
func NewChainFetcher(primary, fallback Fetcher, logger *slog.Logger) *ChainFetcher
NewChainFetcher creates a fetcher that chains primary → fallback. If fallback is nil, it behaves identically to the primary fetcher.
func (*ChainFetcher) Fetch ¶
func (c *ChainFetcher) Fetch(ctx context.Context, url string) (*FetchResult, error)
Fetch tries the primary fetcher; falls back to the enhanced fetcher if the primary result appears empty or too short (likely a JS-rendered page).
func (*ChainFetcher) Name ¶
func (c *ChainFetcher) Name() string
Name returns the fetcher identifier.
type DefaultFetcher ¶
type DefaultFetcher struct {
// contains filtered or unexported fields
}
DefaultFetcher fetches URLs via HTTP and converts HTML to Markdown.
func NewDefaultFetcher ¶
func NewDefaultFetcher(opts Options) *DefaultFetcher
NewDefaultFetcher creates a fetcher with the given options.
func (*DefaultFetcher) Fetch ¶
func (f *DefaultFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)
Fetch retrieves a URL and converts its content to Markdown.
func (*DefaultFetcher) Name ¶
func (f *DefaultFetcher) Name() string
Name returns the fetcher identifier.
type FetchResult ¶
type FetchResult struct {
URL string `json:"url"`
Title string `json:"title"`
Content string `json:"content"` // Markdown
ContentType string `json:"content_type"`
BytesFetched int `json:"bytes_fetched"`
}
FetchResult holds the fetched and converted content from a URL.
type Fetcher ¶
type Fetcher interface {
Fetch(ctx context.Context, url string) (*FetchResult, error)
Name() string
}
Fetcher retrieves content from a URL and returns it as Markdown.
type JinaFetcher ¶
type JinaFetcher struct {
// contains filtered or unexported fields
}
JinaFetcher uses the Jina Reader API (r.jina.ai) to fetch URLs and convert them to Markdown. Jina handles JavaScript rendering, making it suitable for JS-heavy pages that the DefaultFetcher cannot process.
func NewJinaFetcher ¶
func NewJinaFetcher(timeout time.Duration, logger *slog.Logger) *JinaFetcher
NewJinaFetcher creates a Jina Reader fetcher.
func (*JinaFetcher) Fetch ¶
func (j *JinaFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)
Fetch retrieves a URL via Jina Reader and returns Markdown content.