webfetch

package

v0.9.0 Latest Latest Go to latest Published: Apr 3, 2026 License: Apache-2.0 Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Temikus/denkeeper

Links

Open Source Insights

Documentation ¶

Overview ¶

Package webfetch provides URL content fetching with HTML-to-Markdown conversion. It supports configurable timeouts, size limits, robots.txt/agents.txt compliance, and optional enhanced fetchers (e.g. Jina Reader) for JS-heavy pages.

Index ¶

type ChainFetcher
- func NewChainFetcher(primary, fallback Fetcher, logger *slog.Logger) *ChainFetcher
- func (c *ChainFetcher) Fetch(ctx context.Context, url string) (*FetchResult, error)
- func (c *ChainFetcher) Name() string
type DefaultFetcher
- func NewDefaultFetcher(opts Options) *DefaultFetcher
- func (f *DefaultFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)
- func (f *DefaultFetcher) Name() string
type FetchResult
type Fetcher
type JinaFetcher
- func NewJinaFetcher(timeout time.Duration, logger *slog.Logger) *JinaFetcher
- func (j *JinaFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)
- func (j *JinaFetcher) Name() string
type Options

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ChainFetcher ¶

type ChainFetcher struct {
	// contains filtered or unexported fields
}

ChainFetcher tries the primary fetcher first. If the result content looks empty (common with JS-rendered pages), it falls back to an enhanced fetcher.

func NewChainFetcher ¶

func NewChainFetcher(primary, fallback Fetcher, logger *slog.Logger) *ChainFetcher

NewChainFetcher creates a fetcher that chains primary → fallback. If fallback is nil, it behaves identically to the primary fetcher.

func (*ChainFetcher) Fetch ¶

func (c *ChainFetcher) Fetch(ctx context.Context, url string) (*FetchResult, error)

Fetch tries the primary fetcher; falls back to the enhanced fetcher if the primary result appears empty or too short (likely a JS-rendered page).

func (*ChainFetcher) Name ¶

func (c *ChainFetcher) Name() string

Name returns the fetcher identifier.

type DefaultFetcher ¶

type DefaultFetcher struct {
	// contains filtered or unexported fields
}

DefaultFetcher fetches URLs via HTTP and converts HTML to Markdown.

func NewDefaultFetcher ¶

func NewDefaultFetcher(opts Options) *DefaultFetcher

NewDefaultFetcher creates a fetcher with the given options.

func (*DefaultFetcher) Fetch ¶

func (f *DefaultFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)

func (*DefaultFetcher) Name ¶

func (f *DefaultFetcher) Name() string

Name returns the fetcher identifier.

type FetchResult ¶

type FetchResult struct {
	URL          string `json:"url"`
	Title        string `json:"title"`
	Content      string `json:"content"` // Markdown
	ContentType  string `json:"content_type"`
	BytesFetched int    `json:"bytes_fetched"`
}

FetchResult holds the fetched and converted content from a URL.

type Fetcher ¶

type Fetcher interface {
	Fetch(ctx context.Context, url string) (*FetchResult, error)
	Name() string
}

Fetcher retrieves content from a URL and returns it as Markdown.

type JinaFetcher ¶

type JinaFetcher struct {
	// contains filtered or unexported fields
}

JinaFetcher uses the Jina Reader API (r.jina.ai) to fetch URLs and convert them to Markdown. Jina handles JavaScript rendering, making it suitable for JS-heavy pages that the DefaultFetcher cannot process.

func NewJinaFetcher ¶

func NewJinaFetcher(timeout time.Duration, logger *slog.Logger) *JinaFetcher

NewJinaFetcher creates a Jina Reader fetcher.

func (*JinaFetcher) Fetch ¶

func (j *JinaFetcher) Fetch(ctx context.Context, rawURL string) (*FetchResult, error)

Fetch retrieves a URL via Jina Reader and returns Markdown content.

func (*JinaFetcher) Name ¶

func (j *JinaFetcher) Name() string

Name returns the fetcher identifier.

type Options ¶

type Options struct {
	Timeout          time.Duration
	MaxSizeBytes     int64
	UserAgent        string
	RespectRobotsTxt bool
	RespectAgentsTxt bool
	Logger           *slog.Logger
}

Options configures a DefaultFetcher.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL