fetch

package
v0.1.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Overview

Package fetch provides URL content extraction using go-defuddle. It fetches a web page via HTTP (with optional browser fallback) and extracts clean content (HTML or Markdown).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Fetcher

type Fetcher struct {
	// contains filtered or unexported fields
}

Fetcher extracts clean content from web pages.

func New

func New(tp *transport.Transport) (*Fetcher, error)

New creates a new Fetcher backed by the given transport. Call Close() when done.

func (*Fetcher) Close

func (f *Fetcher) Close()

Close releases resources.

func (*Fetcher) Fetch

func (f *Fetcher) Fetch(ctx context.Context, url string, opts *Options) (*Result, error)

Fetch retrieves a URL and extracts clean content. It tries HTTP first, falling back to browser if the result is poor.

type Options

type Options struct {
	// Markdown converts extracted HTML to Markdown.
	Markdown bool
	// UseBrowser forces browser-based fetching (level 2).
	UseBrowser bool
	// PauseFunc runs after browser navigation before HTML extraction.
	PauseFunc transport.PauseFunc
}

Options controls fetch behavior.

type Result

type Result struct {
	// Title is the page title.
	Title string `json:"title"`
	// Description is the meta description.
	Description string `json:"description"`
	// Domain is the hostname.
	Domain string `json:"domain"`
	// Author is the author name.
	Author string `json:"author"`
	// Published is the publish date.
	Published string `json:"published"`
	// Content is the extracted main content as clean HTML.
	Content string `json:"content"`
	// Markdown is the content converted to Markdown.
	Markdown string `json:"markdown,omitempty"`
	// WordCount is the word count of extracted content.
	WordCount int `json:"wordCount"`
}

Result holds the extracted content from a web page.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL