Documentation
¶
Overview ¶
Package fetch provides URL content extraction using go-defuddle. It fetches a web page via HTTP (with optional browser fallback) and extracts clean content (HTML or Markdown).
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Fetcher ¶
type Fetcher struct {
// contains filtered or unexported fields
}
Fetcher extracts clean content from web pages.
type Options ¶
type Options struct {
// Markdown converts extracted HTML to Markdown.
Markdown bool
// UseBrowser forces browser-based fetching (level 2).
UseBrowser bool
// PauseFunc runs after browser navigation before HTML extraction.
PauseFunc transport.PauseFunc
}
Options controls fetch behavior.
type Result ¶
type Result struct {
// Title is the page title.
Title string `json:"title"`
// Description is the meta description.
Description string `json:"description"`
// Domain is the hostname.
Domain string `json:"domain"`
// Author is the author name.
Author string `json:"author"`
// Published is the publish date.
Published string `json:"published"`
// Content is the extracted main content as clean HTML.
Content string `json:"content"`
// Markdown is the content converted to Markdown.
Markdown string `json:"markdown,omitempty"`
// WordCount is the word count of extracted content.
WordCount int `json:"wordCount"`
}
Result holds the extracted content from a web page.
Click to show internal directories.
Click to hide internal directories.