Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
View Source
var ( DefaultUserAgent = "Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0" DefaultTimeout = 30 * time.Second DefaultOptions = &Options{ FallbackConfig: trafilaturaFallback, HttpClient: &http.Client{}, Timeout: DefaultTimeout, Transport: nil, UserAgent: fetch.DefaultUserAgent, } )
Functions ¶
Types ¶
type Options ¶
type Options struct {
FallbackConfig *trafilatura.FallbackConfig
HttpClient *http.Client
UserAgent string
Transport http.RoundTripper
Timeout time.Duration
}
type TrafilaturaFetcher ¶
type TrafilaturaFetcher struct {
// contains filtered or unexported fields
}
func NewTrafilaturaFetcher ¶
func NewTrafilaturaFetcher(options Options) *TrafilaturaFetcher
func (*TrafilaturaFetcher) Close ¶
func (f *TrafilaturaFetcher) Close() error
func (*TrafilaturaFetcher) Fetch ¶
Fetch a URL and return a WebPage resource. The web page will be fetched and parsed using the Trafilatura library. The returned resource will contain the metadata and content text. The request's StatusCode will be set to the HTTP status code returned. If there's an error fetching the page, in addition to the returned error, the *resource.WebPage will contain partial data pertaining to the request.
Click to show internal directories.
Click to hide internal directories.