discovery

package
v0.18.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 29, 2026 License: GPL-3.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

View Source
const (
	SourceRSSLink      model.DiscoverySource = "rss-link"
	SourceSitemap      model.DiscoverySource = "sitemap"
	SourceDOMContainer model.DiscoverySource = "dom-container"
)

Source constants for DiscoveredFeed.Source.

View Source
const (

	// AcceptThreshold is the confidence score above which results are accepted without sampling.
	AcceptThreshold = 0.70
)

Variables

This section is empty.

Functions

func RefineByUrlPattern

func RefineByUrlPattern(allEntries, validEntries []*model.Recipe) (string, int)

RefineByUrlPattern derives the common URL path prefix from validEntries and returns the pattern and count of allEntries whose URL matches that prefix.

func ReplayDiscovered

func ReplayDiscovered(data *model.DataInput, feed *model.Feed, d *model.DiscoveredFeed) error

ReplayDiscovered replays a previously discovered feed configuration.

func ScrapeFeed

func ScrapeFeed(data *model.DataInput, feed *model.Feed, sampling SamplingOptions) error

ScrapeFeed runs the discovery pipeline against the given DataInput. On success, feed.Entries will contain stub recipes (Url only) and feed.Discovered will describe how they were found. When sampling.Validator is set, it is called with a sample of URLs from each DOM candidate group before committing it. Groups where fewer than half the sampled URLs are recipes are skipped; the next-best group is tried instead.

func UrlPathPattern

func UrlPathPattern(urls []string) string

UrlPathPattern returns the common path prefix of URLs as a string. Example: ["/recipes/pasta", "/recipes/chicken"] → "/recipes/"

Types

type GroupValidator added in v0.18.0

type GroupValidator func(urls []string) []*model.Recipe

GroupValidator samples a slice of candidate URLs and returns the subset that are confirmed recipe pages (already scraped, ready to merge into feed entries).

type SamplingOptions added in v0.18.0

type SamplingOptions struct {
	// Validator is called with a sample of candidate URLs; nil disables validation.
	Validator GroupValidator
	// SampleSize is the number of URLs to sample per candidate group; 0 disables sampling.
	SampleSize int
}

SamplingOptions configures optional URL validation during DOM discovery. The zero value disables sampling entirely.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL