scraperutil

package
v0.2.0-alpha Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2026 License: MIT Imports: 0 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetDefaultScraperSettings

func GetDefaultScraperSettings() map[string]any

GetDefaultScraperSettings returns a copy of all registered scraper settings as map[string]any. Callers (config.go) type-assert the any to config.ScraperSettings.

func GetPriorities

func GetPriorities() []string

GetPriorities returns scraper names sorted by priority (highest first). Derives from defaultScraperSettingsRegistry to maintain single source of truth. This registry is populated by scraper packages calling RegisterDefaultScraperDefaults, which chains from scraper.RegisterScraperDefaults in scraper/plugin.go.

func GetScraperConfigs

func GetScraperConfigs() map[string]ScraperConfigAccessor

GetScraperConfigs returns a copy of all registered scraper config accessors.

func RegisterConfigFactory

func RegisterConfigFactory(name string, factory ConfigFactory)

RegisterConfigFactory registers a scraper config factory function. Called from each scraper package's init() function. The factory returns a new empty instance of the scraper's config struct.

func RegisterDefaultScraperSettings

func RegisterDefaultScraperSettings(name string, settings any, priority int)

RegisterDefaultScraperSettings registers a scraper's default settings and priority. Called from each scraper package's init() function. Settings is stored as any to avoid import cycle with config package.

func RegisterFlattenFunc

func RegisterFlattenFunc(name string, fn FlattenFunc)

RegisterFlattenFunc registers a flatten function for a scraper.

func RegisterScraperConfig

func RegisterScraperConfig(name string, accessor ScraperConfigAccessor)

RegisterScraperConfig registers a scraper config field accessor. Called from each scraper package's init() function. The accessor function handles the type assertion from any to *config.ScrapersConfig.

func RegisterScraperOptions

func RegisterScraperOptions(name string, provider ScraperOptionsProvider)

RegisterScraperOptions registers a scraper's display name and options. Called from each scraper package's init() function to self-register UI metadata. This enables plug-and-play scrapers - no central switch statement needed.

func RegisterValidator

func RegisterValidator(name string, fn ValidatorFunc)

RegisterValidator registers a scraper config validator function. This is called from each scraper package's init() function.

func ResetConfigFactories

func ResetConfigFactories()

ResetConfigFactories clears the config factory registry. Primarily used for test isolation.

func ResetDefaults

func ResetDefaults()

ResetDefaults clears the default scraper settings registry. Primarily used for test isolation. Note: priorityRegistry removed - GetPriorities now derives from scraper.GetRegisteredDefaults()

func ResetFlattenFuncs

func ResetFlattenFuncs()

ResetFlattenFuncs clears the flatten registry. Primarily used for test isolation.

func ResetScraperConfigs

func ResetScraperConfigs()

ResetScraperConfigs clears the scraper config registry. Primarily used for test isolation.

func ResetValidators

func ResetValidators()

ResetValidators clears the validator registry. Primarily used for test isolation.

Types

type ConfigFactory

type ConfigFactory func() any

ConfigFactory is a function that returns a new empty scraper config instance. The returned value is a pointer to a struct that implements yaml.Unmarshaler. Uses func() any to break the import cycle between scraperutil and config packages.

func GetConfigFactory

func GetConfigFactory(name string) ConfigFactory

GetConfigFactory returns the factory function for the named scraper.

type FlattenFunc

type FlattenFunc func(any) any

FlattenFunc converts a scraper-specific flat config to unified *config.ScraperSettings. Uses any to break the import cycle between scraperutil and config packages. The function type-asserts the any to the scraper's concrete config type internally.

func GetFlattenFunc

func GetFlattenFunc(name string) FlattenFunc

GetFlattenFunc returns the flatten function for a scraper.

type ScraperConfigAccessor

type ScraperConfigAccessor func(any) any

ScraperConfigAccessor is a function that returns the scraper-specific *ScraperSettings from the shared ScrapersConfig. Uses any to break the import cycle between scraperutil and config packages. The accessor function is defined in the scraper package (which already imports config).

type ScraperConfigFlattenFunc

type ScraperConfigFlattenFunc func(ScraperConfigInterface) any

ScraperConfigFlattenFunc converts a ScraperConfigInterface to *config.ScraperSettings. Uses any to break the import cycle since scraperutil cannot import config.

type ScraperConfigInterface

type ScraperConfigInterface interface {
	IsEnabled() bool
	GetUserAgent() string
	GetRequestDelay() int
	GetMaxRetries() int
	GetProxy() any         // Returns *config.ProxyConfig
	GetDownloadProxy() any // Returns *config.ProxyConfig
}

ScraperConfigInterface is implemented by scraper-specific config types (both from scraper packages and config package). This breaks the type dependency cycle and allows FlattenFunc to work across packages.

Implement this interface on config package types by adding wrapper methods. Scraper package types already have compatible fields.

type ScraperOptionsProvider

type ScraperOptionsProvider struct {
	DisplayName string // Human-readable name (e.g., "DMM/Fanza")
	Options     []any  // All configurable options for this scraper (API type-asserts to contracts.ScraperOption)
}

ScraperOptionsProvider holds a scraper's UI metadata for the API. Uses any to break import cycle - API layer handles type assertion to contracts.ScraperOption.

func GetScraperOptions

func GetScraperOptions(name string) (ScraperOptionsProvider, bool)

GetScraperOptions retrieves a scraper's options provider. Used by the API to dynamically fetch scraper metadata. Returns (provider, exists) where exists is false if scraper not registered.

type ValidatorFunc

type ValidatorFunc func(any) error

ValidatorFunc is a function that validates a scraper config. Uses any to break the import cycle between config and scraper packages. The validator function should type-assert the any to *config.ScraperSettings internally.

func GetValidator

func GetValidator(name string) ValidatorFunc

GetValidator returns the validator function for the named scraper.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL