Documentation
¶
Index ¶
- func CloseBrowser()
- func ClosePlaywright()
- func ExtractContentWithCSS(content, includeSelector string, excludeSelectors []string) (string, error)
- func FetchWebpageContent(urlStr string) (string, error)
- func InitBrowser() error
- func InitPlaywright() error
- func NormalizePathForFilename(urlPath string) string
- func ProcessHTMLContent(htmlContent string, config Config) (string, error)
- func SaveToFiles(content map[string]struct{ ... }, config Config) error
- func ScrapeSites(config Config) error
- func SetupLogger(verbose bool)
- type Config
- type PathOverride
- type ScrapeConfig
- type SiteConfig
- type URLConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ClosePlaywright ¶
func ClosePlaywright()
ClosePlaywright closes the browser and stops Playwright
func ExtractContentWithCSS ¶
func ExtractContentWithCSS(content, includeSelector string, excludeSelectors []string) (string, error)
ExtractContentWithCSS extracts content from HTML using a CSS selector
func FetchWebpageContent ¶
FetchWebpageContent retrieves the content of a webpage using Playwright
func InitPlaywright ¶
func InitPlaywright() error
InitPlaywright initializes Playwright and launches the browser
func NormalizePathForFilename ¶ added in v0.1.1
NormalizePathForFilename converts a URL path into a valid filename component
func ProcessHTMLContent ¶
ProcessHTMLContent converts HTML content to Markdown
func SaveToFiles ¶ added in v0.1.1
func SaveToFiles(content map[string]struct { content string site SiteConfig }, config Config) error
SaveToFiles writes the scraped content to files based on output type
func ScrapeSites ¶ added in v0.0.2
func SetupLogger ¶
func SetupLogger(verbose bool)
SetupLogger initializes the logger based on the verbose flag
Types ¶
type Config ¶
type Config struct {
Sites []SiteConfig
OutputType string
Verbose bool
Scrape ScrapeConfig
}
Config holds the scraper configuration
type PathOverride ¶ added in v0.0.2
PathOverride holds path-specific overrides
type ScrapeConfig ¶ added in v0.0.2
ScrapeConfig holds the scraping-specific configuration
type SiteConfig ¶ added in v0.0.2
type SiteConfig struct {
BaseURL string
CSSLocator string
ExcludeSelectors []string
AllowedPaths []string
ExcludePaths []string
FileNamePrefix string
PathOverrides []PathOverride
}
SiteConfig holds configuration for a single site