Documentation
¶
Index ¶
- type SiteMapper
- type SiteMapperOptions
- func (options *SiteMapperOptions) SetCallbackFunction(callback func(*SiteMapper))
- func (options *SiteMapperOptions) SetCrawlInterval(interval time.Duration) error
- func (options *SiteMapperOptions) SetDomain(domain string) error
- func (options *SiteMapperOptions) SetDurationBeforeFirstCrawl(duration time.Duration) error
- func (options *SiteMapperOptions) SetErrorLogger(logger func(error))
- func (options *SiteMapperOptions) SetInfoLogger(logger func(string))
- func (options *SiteMapperOptions) SetLinkAttributes(attributes ...string) error
- func (options *SiteMapperOptions) SetStartingURL(urlPath string) error
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type SiteMapper ¶
type SiteMapper struct {
// contains filtered or unexported fields
}
SiteMapper is responsible for managing the crawling and sitemap generation for a specified domain. It schedules periodic crawls and allows manual recrawling.
SiteMapper uses a crawler to traverse the site and build the sitemap based on the configuration options.
func NewSiteMapper ¶
func NewSiteMapper(options *SiteMapperOptions) *SiteMapper
NewSiteMapper initializes and returns a new SiteMapper instance configured with the provided options.
The SiteMapper starts its first crawl after the delay specified in `options.durationBeforeFirstCrawl` and subsequently recrawls the site at intervals defined by `options.crawlInterval`.
Parameters:
options *sitemapper.SiteMapperOptions // Configuration options for the SiteMapper instance.
Returns:
*sitemapper.SiteMapper // A new SiteMapper instance.
func (*SiteMapper) EmptySitemapXML ¶
func (mapper *SiteMapper) EmptySitemapXML(baseDomain string) string
func (*SiteMapper) GenerateSitemap ¶
func (mapper *SiteMapper) GenerateSitemap(baseDomain string, filterPattern string) (string, error)
func (*SiteMapper) RecrawlSite ¶
func (mapper *SiteMapper) RecrawlSite()
RecrawlSite triggers a manual recrawl of the site, bypassing the scheduled interval.
type SiteMapperOptions ¶
type SiteMapperOptions struct {
// contains filtered or unexported fields
}
SiteMapperOptions defines the configuration options for SiteMapper.
These options allow customization of crawling behavior, logging, and site-specific details.
func DefaultOptions ¶
func DefaultOptions() *SiteMapperOptions
DefaultOptions creates an instance of SiteMapperOptions with pre-defined default values.
- Domain defaults to "http://localhost:8080".
- Duration Before First Crawl defaults to 3 seconds.
- Crawl Interval defaults to one week.
- Starting URL defaults to "/".
- Link Attributes defaults to an empty list.
- Logging functions are empty by default and can be set later.
- Callback function is empty by default and can be set later.
func (*SiteMapperOptions) SetCallbackFunction ¶ added in v1.1.0
func (options *SiteMapperOptions) SetCallbackFunction(callback func(*SiteMapper))
SetCallbackFunction assigns a callback function that will be called after each website crawl.
options.SetCallbackFunction(func(mapper *SiteMapper) {
sitemapURL := "https://example.com/sitemap.xml"
googlePingURL := "https://www.google.com/ping?sitemap=" + url.QueryEscape(sitemapURL)
resp, err := http.Get(googlePingURL)
if err != nil {
fmt.Println("Error sending request:", err)
return
}
defer resp.Body.Close()
fmt.Println("Google Sitemap Ping Response:", resp.Status)
})
func (*SiteMapperOptions) SetCrawlInterval ¶
func (options *SiteMapperOptions) SetCrawlInterval(interval time.Duration) error
SetCrawlInterval sets the interval for recrawling the site and updating the sitemap. Example:
options.SetCrawlInterval(time.Hour * 24) // for daily crawling.
func (*SiteMapperOptions) SetDomain ¶
func (options *SiteMapperOptions) SetDomain(domain string) error
SetDomain updates the domain name of the site to crawl.
Only domains with "http" or "https" schemes are allowed, and no relative path should be included.
func (*SiteMapperOptions) SetDurationBeforeFirstCrawl ¶
func (options *SiteMapperOptions) SetDurationBeforeFirstCrawl(duration time.Duration) error
SetDurationBeforeFirstCrawl updates the time delay before the initial crawl occurs. This is useful to control when the first crawl starts after initialization.
func (*SiteMapperOptions) SetErrorLogger ¶
func (options *SiteMapperOptions) SetErrorLogger(logger func(error))
SetErrorLogger assigns a logging function to handle error messages. Example:
options.SetErrorLogger(func(err error) {
log.Println(err.Error())
})
func (*SiteMapperOptions) SetInfoLogger ¶
func (options *SiteMapperOptions) SetInfoLogger(logger func(string))
SetInfoLogger assigns a logging function to handle informational messages. Example:
options.SetInfoLogger(func(msg string) {
log.Println(msg)
})
func (*SiteMapperOptions) SetLinkAttributes ¶
func (options *SiteMapperOptions) SetLinkAttributes(attributes ...string) error
SetLinkAttributes specifies which HTML attributes the crawler should inspect for URLs. For example:
options.SetLinkAttributes("hx-get", "src")
func (*SiteMapperOptions) SetStartingURL ¶
func (options *SiteMapperOptions) SetStartingURL(urlPath string) error
SetStartingURL sets the URL where the crawler begins its process.
Only relative paths (e.g., "/path") are allowed.