Documentation
¶
Index ¶
- func Collector(ctx context.Context, url string, projectPath string, cookieJar *cookiejar.Jar, ...) error
- func CollectorWithSizeLimit(ctx context.Context, url string, projectPath string, cookieJar *cookiejar.Jar, ...) error
- func Crawl(ctx context.Context, site string, projectPath string, cookieJar *cookiejar.Jar, ...) error
- func CrawlWithConfig(ctx context.Context, site string, projectPath string, cookieJar *cookiejar.Jar, ...) error
- func Extractor(link string, projectPath string)
- func HTMLExtractor(link string, projectPath string)
- func HTMLExtractorFromResponse(link string, projectPath string, bodyData []byte)
- type CrawlConfig
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Collector ¶
func Collector(ctx context.Context, url string, projectPath string, cookieJar *cookiejar.Jar, proxyString string, userAgent string) error
Collector searches for css, js, and images within a given link TODO improve for better performance
func CollectorWithSizeLimit ¶
func CollectorWithSizeLimit(ctx context.Context, url string, projectPath string, cookieJar *cookiejar.Jar, proxyString string, userAgent string, maxFolderSize int64) error
CollectorWithSizeLimit 带大小限制的收集器
func Crawl ¶
func Crawl(ctx context.Context, site string, projectPath string, cookieJar *cookiejar.Jar, proxyString string, userAgent string) error
Crawl asks the necessary crawlers for collecting links for building the web page
func CrawlWithConfig ¶
func CrawlWithConfig(ctx context.Context, site string, projectPath string, cookieJar *cookiejar.Jar, config CrawlConfig) error
CrawlWithConfig 使用配置对象进行爬取,支持大小检查
func Extractor ¶
Extractor visits a link determines if its a page or sublink downloads the contents to a correct directory in project folder TODO add functionality for determining if page or sublink
func HTMLExtractorFromResponse ¶
HTMLExtractorFromResponse 从colly响应中提取HTML内容
Types ¶
type CrawlConfig ¶
type CrawlConfig interface { GetProxyString() string GetUserAgent() string GetMaxFolderSize() int64 }
CrawlConfig 爬取配置接口
Click to show internal directories.
Click to hide internal directories.