Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AntiCrawl ¶
type AntiCrawl struct {
// contains filtered or unexported fields
}
AntiCrawl is the anti-crawl strategy manager 反爬策略管理器 Encapsulates anti-crawl mechanisms such as rate limiting, random request headers, delay control 封装速率限制、随机请求头、延迟控制等反爬机制
func NewAntiCrawl ¶
func NewAntiCrawl(cfg *config.SteamConfig) *AntiCrawl
NewAntiCrawl creates anti-crawl strategy instance 创建反爬策略实例
type Parser ¶
type Parser struct{}
Parser is the HTML parser | HTML 解析器 Encapsulates goquery parsing logic and provides universal HTML parsing and text cleaning capabilities 封装 goquery 解析逻辑, 提供通用的 HTML 解析和文本清理能力
func (*Parser) CleanText ¶
CleanText 清理解析后的文本 去除首尾空格和换行符, 提升文本可读性 参数:
- text: 原始解析文本 | Raw parsed text
返回值:
- string: 清理后的文本 | Cleaned text
type ProxyRotator ¶
type ProxyRotator struct {
// contains filtered or unexported fields
}
ProxyRotator 代理轮换管理器 支持轮询/随机两种代理选择策略, 提供动态代理池管理能力 ProxyRotator is the proxy rotation manager Supports round-robin/random proxy selection strategies and provides dynamic proxy pool management
func NewProxyRotator ¶
func NewProxyRotator(cfg *config.SteamConfig) *ProxyRotator
NewProxyRotator 创建代理轮换实例 参数:
- cfg: 全局配置实例 | Global configuration instance
返回值:
- *ProxyRotator: 代理轮换实例 | Proxy rotation instance
func (*ProxyRotator) AddProxy ¶
func (p *ProxyRotator) AddProxy(proxyAddr string)
AddProxy 动态添加代理 参数:
- proxyAddr: 代理地址 | Proxy address
func (*ProxyRotator) GetProxyFunc ¶
GetProxyFunc 返回 Colly 兼容的 ProxyFunc 每次请求调用时自动选择代理, 支持失败兜底 返回值:
- func(r *http.Request) (*url.URL, error): Colly 代理函数 | Colly proxy function
func (*ProxyRotator) Pool ¶
func (p *ProxyRotator) Pool() []string
Pool 返回当前代理池 返回值:
- []string: 代理地址列表 | Proxy address list
func (*ProxyRotator) RemoveProxy ¶
func (p *ProxyRotator) RemoveProxy(proxyAddr string)
RemoveProxy 动态移除代理 参数:
- proxyAddr: 代理地址 | Proxy address
type Storage ¶
type Storage struct {
// contains filtered or unexported fields
}
Storage HTML 存储管理器 按日期分目录存储爬取的 HTML 文件, 确保文件组织规范 Storage is the HTML storage manager Stores crawled HTML files by date directory to ensure standardized file organization
func NewStorage ¶
NewStorage 创建存储管理器实例 参数:
- baseDir: 基础存储目录 | Base storage directory
返回值:
- *Storage: 存储管理器实例 | Storage manager instance