Documentation
¶
Overview ¶
Package common provides HTML cleaning, form parsing, and other utility functions for spider rules.
Index ¶
- func CleanHtml(str string, depth int) string
- func ConvertToString(src string, srcCode string, tagCode string) string
- func DecodeString(src, charset string) string
- func Deprive(s string) string
- func Deprive2(s string) string
- func DepriveBreak(s string) string
- func DepriveMutiBreak(s string) string
- func EncodeString(src, charset string) string
- func ExtractArticle(html string) string
- func Floor(f float64, n int) float64
- func GBKToUTF8(src string) string
- func GetHref(baseURL string, url string, href string, mustBase bool) string
- func HrefSub(src string, sub string) string
- func MakeUrl(path string, schemeAndHost ...string) (string, bool)
- func Ping(address string, timeoutSecond int) result.Result[ping.PingResult]
- func Pinger(address string, timeoutSecond int) result.VoidResult
- func ProcessHtml(html string) string
- func SplitCookies(cookieStr string) (cookies []*http.Cookie)
- func Unicode16ToUTF8(str string) string
- func UnicodeToUTF8(str string) string
- type Form
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DecodeString ¶
func DepriveBreak ¶
DepriveBreak removes all line-break characters (both actual and literal escape sequences).
func DepriveMutiBreak ¶
DepriveMutiBreak collapses consecutive blank lines into a single newline.
func EncodeString ¶
func ExtractArticle ¶
ExtractArticle extracts the main article body from an HTML page. Heuristic: the parent of the tag with the longest text node is treated as the article body.
func GetHref ¶ added in v1.4.0
GetHref resolves a relative or absolute href against a base URL and current page URL.
func MakeUrl ¶
@SchemeAndHost https://www.baidu.com @path /search?w=x
func ProcessHtml ¶
ProcessHtml removes comments from an HTML string.
func SplitCookies ¶
SplitCookies parses a cookie string (e.g. "mt=ci%3D-1_0; thw=cn; v=0;") into []*http.Cookie.
func Unicode16ToUTF8 ¶
Unicode16ToUTF8 converts \uXXXX escape sequences in a string to UTF-8 characters.
func UnicodeToUTF8 ¶
UnicodeToUTF8 converts HTML numeric character references (e.g. "咖啡") to UTF-8.
Types ¶
type Form ¶
type Form struct {
// contains filtered or unexported fields
}
Form is the default form element.
func NewForm ¶
func NewForm(ctx *spider.Context, rule string, u string, form *goquery.Selection, schemeAndHost ...string) *Form
NewForm creates and returns a *Form type.