Documentation
¶
Overview ¶
Package pagser is a simple, easy, extensible, configurable HTML parser to struct based on goquery and struct tags, It's parser library from scrago(https://github.com/foolin/scrago).
Features ¶
* Simple - Use golang struct tag syntax.
* Easy - Easy use for your spider/crawler/colly application.
* Extensible - Support for extension functions.
* Struct tag grammar - Grammar is simple, like \`pagser:"a->attr(href)"\`.
* Nested Structure - Support Nested Structure for node.
* Configurable - Support configuration.
* GoQuery/Colly - Support all goquery project, such as go-colly.
More info: https://github.com/foolin/pagser
Index ¶
- type CallFunc
- type Config
- type Pagser
- func (p *Pagser) Parse(v interface{}, document string) (err error)
- func (p *Pagser) ParseDocument(v interface{}, document *goquery.Document) (err error)
- func (p *Pagser) ParseReader(v interface{}, reader io.Reader) (err error)
- func (p *Pagser) ParseSelection(v interface{}, selection *goquery.Selection) (err error)
- func (p *Pagser) RegisterFunc(name string, fn CallFunc) error
- type Tager
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CallFunc ¶
CallFunc write function interface
Builtin Functions ¶
- html() get inner html.
- outerHtml() get outer html.
- text() get inner text.
- attr(name) get element attribute value.
- attrInt(name, defaultValue) get element attribute value and to int.
- value() get element attribute value by name is `value`, eg: <input value='xxxx' />.
- split(sep) get text and split by separator to array string.
- attrSplit(name, sep) get attribute value and split by separator to array string.
- join() get match selector element text and join to string.
Define Global Function ¶
func MyFunc(node *goquery.Selection, args ...string) (out interface{}, err error) {
//Todo
return "Hello", nil
}
//Register function
pagser.RegisterFunc("MyFunc", MyFunc)
//Use function
type MyStruct struct{
Text string `pagser:"h1->MyFunc()"`
}
Define Struct Function ¶
func MyFunc(node *goquery.Selection, args ...string) (out interface{}, err error) {
//Todo
return "Hello", nil
}
//Register function
pagser.RegisterFunc("MyFunc", MyFunc)
//Use function
type MyStruct struct{
Text string `pagser:"h1->MyFunc()"`
}
Define your own function
type Config ¶
type Config struct {
TagerName string //struct tag name, default is `pagser`
FuncSymbol string //Function symbol, default is `->`
IgnoreSymbol string //Ignore symbol, default is `-`
Debug bool //Debug mode, debug will print some log, default is `false`
}
Config configuration
type Pagser ¶
type Pagser struct {
// contains filtered or unexported fields
}
Pagser the page parser
func MustNewWithConfig ¶
MustNewWithConfig create client with config
Example ¶
cfg := Config{
TagerName: "pagser",
FuncSymbol: "->",
IgnoreSymbol: "-",
Debug: false,
}
p, err := NewWithConfig(cfg)
if err != nil {
log.Fatal(err)
}
//data parser model
var page ExampPage
//parse html data
err = p.Parse(&page, rawPageHtml)
//check error
if err != nil {
log.Fatal(err)
}
func NewWithConfig ¶
NewWithConfig create client with config and error
func (*Pagser) Parse ¶
Parse parse html to struct
Example ¶
//New default config
p := New()
//data parser model
var page ExampPage
//parse html data
err := p.Parse(&page, rawPageHtml)
//check error
if err != nil {
log.Fatal(err)
}
log.Printf("%v", page)
func (*Pagser) ParseDocument ¶
ParseDocument parse document to struct
func (*Pagser) ParseReader ¶ added in v0.0.3
Parse parse html to struct
Example ¶
resp, err := http.Get("https://raw.githubusercontent.com/foolin/pagser/master/_examples/pages/demo.html")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
//New default config
p := New()
//data parser model
var page ExampPage
//parse html data
err = p.ParseReader(&page, resp.Body)
//check error
if err != nil {
panic(err)
}
log.Printf("%v", page)
func (*Pagser) ParseSelection ¶
ParseSelection parse selection to struct
