Documentation
¶
Overview ¶
Package seltabl provides a simple way to parse html tables into structs.
Index ¶
- func New[T any](doc *goquery.Document) ([]T, error)
- func NewCh[T any](doc *goquery.Document, ch chan T) error
- func NewChFn[T any, F func(T) bool](doc *goquery.Document, ch chan T, fn F) error
- func NewChFnErr[T any, F func(T) bool](doc *goquery.Document, ch chan T, fn F) error
- func NewFromBytes[T any](b []byte) ([]T, error)
- func NewFromBytesCh[T any](b []byte, ch chan T) error
- func NewFromBytesChFn[T any, F func(T) bool](b []byte, ch chan T, fn F) error
- func NewFromReader[T any](r io.Reader) ([]T, error)
- func NewFromReaderCh[T any](r io.Reader, ch chan T) error
- func NewFromReaderChFn[T any, F func(T) bool](r io.Reader, ch chan T, fn F) error
- func NewFromString[T any](htmlInput string) ([]T, error)
- func NewFromStringCh[T any](htmlInput string, ch chan T) error
- func NewFromStringChFn[T any, F func(T) bool](htmlInput string, ch chan T, fn F) error
- func NewFromStringChFnErr[T any](htmlInput string, ch chan T, fn func(T) bool) error
- func NewFromURL[T any](url string) ([]T, error)
- func NewFromURLCh[T any](url string, ch chan T) error
- func SetStructField[T any](structPtr *T, structField reflect.StructField, cellValue *goquery.Selection, ...) error
- type Decoder
- type ErrNoDataFound
- type ErrParsing
- type ErrSelectorNotFound
- type SelectorConfig
- type SelectorI
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func New ¶
New parses a goquery doc into a slice of structs.
The struct given as an argument must have a field with the tag seltabl, a header selector with the tag hSel, and a data selector with the tag dSel.
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
var fixture = `
<table>
<tr> <td>a</td> <td>b</td> </tr>
<tr> <td>1</td> <td>2</td> </tr>
<tr> <td>3</td> <td>4</td> </tr>
<tr> <td>5</td> <td>6</td> </tr>
<tr> <td>7</td> <td>8</td> </tr>
</table>
`
type FixtureStruct struct {
A string `json:"a" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.New[fixtureStruct](fixture)
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewCh ¶ added in v0.7.6
NewCh parses a goquery doc into a slice of structs delivered to a channel.
It parse the html for each slice of the structs.
The struct given as an argument must have a field with the tag seltabl, a header selector with the tag hSel, and a data selector with the tag key dSel.
func NewChFn ¶ added in v0.7.7
NewChFn parses a reader into a channel of structs.
It also applies a function to each struct before adding it to the channel.
func NewChFnErr ¶ added in v0.9.6
NewChFnErr parses a reader into a channel of structs.
It also applies a function to each struct before adding it to the channel.
It ignores errors per row selected.
func NewFromBytes ¶ added in v0.7.4
NewFromBytes parses a byte slice into a slice of structs adhering to the given generic type.
The byte slice must be a valid html page with a single table.
The passed in generic type must be a struct with valid selectors for the table and data (hSel, dSel, cSel).
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
import (
"fmt"
"github.com/conneroisu/seltabl"
)
type FixtureStruct struct {
A string `json:"a" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.NewFromBytes[TableStruct]([]byte(`
<table>
<tr> <td>a</td> <td>b</td> </tr>
<tr> <td>1</td> <td>2</td> </tr>
<tr> <td>3</td> <td>4</td> </tr>
<tr> <td>5</td> <td>6</td> </tr>
<tr> <td>7</td> <td>8</td> </tr>
</table>
`))
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewFromBytesCh ¶ added in v0.7.6
NewFromBytesCh parses a byte slice into a slice of structs adhering to the given generic type.
func NewFromBytesChFn ¶ added in v0.7.7
NewFromBytesChFn parses a byte slice into a channel of structs. It also applies a function to each struct before adding it to the channel.
func NewFromReader ¶
NewFromReader parses a reader into a slice of structs.
The reader must be a valid html page with a single table.
The passed in generic type must be a struct with valid selectors for the table and data (hSel, dSel, cSel).
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
import (
"fmt"
"github.com/conneroisu/seltabl"
)
type TableStruct struct {
A string `json:"a" hSel:"tr:nth-child(1) td:nth-child(1)" dSel:"tr td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1) td:nth-child(2)" dSel:"tr td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.NewFromReader[TableStruct](strings.NewReader(`
<table>
<tr> <td>a</td> <td>b</td> </tr>
<tr> <td>1</td> <td>2</td> </tr>
<tr> <td>3</td> <td>4</td> </tr>
<tr> <td>5</td> <td>6</td> </tr>
<tr> <td>7</td> <td>8</td> </tr>
</table>
`))
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewFromReaderCh ¶ added in v0.7.6
NewFromReaderCh parses a reader into a slice of structs.
func NewFromReaderChFn ¶ added in v0.7.7
NewFromReaderChFn parses a reader into a channel of structs. It also applies a function to each struct before adding it to the channel.
func NewFromString ¶
NewFromString parses a string into a slice of structs.
The struct must have a field with the tag seltabl, a header selector with the tag hSel, and a data selector with the tag dSel.
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
import (
"fmt"
"github.com/conneroisu/seltabl"
)
type FixtureStruct struct {
A string `json:"a" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.NewFromString[TableStruct](`
<table>
<tr> <td>a</td> <td>b</td> </tr>
<tr> <td>1</td> <td>2</td> </tr>
<tr> <td>3</td> <td>4</td> </tr>
<tr> <td>5</td> <td>6</td> </tr>
<tr> <td>7</td> <td>8</td> </tr>
</table>
`)
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewFromStringCh ¶ added in v0.7.6
NewFromStringCh parses a string into a slice of structs.
func NewFromStringChFn ¶ added in v0.7.7
NewFromStringChFn parses a string into a channel of structs. It also applies a function to each struct before adding it to the channel.
func NewFromStringChFnErr ¶ added in v0.9.7
NewFromStringChFnErr parses a string into a channel of structs.
It also applies a function to each struct before adding it to the channel.
It ignores errors per row selected.
func NewFromURL ¶
NewFromURL parses a given URL's html into a slice of structs adhering to the given generic type.
The URL must be a valid html page with a single table.
The passed in generic type must be a struct with valid selectors for the table and data (hSel, dSel, cSel).
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
import (
"fmt"
"github.com/conneroisu/seltabl"
)
type FixtureStruct struct {
A string `json:"a" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.NewFromURL[TableStruct]("https://github.com/conneroisu/seltabl/blob/main/testdata/ab_num_table.html")
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewFromURLCh ¶ added in v0.7.7
NewFromURLCh parses a given URL's html into a slice of structs adhering to the given generic type.
The URL must be a valid html page with a single table.
The passed in generic type must be a struct with valid selectors for the table and data (hSel, dSel, cSel).
The selectors responsibilities:
- header selector (hSel): used to find the header row and column for the field in the given struct.
- data selector (dSel): used to find the data column for the field in the given struct.
- query selector (qSel): used to query for the inner text or attribute of the cell.
- control selector (cSel): used to control what to query for the inner text or attribute of the cell.
Example:
package main
import (
"fmt"
"github.com/conneroisu/seltabl"
)
type FixtureStruct struct {
A string `json:"a" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(1)" cSel:"$text"`
B string `json:"b" hSel:"tr:nth-child(1)" dSel:"table tr:not(:first-child) td:nth-child(2)" cSel:"$text"`
}
func main() {
p, err := seltabl.NewFromURLCh[TableStruct]("https://github.com/conneroisu/seltabl/blob/main/testdata/ab_num_table.html", ch)
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func SetStructField ¶
func SetStructField[T any]( structPtr *T, structField reflect.StructField, cellValue *goquery.Selection, selector SelectorI, ) error
SetStructField sets a struct field to a value. It uses generics to specify the type of the struct and the field name. It also uses the selector interface to find the value and uses the type of the selector to parse and set the value.
It is used by the NewFromString function.
Types ¶
type Decoder ¶
type Decoder[T any] struct { // contains filtered or unexported fields }
Decoder is a struct for decoding a reader into a slice of structs.
It is used by the NewDecoder function.
It is not intended to be used directly.
Example:
type TableStruct struct {
A string `json:"a" seltabl:"a" hSel:"tr:nth-child(1) td:nth-child(1)" dSel:"tr td:nth-child(1)" cSel:"$text"`
B string `json:"b" seltabl:"b" hSel:"tr:nth-child(1) td:nth-child(2)" dSel:"tr td:nth-child(2)" cSel:"$text"`
}
func main() {
r := strings.NewReader(`
<table>
<tr>
<td>a</td>
<td>b</td>
</tr>
<tr>
<td> 1 </td>
<td>2</td>
</tr>
<tr>
<td>3 </td>
<td> 4</td>
</tr>
<tr>
<td> 5 </td>
<td> 6</td>
</tr>
<tr>
<td>7 </td>
<td> 8</td>
</tr>
</table>
`)
p, err := seltabl.NewDecoder[TableStruct](r)
if err != nil {
panic(err)
}
for _, pp := range p {
fmt.Printf("pp %+v\n", pp)
}
}
func NewDecoder ¶
func NewDecoder[T any](r io.ReadCloser) *Decoder[T]
NewDecoder parses a reader into a slice of structs.
It is used by the NewFromReader function.
This allows for decoding a reader into a slice of structs.
Similar to the json.Decoder for brevity.
type ErrNoDataFound ¶ added in v0.5.1
type ErrNoDataFound struct {
Typ reflect.Type
Field reflect.StructField
Cfg *SelectorConfig
}
ErrNoDataFound is an error for when no data is found for a selector
func (ErrNoDataFound) Error ¶ added in v0.5.1
func (e ErrNoDataFound) Error() string
Error implements the error interface for ErrNoDataFound
type ErrParsing ¶ added in v0.6.3
ErrParsing is returned when a field's value cannot be parsed.
func (ErrParsing) Error ¶ added in v0.6.3
func (e ErrParsing) Error() string
Error returns the error message. It implements the error interface.
type ErrSelectorNotFound ¶ added in v0.5.1
type ErrSelectorNotFound struct {
Typ reflect.Type // type of the struct
Field reflect.StructField // field of the struct
Cfg *SelectorConfig // selector config
}
ErrSelectorNotFound is an error for when a selector is not found
func (ErrSelectorNotFound) Error ¶ added in v0.5.1
func (e ErrSelectorNotFound) Error() string
Error implements the error interface for ErrSelectorNotFound
type SelectorConfig ¶ added in v0.2.9
type SelectorConfig struct {
DataSelector string // selector for the data cell
HeadSelector string // selector for the header cell
QuerySelector string // selector for the data cell
ControlTag string // tag used to signify selecting aspects of a cell
}
SelectorConfig is a struct for configuring a selector
func NewSelectorConfig ¶ added in v0.2.9
func NewSelectorConfig(tag reflect.StructTag) *SelectorConfig
NewSelectorConfig parses a struct tag and returns a SelectorConfig
Directories
¶
| Path | Synopsis |
|---|---|
|
examples
|
|
|
huggingface-leader-board
command
Package main shows how to use the seltabl package to scrape a table from a given url.
|
Package main shows how to use the seltabl package to scrape a table from a given url. |
|
ncaa
command
Package main is the an example of how to use the seltabl package.
|
Package main is the an example of how to use the seltabl package. |
|
penguins-wikipedia
command
Package main is the an example of how to use the seltabl package.
|
Package main is the an example of how to use the seltabl package. |
|
tools
|
|
|
seltabl-lsp
module
|
|
|
seltabls
module
|