Documentation
¶
Overview ¶
Package storage provides access to annotation data for form classification training.
Index ¶
- func GetDomain(rawURL string) string
- type AnnotationSchema
- type FormAnnotation
- type IterOptions
- type PageAnnotation
- type PageStorage
- type Storage
- func (s *Storage) GetConfig() (*configJSON, error)
- func (s *Storage) GetFieldSchema() (*AnnotationSchema, error)
- func (s *Storage) GetFormSchema() (*AnnotationSchema, error)
- func (s *Storage) GetIndex() (map[string]indexEntry, error)
- func (s *Storage) IterAnnotations(opts IterOptions) ([]FormAnnotation, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type AnnotationSchema ¶
type AnnotationSchema struct {
Types map[string]string // full_name -> short_name
TypesInv map[string]string // short_name -> full_name
NAValue string
SkipValue string
SimplifyMap map[string]string
}
AnnotationSchema holds the types and their mappings for form or field annotations.
type FormAnnotation ¶
type FormAnnotation struct {
FormHTML string
URL string
Type string // short form type
TypeFull string // full form type
FormIndex int // index of form on the page
FieldTypes map[string]string // field_name -> short_type
FieldTypesFull map[string]string // field_name -> full_type
FormSchema *AnnotationSchema
FieldSchema *AnnotationSchema
// Computed
FormAnnotated bool
FieldsAnnotated bool
}
FormAnnotation represents a single annotated form.
type IterOptions ¶
type IterOptions struct {
DropDuplicates bool
DropNA bool
DropSkipped bool
SimplifyFormTypes bool
SimplifyFieldTypes bool
Verbose bool
}
IterOptions controls annotation iteration behavior.
func DefaultIterOptions ¶
func DefaultIterOptions() IterOptions
DefaultIterOptions returns the default options for iterating annotations.
type PageAnnotation ¶ added in v0.0.3
type PageAnnotation struct {
HTML string
URL string
Type string // short page type
TypeFull string // full page type
}
PageAnnotation represents a single annotated page.
type PageStorage ¶ added in v0.0.3
type PageStorage struct {
Folder string
}
PageStorage wraps the page annotation data folder.
func NewPageStorage ¶ added in v0.0.3
func NewPageStorage(folder string) *PageStorage
NewPageStorage creates a PageStorage for the given data folder.
func (*PageStorage) GetPageIndex ¶ added in v0.0.3
func (s *PageStorage) GetPageIndex() (map[string]pageIndexEntry, error)
GetPageIndex reads the page index file.
func (*PageStorage) GetPageSchema ¶ added in v0.0.3
func (s *PageStorage) GetPageSchema() (*AnnotationSchema, error)
GetPageSchema reads the page type schema from config.json.
func (*PageStorage) IterPageAnnotations ¶ added in v0.0.3
func (s *PageStorage) IterPageAnnotations(opts IterOptions) ([]PageAnnotation, error)
IterPageAnnotations yields PageAnnotation objects from the storage.
type Storage ¶
type Storage struct {
Folder string
}
Storage wraps the annotation data folder.
func NewStorage ¶
NewStorage creates a Storage for the given data folder.
func (*Storage) GetFieldSchema ¶
func (s *Storage) GetFieldSchema() (*AnnotationSchema, error)
GetFieldSchema returns the field annotation schema.
func (*Storage) GetFormSchema ¶
func (s *Storage) GetFormSchema() (*AnnotationSchema, error)
GetFormSchema returns the form annotation schema.
func (*Storage) IterAnnotations ¶
func (s *Storage) IterAnnotations(opts IterOptions) ([]FormAnnotation, error)
IterAnnotations yields FormAnnotation objects from the storage.