preprocess

package
v1.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 4, 2021 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package preprocess performs preparsing filtering and modification of a scientific-name.

Index

Constants

This section is empty.

Variables

View Source
var AnnotationException = map[string]string{
	"Acrostichum":      "nudum",
	"Adiantum":         "nudum",
	"Africanthion":     "nudum",
	"Agathidium":       "nudum",
	"Aphaniosoma":      "nudum",
	"Aspidium":         "nudum",
	"Athyrium":         "nudum",
	"Blechnum":         "nudum",
	"Bottaria":         "nudum",
	"Gnathopleustes":   "den",
	"Lycopodium":       "nudum",
	"Nephrodium":       "nudum",
	"Paralvinella":     "dela",
	"Polypodium":       "nudum",
	"Polystichum":      "nudum",
	"Psilotum":         "nudum",
	"Ruteloryctes":     "bis",
	"Selenops":         "ab",
	"Tortolena":        "dela",
	"Trachyphloeosoma": "nudum",
	"Zodarion":         "van",
}
View Source
var NoParseException = map[string]string{
	"Navicula": "bacterium",
}
View Source
var VirusException = map[string]string{
	"Aspilota":      "vector",
	"Bembidion":     "satellites",
	"Bolivina":      "prion",
	"Ceylonesmus":   "vector",
	"Cryptops":      "vector",
	"Culex":         "vector",
	"Dasyproctus":   "cevirus",
	"Desmoxytes":    "vector",
	"Dicathais":     "vector",
	"Erateina":      "satellites",
	"Euragallia":    "prion",
	"Exochus":       "virus",
	"Hilara":        "vector",
	"Ithomeis":      "satellites",
	"Microgoneplax": "prion",
	"Neoaemula":     "vector",
	"Nephodia":      "satellites",
	"Ophion":        "virus",
	"Psenulus":      "trevirus",
	"Tidabius":      "vector",
}

Functions

func Annotation

func Annotation(bs []byte) int

Annotation returns index where unparsed part starts. In case if the full string can be parsed, returns returns the index of the end of the input.

func CleanupStream

func CleanupStream(in <-chan string, out chan<- *CleanupResult, wn int)

CleanupStream takes input and output string channels, and feeds output with pipe delimited strings with original name on the left and cleaned up name on the right from the pipe.

func IsException added in v1.3.3

func IsException(name string, names map[string]string) bool

func IsVirus

func IsVirus(data []byte) bool

func NoParse

func NoParse(data []byte) bool

func StripTags

func StripTags(s string) string

StripTags takes a slice of bytes and returns a string with common tags removed and html entities escaped. It does keep all uncommon tags intact to let parser deal with them.

func UnderscoreToSpace

func UnderscoreToSpace(bs []byte) (bool, error)

UnderscoreToSpace takes a slice of bytes. If it finds that the string contains underscores, but not spaces, it substitutes underscores to spaces in the slice. In case if any spaces are present, the slice is returned unmodified.

Types

type CleanupResult

type CleanupResult struct {
	// Input is the original name.
	Input string
	// Output is the name after the tag removal.
	Output string
}

CleanupResult keeps results of removal of some HTML tags.

type Preprocessor

type Preprocessor struct {
	Virus       bool
	Underscore  bool
	NoParse     bool
	Approximate bool
	Annotation  bool
	Body        []byte
	Tail        []byte
}

Preprocessor structure keeps state of the preprocessor results.

func Preprocess

func Preprocess(bs []byte) *Preprocessor

Preprocess runs a series of regular expressions over the input to determine features of the input before parsing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL