Documentation
¶
Overview ¶
Package token deals with breaking a text into tokens. It cleans names broken by new lines, concatenating pieces together. Tokens are connected to properties. Properties are used for heuristic and Bayes' approaches for finding names.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewTokenSN ¶
NewTokenSN is a factory and a wrapper. It takes gner.TokenNER object and wraps into TokenSN interface.
func SetIndices ¶
func SetIndices(ts []TokenSN, d *dict.Dictionary)
SetIndices takes a slice of tokens that correspond to a name candidate. It analyses the tokens and sets Token.Indices according to feasibility of the input tokens to form a scientific name. It checks if there is a possible species, ranks, and infraspecies.
func UpperIndex ¶
UpperIndex takes an index of a token and length of the tokens slice and returns an upper index of what could be a slice of a name. We expect that that most of the names will fit into 5 words. Other cases would require more thorough algorithims that we can run later as plugins.
Types ¶
type Decision ¶
type Decision int
Decision definds possible kinds of name candidates.
const ( NotName Decision = iota Uninomial PossibleUninomial Binomial PossibleBinomial Trinomial BayesUninomial BayesBinomial BayesTrinomial )
Possible Decisions
func (Decision) Cardinality ¶
Cardinality returns number of elements in canonical form of a scientific name. If name is uninomial 1 is returned, for binomial 2, for trinomial 3.
type Features ¶
type Features struct {
// IsCapitalized is true if the first rune that is letter, is capitalized.
IsCapitalized bool
// HasDash is true if token contains dash
HasDash bool
// HasStartParens is true if token start with '('
HasStartParens bool
// HasEndParens is true if token ends with ')'
HasEndParens bool
// Abbr feature: token ends with a period.
Abbr bool
// PotentialBinomialGenus feature: the token might be a genus of name.
PotentialBinomialGenus bool
// StartsWithLetter feature: the token has necessary qualities to be a start
// of a binomial species. It assumes to be low-case and be two letters or
// more.
StartsWithLetter bool
// EndsWithLetter feature: the token has necessary quality to be a species
// part of trinomial.
EndsWithLetter bool
// RankLike is true if token is a known infraspecific rank
RankLike bool
// UninomialDict defines which Genera or Uninomials dictionary (if any)
// contained the token.
UninomialDict dict.DictionaryType
// SpeciesDict defines which Species dictionary (if any) contained the token.
SpeciesDict dict.DictionaryType
// GenSpInAmbigDict shows how many specific/infraspecific epithets of a putative
// name matched bi-/tri- nomials in a full name dictionary for grey genera.
// For example "Bubo bubo" name would set it to 1, and "Bubo bubo bubo" would
// set it to 2.
GenSpInAmbigDict int
}
Features keep properties of a token as a possible candidate for a name part.
func (*Features) SetSpeciesDict ¶
func (p *Features) SetSpeciesDict(cleaned string, d *dict.Dictionary)
func (*Features) SetUninomialDict ¶
func (p *Features) SetUninomialDict(cleaned string, d *dict.Dictionary)
type NLP ¶
type NLP struct {
// Odds are posterior odds.
Odds float64
// ClassCases is used to calculate prior odds of names appearing in a
// document.
ClassCases map[feature.Class]int
// OddsDetails are used for calculating final odds for detected names and
// for displaying results in the output
boutput.OddsDetails
}
NLP collects data received from Bayes' algorithm