Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CalculateProperties ¶ added in v0.1.1
func CalculateProperties(raw, cleaned []rune, p *Properties)
CalculateProperties takes raw and cleaned values of a token and computes properties of these values, saving them into Properties object.
Types ¶
type Properties ¶
type Properties struct {
// HasStartParens token starts with '('.
HasStartParens bool
// HasEndParens token end with '('.
HasEndParens bool
// HasStartSqParens token starts with '['.
HasStartSqParens bool
// HasEndSqParens token ends with ']'.
HasEndSqParens bool
// HasEndDot token ends with '.'
HasEndDot bool
// HasEndComma token ends with ','
HasEndComma bool
// HasDigits token includes at least one '0-9'.
HasDigits bool
// HasLetters token includes at least one character for which
// unicode.IsLetter(ch) is true.
HasLetters bool
// HasDash token includes '-'
HasDash bool
// HasSpecialChars internal part of a token includes non-letters, non-digits.
HasSpecialChars bool
// IsCapitalized is true if the furst letter of a token is capitalized.
// The first letter does not have to be the first character.
IsCapitalized bool
// IsNumber internal part of a token has only numbers.
IsNumber bool
// IsWord internal part of a token includes only letters.
IsWord bool
}
Properties is a fixed set of general properties determined during the the text traversal.
type TokenJSON ¶
type TokenJSON struct {
Line int `json:"lineNumber"`
Raw string `json:"raw"`
Cleaned string `json:"cleaned"`
Start int `json:"start"`
End int `json:"end"`
}
TokenJSON provides a presentation view for a Token.
type TokenNER ¶ added in v0.1.1
type TokenNER interface {
// Raw is a verbatim presentation of a token as it appears in a text.
Raw() []rune
// Start is the index of the first rune of a token. The first rune
// does not have to be alpha-numeric.
Start() int
// End is the index of the last rune of a token. The last rune does not
// have to be alpha-numeric.
End() int
// Line line number in the text
Line() int
// SetLine sets the line number
SetLine(int)
// Cleaned is a presentation of a token after normalization.
Cleaned() string
// SetCleaned substitues existing cleaned text with a new one.
SetCleaned(string)
// Properties is a fixed set of general properties that we determine during
// the text traversal.
Properties() *Properties
// SetProperties substitutes existing properties with new ones.
SetProperties(*Properties)
// ProcessRaw computes a clean version of a name as well as properties
// of the token.
ProcessRaw()
// ToJSON converts TokenNER object into JSON represenation
ToJSON() ([]byte, error)
}
TokenNER represents a word separated by spaces in a text. Words split by new lines are concatenated.
Click to show internal directories.
Click to hide internal directories.