token

package

v0.0.1 Latest Latest Go to latest Published: Oct 29, 2020 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gnames/gner

Links

Open Source Insights

Documentation ¶

Index ¶

type Feature
type Properties
type Token
- func NewToken(raw []rune, start int, end int, feat ...Feature) Token
- func Tokenize(text []rune) []Token
- func (t *Token) ToJson() ([]byte, error)
type TokenJSON

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Feature ¶

type Feature interface {
	Analyse(token *Token)
	Value(obj interface{})
	String() string
}

type Properties ¶

type Properties struct {
	// HasStartParens token starts with '('.
	HasStartParens bool

	// HasEndParens token end with '('.
	HasEndParens bool

	// HasStartSqParens token starts with '['.
	HasStartSqParens bool

	// HasEndSqParens token end with ']'.
	HasEndSqParens bool

	// HasEndDot token ends with '.'
	HasEndDot bool

	// HasEndComma token ends with ','
	HasEndComma bool

	// HasDigits token includes at least one '0-9'.
	HasDigits bool

	// HasLetters token includes at least one character for which
	// unicode.IsLetter(ch) is true.
	HasLetters bool

	// HasDash token includes '-'
	HasDash bool

	// HasSpecialChars internal part of a token includes non-letters, non-digits.
	HasSpecialChars bool

	// IsNumber internal part of a token has only numbers.
	IsNumber bool

	// IsWord internal part of a token includes only letters.
	IsWord bool
}

Properties is a fixed set of general properties determined durint the the text traversal.

type Token ¶

type Token struct {
	// Line line number in the text
	Line int

	// Raw is a verbatim presentation of a token as it appears in a text.
	Raw []rune

	// Start is the index of the first rune of a token. The first rune
	// does not have to be alpha-numeric.
	Start int

	// End is the index of the last rune of a token. The last rune does not
	// have to be alpha-numeric.
	End int

	// Cleaned is a presentation of a token after normalization.
	Cleaned string

	// Properties is a fixed set of general properties that we determine during
	// the text traversal.
	Properties

	// Features is the map of features as values with their string
	// representations as keys.
	Features map[string]Feature
	// contains filtered or unexported fields
}

Token represents a word separated by spaces in a text. Words split by new lines are concatenated.

func NewToken ¶

func NewToken(raw []rune, start int, end int, feat ...Feature) Token

NewToken constructs a new Token object.

func Tokenize ¶

func Tokenize(text []rune) []Token

Tokenize creates a slice containing tokens for every word in the document.

func (*Token) ToJson ¶

func (t *Token) ToJson() ([]byte, error)

ToJSON serializes token to JSON string

type TokenJSON ¶

type TokenJSON struct {
	Line    int    `json:"lineNumber"`
	Raw     string `json:"raw"`
	Cleaned string `json:"cleaned"`
	Start   int    `json:"start"`
	End     int    `json:"end"`
}

TokenJSON provides a presentation view for a Token.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL