token

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 29, 2020 License: MIT Imports: 3 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Feature

type Feature interface {
	Analyse(token *Token)
	Value(obj interface{})
	String() string
}

type Properties

type Properties struct {
	// HasStartParens token starts with '('.
	HasStartParens bool

	// HasEndParens token end with '('.
	HasEndParens bool

	// HasStartSqParens token starts with '['.
	HasStartSqParens bool

	// HasEndSqParens token end with ']'.
	HasEndSqParens bool

	// HasEndDot token ends with '.'
	HasEndDot bool

	// HasEndComma token ends with ','
	HasEndComma bool

	// HasDigits token includes at least one '0-9'.
	HasDigits bool

	// HasLetters token includes at least one character for which
	// unicode.IsLetter(ch) is true.
	HasLetters bool

	// HasDash token includes '-'
	HasDash bool

	// HasSpecialChars internal part of a token includes non-letters, non-digits.
	HasSpecialChars bool

	// IsNumber internal part of a token has only numbers.
	IsNumber bool

	// IsWord internal part of a token includes only letters.
	IsWord bool
}

Properties is a fixed set of general properties determined durint the the text traversal.

type Token

type Token struct {
	// Line line number in the text
	Line int

	// Raw is a verbatim presentation of a token as it appears in a text.
	Raw []rune

	// Start is the index of the first rune of a token. The first rune
	// does not have to be alpha-numeric.
	Start int

	// End is the index of the last rune of a token. The last rune does not
	// have to be alpha-numeric.
	End int

	// Cleaned is a presentation of a token after normalization.
	Cleaned string

	// Properties is a fixed set of general properties that we determine during
	// the text traversal.
	Properties

	// Features is the map of features as values with their string
	// representations as keys.
	Features map[string]Feature
	// contains filtered or unexported fields
}

Token represents a word separated by spaces in a text. Words split by new lines are concatenated.

func NewToken

func NewToken(raw []rune, start int, end int, feat ...Feature) Token

NewToken constructs a new Token object.

func Tokenize

func Tokenize(text []rune) []Token

Tokenize creates a slice containing tokens for every word in the document.

func (*Token) ToJson

func (t *Token) ToJson() ([]byte, error)

ToJSON serializes token to JSON string

type TokenJSON

type TokenJSON struct {
	Line    int    `json:"lineNumber"`
	Raw     string `json:"raw"`
	Cleaned string `json:"cleaned"`
	Start   int    `json:"start"`
	End     int    `json:"end"`
}

TokenJSON provides a presentation view for a Token.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL