Documentation
¶
Index ¶
- Variables
- func BuildTermFromRunes(runes []rune) []byte
- func DeleteRune(in []rune, pos int) []rune
- func InsertRune(in []rune, pos int, r rune) []rune
- func RunesEndsWith(input []rune, suffix string) bool
- func TruncateRunes(input []byte, num int) []byte
- type Analyzer
- type ByteArrayConverter
- type CharFilter
- type DateTimeParser
- type Token
- type TokenFilter
- type TokenFreq
- type TokenFrequencies
- type TokenLocation
- type TokenMap
- type TokenStream
- type TokenType
- type Tokenizer
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")
Functions ¶
func BuildTermFromRunes ¶
func DeleteRune ¶
func RunesEndsWith ¶
func TruncateRunes ¶
Types ¶
type Analyzer ¶
type Analyzer struct {
CharFilters []CharFilter
Tokenizer Tokenizer
TokenFilters []TokenFilter
}
func (*Analyzer) Analyze ¶
func (a *Analyzer) Analyze(input []byte) TokenStream
type ByteArrayConverter ¶
type CharFilter ¶
type Token ¶
type Token struct {
// Start specifies the byte offset of the beginning of the term in the
// field.
Start int `json:"start"`
// End specifies the byte offset of the end of the term in the field.
End int `json:"end"`
Term []byte `json:"term"`
// Position specifies the 1-based index of the token in the sequence of
// occurrences of its term in the field.
Position int `json:"position"`
Type TokenType `json:"type"`
KeyWord bool `json:"keyword"`
}
Token represents one occurrence of a term at a particular location in a field.
type TokenFilter ¶
type TokenFilter interface {
Filter(TokenStream) TokenStream
}
A TokenFilter adds, transforms or removes tokens from a token stream.
type TokenFreq ¶
type TokenFreq struct {
Term []byte
Locations []*TokenLocation
// contains filtered or unexported fields
}
TokenFreq represents all the occurrences of a term in all fields of a document.
type TokenFrequencies ¶
TokenFrequencies maps document terms to their combined frequencies from all fields.
func TokenFrequency ¶
func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies
func (TokenFrequencies) MergeAll ¶
func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
type TokenLocation ¶
TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.
type TokenMap ¶
func NewTokenMap ¶
func NewTokenMap() TokenMap
func (TokenMap) LoadBytes ¶
LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`
type TokenStream ¶
type TokenStream []*Token
type Tokenizer ¶
type Tokenizer interface {
Tokenize([]byte) TokenStream
}
A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.
Directories
¶
| Path | Synopsis |
|---|---|
|
analyzers
|
|
|
byte_array_converters
|
|
|
char_filters
|
|
|
datetime_parsers
|
|
|
language
|
|
|
en
Package en implements an analyzer with reasonable defaults for processing English text.
|
Package en implements an analyzer with reasonable defaults for processing English text. |
|
token_filters
|
|
|
lower_case_filter
Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules.
|
Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules. |
|
stop_tokens_filter
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.
|
package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap. |
|
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
|
package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens. |
|
tokenizers
|
|
|
exception
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
|
package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream. |