analysis

package

v0.1.0 Latest Latest Go to latest Published: Apr 8, 2016 License: Apache-2.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/fancybits/bleve

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func BuildTermFromRunes(runes []rune) []byte
func DeleteRune(in []rune, pos int) []rune
func InsertRune(in []rune, pos int, r rune) []rune
func RunesEndsWith(input []rune, suffix string) bool
func TruncateRunes(input []byte, num int) []byte
type Analyzer
- func (a *Analyzer) Analyze(input []byte) TokenStream
type ByteArrayConverter
type CharFilter
type DateTimeParser
type Token
- func (t *Token) String() string
type TokenFilter
type TokenFreq
- func (tf *TokenFreq) Frequency() int
type TokenFrequencies
- func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies
- func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)
type TokenLocation
type TokenMap
- func NewTokenMap() TokenMap
- func (t TokenMap) AddToken(token string)
- func (t TokenMap) LoadBytes(data []byte) error
- func (t TokenMap) LoadFile(filename string) error
- func (t TokenMap) LoadLine(line string)
type TokenStream
type TokenType
type Tokenizer

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")

Functions ¶

func BuildTermFromRunes ¶

func BuildTermFromRunes(runes []rune) []byte

func DeleteRune ¶

func DeleteRune(in []rune, pos int) []rune

func InsertRune ¶

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith ¶

func RunesEndsWith(input []rune, suffix string) bool

func TruncateRunes ¶

func TruncateRunes(input []byte, num int) []byte

Types ¶

type Analyzer ¶

type Analyzer struct {
	CharFilters  []CharFilter
	Tokenizer    Tokenizer
	TokenFilters []TokenFilter
}

func (*Analyzer) Analyze ¶

func (a *Analyzer) Analyze(input []byte) TokenStream

type ByteArrayConverter ¶

type ByteArrayConverter interface {
	Convert([]byte) (interface{}, error)
}

type CharFilter ¶

type CharFilter interface {
	Filter([]byte) []byte
}

type DateTimeParser ¶

type DateTimeParser interface {
	ParseDateTime(string) (time.Time, error)
}

type Token ¶

type Token struct {
	// Start specifies the byte offset of the beginning of the term in the
	// field.
	Start int `json:"start"`

	// End specifies the byte offset of the end of the term in the field.
	End  int    `json:"end"`
	Term []byte `json:"term"`

	// Position specifies the 1-based index of the token in the sequence of
	// occurrences of its term in the field.
	Position int       `json:"position"`
	Type     TokenType `json:"type"`
	KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String ¶

func (t *Token) String() string

type TokenFilter ¶

type TokenFilter interface {
	Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenFreq ¶

type TokenFreq struct {
	Term      []byte
	Locations []*TokenLocation
	// contains filtered or unexported fields
}

TokenFreq represents all the occurrences of a term in all fields of a document.

func (*TokenFreq) Frequency ¶

func (tf *TokenFreq) Frequency() int

type TokenFrequencies ¶

type TokenFrequencies map[string]*TokenFreq

TokenFrequencies maps document terms to their combined frequencies from all fields.

func TokenFrequency ¶

func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies

func (TokenFrequencies) MergeAll ¶

func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)

type TokenLocation ¶

type TokenLocation struct {
	Field          string
	ArrayPositions []uint64
	Start          int
	End            int
	Position       int
}

TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.

type TokenMap ¶

type TokenMap map[string]bool

func NewTokenMap ¶

func NewTokenMap() TokenMap

func (TokenMap) AddToken ¶

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes ¶

func (t TokenMap) LoadBytes(data []byte) error

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile ¶

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine ¶

func (t TokenMap) LoadLine(line string)

type TokenStream ¶

type TokenStream []*Token

type TokenType ¶

type TokenType int

const (
	AlphaNumeric TokenType = iota
	Ideographic
	Numeric
	DateTime
	Shingle
	Single
	Double
	Boolean
)

type Tokenizer ¶

type Tokenizer interface {
	Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
analyzers
custom_analyzer
keyword_analyzer
simple_analyzer
standard_analyzer
web
byte_array_converters
ignore
json
string
char_filters
html_char_filter
regexp_char_filter
zero_width_non_joiner
datetime_parsers
datetime_optional
flexible_go
language
ar
bg
ca
cjk
ckb
cs
el
en Package en implements an analyzer with reasonable defaults for processing English text.	Package en implements an analyzer with reasonable defaults for processing English text.
eu
fa
fr
ga
gl
hi
hy
id
in
it
pt
token_filters
apostrophe_filter
compound
edge_ngram_filter
elision_filter
keyword_marker_filter
length_filter
lower_case_filter Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules.	Package lower_case_filter implements a TokenFilter which converts tokens to lower case according to unicode rules.
ngram_filter
porter
shingle
stop_tokens_filter package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.	package stop_tokens_filter implements a TokenFilter removing tokens found in a TokenMap.
truncate_token_filter
unicode_normalize
token_map package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.	package token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
tokenizers
character
exception package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.	package exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
letter
regexp_tokenizer
single_token
unicode
web
whitespace_tokenizer

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL