Documentation
¶
Index ¶
- Variables
- func AddCustomDict(className string, configs []*models.TokenizerUserDictConfig) error
- func InitOptionalTokenizers()
- func NewUserDictFromModel(config *models.TokenizerUserDictConfig) (*dict.UserDict, error)
- func Tokenize(tokenization string, in string) []string
- func TokenizeAndCountDuplicatesForClass(tokenization string, in string, class string) ([]string, []int)
- func TokenizeForClass(tokenization string, in string, class string) []string
- func TokenizeWithWildcardsForClass(tokenization string, in string, class string) []string
- type KagomeTokenizers
Constants ¶
This section is empty.
Variables ¶
View Source
var ( UseGse = false // Load Japanese dictionary and prepare tokenizer UseGseCh = false // Load Chinese dictionary and prepare tokenizer // The Tokenizer Libraries can consume a lot of memory, so we limit the number of parallel tokenizers ApacTokenizerThrottle = chan struct{}(nil) // Throttle for tokenizers )
View Source
var Tokenizations []string = []string{ models.PropertyTokenizationWord, models.PropertyTokenizationLowercase, models.PropertyTokenizationWhitespace, models.PropertyTokenizationField, models.PropertyTokenizationTrigram, }
Optional tokenizers can be enabled with an environment variable like: 'ENABLE_TOKENIZER_XXX', e.g. 'ENABLE_TOKENIZER_GSE', 'ENABLE_TOKENIZER_KAGOME_KR', 'ENABLE_TOKENIZER_KAGOME_JA'
Functions ¶
func AddCustomDict ¶ added in v1.34.1
func AddCustomDict(className string, configs []*models.TokenizerUserDictConfig) error
func InitOptionalTokenizers ¶ added in v1.34.1
func InitOptionalTokenizers()
func NewUserDictFromModel ¶ added in v1.34.1
func NewUserDictFromModel(config *models.TokenizerUserDictConfig) (*dict.UserDict, error)
func TokenizeAndCountDuplicatesForClass ¶ added in v1.34.1
func TokenizeForClass ¶ added in v1.34.1
Types ¶
type KagomeTokenizers ¶
type KagomeTokenizers struct {
Korean *kagomeTokenizer.Tokenizer
Japanese *kagomeTokenizer.Tokenizer
}
Click to show internal directories.
Click to hide internal directories.