Documentation
¶
Overview ¶
Package sentencepiece implements a tokenizers.Tokenizer based on SentencePiece tokenizer.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Tokenizer ¶
type Tokenizer struct {
*esentencepiece.Processor
Info *esentencepiece.ModelInfo
}
Tokenizer implements tokenizers.Tokenizer interface based on SentencePiece tokenizer by Google.
func (*Tokenizer) Decode ¶
Decode returns the text from a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) Encode ¶
Encode returns the text encoded into a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) SpecialTokenID ¶
func (p *Tokenizer) SpecialTokenID(token api.SpecialToken) (int, error)
SpecialTokenID returns the token for the given symbol, or an error if not known.
Click to show internal directories.
Click to hide internal directories.