Documentation
¶
Overview ¶
Package sentencepiece implements a tokenizers.Tokenizer based on SentencePiece tokenizer.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Tokenizer ¶
type Tokenizer struct {
*esentencepiece.Processor
Info *esentencepiece.ModelInfo
}
Tokenizer implements tokenizers.Tokenizer interface based on SentencePiece tokenizer by Google.
func (*Tokenizer) Decode ¶
Decode returns the text from a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) Encode ¶
Encode returns the text encoded into a sequence of ids. It implements sampler.Vocabulary.
func (*Tokenizer) SpecialTokenID ¶
func (p *Tokenizer) SpecialTokenID(token api.SpecialToken) (int, error)
SpecialTokenID returns the token for the given symbol, or an error if not known.
Directories
¶
| Path | Synopsis |
|---|---|
|
private
|
|
|
protos
Package protos have the Proto Buffer code for the sentencepiece_model.proto file, downloaded from https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto.
|
Package protos have the Proto Buffer code for the sentencepiece_model.proto file, downloaded from https://github.com/google/sentencepiece/blob/master/src/sentencepiece_model.proto. |
Click to show internal directories.
Click to hide internal directories.