Affected by GO-2025-3548 and 8 other vulnerabilities

GO-2025-3548: Ollama Vulnerable to Denial of Service (DoS) via Crafted GZIP in github.com/ollama/ollama

GO-2025-3557: Ollama Allocation of Resources Without Limits or Throttling vulnerability in github.com/ollama/ollama

GO-2025-3558: Ollama Allows Out-of-Bounds Read in github.com/ollama/ollama

GO-2025-3559: Ollama Divide By Zero vulnerability in github.com/ollama/ollama

GO-2025-3582: Ollama Denial of Service (DoS) via Null Pointer Dereference in github.com/ollama/ollama

GO-2025-3689: Ollama Divide by Zero Vulnerability in github.com/ollama/ollama

GO-2025-3695: Ollama Server Vulnerable to Denial of Service (DoS) Attack in github.com/ollama/ollama

GO-2025-3824: Ollama vulnerable to Cross-Domain Token Exposure in github.com/ollama/ollama

GO-2025-4251: Ollama has missing authentication enabling attackers to perform model management operations in github.com/ollama/ollama

tokenizer

package

v0.18.2 Latest Latest Go to latest Published: Mar 18, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ollama/ollama

Links

Documentation ¶

Index ¶

type Tokenizer
- func LoadFromBytes(data []byte) (*Tokenizer, error)
- func LoadFromBytesWithConfig(data []byte, config *TokenizerConfig) (*Tokenizer, error)
type TokenizerConfig
type TokenizerType
type Vocabulary

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Tokenizer ¶

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer handles BPE and SentencePiece tokenization

func LoadFromBytes ¶

func LoadFromBytes(data []byte) (*Tokenizer, error)

LoadFromBytes loads a tokenizer from tokenizer.json bytes. This is useful when loading from blob storage where the file content is already in memory. Note: This won't load special token config from companion files. Use LoadFromBytesWithConfig to provide tokenizer_config.json data for proper PAD/EOS token loading.

func LoadFromBytesWithConfig ¶

func LoadFromBytesWithConfig(data []byte, config *TokenizerConfig) (*Tokenizer, error)

LoadFromBytesWithConfig loads a tokenizer from tokenizer.json bytes with additional config files. This is useful when loading from blob storage where companion config files are also blobs.

func (*Tokenizer) AddBOS ¶ added in v0.18.2

func (t *Tokenizer) AddBOS() bool

AddBOS returns whether a BOS token should be prepended during encoding.

func (*Tokenizer) BOS ¶

func (t *Tokenizer) BOS() int32

BOS returns the beginning of sequence token ID

func (*Tokenizer) Decode ¶

func (t *Tokenizer) Decode(ids []int32) string

Decode converts token IDs back to text

func (*Tokenizer) EOS ¶

func (t *Tokenizer) EOS() int32

EOS returns the first end of sequence token ID (for backwards compatibility)

func (*Tokenizer) EOSTokens ¶

func (t *Tokenizer) EOSTokens() []int32

EOSTokens returns all end of sequence token IDs

func (*Tokenizer) Encode ¶

func (t *Tokenizer) Encode(s string, addBOS bool) []int32

Encode tokenizes text to token IDs. Parallel encoding is used only for very large inputs with enough chunks per worker.

func (*Tokenizer) GetSpecialToken ¶

func (t *Tokenizer) GetSpecialToken(name string) (int32, bool)

GetSpecialToken returns the token ID for a special token string

func (*Tokenizer) IsEOS ¶

func (t *Tokenizer) IsEOS(id int32) bool

IsEOS returns true if the token ID is an end of sequence token

func (*Tokenizer) PAD ¶

func (t *Tokenizer) PAD() int32

PAD returns the padding token ID, or -1 if not set

func (*Tokenizer) VocabSize ¶

func (t *Tokenizer) VocabSize() int

VocabSize returns the vocabulary size

type TokenizerConfig ¶

type TokenizerConfig struct {
	TokenizerConfigJSON  []byte // tokenizer_config.json content
	GenerationConfigJSON []byte // generation_config.json content
	SpecialTokensMapJSON []byte // special_tokens_map.json content
	ConfigJSON           []byte // config.json content
}

TokenizerConfig holds optional configuration data that can be passed to LoadFromBytesWithConfig.

type TokenizerType ¶

type TokenizerType int

TokenizerType identifies the tokenization algorithm

const (
	TokenizerBPE           TokenizerType = iota // GPT-2 style byte-level BPE
	TokenizerSentencePiece                      // SentencePiece with ▁ for spaces
)

type Vocabulary ¶

type Vocabulary struct {
	Values  []string
	Reverse map[string]int32
	Merges  map[string]int

	BOS    int32
	EOS    []int32 // Multiple EOS tokens supported (e.g., Gemma has <eos> and <end_of_turn>)
	PAD    int32   // Padding token (often <|endoftext|> or <pad>)
	AddBOS bool
	AddEOS bool
	// contains filtered or unexported fields
}

Vocabulary holds the tokenizer vocabulary and merges

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL