tokenizers

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 5, 2026 License: MIT Imports: 25 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LoadLibrary

func LoadLibrary(path string) error

LoadLibrary is a no-op for backward compatibility pure-tokenizers handles library loading automatically

Types

type EncodeOption

type EncodeOption func(eo *encodeOpts)

func WithReturnAllAttributes

func WithReturnAllAttributes() EncodeOption

func WithReturnAttentionMask

func WithReturnAttentionMask() EncodeOption

func WithReturnOffsets

func WithReturnOffsets() EncodeOption

func WithReturnSpecialTokensMask

func WithReturnSpecialTokensMask() EncodeOption

func WithReturnTokens

func WithReturnTokens() EncodeOption

func WithReturnTypeIDs

func WithReturnTypeIDs() EncodeOption

type Encoding

type Encoding struct {
	IDs               []uint32
	TypeIDs           []uint32
	SpecialTokensMask []uint32
	AttentionMask     []uint32
	Tokens            []string
	Offsets           []Offset
}

Encoding represents the result of tokenizing text

type Offset

type Offset [2]uint

Offset represents a character offset range [start, end]

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer wraps pure-tokenizers with backward-compatible API

func FromBytes

func FromBytes(data []byte, opts ...TokenizerOption) (*Tokenizer, error)

FromBytes creates a tokenizer from byte configuration

func FromBytesWithTruncation

func FromBytesWithTruncation(data []byte, maxLen uint32, dir TruncationDirection) (*Tokenizer, error)

FromBytesWithTruncation creates a tokenizer with truncation settings

func FromFile

func FromFile(path string) (*Tokenizer, error)

FromFile creates a tokenizer from a file path

func (*Tokenizer) Close

func (t *Tokenizer) Close() error

Close closes the tokenizer and frees resources

func (*Tokenizer) Decode

func (t *Tokenizer) Decode(tokenIDs []uint32, skipSpecialTokens bool) (string, error)

Decode converts token IDs back to text

func (*Tokenizer) Encode

func (t *Tokenizer) Encode(str string, addSpecialTokens bool) ([]uint32, []string, error)

Encode tokenizes text with simple options

func (*Tokenizer) EncodeWithOptions

func (t *Tokenizer) EncodeWithOptions(str string, addSpecialTokens bool, opts ...EncodeOption) (Encoding, error)

EncodeWithOptions tokenizes text with full control over encoding options

func (*Tokenizer) VocabSize

func (t *Tokenizer) VocabSize() (uint32, error)

VocabSize returns the vocabulary size

type TokenizerOption

type TokenizerOption func(to *tokenizerOpts)

func WithEncodeSpecialTokens

func WithEncodeSpecialTokens() TokenizerOption

type TruncationDirection

type TruncationDirection int
const (
	TruncationDirectionLeft TruncationDirection = iota
	TruncationDirectionRight
)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL