tokenizers

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 25, 2025 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type TokenizerNode

type TokenizerNode struct {
	// contains filtered or unexported fields
}

TokenizerNode converts a tensor of strings into a tensor of integer token IDs. NOTE: This implementation assumes a flexible Node interface that can handle different tensor types, not one strictly tied to numerics.

func NewTokenizerNode

func NewTokenizerNode(vocab map[string]int32, unkTokenID int32) *TokenizerNode

NewTokenizerNode creates a new node for tokenization. The vocabulary maps string tokens to their integer IDs. unkTokenID is the ID to use for tokens not found in the vocabulary.

func (*TokenizerNode) Attributes

func (n *TokenizerNode) Attributes() map[string]any

Attributes returns no attributes for this node.

func (*TokenizerNode) Backward

func (n *TokenizerNode) Backward(ctx context.Context, mode types.BackwardMode, outputGradient tensor.Tensor) ([]tensor.Tensor, error)

Backward is not implemented for TokenizerNode as it is not a differentiable operation.

func (*TokenizerNode) Forward

func (n *TokenizerNode) Forward(ctx context.Context, inputs ...tensor.Tensor) (tensor.Tensor, error)

Forward performs the tokenization. It expects a single input: a 1D TensorString. It outputs a 2D TensorNumeric[int32] with shape [1, sequence_length].

func (*TokenizerNode) OpType

func (n *TokenizerNode) OpType() string

OpType returns the type of the node.

func (*TokenizerNode) OutputShape

func (n *TokenizerNode) OutputShape() []int

OutputShape returns the shape of the output tensor. Since the sequence length is dynamic, we can represent it with -1.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL