Documentation
¶
Index ¶
- type TokenizerNode
- func (n *TokenizerNode) Attributes() map[string]any
- func (n *TokenizerNode) Backward(ctx context.Context, mode types.BackwardMode, outputGradient tensor.Tensor) ([]tensor.Tensor, error)
- func (n *TokenizerNode) Forward(ctx context.Context, inputs ...tensor.Tensor) (tensor.Tensor, error)
- func (n *TokenizerNode) OpType() string
- func (n *TokenizerNode) OutputShape() []int
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type TokenizerNode ¶
type TokenizerNode struct {
// contains filtered or unexported fields
}
TokenizerNode converts a tensor of strings into a tensor of integer token IDs. NOTE: This implementation assumes a flexible Node interface that can handle different tensor types, not one strictly tied to numerics.
func NewTokenizerNode ¶
func NewTokenizerNode(vocab map[string]int32, unkTokenID int32) *TokenizerNode
NewTokenizerNode creates a new node for tokenization. The vocabulary maps string tokens to their integer IDs. unkTokenID is the ID to use for tokens not found in the vocabulary.
func (*TokenizerNode) Attributes ¶
func (n *TokenizerNode) Attributes() map[string]any
Attributes returns no attributes for this node.
func (*TokenizerNode) Backward ¶
func (n *TokenizerNode) Backward(ctx context.Context, mode types.BackwardMode, outputGradient tensor.Tensor) ([]tensor.Tensor, error)
Backward is not implemented for TokenizerNode as it is not a differentiable operation.
func (*TokenizerNode) Forward ¶
func (n *TokenizerNode) Forward(ctx context.Context, inputs ...tensor.Tensor) (tensor.Tensor, error)
Forward performs the tokenization. It expects a single input: a 1D TensorString. It outputs a 2D TensorNumeric[int32] with shape [1, sequence_length].
func (*TokenizerNode) OpType ¶
func (n *TokenizerNode) OpType() string
OpType returns the type of the node.
func (*TokenizerNode) OutputShape ¶
func (n *TokenizerNode) OutputShape() []int
OutputShape returns the shape of the output tensor. Since the sequence length is dynamic, we can represent it with -1.