tokenizer

package
v0.21.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 16, 2026 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package tokenizer provides token counting for brfit output.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type NoOpTokenizer

type NoOpTokenizer struct{}

NoOpTokenizer is a tokenizer that always returns 0. Used as default when token counting is disabled or unavailable.

func NewNoOpTokenizer

func NewNoOpTokenizer() *NoOpTokenizer

NewNoOpTokenizer creates a new NoOpTokenizer.

Example
package main

import (
	"fmt"

	"github.com/indigo-net/Brf.it/pkg/tokenizer"
)

func main() {
	t := tokenizer.NewNoOpTokenizer()
	fmt.Println(t.Name())

	count, err := t.Count([]byte("hello world"))
	fmt.Println(count, err)
}
Output:
noop
0 <nil>

func (*NoOpTokenizer) Count

func (t *NoOpTokenizer) Count(_ []byte) (int, error)

Count returns 0 and nil error (no-op).

func (*NoOpTokenizer) Name

func (t *NoOpTokenizer) Name() string

Name returns "noop".

type TiktokenTokenizer

type TiktokenTokenizer struct {
	// contains filtered or unexported fields
}

TiktokenTokenizer implements Tokenizer using tiktoken with cl100k_base encoding. This encoding is compatible with GPT-4 and GPT-3.5-turbo models.

func NewTiktokenTokenizer

func NewTiktokenTokenizer() (*TiktokenTokenizer, error)

NewTiktokenTokenizer creates a new TiktokenTokenizer with cl100k_base encoding. Returns error if the encoding fails to load.

func (*TiktokenTokenizer) Count

func (t *TiktokenTokenizer) Count(text []byte) (int, error)

Count returns the number of tokens in the given text.

func (*TiktokenTokenizer) Name

func (t *TiktokenTokenizer) Name() string

Name returns the tokenizer name with encoding.

type Tokenizer

type Tokenizer interface {
	// Count returns the number of tokens in the given text.
	// Returns 0 and error if counting fails.
	Count(text []byte) (int, error)

	// Name returns the tokenizer name (e.g., "tiktoken-cl100k", "noop").
	Name() string
}

Tokenizer defines the interface for counting tokens in text.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL