tokenizer

package

v0.21.0 Latest Latest Go to latest Published: Mar 16, 2026 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/indigo-net/Brf.it

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tokenizer provides token counting for brfit output.

Index ¶

type NoOpTokenizer
- func NewNoOpTokenizer() *NoOpTokenizer
- func (t *NoOpTokenizer) Count(_ []byte) (int, error)
- func (t *NoOpTokenizer) Name() string
type TiktokenTokenizer
- func NewTiktokenTokenizer() (*TiktokenTokenizer, error)
- func (t *TiktokenTokenizer) Count(text []byte) (int, error)
- func (t *TiktokenTokenizer) Name() string
type Tokenizer

Examples ¶

NewNoOpTokenizer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type NoOpTokenizer ¶

type NoOpTokenizer struct{}

NoOpTokenizer is a tokenizer that always returns 0. Used as default when token counting is disabled or unavailable.

func NewNoOpTokenizer ¶

func NewNoOpTokenizer() *NoOpTokenizer

NewNoOpTokenizer creates a new NoOpTokenizer.

Example ¶

package main

import (
	"fmt"

	"github.com/indigo-net/Brf.it/pkg/tokenizer"
)

func main() {
	t := tokenizer.NewNoOpTokenizer()
	fmt.Println(t.Name())

	count, err := t.Count([]byte("hello world"))
	fmt.Println(count, err)
}

Output:
noop
0 <nil>

func (*NoOpTokenizer) Count ¶

func (t *NoOpTokenizer) Count(_ []byte) (int, error)

Count returns 0 and nil error (no-op).

func (*NoOpTokenizer) Name ¶

func (t *NoOpTokenizer) Name() string

Name returns "noop".

type TiktokenTokenizer ¶

type TiktokenTokenizer struct {
	// contains filtered or unexported fields
}

TiktokenTokenizer implements Tokenizer using tiktoken with cl100k_base encoding. This encoding is compatible with GPT-4 and GPT-3.5-turbo models.

func NewTiktokenTokenizer ¶

func NewTiktokenTokenizer() (*TiktokenTokenizer, error)

NewTiktokenTokenizer creates a new TiktokenTokenizer with cl100k_base encoding. Returns error if the encoding fails to load.

func (*TiktokenTokenizer) Count ¶

func (t *TiktokenTokenizer) Count(text []byte) (int, error)

Count returns the number of tokens in the given text.

func (*TiktokenTokenizer) Name ¶

func (t *TiktokenTokenizer) Name() string

Name returns the tokenizer name with encoding.

type Tokenizer ¶

type Tokenizer interface {
	// Count returns the number of tokens in the given text.
	// Returns 0 and error if counting fails.
	Count(text []byte) (int, error)

	// Name returns the tokenizer name (e.g., "tiktoken-cl100k", "noop").
	Name() string
}

Tokenizer defines the interface for counting tokens in text.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL