Documentation
¶
Overview ¶
Package tokenizer provides token counting for brfit output.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type NoOpTokenizer ¶
type NoOpTokenizer struct{}
NoOpTokenizer is a tokenizer that always returns 0. Used as default when token counting is disabled or unavailable.
func NewNoOpTokenizer ¶
func NewNoOpTokenizer() *NoOpTokenizer
NewNoOpTokenizer creates a new NoOpTokenizer.
Example ¶
package main
import (
"fmt"
"github.com/indigo-net/Brf.it/pkg/tokenizer"
)
func main() {
t := tokenizer.NewNoOpTokenizer()
fmt.Println(t.Name())
count, err := t.Count([]byte("hello world"))
fmt.Println(count, err)
}
Output: noop 0 <nil>
type TiktokenTokenizer ¶
type TiktokenTokenizer struct {
// contains filtered or unexported fields
}
TiktokenTokenizer implements Tokenizer using tiktoken with cl100k_base encoding. This encoding is compatible with GPT-4 and GPT-3.5-turbo models.
func NewTiktokenTokenizer ¶
func NewTiktokenTokenizer() (*TiktokenTokenizer, error)
NewTiktokenTokenizer creates a new TiktokenTokenizer with cl100k_base encoding. Returns error if the encoding fails to load.
func (*TiktokenTokenizer) Count ¶
func (t *TiktokenTokenizer) Count(text []byte) (int, error)
Count returns the number of tokens in the given text.
func (*TiktokenTokenizer) Name ¶
func (t *TiktokenTokenizer) Name() string
Name returns the tokenizer name with encoding.
type Tokenizer ¶
type Tokenizer interface {
// Count returns the number of tokens in the given text.
// Returns 0 and error if counting fails.
Count(text []byte) (int, error)
// Name returns the tokenizer name (e.g., "tiktoken-cl100k", "noop").
Name() string
}
Tokenizer defines the interface for counting tokens in text.