token

package
v0.1.0-beta.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 8, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package token provides token counting for MCP tool call content.

The default implementation uses cl100k_base (the BPE vocabulary used by GPT-4 and related models). Claude 3+ uses a different, unpublished vocabulary, so cl100k_base counts are an approximation for Claude models — typically within 10-15% for English and code content. This is intentional: the interface exists to allow swapping implementations without changing consumers, and cl100k_base is a meaningful improvement over the 4-bytes/token heuristic it replaces.

Users who need exact Claude token counts can enable gateway.tokenizer: api in their stack.yaml, which routes counting through Anthropic's count_tokens endpoint (Anthropic-specific, requires network access).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CountJSON

func CountJSON(c Counter, v any) int

CountJSON estimates the token count for a value by marshaling it to JSON first. This is useful for counting tool call arguments (map[string]any).

Types

type APICounter

type APICounter struct {
	// contains filtered or unexported fields
}

APICounter counts tokens via Anthropic's count_tokens endpoint.

Note: CountJSON passes a JSON-marshaled string as the message content. This counts the JSON bytes as tokens, which is an approximation — the real context includes tool definitions and system prompts not visible here. It is still significantly more accurate than the 4-bytes/token heuristic.

This counter is Anthropic-specific: it returns incorrect results when the gateway is routing to non-Anthropic models (Gemini, local endpoints, etc.). Use gateway.tokenizer: embedded for model-agnostic deployments.

func NewAPICounter

func NewAPICounter(apiKey string) (*APICounter, error)

NewAPICounter creates an APICounter using the given Anthropic API key. The HTTP client is initialized with a 5-second timeout. A TiktokenCounter is also initialized as a fallback for network or API errors.

func (*APICounter) Count

func (c *APICounter) Count(text string) int

Count returns the token count for the given text by calling the Anthropic count_tokens endpoint. On any error (network, non-200, parse), it falls back to the embedded TiktokenCounter and logs a warning.

type Counter

type Counter interface {
	// Count returns the estimated number of tokens in the given text.
	Count(text string) int
}

Counter estimates token counts for text content. Implementations may vary in accuracy — the interface allows swapping a heuristic counter for a tiktoken-based one without changing consumers.

type HeuristicCounter

type HeuristicCounter struct {
	// contains filtered or unexported fields
}

HeuristicCounter estimates tokens using a simple bytes-per-token ratio. This is fast and zero-dependency but approximate. Suitable for visibility purposes where exact counts are not required.

func NewHeuristicCounter

func NewHeuristicCounter(bytesPerToken int) *HeuristicCounter

NewHeuristicCounter creates a counter that estimates tokens at the given bytes-per-token ratio. A ratio of 4 is a reasonable default for English text.

func (*HeuristicCounter) Count

func (c *HeuristicCounter) Count(text string) int

Count returns the estimated token count for the given text.

type TiktokenCounter

type TiktokenCounter struct {
	// contains filtered or unexported fields
}

TiktokenCounter counts tokens using the cl100k_base BPE encoding. This is the vocabulary used by GPT-4 and related models and is a close approximation for Claude models (whose vocabulary is unpublished).

func NewTiktokenCounter

func NewTiktokenCounter() (*TiktokenCounter, error)

NewTiktokenCounter creates a counter using the cl100k_base encoding. The vocabulary is loaded eagerly at construction time to surface any initialization failure at startup rather than during request handling.

func (*TiktokenCounter) Count

func (c *TiktokenCounter) Count(text string) int

Count returns the token count for the given text using cl100k_base encoding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL