ollamatokenizer

package

v0.28.6 Latest Latest Go to latest Published: Jun 9, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/contenox/runtime

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ollamatokenizer provides Tokenizer implementations used by llmrepo to count and split tokens for a given model.

NewHTTPClient talks to an Ollama-compatible tokenizer endpoint; EstimateTokenizer is a dependency-free heuristic fallback; MockTokenizer is intended for tests.

Index ¶

type ConfigHTTP
type EstimateTokenizer
- func NewEstimateTokenizer() *EstimateTokenizer
type HTTPClient
type MockTokenizer
type Tokenizer
- func NewHTTPClient(ctx context.Context, cfg ConfigHTTP) (Tokenizer, func() error, error)
- func WithActivityTracker(client Tokenizer, tracker libtracker.ActivityTracker) Tokenizer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type ConfigHTTP ¶

type ConfigHTTP struct {
	BaseURL string
}

ConfigHTTP contains configuration for the HTTP client.

type EstimateTokenizer ¶

type EstimateTokenizer struct{}

EstimateTokenizer implements Tokenizer using simple character-based estimates. Use for local single-process mode where no tokenizer service is available. CountTokens uses ~4 chars per token (rough heuristic for typical LLM tokenizers).

func NewEstimateTokenizer ¶

func NewEstimateTokenizer() *EstimateTokenizer

NewEstimateTokenizer returns a tokenizer that estimates token counts without a remote service.

func (*EstimateTokenizer) CountTokens ¶

func (e *EstimateTokenizer) CountTokens(ctx context.Context, modelName string, prompt string) (int, error)

CountTokens returns an estimated token count (runes / 4, min 1).

func (*EstimateTokenizer) OptimalModel ¶

func (e *EstimateTokenizer) OptimalModel(ctx context.Context, baseModel string) (string, error)

OptimalModel returns the base model unchanged (no proxy tokenizer model).

func (*EstimateTokenizer) Tokenize ¶

func (e *EstimateTokenizer) Tokenize(ctx context.Context, modelName string, prompt string) ([]int, error)

Tokenize returns a dummy slice of length equal to the estimated token count. No caller in the task engine uses the actual token IDs; the length is sufficient.

type HTTPClient ¶

type HTTPClient struct {
	// contains filtered or unexported fields
}

HTTPClient implements the Tokenizer interface using HTTP calls to the tokenizer service.

func (*HTTPClient) CountTokens ¶

func (c *HTTPClient) CountTokens(ctx context.Context, modelName string, prompt string) (int, error)

CountTokens uses the dedicated /count endpoint, with a backward-compatible fallback to /tokenize.

func (*HTTPClient) OptimalModel ¶

func (c *HTTPClient) OptimalModel(ctx context.Context, baseModel string) (string, error)

OptimalModel returns the optimal model for tokenization based on the given model. This is a client-side implementation mirroring the server's logic.

func (*HTTPClient) Tokenize ¶

func (c *HTTPClient) Tokenize(ctx context.Context, modelName string, prompt string) ([]int, error)

Tokenize sends a tokenization request to the HTTP service.

type MockTokenizer ¶

type MockTokenizer struct {
	FixedTokenCount int
	FixedModel      string
	CustomTokens    map[string][]int
}

MockTokenizer is a mock implementation of the Tokenizer interface.

func (MockTokenizer) CountTokens ¶

func (m MockTokenizer) CountTokens(ctx context.Context, modelName string, prompt string) (int, error)

func (MockTokenizer) OptimalModel ¶

func (m MockTokenizer) OptimalModel(ctx context.Context, baseModel string) (string, error)

func (MockTokenizer) Tokenize ¶

func (m MockTokenizer) Tokenize(ctx context.Context, modelName string, prompt string) ([]int, error)

type Tokenizer ¶

type Tokenizer interface {
	Tokenize(ctx context.Context, modelName string, prompt string) ([]int, error)
	CountTokens(ctx context.Context, modelName string, prompt string) (int, error)
	OptimalModel(ctx context.Context, baseModel string) (string, error)
}

func NewHTTPClient ¶

func NewHTTPClient(ctx context.Context, cfg ConfigHTTP) (Tokenizer, func() error, error)

NewHTTPClient creates a new HTTP-based tokenizer client.

func WithActivityTracker ¶

func WithActivityTracker(client Tokenizer, tracker libtracker.ActivityTracker) Tokenizer

WithActivityTracker decorates the given Tokenizer with activity tracking

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL