Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewApproximationTokenizer ¶
NewApproximationTokenizer creates a new ApproximationTokenizer with default settings. Uses 4.0 characters per token as the default ratio.
func NewApproximationTokenizerWithRatio ¶
NewApproximationTokenizerWithRatio creates a new ApproximationTokenizer with a custom characters-per-token ratio. Useful for tuning accuracy for specific languages or domains.
Types ¶
type ApproximationTokenizer ¶
type ApproximationTokenizer struct {
// contains filtered or unexported fields
}
ApproximationTokenizer implements ports.Tokenizer using character-based estimation. Provides fast, approximate token counting as a fallback when exact tokenization is not available or not needed. Uses a simple heuristic: ~4 characters per token.
func (*ApproximationTokenizer) CountTokens ¶
func (a *ApproximationTokenizer) CountTokens(text string) (int, error)
CountTokens returns an approximate token count based on character length. Formula: token_count ≈ len(text) / charsPerToken
func (*ApproximationTokenizer) CountTurnsTokens ¶
func (a *ApproximationTokenizer) CountTurnsTokens(turns []string) (int, error)
CountTurnsTokens returns the total approximate token count across multiple conversation turns. Sums the estimated token counts for each turn.
func (*ApproximationTokenizer) IsEstimate ¶
func (a *ApproximationTokenizer) IsEstimate() bool
IsEstimate returns true because this tokenizer produces approximate counts.
func (*ApproximationTokenizer) ModelName ¶
func (a *ApproximationTokenizer) ModelName() string
ModelName returns the identifier for this approximation method.
type TiktokenTokenizer ¶
type TiktokenTokenizer struct {
// contains filtered or unexported fields
}
TiktokenTokenizer implements ports.Tokenizer using pkoukk/tiktoken-go library. Provides accurate token counting for OpenAI-compatible models.
func (*TiktokenTokenizer) CountTokens ¶
func (t *TiktokenTokenizer) CountTokens(text string) (int, error)
CountTokens returns the exact number of tokens in the given text.
func (*TiktokenTokenizer) CountTurnsTokens ¶
func (t *TiktokenTokenizer) CountTurnsTokens(turns []string) (int, error)
CountTurnsTokens returns the total token count across multiple conversation turns.
func (*TiktokenTokenizer) IsEstimate ¶
func (t *TiktokenTokenizer) IsEstimate() bool
IsEstimate returns false because tiktoken provides exact counts.
func (*TiktokenTokenizer) ModelName ¶
func (t *TiktokenTokenizer) ModelName() string
ModelName returns the tiktoken model identifier.