Documentation
¶
Overview ¶
Package tok provides high-performance text compression for LLM context windows.
Usage:
compressed, stats := tok.Compress(text, tok.Aggressive) compressed, stats := tok.Compress(text, tok.WithBudget(4000)) c := tok.NewCompressor(tok.Adaptive) compressed, stats := c.Compress(text)
Index ¶
- Constants
- Variables
- func BuildCompactionPrompt(context string, maxChars int) string
- func ClassifyContent(text string) string
- func CompressJSON(text string, maxItems int) string
- func CompressLog(text string) string
- func DetectLanguageByExtension(path string) string
- func EstimateCompression(content string, ratio float64) string
- func EstimateTokens(text string) int
- func EstimateTokensPrecise(text string) int
- func FormatResult(result *OptimizationResult) string
- func FormatUsageBar(pct float64, width int) string
- func GetLanguagePatterns(lang string) []string
- func IsHighEntropy(s string, threshold float64) bool
- func OptimalBudget(contentType string, importance float64) int
- func RecommendMode(contentType string, budget int) string
- func RegisterChunker(ext string, fn ChunkerFunc)
- func RegisterLanguagePatterns(lang string, patterns []string)
- func ShannonEntropy(s string) float64
- func SuggestBudget(blocks []ContentBlock) int
- func WarmupTokenizer()
- func WastedTokens(events []CompressionEvent) int
- type Alert
- type ChunkOptions
- type ChunkerFunc
- type CodeChunk
- type CompactionSchema
- type CompressionAdvisor
- type CompressionEvent
- type Compressor
- type ContentBlock
- type ContextOptimizer
- type LayerStat
- type Mode
- type OptimizationResult
- type Option
- type Recommendation
- type SecretDetector
- type SecretMatch
- type SeparatorKeep
- type Stats
- type StrategyStats
- type StreamCompressor
- type Tier
- type UsageEntry
- type UsageSummary
- type UsageTracker
- func (u *UsageTracker) CanProceed() (bool, string)
- func (u *UsageTracker) CheckThresholds()
- func (u *UsageTracker) EstimateRemaining(tokensPerRequest int) int
- func (u *UsageTracker) FormatSummary() string
- func (u *UsageTracker) GetUsage() UsageSummary
- func (u *UsageTracker) PruneOld()
- func (u *UsageTracker) Record(tokens int, costUSD float64, provider, model string)
- func (u *UsageTracker) Reset()
Constants ¶
const CompactionSystemPrompt = `` /* 662-byte string literal not displayed */
CompactionSystemPrompt is the system prompt for LLM-based structured compaction.
const DefaultMinChunkSize = 250
DefaultMinChunkSize is the default minimum chunk size in tokens.
Variables ¶
var Version = strings.TrimSpace(versionFile)
Version of the tok library. Sourced from the VERSION file at the repo root. Release builds may override it via ldflags:
-X github.com/GrayCodeAI/tok.Version={{.Version}}
Do not edit this variable directly — bump the VERSION file instead, or let release-please/goreleaser do it.
Functions ¶
func BuildCompactionPrompt ¶ added in v0.2.0
BuildCompactionPrompt builds a prompt for LLM-based structured compaction.
func ClassifyContent ¶ added in v0.2.0
ClassifyContent determines the content type of a text string.
func CompressJSON ¶
CompressJSON samples large JSON arrays, keeping error/failure items, first 2, last 2, and a random sample of the middle. If maxItems <= 0, defaults to 20. Non-array input is returned unchanged.
func CompressLog ¶
CompressLog preserves ERROR/WARN/FATAL lines and stack traces, collapsing runs of 3+ similar INFO/DEBUG lines into a summary.
func DetectLanguageByExtension ¶
DetectLanguageByExtension returns the programming language name for a file path based on its extension. Returns "" for unknown extensions.
func EstimateCompression ¶ added in v0.2.0
EstimateCompression applies ratio-based truncation intelligently, keeping first/last and dropping middle.
func EstimateTokens ¶
EstimateTokens returns the estimated token count for the given text.
func EstimateTokensPrecise ¶
EstimateTokensPrecise uses BPE tokenization (slower, more accurate).
func FormatResult ¶ added in v0.2.0
func FormatResult(result *OptimizationResult) string
FormatResult produces a human-readable summary of the optimization result.
func FormatUsageBar ¶ added in v0.2.0
FormatUsageBar creates an ASCII progress bar.
func GetLanguagePatterns ¶
GetLanguagePatterns returns custom patterns if registered, else built-in pattern strings.
func IsHighEntropy ¶ added in v0.2.0
IsHighEntropy returns true if the string's Shannon entropy exceeds the given threshold. A threshold of 4.5 is generally a good indicator for detecting random secrets/keys. Most English text has entropy around 3.5-4.0, while random base64/hex strings typically have entropy above 4.5.
func OptimalBudget ¶ added in v0.2.0
OptimalBudget recommends how many tokens to allocate based on content type and importance (0.0-1.0).
func RecommendMode ¶ added in v0.2.0
RecommendMode recommends the best compression mode for a given content type and token budget.
func RegisterChunker ¶
func RegisterChunker(ext string, fn ChunkerFunc)
RegisterChunker registers a custom chunker for a file extension (e.g. ".go").
func RegisterLanguagePatterns ¶
RegisterLanguagePatterns registers custom boundary patterns for a language. These override the built-in patterns when looking up boundaries.
func ShannonEntropy ¶ added in v0.2.0
ShannonEntropy calculates the Shannon entropy of a string in bits per character. Higher values indicate more randomness, which is characteristic of secrets and keys.
func SuggestBudget ¶ added in v0.2.0
func SuggestBudget(blocks []ContentBlock) int
SuggestBudget recommends an optimal budget based on content blocks.
func WarmupTokenizer ¶
func WarmupTokenizer()
WarmupTokenizer pre-initializes the BPE tokenizer in the background. Call at application startup to avoid latency on the first Compress call.
func WastedTokens ¶ added in v0.2.0
func WastedTokens(events []CompressionEvent) int
WastedTokens calculates how many tokens could have been saved with optimal compression strategies.
Types ¶
type Alert ¶ added in v0.2.0
type Alert struct {
Level string // "warning", "critical", "limit_reached"
Message string
Timestamp time.Time
Threshold float64 // what % triggered it
}
Alert represents a usage threshold alert.
type ChunkOptions ¶
type ChunkOptions struct {
MaxTokens int
MinTokens int
MinChunkSize int // hard minimum; chunks below this get heavy DP penalty (default: DefaultMinChunkSize)
Language string
Overlap int // number of tokens worth of content to repeat from previous chunk
KeepSeparator SeparatorKeep // controls boundary line placement (default: SepLeft)
}
ChunkOptions configures the code chunking behavior.
func DefaultChunkOptions ¶
func DefaultChunkOptions() ChunkOptions
DefaultChunkOptions returns sensible defaults for code chunking.
type ChunkerFunc ¶
ChunkerFunc is a custom chunker that takes a file path and content, returning the detected language and code chunks.
type CodeChunk ¶
type CodeChunk struct {
Content string `json:"content"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
Symbol string `json:"symbol,omitempty"`
Tokens int `json:"tokens"`
}
CodeChunk represents a semantically meaningful chunk of source code.
func ChunkCode ¶
func ChunkCode(source string, opts ChunkOptions) []CodeChunk
ChunkCode splits source code into semantically meaningful chunks based on language-aware boundary detection (function/class/method definitions). If a custom chunker is registered for the file extension in opts.Language, it is used instead of the default pipeline.
func ChunkCodePath ¶
func ChunkCodePath(path, source string, opts ChunkOptions) []CodeChunk
ChunkCodePath is like ChunkCode but accepts a file path for registry lookup.
type CompactionSchema ¶ added in v0.2.0
type CompactionSchema struct {
TaskOverview string `json:"task_overview"`
CurrentState string `json:"current_state"`
ImportantDiscoveries []string `json:"important_discoveries"`
NextSteps []string `json:"next_steps"`
ContextToPreserve []string `json:"context_to_preserve"`
}
CompactionSchema is a 5-field structured schema for LLM-based context compaction. It preserves critical information while maximizing compression.
func ParseCompactionResponse ¶ added in v0.2.0
func ParseCompactionResponse(response string) (*CompactionSchema, error)
ParseCompactionResponse parses an LLM's JSON response into a CompactionSchema.
func (*CompactionSchema) ToPrompt ¶ added in v0.2.0
func (s *CompactionSchema) ToPrompt() string
ToPrompt renders the schema as a structured prompt for reinsertion into context.
type CompressionAdvisor ¶ added in v0.2.0
type CompressionAdvisor struct {
History []CompressionEvent
Strategies map[string]*StrategyStats
// contains filtered or unexported fields
}
CompressionAdvisor analyzes compression sessions and recommends optimal strategies.
func NewCompressionAdvisor ¶ added in v0.2.0
func NewCompressionAdvisor() *CompressionAdvisor
NewCompressionAdvisor creates a new CompressionAdvisor instance.
func (*CompressionAdvisor) Analyze ¶ added in v0.2.0
func (ca *CompressionAdvisor) Analyze() []Recommendation
Analyze examines the history and produces recommendations per content type.
func (*CompressionAdvisor) FormatAdvice ¶ added in v0.2.0
func (ca *CompressionAdvisor) FormatAdvice() string
FormatAdvice returns a human-readable report of compression advice.
func (*CompressionAdvisor) Record ¶ added in v0.2.0
func (ca *CompressionAdvisor) Record(event CompressionEvent)
Record records a compression event and updates strategy statistics.
type CompressionEvent ¶ added in v0.2.0
type CompressionEvent struct {
Input string
Output string
Mode string
InputTokens int
OutputTokens int
Savings float64
Timestamp time.Time
ContentType string
}
CompressionEvent records a single compression operation.
type Compressor ¶
type Compressor struct {
// contains filtered or unexported fields
}
Compressor is a reusable compression instance. Reuses internal caches across calls for better performance.
func NewCompressor ¶
func NewCompressor(opts ...Option) *Compressor
NewCompressor creates a reusable compressor.
type ContentBlock ¶ added in v0.2.0
type ContentBlock struct {
ID string
Content string
Tokens int
Priority float64
Category string // "system", "memory", "conversation", "tool_output", "context"
Compressible bool
CompressedContent string
}
ContentBlock represents a unit of content that can be optimized.
func CompressBlock ¶ added in v0.2.0
func CompressBlock(block ContentBlock, targetTokens int) ContentBlock
CompressBlock applies compression to a block proportional to how much we need to save. targetTokens is the desired token count after compression.
type ContextOptimizer ¶ added in v0.2.0
type ContextOptimizer struct {
Budget int
Strategy string // "greedy", "balanced", "priority"
// contains filtered or unexported fields
}
ContextOptimizer maximizes information density within a token budget.
func NewContextOptimizer ¶ added in v0.2.0
func NewContextOptimizer(budget int) *ContextOptimizer
NewContextOptimizer creates an optimizer with the given token budget.
func (*ContextOptimizer) Optimize ¶ added in v0.2.0
func (co *ContextOptimizer) Optimize(blocks []ContentBlock) *OptimizationResult
Optimize selects the best strategy and optimizes content blocks within budget.
type OptimizationResult ¶ added in v0.2.0
type OptimizationResult struct {
Kept []ContentBlock
Dropped []ContentBlock
Compressed []ContentBlock
TotalTokens int
BudgetUsed float64
Savings int
}
OptimizationResult holds the outcome of an optimization pass.
func BalancedOptimize ¶ added in v0.2.0
func BalancedOptimize(blocks []ContentBlock, budget int) *OptimizationResult
BalancedOptimize ensures each category gets minimum representation, then fills by priority.
func GreedyOptimize ¶ added in v0.2.0
func GreedyOptimize(blocks []ContentBlock, budget int) *OptimizationResult
GreedyOptimize takes highest priority blocks until budget is full.
func PriorityOptimize ¶ added in v0.2.0
func PriorityOptimize(blocks []ContentBlock, budget int) *OptimizationResult
PriorityOptimize uses strict priority ordering, compressing before dropping.
type Option ¶
type Option interface {
// contains filtered or unexported methods
}
Option configures compression behavior.
var ( Minimal Option = WithMode(ModeMinimal) Aggressive Option = WithMode(ModeAggressive) Surface Option = WithTier(TierSurface) Adaptive Option = WithTier(TierAdaptive) Code Option = WithTier(TierCode) Log Option = WithTier(TierLog) )
Pre-built option presets (use directly as options).
func WithBudget ¶
WithBudget sets a hard token limit for the output.
type Recommendation ¶ added in v0.2.0
type Recommendation struct {
Strategy string
Reason string
ExpectedSavings float64
ContentType string
Priority int
}
Recommendation represents an advisor suggestion for improving compression.
type SecretDetector ¶ added in v0.2.0
type SecretDetector struct {
// contains filtered or unexported fields
}
SecretDetector detects and redacts secrets from text using compiled regex patterns.
func DefaultSecretDetector ¶ added in v0.2.0
func DefaultSecretDetector() *SecretDetector
DefaultSecretDetector returns the singleton SecretDetector instance with all built-in patterns.
func NewSecretDetector ¶ added in v0.2.0
func NewSecretDetector() *SecretDetector
NewSecretDetector creates a SecretDetector with all built-in patterns compiled and ready.
func (*SecretDetector) DetectAndRedactWithEntropy ¶ added in v0.2.0
func (sd *SecretDetector) DetectAndRedactWithEntropy(text string, entropyThreshold float64) string
DetectAndRedactWithEntropy performs secret detection using both patterns and entropy analysis. It applies entropy-based detection for segments that match a generic high-entropy pattern but were not caught by specific detectors.
func (*SecretDetector) DetectSecrets ¶ added in v0.2.0
func (sd *SecretDetector) DetectSecrets(text string) []SecretMatch
DetectSecrets scans the input text and returns all detected secret matches. Matches are returned sorted by StartPos in ascending order.
func (*SecretDetector) RedactSecrets ¶ added in v0.2.0
func (sd *SecretDetector) RedactSecrets(text string) string
RedactSecrets replaces all detected secrets in the text with [REDACTED:<type>] placeholders.
type SecretMatch ¶ added in v0.2.0
type SecretMatch struct {
Type string // The category/type of the secret (e.g., "AWS Access Key", "GitHub Token")
Value string // The matched secret value
Masked string // A masked version for safe display
StartPos int // Start byte position in the original text
EndPos int // End byte position in the original text
}
SecretMatch represents a detected secret within text.
type SeparatorKeep ¶
type SeparatorKeep int
SeparatorKeep controls what happens to boundary separators during splitting.
const ( SepLeft SeparatorKeep = iota // separator stays with preceding chunk (default) SepRight // separator stays with following chunk SepDiscard // separator is removed from output )
type Stats ¶
type Stats struct {
OriginalTokens int
FinalTokens int
TokensSaved int
ReductionPercent float64
Layers map[string]LayerStat
}
Stats contains compression statistics.
type StrategyStats ¶ added in v0.2.0
type StrategyStats struct {
Name string
TotalEvents int
AvgSavings float64
BestSavings float64
WorstSavings float64
AvgQuality float64
}
StrategyStats tracks performance statistics for a compression strategy.
type StreamCompressor ¶
type StreamCompressor struct {
// contains filtered or unexported fields
}
StreamCompressor maintains a background-compressed version of accumulating content. As new content is appended, it re-compresses in the background so a compressed snapshot is always available without blocking.
func NewStreamCompressor ¶
func NewStreamCompressor(threshold int, opts ...Option) *StreamCompressor
NewStreamCompressor creates a background compressor that keeps compressed output ready at all times. Threshold is the token count that triggers background re-compression. If threshold <= 0, it defaults to 2000 tokens.
func (*StreamCompressor) Append ¶
func (sc *StreamCompressor) Append(content string)
Append adds new content. If accumulated tokens exceed the threshold, triggers background re-compression.
func (*StreamCompressor) Close ¶
func (sc *StreamCompressor) Close()
Close shuts down the background compressor and waits for any in-progress compression to finish.
func (*StreamCompressor) Raw ¶
func (sc *StreamCompressor) Raw() string
Raw returns all accumulated raw content.
func (*StreamCompressor) Snapshot ¶
func (sc *StreamCompressor) Snapshot() (string, Stats)
Snapshot returns the current compressed output without blocking. If compression hasn't run yet, returns the raw content joined.
func (*StreamCompressor) TokenCount ¶
func (sc *StreamCompressor) TokenCount() int
TokenCount returns estimated token count of raw content.
type Tier ¶
type Tier string
Tier selects a pre-built pipeline profile.
const ( TierSurface Tier = "surface" // 4 layers, fast TierTrim Tier = "trim" // 8 layers, balanced TierExtract Tier = "extract" // 20 layers, max compression TierCore Tier = "core" // 20 layers, quality-first TierCode Tier = "code" // code-aware TierLog Tier = "log" // log-aware TierThread Tier = "thread" // conversation-aware TierAdaptive Tier = "adaptive" // auto-detect content type )
type UsageEntry ¶ added in v0.2.0
type UsageEntry struct {
Tokens int
CostUSD float64
Timestamp time.Time
Provider string
Model string
}
UsageEntry represents a single recorded usage event.
type UsageSummary ¶ added in v0.2.0
type UsageSummary struct {
HourlyTokens int
HourlyRemaining int
DailyTokens int
DailyRemaining int
SessionTokens int
SessionRemaining int
DailyCostUSD float64
CostRemaining float64
HourlyPct float64
DailyPct float64
}
UsageSummary provides a snapshot of current usage across all windows.
type UsageTracker ¶ added in v0.2.0
type UsageTracker struct {
DailyLimit int
HourlyLimit int
SessionLimit int
CostLimitUSD float64
Alerts []Alert
// contains filtered or unexported fields
}
UsageTracker tracks API usage across sessions and prevents surprise bills.
func NewUsageTracker ¶ added in v0.2.0
func NewUsageTracker() *UsageTracker
NewUsageTracker creates a UsageTracker with sensible defaults.
func (*UsageTracker) CanProceed ¶ added in v0.2.0
func (u *UsageTracker) CanProceed() (bool, string)
CanProceed checks all limits and returns whether another request is allowed. If not, it returns false and a reason string.
func (*UsageTracker) CheckThresholds ¶ added in v0.2.0
func (u *UsageTracker) CheckThresholds()
CheckThresholds evaluates current usage against threshold levels and generates alerts.
func (*UsageTracker) EstimateRemaining ¶ added in v0.2.0
func (u *UsageTracker) EstimateRemaining(tokensPerRequest int) int
EstimateRemaining estimates how many more requests of the given size fit in the budget.
func (*UsageTracker) FormatSummary ¶ added in v0.2.0
func (u *UsageTracker) FormatSummary() string
FormatSummary returns a human-readable usage summary.
func (*UsageTracker) GetUsage ¶ added in v0.2.0
func (u *UsageTracker) GetUsage() UsageSummary
GetUsage returns a snapshot of current usage.
func (*UsageTracker) PruneOld ¶ added in v0.2.0
func (u *UsageTracker) PruneOld()
PruneOld removes entries older than their respective windows.
func (*UsageTracker) Record ¶ added in v0.2.0
func (u *UsageTracker) Record(tokens int, costUSD float64, provider, model string)
Record adds a usage entry and checks thresholds.
func (*UsageTracker) Reset ¶ added in v0.2.0
func (u *UsageTracker) Reset()
Reset clears the session counter and alerts.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
internal
|
|
|
cache
Package cache provides persistent query caching for tok.
|
Package cache provides persistent query caching for tok. |
|
config
Package config provides configuration management for tok.
|
Package config provides configuration management for tok. |
|
core
Package core provides core interfaces and utilities for tok.
|
Package core provides core interfaces and utilities for tok. |
|
filter
Package filter provides LRU caching using the unified cache package.
|
Package filter provides LRU caching using the unified cache package. |
|
simd
Package simd provides performance-optimized string operations using manual loop unrolling (processing 16 bytes per iteration).
|
Package simd provides performance-optimized string operations using manual loop unrolling (processing 16 bytes per iteration). |