Documentation
¶
Overview ¶
Package tokenopt reduces the token footprint of session raw.jsonl streams before they are handed to a summarization LLM. Deterministic, streaming, single-pass, no LLM calls.
The one public entry point is Compress. It reads a raw.jsonl stream from r, applies a fixed set of transforms, and writes the compressed stream to w. Memory stays bounded (a small dedup set of content hashes) regardless of session size.
This package is intentionally self-contained — it depends only on the standard library and a small LRU — so it can be used by the CLI, the sessionsummary package, and a future server-side distiller with no coupling. No persistence, no sidecar manifest: the raw.jsonl on disk is the audit trail.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Mode ¶
type Mode int
Mode controls how aggressively Compress reduces the stream.
const ( // ModeConversationOnly (default) emits user + assistant entries verbatim, // drops header and system entries, and selectively replaces tool entries: // // - Tool calls with a non-empty `description` field in tool_input // (Bash, Agent, Task, WebFetch, ...) emit // `{type:"tool_mark", description:"..."}`. Adjacent calls with the // same description collapse via the count field. // // - Tool calls without a description (Edit, Read, Write, Glob, Grep, ...) // produce NO tool_mark at all. Their actions are recoverable from // surrounding assistant prose ("I'll edit foo.go and run tests"), // so the marker would only be redundant noise. // // The descriptions are agent-authored intent strings — high // signal-per-byte, and far cheaper than the previous 120-char brief // extracted heuristically from the most-meaningful input field. ModeConversationOnly Mode = 0 // ModeLossless keeps every entry but applies content-level transforms // (ANSI strip, progress collapse, image elision, large-Read truncation, // system-reminder + tool_result dedup). Use when a downstream consumer // still needs tool details (replay, debugging, contract-testing). ModeLossless Mode = 1 )
type Options ¶
type Options struct {
// Mode selects the compression strategy. Defaults to ModeConversationOnly.
Mode Mode
// LargeReadMaxLines is the line threshold above which a Read tool_result
// body is truncated to head+tail. ModeLossless only. Defaults to 120.
LargeReadMaxLines int
// LargeReadKeepLines is how many lines to keep from the start and end when
// a large Read body is truncated. ModeLossless only. Defaults to 40.
LargeReadKeepLines int
// ToolResultLRUSize caps the content-hash LRU for tool_result dedup.
// ModeLossless only. Defaults to 1024 unique payloads.
ToolResultLRUSize int
// ToolResultMinBytes is the minimum content size worth deduping.
// ModeLossless only. Defaults to 512.
ToolResultMinBytes int
}
Options configures a Compress run. Zero value is a reasonable default (ModeConversationOnly).
type Stats ¶
type Stats struct {
EntriesIn int
EntriesOut int // differs from EntriesIn in ModeConversationOnly where tool entries collapse to markers
BytesIn int64
BytesOut int64
ToolsMarked int // ModeConversationOnly: count of tool entries replaced with compact markers
ToolsBatched int // ModeConversationOnly: count of adjacent identical tool_marks collapsed into prior with count++
SystemDropped int // ModeConversationOnly: count of system entries dropped
HeaderDropped int // ModeConversationOnly: count of header entries dropped (header carries metadata the LLM doesn't need; daemon stamps it from stored.Meta directly)
ANSIStripped int // ModeLossless: entries with ANSI sequences removed
ProgressCollapsed int // ModeLossless: entries with \r progress frames collapsed
ImagesElided int // ModeLossless: base64 image payloads replaced
LargeReadsElided int // ModeLossless: large Read tool_result bodies truncated
RemindersDeduped int // ModeLossless: <system-reminder> blocks replaced with a ref
ToolResultsRefd int // ModeLossless: tool_result bodies replaced with a tool_ref
// Token estimates use a ~4 chars/token heuristic from internal/tokens.
// They exist so callers can log a token-reduction number alongside the
// byte-reduction one. Tokens are the actual LLM cost driver; bytes are
// a proxy. Swap the estimator (or wire in a real BPE tokenizer) without
// changing any caller — the field shape stays stable.
TokensInEstimate int64
TokensOutEstimate int64
}
Stats reports what Compress did. Zero values are meaningful (no matches).
func Compress ¶
Compress reads raw.jsonl entries from r, applies streaming transforms, and writes compressed jsonl to w. Returns Stats describing what was done.
Guarantees (both modes):
- Single pass over r. Surviving entries emit in original order.
- Constant memory relative to session size.
- User turns and header entries are preserved verbatim, always.
Mode-specific behavior:
- ModeConversationOnly (default): assistant turns verbatim; tool entries collapse to compact tool_mark entries with a brief gist; system entries are dropped. Typical reduction 50–80% on real sessions.
- ModeLossless: every entry preserved; content-level transforms only (ANSI strip, image elision, dedup, etc.).
func CompressWith ¶
CompressWith is Compress with tunable options.
func (*Stats) Add ¶
Add accumulates other into s. Useful for aggregating across many sessions (e.g., a daemon summarizing a nightly batch).
func (Stats) LogValue ¶
LogValue implements slog.LogValuer. Enables callers (CLI, daemon) to emit single-line key=value compression telemetry with:
slog.Info("token_optimize", "stats", stats)
func (Stats) Reduction ¶
Reduction returns bytes saved and percentage. Safe to call when BytesIn is 0.
func (Stats) TokenReduction ¶ added in v0.7.2
TokenReduction returns estimated tokens saved and percentage. The cost dashboard wants this — bytes are a proxy, tokens are the actual LLM cost. Code-heavy sessions tokenize at a different ratio than prose-heavy ones, so byte-percentage trends mislead operators on cost.