Documentation
¶
Overview ¶
Package tokenstrip is a streaming, token-aware compaction stage for session raw.jsonl streams. It sits downstream of tokenopt in the session pipeline and reduces token count — not bytes — via a small set of intentionally conservative transforms.
Why a separate package ¶
tokenopt produces a byte-reduced stream (ANSI strip, image elision, tool-result dedup, etc.). Those transforms save bytes but rarely save tokens in proportion. tokenstrip attacks the tokenizer directly: NFC-normalize, eliminate zero-width characters, canonicalize whitespace, and — strictly inside assistant <thinking> blocks — drop stop words and optionally substitute high-token phrases with shorter synonyms.
Safety model — precise contract by transform ¶
Some transforms here are lossy. The package is therefore OFF by default in upstream callers and gated behind explicit opt-in.
Fields NEVER mutated, regardless of transform or config:
- header entries (session metadata)
- user turns, in their entirety (intent signal is sacred)
- tool_name, tool_input, tool_mark.brief (summarizer scaffolding)
For assistant entries, the applicability depends on whether a transform is lossless or lossy:
Lossless transforms (apply to assistant content globally): - NFC Unicode normalization — round-trippable canonical form - Zero-width + unusual whitespace strip — information-free glyphs - Whitespace canonicalization — multiple spaces/newlines → one Lossy transforms (apply ONLY to text inside <thinking>…</thinking>): - Stop-word removal - Synonym substitution (opt-in even when tokenstrip is enabled)
This means assistant prose OUTSIDE <thinking> may see its whitespace canonicalized and zero-width chars removed (lossless-safe), but its words will never be dropped or rewritten (preserves the answer to the user verbatim). Assistant prose INSIDE <thinking> may additionally lose stop words / have synonyms substituted (lossy but scoped to reasoning).
Streaming ¶
Compress is single-pass over r, bounded memory, tolerant of oversized entries (>64KB). Unknown top-level JSON fields on each entry round-trip via map[string]json.RawMessage so downstream consumers keep whatever schema extensions upstream added.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DefaultSynonymTable ¶
DefaultSynonymTable returns the baseline high-token phrase → shorter form mapping. Kept short and conservative; callers wanting more aggressive shortening should pass their own table.
Types ¶
type DropThinkingMode ¶ added in v0.7.2
type DropThinkingMode int
DropThinkingMode controls the aggressive reduction of <thinking> blocks. See Options.DropThinkingMode for the rationale.
const ( // DropThinkingNone preserves the entire <thinking> block (default). // Stop-word removal still applies to the inner text. DropThinkingNone DropThinkingMode = 0 // DropThinkingFirstSentence keeps the first sentence of each block // and replaces the rest with an elision marker. Sentence boundary // is the first ".", "!", or "?" followed by whitespace or end-of-block. DropThinkingFirstSentence DropThinkingMode = 1 // DropThinkingAll removes the entire block (including the surrounding // <thinking> tags). Maximum aggression — use only with telemetry- // backed evidence that quality_score isn't impacted. DropThinkingAll DropThinkingMode = 2 )
type Options ¶
type Options struct {
// EnableSynonymSub turns on phrase→synonym substitution inside assistant
// <thinking> blocks. Off by default even when tokenstrip itself is on,
// because the table is opinionated and can produce awkward reasoning
// text; callers should opt in explicitly.
EnableSynonymSub bool
// DropThinkingMode controls whether <thinking>...</thinking> blocks in
// assistant content are reduced more aggressively than the default
// stop-word strip. Default is DropThinkingNone (preserve current
// behavior — full block kept, with stop-word removal applied to its
// contents).
//
// On long Sonnet/Opus sessions, thinking blocks can be 30–50% of
// total assistant prose. The summary schema (title, key_actions,
// chapter_titles, aha_moments) doesn't require them, but the chain
// of reasoning sometimes contains the framing for an aha_moment, so
// we offer a conservative middle ground:
//
// - DropThinkingNone: keep the entire block (current behavior).
// - DropThinkingFirstSentence: keep only the first sentence of
// each block (typically the framing — "Let me work out X" or
// "I need to figure out Y") and elide the body. Preserves the
// hint of where the reasoning was going without the deliberation
// cost.
// - DropThinkingAll: drop the block entirely. Maximum savings,
// maximum risk.
//
// Recommended rollout: DropThinkingFirstSentence behind an env-var
// flag for a few weeks, A/B against EventSummarization quality_score
// distribution, flip to default if there's no measurable quality
// drop on the cohort that has it enabled.
DropThinkingMode DropThinkingMode
// SynonymTable overrides the default substitution table. Keys are
// matched case-insensitively as whole words. A nil map falls back to
// DefaultSynonymTable().
SynonymTable map[string]string
// StopWordLanguage is an ISO-639-1 language code (e.g. "en", "fr").
// Empty string defaults to "en".
StopWordLanguage string
}
Options configures a Compress run. Zero value is a reasonable default (English stop words, synonym substitution OFF, thinking blocks kept).
type Stats ¶
type Stats struct {
EntriesIn int
EntriesOut int
BytesIn int64
BytesOut int64
NFCNormalized int // entries where NFC normalization changed content
ZeroWidthStripped int // entries where zero-width / unusual whitespace was removed
WhitespaceCanonicalized int // entries where whitespace collapse changed content
StopWordsRemoved int // <thinking> blocks where stop words were removed
SynonymsSubstituted int // <thinking> blocks where synonym substitution fired
// ThinkingBlocksTrimmed counts <thinking> blocks reduced under
// DropThinkingFirstSentence (kept first sentence only).
ThinkingBlocksTrimmed int
// ThinkingBlocksDropped counts <thinking> blocks removed entirely
// under DropThinkingAll.
ThinkingBlocksDropped int
// Token estimates use a ~4 chars/token heuristic (Anthropic's rule of
// thumb). They exist so callers can log a rough token-reduction number
// without pulling in a BPE-heavy tokenizer. When a real tokenizer is
// wired in later, swap the estimator in transforms.go and these fields
// will reflect actual counts.
TokensInEstimate int64
TokensOutEstimate int64
}
Stats reports what Compress did. Zero values mean no matches.
func Compress ¶
Compress reads raw.jsonl entries from r, applies token-aware transforms, and writes the transformed stream to w. Equivalent to CompressWith with a zero Options value.
func CompressWith ¶
CompressWith is Compress with tunable options.
Guarantees:
- Single pass over r, bounded memory.
- Entry order preserved; nothing is dropped.
- User turns and header entries are byte-identical on output.
- Unknown top-level JSON fields survive round-trip.
func (Stats) LogValue ¶
LogValue implements slog.LogValuer. Enables single-line key=value telemetry:
slog.Info("tokenstrip", "stats", stats)
func (Stats) TokenReduction ¶
TokenReduction returns estimated tokens saved and percentage.