Documentation
¶
Overview ¶
Package contextpacker provides token-aware context packing for LLMs.
The DOMCompressor reduces token usage by stripping non-essential HTML elements and attributes from raw DOM strings (e.g., from playwright-go).
This is critical for browser automation agents where:
- playwright-go returns massive DOM strings
- Raw DOM quickly exhausts token limits
- LLMs hallucinate when context is too large
Example:
compressor := contextpacker.NewDOMCompressor() compressed, err := compressor.Compress(rawDOM) // Use compressed string for LLM context
Index ¶
- Constants
- Variables
- type CompressionStats
- type DOMCompressor
- type DOMCompressorOption
- func WithComments(remove bool) DOMCompressorOption
- func WithFlattenDivs(flatten bool) DOMCompressorOption
- func WithKeepAttributes(attrs ...string) DOMCompressorOption
- func WithMaxDepth(depth int) DOMCompressorOption
- func WithRemoveAttributes(attrs ...string) DOMCompressorOption
- func WithScriptTags(remove bool) DOMCompressorOption
- func WithStyleTags(remove bool) DOMCompressorOption
- type GreedyStrategy
- type ImportanceStrategy
- type MetadataStrategy
- type Option
- type PackedResult
- type Packer
- func (p *Packer) Pack(ctx context.Context, docs []schema.Document) (PackedResult, error)
- func (p *Packer) PackScored(ctx context.Context, docs []schema.ScoredDocument) (PackedResult, error)
- func (p *Packer) PackWithScores(ctx context.Context, docs []schema.Document, scores []float64) (PackedResult, error)
- type PackingStrategy
- type TokenStats
- type UsedDocument
Constants ¶
const CompactTemplate = `{{.Content}}`
CompactTemplate is a minimal format without metadata.
const DefaultTemplate = `{{.Content}}
---
Metadata: {{.Metadata}}
---`
DefaultTemplate is the default format for packing documents.
Variables ¶
var ( // ErrNilTokenizer is returned when a nil tokenizer is provided. ErrNilTokenizer = errors.New("tokenizer cannot be nil") // ErrInvalidMaxTokens is returned when maxTokens is less than or equal to zero. ErrInvalidMaxTokens = errors.New("maxTokens must be greater than zero") // ErrTokenCountFailed is returned when token counting fails. ErrTokenCountFailed = errors.New("failed to count tokens") // ErrTemplateParse is returned when template parsing fails. ErrTemplateParse = errors.New("failed to parse template") // ErrTemplateExecute is returned when template execution fails. ErrTemplateExecute = errors.New("failed to execute template") )
Functions ¶
This section is empty.
Types ¶
type CompressionStats ¶
type CompressionStats struct {
OriginalLength int // Original DOM length in characters
CompressedLength int // Compressed DOM length in characters
ReductionPercent float64 // Percentage reduction in size
TokensSaved int // Estimated tokens saved
}
CompressionStats contains compression statistics.
type DOMCompressor ¶
type DOMCompressor struct {
// RemoveStyleTags removes <style> and </style> tags and their content.
RemoveStyleTags bool
// RemoveScriptTags removes <script> and </script> tags and their content.
RemoveScriptTags bool
// RemoveComments removes HTML comments <!-- ... -->.
RemoveComments bool
// RemoveAttributes lists attributes to remove (e.g., "class", "style", "data-*").
// Keep attributes in KeepAttributes to preserve them.
RemoveAttributes []string
// KeepAttributes lists attributes to preserve (e.g., "id", "name", "type", "aria-label").
KeepAttributes []string
// FlattenDivs removes deeply nested <div> chains that are purely for layout.
FlattenDivs bool
// PreserveSemanticTags keeps semantic HTML5 tags (article, nav, section, etc.).
PreserveSemanticTags bool
// MaxDepth limits nesting depth (0 = unlimited).
MaxDepth int
}
DOMCompressor strips non-essential HTML elements and attributes to reduce token usage while preserving semantic structure.
func NewDOMCompressor ¶
func NewDOMCompressor(opts ...DOMCompressorOption) *DOMCompressor
NewDOMCompressor creates a compressor with sensible defaults.
Default configuration:
- Removes style and script tags
- Removes HTML comments
- Removes class, style, and data-* attributes
- Keeps id, name, type, aria-label, href, src, alt attributes
- Flattens deeply nested divs
- Preserves semantic HTML5 tags
func (*DOMCompressor) Compress ¶
func (c *DOMCompressor) Compress(dom string) (string, error)
Compress reduces a raw DOM string by removing non-essential elements and attributes while preserving semantic structure.
func (*DOMCompressor) CompressWithStats ¶
func (c *DOMCompressor) CompressWithStats(dom string) (string, CompressionStats, error)
CompressWithStats returns compression statistics along with the result.
func (*DOMCompressor) MustCompress ¶
func (c *DOMCompressor) MustCompress(dom string) string
MustCompress compresses the DOM or panics on error.
type DOMCompressorOption ¶
type DOMCompressorOption func(*DOMCompressor)
DOMCompressorOption configures the DOMCompressor.
func WithComments ¶
func WithComments(remove bool) DOMCompressorOption
WithComments configures HTML comment removal.
func WithFlattenDivs ¶
func WithFlattenDivs(flatten bool) DOMCompressorOption
WithFlattenDivs configures div flattening.
func WithKeepAttributes ¶
func WithKeepAttributes(attrs ...string) DOMCompressorOption
WithKeepAttributes sets attributes to preserve.
func WithMaxDepth ¶
func WithMaxDepth(depth int) DOMCompressorOption
WithMaxDepth sets maximum nesting depth (0 = unlimited).
func WithRemoveAttributes ¶
func WithRemoveAttributes(attrs ...string) DOMCompressorOption
WithRemoveAttributes sets additional attributes to remove. Supports data-* pattern for all data attributes.
func WithScriptTags ¶
func WithScriptTags(remove bool) DOMCompressorOption
WithScriptTags configures script tag removal.
func WithStyleTags ¶
func WithStyleTags(remove bool) DOMCompressorOption
WithStyleTags configures style tag removal.
type GreedyStrategy ¶
type GreedyStrategy struct{}
GreedyStrategy packs documents in their original order.
func (GreedyStrategy) Name ¶
func (GreedyStrategy) Name() string
type ImportanceStrategy ¶
type ImportanceStrategy struct{}
ImportanceStrategy packs documents ordered by score (highest first).
func (ImportanceStrategy) Name ¶
func (ImportanceStrategy) Name() string
type MetadataStrategy ¶
MetadataStrategy packs documents ordered by a metadata field.
func (MetadataStrategy) Name ¶
func (s MetadataStrategy) Name() string
type Option ¶
type Option func(*options)
Option configures a Packer.
func WithStrategy ¶
func WithStrategy(strategy PackingStrategy) Option
WithStrategy sets the packing strategy.
func WithTemplate ¶
WithTemplate sets the document template. Use {{.content}} for document content and {{.metadata}} for metadata.
type PackedResult ¶
type PackedResult struct {
// Content is the final packed context string.
Content string
// UsedDocuments are the documents that fit in the token budget.
UsedDocuments []UsedDocument
// TokenStats contains token usage statistics.
TokenStats TokenStats
// Truncated indicates whether some documents were dropped.
Truncated bool
}
PackedResult contains the result of packing documents.
type Packer ¶
type Packer struct {
// contains filtered or unexported fields
}
Packer packs documents into a context window within token limits.
func (*Packer) Pack ¶
Pack packs documents into a context string within token limits. Documents are packed atomically - either fully included or dropped.
func (*Packer) PackScored ¶
func (p *Packer) PackScored(ctx context.Context, docs []schema.ScoredDocument) (PackedResult, error)
PackScored packs ScoredDocuments using their scores for importance ordering.
func (*Packer) PackWithScores ¶
func (p *Packer) PackWithScores(ctx context.Context, docs []schema.Document, scores []float64) (PackedResult, error)
PackWithScores packs documents with optional scores for importance ordering. If scores is nil or empty, the default strategy ordering is used.
type PackingStrategy ¶
type PackingStrategy interface {
// Name returns the strategy name for logging and debugging.
Name() string
// Order returns documents in the order they should be packed.
Order(docs []schema.Document, scores []float64) []schema.Document
}
PackingStrategy defines how documents are ordered before packing.
type TokenStats ¶
type TokenStats struct {
// TotalTokens is the total tokens used in the packed content.
TotalTokens int
// MaxTokens is the maximum allowed tokens.
MaxTokens int
// DocumentsConsidered is the number of documents evaluated.
DocumentsConsidered int
// DocumentsPacked is the number of documents that fit.
DocumentsPacked int
}
TokenStats holds token usage statistics.