contextpacker

package
v0.34.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2026 License: MIT Imports: 11 Imported by: 0

Documentation

Overview

Package contextpacker provides token-aware context packing for LLMs.

The DOMCompressor reduces token usage by stripping non-essential HTML elements and attributes from raw DOM strings (e.g., from playwright-go).

This is critical for browser automation agents where:

  • playwright-go returns massive DOM strings
  • Raw DOM quickly exhausts token limits
  • LLMs hallucinate when context is too large

Example:

compressor := contextpacker.NewDOMCompressor()
compressed, err := compressor.Compress(rawDOM)
// Use compressed string for LLM context

Index

Constants

View Source
const CompactTemplate = `{{.Content}}`

CompactTemplate is a minimal format without metadata.

View Source
const DefaultTemplate = `{{.Content}}

---
Metadata: {{.Metadata}}
---`

DefaultTemplate is the default format for packing documents.

Variables

View Source
var (
	// ErrNilTokenizer is returned when a nil tokenizer is provided.
	ErrNilTokenizer = errors.New("tokenizer cannot be nil")
	// ErrInvalidMaxTokens is returned when maxTokens is less than or equal to zero.
	ErrInvalidMaxTokens = errors.New("maxTokens must be greater than zero")
	// ErrTokenCountFailed is returned when token counting fails.
	ErrTokenCountFailed = errors.New("failed to count tokens")
	// ErrTemplateParse is returned when template parsing fails.
	ErrTemplateParse = errors.New("failed to parse template")
	// ErrTemplateExecute is returned when template execution fails.
	ErrTemplateExecute = errors.New("failed to execute template")
)

Functions

This section is empty.

Types

type CompressionStats

type CompressionStats struct {
	OriginalLength   int     // Original DOM length in characters
	CompressedLength int     // Compressed DOM length in characters
	ReductionPercent float64 // Percentage reduction in size
	TokensSaved      int     // Estimated tokens saved
}

CompressionStats contains compression statistics.

type DOMCompressor

type DOMCompressor struct {
	// RemoveStyleTags removes <style> and </style> tags and their content.
	RemoveStyleTags bool
	// RemoveScriptTags removes <script> and </script> tags and their content.
	RemoveScriptTags bool
	// RemoveComments removes HTML comments <!-- ... -->.
	RemoveComments bool
	// RemoveAttributes lists attributes to remove (e.g., "class", "style", "data-*").
	// Keep attributes in KeepAttributes to preserve them.
	RemoveAttributes []string
	// KeepAttributes lists attributes to preserve (e.g., "id", "name", "type", "aria-label").
	KeepAttributes []string
	// FlattenDivs removes deeply nested <div> chains that are purely for layout.
	FlattenDivs bool
	// PreserveSemanticTags keeps semantic HTML5 tags (article, nav, section, etc.).
	PreserveSemanticTags bool
	// MaxDepth limits nesting depth (0 = unlimited).
	MaxDepth int
}

DOMCompressor strips non-essential HTML elements and attributes to reduce token usage while preserving semantic structure.

func NewDOMCompressor

func NewDOMCompressor(opts ...DOMCompressorOption) *DOMCompressor

NewDOMCompressor creates a compressor with sensible defaults.

Default configuration:

  • Removes style and script tags
  • Removes HTML comments
  • Removes class, style, and data-* attributes
  • Keeps id, name, type, aria-label, href, src, alt attributes
  • Flattens deeply nested divs
  • Preserves semantic HTML5 tags

func (*DOMCompressor) Compress

func (c *DOMCompressor) Compress(dom string) (string, error)

Compress reduces a raw DOM string by removing non-essential elements and attributes while preserving semantic structure.

func (*DOMCompressor) CompressWithStats

func (c *DOMCompressor) CompressWithStats(dom string) (string, CompressionStats, error)

CompressWithStats returns compression statistics along with the result.

func (*DOMCompressor) MustCompress

func (c *DOMCompressor) MustCompress(dom string) string

MustCompress compresses the DOM or panics on error.

type DOMCompressorOption

type DOMCompressorOption func(*DOMCompressor)

DOMCompressorOption configures the DOMCompressor.

func WithComments

func WithComments(remove bool) DOMCompressorOption

WithComments configures HTML comment removal.

func WithFlattenDivs

func WithFlattenDivs(flatten bool) DOMCompressorOption

WithFlattenDivs configures div flattening.

func WithKeepAttributes

func WithKeepAttributes(attrs ...string) DOMCompressorOption

WithKeepAttributes sets attributes to preserve.

func WithMaxDepth

func WithMaxDepth(depth int) DOMCompressorOption

WithMaxDepth sets maximum nesting depth (0 = unlimited).

func WithRemoveAttributes

func WithRemoveAttributes(attrs ...string) DOMCompressorOption

WithRemoveAttributes sets additional attributes to remove. Supports data-* pattern for all data attributes.

func WithScriptTags

func WithScriptTags(remove bool) DOMCompressorOption

WithScriptTags configures script tag removal.

func WithStyleTags

func WithStyleTags(remove bool) DOMCompressorOption

WithStyleTags configures style tag removal.

type GreedyStrategy

type GreedyStrategy struct{}

GreedyStrategy packs documents in their original order.

func (GreedyStrategy) Name

func (GreedyStrategy) Name() string

func (GreedyStrategy) Order

func (GreedyStrategy) Order(docs []schema.Document, _ []float64) []schema.Document

type ImportanceStrategy

type ImportanceStrategy struct{}

ImportanceStrategy packs documents ordered by score (highest first).

func (ImportanceStrategy) Name

func (ImportanceStrategy) Name() string

func (ImportanceStrategy) Order

func (ImportanceStrategy) Order(docs []schema.Document, scores []float64) []schema.Document

type MetadataStrategy

type MetadataStrategy struct {
	Field     string
	Ascending bool
}

MetadataStrategy packs documents ordered by a metadata field.

func (MetadataStrategy) Name

func (s MetadataStrategy) Name() string

func (MetadataStrategy) Order

func (s MetadataStrategy) Order(docs []schema.Document, _ []float64) []schema.Document

type Option

type Option func(*options)

Option configures a Packer.

func WithLogger

func WithLogger(logger *slog.Logger) Option

WithLogger sets a custom logger.

func WithStrategy

func WithStrategy(strategy PackingStrategy) Option

WithStrategy sets the packing strategy.

func WithTemplate

func WithTemplate(tmpl string) Option

WithTemplate sets the document template. Use {{.content}} for document content and {{.metadata}} for metadata.

type PackedResult

type PackedResult struct {
	// Content is the final packed context string.
	Content string
	// UsedDocuments are the documents that fit in the token budget.
	UsedDocuments []UsedDocument
	// TokenStats contains token usage statistics.
	TokenStats TokenStats
	// Truncated indicates whether some documents were dropped.
	Truncated bool
}

PackedResult contains the result of packing documents.

type Packer

type Packer struct {
	// contains filtered or unexported fields
}

Packer packs documents into a context window within token limits.

func New

func New(tokenizer llms.Tokenizer, maxTokens int, opts ...Option) (*Packer, error)

New creates a new Packer with the given tokenizer and token limit.

func (*Packer) Pack

func (p *Packer) Pack(ctx context.Context, docs []schema.Document) (PackedResult, error)

Pack packs documents into a context string within token limits. Documents are packed atomically - either fully included or dropped.

func (*Packer) PackScored

func (p *Packer) PackScored(ctx context.Context, docs []schema.ScoredDocument) (PackedResult, error)

PackScored packs ScoredDocuments using their scores for importance ordering.

func (*Packer) PackWithScores

func (p *Packer) PackWithScores(ctx context.Context, docs []schema.Document, scores []float64) (PackedResult, error)

PackWithScores packs documents with optional scores for importance ordering. If scores is nil or empty, the default strategy ordering is used.

type PackingStrategy

type PackingStrategy interface {
	// Name returns the strategy name for logging and debugging.
	Name() string
	// Order returns documents in the order they should be packed.
	Order(docs []schema.Document, scores []float64) []schema.Document
}

PackingStrategy defines how documents are ordered before packing.

type TokenStats

type TokenStats struct {
	// TotalTokens is the total tokens used in the packed content.
	TotalTokens int
	// MaxTokens is the maximum allowed tokens.
	MaxTokens int
	// DocumentsConsidered is the number of documents evaluated.
	DocumentsConsidered int
	// DocumentsPacked is the number of documents that fit.
	DocumentsPacked int
}

TokenStats holds token usage statistics.

type UsedDocument

type UsedDocument struct {
	// Content is the formatted document content.
	Content string
	// TokenCount is the token count for this document.
	TokenCount int
	// Source is the original document.
	Source schema.Document
}

UsedDocument tracks a document that was packed.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL