pruner

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package pruner strips boilerplate from web content using the Tier 0 (0.8B) model.

Web pages contain 30-50% non-technical noise: navigation menus, cookie banners, footers, sidebars, and ads. Sending this noise to the distillation pipeline wastes LLM compute and dilutes the resulting summary. The Pruner extracts only the core technical paragraphs before handing content to the Ingestor.

This is a Tier 0 (Reflex) task: simple extraction, no reasoning, no JSON output. The 0.8B model is fast enough (<3s on CPU) and accurate enough for this job.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Pruner

type Pruner struct {
	// contains filtered or unexported fields
}

Pruner strips boilerplate from web page text using a small LLM.

func New

func New(client llm.LLMClient, timeout time.Duration) *Pruner

New creates a Pruner backed by the given LLM client. timeout is the per-request deadline; defaults to 10s if <= 0.

func (*Pruner) Prune

func (p *Pruner) Prune(ctx context.Context, content string) (string, error)

Prune extracts core technical content from raw web page text. Returns the pruned content, or the original content if the LLM call fails. The returned string is always non-empty if input was non-empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL