Documentation
¶
Overview ¶
Package pruner strips boilerplate from web content using the Tier 0 (0.8B) model.
Web pages contain 30-50% non-technical noise: navigation menus, cookie banners, footers, sidebars, and ads. Sending this noise to the distillation pipeline wastes LLM compute and dilutes the resulting summary. The Pruner extracts only the core technical paragraphs before handing content to the Ingestor.
This is a Tier 0 (Reflex) task: simple extraction, no reasoning, no JSON output. The 0.8B model is fast enough (<3s on CPU) and accurate enough for this job.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
Click to show internal directories.
Click to hide internal directories.