normalize

package
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 26, 2025 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

internal/normalize/doc.go

Package normalize implements Aether's internal normalization pipeline.

Normalization is the process of taking heterogeneous, source-specific structures (search results, extracted articles, feeds, entities, API responses) and converting them into a single, LLM-ready document model (internal/model.Document).

High-level goals:

  • Present a stable, source-agnostic schema to the public aether package.
  • Encode rich structure (sections, metadata, provenance) in a form that is easy for large language models to consume.
  • Support both JSON and TOON serialization from the same canonical model.
  • Allow multiple input types (SearchResult, Article, Feed, Entity) to be merged into one normalized document when appropriate.
  • Provide clear extension points for future plugins and transforms.

Directory layout:

normalize.go
    Orchestrates the main normalization pipeline. Entry points here
    accept high-level types such as aether.SearchResult and delegate
    to schema-specific normalizers.

merge.go
    Contains logic for merging multiple normalized views (e.g. a primary
    search document plus article details plus feed items) into a single
    internal/model.Document instance.

schema_search.go
    Normalizes aether.SearchResult and its SearchDocument into the
    canonical document model. This is the main path used when callers
    ask Aether to normalize search results.

schema_article.go
    Normalizes Article structures from the readability/extractText
    subsystem into one or more body sections within a Document.

schema_feed.go
    Normalizes Feed and feed items (RSS/Atom) into sections with
    a "feed_item" role, preserving links, timestamps, and titles.

schema_entity.go
    Reserved for entity- and API-centric normalization (e.g. OpenAPI
    responses, Wikidata, or future structured entities). This allows
    Aether to represent non-article data in a consistent format.

util.go
    Shared helper functions such as whitespace normalization, safe
    metadata mapping, excerpt generation, and role constants.

This package is intentionally internal. The public aether package exposes normalization via methods such as:

(*aether.Client).NormalizeSearchResult
(*aether.Client).MarshalSearchResultJSON
(*aether.Client).MarshalSearchResultTOON

Those methods delegate into this package, which allows Aether's internal normalization logic to evolve without breaking the public API.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Pipeline

func Pipeline(sr *SearchResult) *model.Document

Pipeline orchestrates top-level normalization.

sr may include SearchDocument, Article extraction, Feed data, and structured Entities. Each schema_* normalizer can emit a partial model.Document. mergeDocuments() combines them into one canonical Document.

Types

type Article

type Article struct {
	Title   string
	Content string
	Meta    map[string]string
}

Article is the extracted readability-based article.

type Entity

type Entity struct {
	ID       string
	Label    string
	Summary  string
	URL      string
	Metadata map[string]string
}

Entity represents a structured API response (Wikidata, etc.).

type Feed

type Feed struct {
	Items []FeedItem
}

Feed is the normalized RSS/Atom representation.

type FeedItem

type FeedItem struct {
	Title       string
	Link        string
	Author      string
	GUID        string
	Description string
	Content     string
	Published   int64
	Updated     int64
}

type NormalizePlan

type NormalizePlan struct {
	Intent string
}

NormalizePlan records meta-intent for the search pipeline (e.g., "lookup", "news", "article"), which is incorporated into metadata.

type SearchDocument

type SearchDocument struct {
	URL      string
	Title    string
	Excerpt  string
	Content  string
	Metadata map[string]string
	Kind     string // e.g., "article", "feed", "html", …
}

SearchDocument represents the extracted or fetched source document from Aether.Search. It mirrors the structure used in aether/search.go.

type SearchResult

type SearchResult struct {
	PrimaryDocument *SearchDocument
	Article         *Article
	Feed            *Feed
	Entities        []*Entity
	Plan            NormalizePlan
}

SearchResult is the minimal subset of aether.SearchResult that the normalization pipeline needs. We define a tiny internal mirror here so the normalize package does not depend on the full aether API.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL