Documentation
¶
Overview ¶
Package synthesizer is Angela's doc-enrichment framework.
A Synthesizer proposes a new block of content (an HTTP+JSON example, a SQL query, an env-var template...) assembled from information already present in a documentation file. It never invents content: every field in every output traces back to a literal character span in a source doc via an Evidence.
Five invariants govern the framework:
- I4 zero-hallucination: every output field has >=1 Evidence with Rule == "literal" pointing to an exact span in the source.
- I5 security-first: fields declared as server-injected in the doc's Security section are excluded by construction.
- I5-bis fail-safe on missing Security: degraded mode filters a configurable well-known list and emits a mandatory review finding.
- I6 idempotency: two runs on an unchanged doc produce byte-identical output; the frontmatter signature is a function of the source spans.
- I7 no silent merge: changes to previously accepted output flow through the interactive diff review. The framework never overwrites user edits.
Concrete implementations live under impls/<name>/. Ships the first member (api-postman).
Index ¶
- Constants
- Variables
- func CanonicalJSON(v any) (string, error)
- func ComputeHash(evs []Evidence) string
- func IsFresh(sig Signature, currentEvs []Evidence, version string) bool
- func IsMetaField(field string) bool
- func MetaFromDoc(doc *Doc, signatures map[string]Signature) domain.DocMeta
- func RequiredFieldKey(fieldName string) string
- func RunInvariantContracts(t *testing.T, s Synthesizer, fixtures []ContractFixture)
- func SignaturesFromMeta(meta domain.DocMeta) map[string]Signature
- func SignaturesToMeta(meta *domain.DocMeta, signatures map[string]Signature)
- func SortedStrings(in []string) []string
- type Block
- type Candidate
- type Config
- type ContractFixture
- type Doc
- type EnabledConfig
- type Evidence
- type FixtureReport
- type OrderedMap
- type Registry
- type Section
- type Signature
- type Synthesizer
- type Warning
Constants ¶
const ( MetaFieldPrefix = "_meta." MetaFieldEndpoint = MetaFieldPrefix + "endpoint" MetaFieldRequiredSuffix = ".required" )
MetaFieldPrefix is reserved for Evidence.Field values that do NOT map to a JSON output key (anchors, requiredness proofs, etc.). Using a single reserved prefix makes it trivial to filter pseudo-evidences out of a downstream view that only cares about the emitted output keys (code review finding #9 - previously these used ad-hoc conventions like "__endpoint__" and "<name>#required").
const WarningCodeMissingSecuritySection = "missing-security-section"
WarningCodeMissingSecuritySection is emitted by synthesizers running in degraded mode (I5-bis) - the source doc declares endpoints but lacks a Security section, so the framework relies on the well-known list to project server-injected fields out of the output.
Variables ¶
var DefaultRegistry = NewRegistry()
DefaultRegistry is the process-wide registry populated via init() from the impls/<name>/ subpackages. Pipelines read from it.
Functions ¶
func CanonicalJSON ¶
CanonicalJSON renders v as JSON in a byte-stable, deterministic form. The output preserves the order of keys provided through the keys argument (rather than alphabetical sorting) so that synthesizers can present required fields before optional ones, while still guaranteeing identical output across runs for identical inputs.
Strict format guarantees:
- 2-space indent (no tabs).
- LF line endings (no CRLF).
- No trailing whitespace.
- Trailing newline at end of output.
- Sub-maps keyed by string are recursed in their own declared order when wrapped via OrderedMap; otherwise standard json.Marshal applies.
CanonicalJSON is the cornerstone of invariant I6 (idempotency): two runs on identical evidence MUST produce byte-identical output, and that output must remain stable across Go versions and platform line endings.
func ComputeHash ¶
ComputeHash derives a stable, deterministic sha256 over the canonical form of evs. The hash is EVIDENCE-BASED, not output-based: it is a function of (File, Line, ColStart, ColEnd, Snippet) tuples sorted in a canonical order.
Why evidence-based:
- I6 (idempotency): the synthesizer's output may include cosmetic version bumps or template tweaks that should NOT trigger a regeneration. What matters for "do we need to re-emit" is whether the source spans changed.
- I7 (no silent merge): the framework compares evidence hashes to decide whether to propose a diff. Output-based hashing would erroneously flag user-edited values as drift.
The Field component of Evidence is included in the hash so that adding a new field (with new evidence) shifts the hash even when other evidences stayed put. Rule and Snippet are also included; if the underlying source text at the span changes (rename, doc edit), the snippet changes and the hash flips.
func IsFresh ¶
IsFresh reports whether sig still matches the evidence list currentEvs. Used by the polish hook to decide skip vs regenerate.
Returns false when:
- sig.Hash is empty (never synthesized)
- currentEvs hash differs from sig.Hash
- sig.Version differs from version (synthesizer output format bumped)
Returns true only when both the source spans AND the format version are unchanged - guarantees identical regeneration on cache hit (I6).
func IsMetaField ¶
IsMetaField reports whether an Evidence.Field value targets framework metadata (anchors, requiredness) rather than a real JSON output key.
func MetaFromDoc ¶
MetaFromDoc returns a domain.DocMeta populated with the given doc's metadata plus an updated Synthesized map. Used by the polish hook to produce the new frontmatter before the storage layer writes it.
func RequiredFieldKey ¶
RequiredFieldKey returns the reserved evidence Field value that documents the requiredness claim for an output field.
func RunInvariantContracts ¶
func RunInvariantContracts(t *testing.T, s Synthesizer, fixtures []ContractFixture)
RunInvariantContracts exercises the I4-I7 invariants against a concrete synthesizer using the provided fixtures. Concrete test packages call this at the top of their test suite to inherit the framework's safety net.
Calling pattern (from impls/<name>/<name>_test.go):
func TestAPIPostman_Contracts(t *testing.T) {
fixtures := []synthesizer.ContractFixture{ ... }
synthesizer.RunInvariantContracts(t, &APIPostman{}, fixtures)
}
The function uses subtests so a single failed invariant on a single fixture surfaces as one targeted failure (e.g. "TestAPIPostman_Contracts/no-security/I5bis_warning_emitted").
func SignaturesFromMeta ¶
SignaturesFromMeta converts the generic frontmatter map (as stored on domain.DocMeta) into a typed map keyed by synthesizer name. Returns an empty map when the frontmatter has no synthesized key, never nil.
Unknown extra fields inside a synthesizer's signature subtree are ignored silently - this lets newer binaries write fields that older binaries silently strip on round-trip without hard errors.
func SignaturesToMeta ¶
SignaturesToMeta writes a typed signature map back into the generic frontmatter shape on domain.DocMeta. Empty input deletes the "synthesized" key (omitempty in the YAML tag handles serialization).
The function MUTATES meta. Callers that need an unmodified copy should clone meta beforehand.
func SortedStrings ¶
SortedStrings returns a sorted copy of in. Convenience for callers that need stable list rendering for hashes or signature fields.
Types ¶
type Block ¶
type Block struct {
// Title is the subsection heading the framework will generate for the
// block (e.g. "Example d'appel (Postman) - minimal"). The framework
// prepends the heading level matching the parent section.
Title string
// Language is the fenced-code info string used in the generated markdown
// fence (e.g. "http+json", "sql", "bash", "mermaid"). Empty for prose
// blocks (not used in MVP v1).
Language string
// Content is the block's body WITHOUT the fence. The framework wraps it
// with the fence markers when rendering. Must be canonical and
// deterministic: byte-identical for equal inputs across runs (I6).
Content string
// Notes is an ordered list of caveat lines appended under the fenced
// block as a bullet list ("- note"). Typical content: server-injected
// fields, degraded mode disclosure, variable convention reminders.
Notes []string
// InsertAfterHeading identifies where in the doc the block belongs.
// Value matches a Section.Heading verbatim (same text, same hashes).
// An empty value means "end of doc" - never used in MVP v1.
InsertAfterHeading string
}
Block is the output produced by a Synthesizer for one Candidate. The framework inserts it into the doc AFTER the heading matched by InsertAfterHeading when polish accepts the proposal.
type Candidate ¶
type Candidate struct {
// Key is a stable, unique-per-doc identifier for this candidate (e.g.
// "POST /api/account-statement/search"). Used for diff-per-section
// routing and idempotency signatures.
Key string
// Anchor is the primary evidence span - typically the endpoint declaration
// line for api-postman, the schema table row for sql-query, etc.
Anchor Evidence
// Extra carries synthesizer-specific data from Detect to Synthesize. The
// framework treats it as opaque; concrete synthesizers own its schema.
Extra map[string]any
}
Candidate is a synthesis opportunity detected in a doc. The shape is deliberately opaque: Extra lets synthesizers carry implementation-specific context from Detect to Synthesize without a second parse pass.
type Config ¶
type Config struct {
// WellKnownServerFields is the list used by synthesizers to filter
// likely server-injected fields when the source doc has no Security
// section (I5-bis). Editable in AngelaConfig.Synthesizers.
WellKnownServerFields []string
// PerSynthesizer is a bag of synthesizer-specific options keyed by
// synthesizer Name. Implementations look up their own key; unknown
// keys are ignored by design (allows shipping options before the
// synthesizer that reads them).
PerSynthesizer map[string]map[string]any
}
Config carries per-run settings passed to Synthesize. It is derived from AngelaConfig.Synthesizers at command entry and holds the minimal subset concrete synthesizers need (avoids leaking the whole AngelaConfig surface to each implementation).
type ContractFixture ¶
type ContractFixture struct {
// Name identifies the case in test output (e.g.
// "complete-with-security", "no-security-section").
Name string
// Path is the synthetic file path attached to the parsed Doc.
Path string
// Content is the raw markdown (frontmatter + body) the framework will
// parse with ParseDoc.
Content string
// ServerInjected lists the field names the doc explicitly declares as
// server-injected (typically extracted from the Security section). The
// I5 contract asserts NONE of these appear in the synthesizer's
// Evidence.Field list. Empty when the fixture has no Security section.
ServerInjected []string
// WellKnown is the list passed to Config.WellKnownServerFields when
// running this fixture. The I5-bis contract uses this list to assert
// the degraded mode filter actually fires.
WellKnown []string
// ExpectMissingSecurity declares whether this fixture lacks a Security
// section. When true, the I5-bis contract is exercised: every
// Synthesize call MUST emit a Warning with Code ==
// WarningCodeMissingSecuritySection.
ExpectMissingSecurity bool
// MinCandidates is the minimum number of Candidates Detect must return
// for this fixture. Zero disables the check.
MinCandidates int
}
ContractFixture describes a single test case for the invariant contract suite. Concrete synthesizer test packages provide a slice of fixtures and invoke RunInvariantContracts to exercise the framework's I4-I7 contracts against their implementation.
type Doc ¶
type Doc struct {
// Path is the source file path, relative to the corpus root when known,
// absolute otherwise. Used for evidence File references.
Path string
// Meta is the parsed YAML frontmatter. Synthesized (from the frontmatter
// "synthesized" key) is surfaced separately via Signatures because it is
// not part of domain.DocMeta yet.
Meta domain.DocMeta
// Body is the markdown content AFTER the frontmatter terminator.
Body string
// Lines is Body split on "\n", 1-indexed at Lines[1]. Lines[0] is empty
// to make line numbers cited in Evidence directly indexable.
Lines []string
// Sections is the flat list of top-level and sub-level headings found in
// Body, in source order. Synthesizers use FuzzyFindSection (task 5) on
// this slice rather than re-parsing.
Sections []Section
// Signatures holds previously-recorded synthesizer signatures, keyed by
// synthesizer Name. Empty when the doc has never been synthesized. The
// registry compares current evidence hashes against these to decide
// skip vs regenerate (I6).
Signatures map[string]Signature
}
Doc is the framework's pre-parsed representation of a markdown file. It is built by the registry hook (review/polish/draft pipelines) from the raw bytes and handed to every synthesizer that Applies to it.
func ParseDoc ¶
ParseDoc builds a Doc from raw markdown bytes. It separates frontmatter from body using storage.UnmarshalPermissive (synthesizers must work on docs that lack a fully-validated lore frontmatter, e.g. mkdocs/hugo corpora in standalone mode), splits the body into lines, and parses out the section tree.
path is recorded as Doc.Path verbatim (relative or absolute, caller's choice). data is the raw file content.
type EnabledConfig ¶
type EnabledConfig struct {
// Enabled is the ordered list of synthesizer names to activate. Empty
// means "no synthesizers this run" (the default before 8-18 merges;
// after 8-18, defaults.go sets this to ["api-postman"]).
Enabled []string
// Disabled forces all synthesizers off, overriding Enabled. Mapped from
// the --no-synthesizers CLI flag.
Disabled bool
}
EnabledConfig is the minimal selection surface the Registry needs. It is populated from AngelaConfig.Synthesizers and (optionally) CLI overrides in the command layer, so the Registry itself does not depend on the config package.
type Evidence ¶
type Evidence struct {
// Field is the output key this evidence supports (e.g. "month",
// "creditAmountMin"). Empty for anchor-only evidences attached to
// Candidates and Warnings.
Field string
// File is the source file path, matching Doc.Path when the evidence
// comes from the doc being synthesized. Different for cross-file
// evidence (not used in MVP v1).
File string
// Line is the 1-based line number. ColStart/ColEnd are byte offsets
// into that line.
Line int
ColStart int
ColEnd int
// Snippet is the literal source text at File:Line[ColStart:ColEnd]. The
// framework validates Snippet == doc.Lines[Line][ColStart:ColEnd] before
// accepting the evidence.
Snippet string
// Rule is the reason the evidence is valid. MVP v1 accepts only
// "literal". Future relaxations would introduce
// alternative rules with their own invariants.
Rule string
}
Evidence anchors one piece of synthesized content to a literal character span in a source doc.
Invariant I4 (zero-hallucination) requires: for every key in a Block's generated output, at least one Evidence exists in the Synthesize return with Field equal to that key, Rule == "literal", and Snippet byte-equal to doc.Lines[Line][ColStart:ColEnd].
type FixtureReport ¶
FixtureReport captures the per-invariant outcome of EvaluateFixture. nil values mean "passed". The map keys are stable subtest names so callers (and CI consumers) can assert a specific invariant.
func EvaluateFixture ¶
func EvaluateFixture(s Synthesizer, fx ContractFixture) (FixtureReport, error)
EvaluateFixture runs a synthesizer against a single fixture and returns per-invariant results without touching testing.T. RunInvariantContracts uses this internally; negative-path tests can call EvaluateFixture directly to assert that a deliberately-broken synthesizer fails the expected invariant.
type OrderedMap ¶
type OrderedMap struct {
// contains filtered or unexported fields
}
OrderedMap is a key-ordered string-keyed map used to render JSON objects in declared (not alphabetical) order. The standard library's json package sorts map keys alphabetically; OrderedMap implements json.Marshaler to emit keys in insertion order.
Use it when the order conveys meaning to the reader (required fields before optional, identity fields before metadata).
func NewOrderedMap ¶
func NewOrderedMap(size int) *OrderedMap
NewOrderedMap returns an empty OrderedMap with capacity for size keys.
func (*OrderedMap) Get ¶
func (m *OrderedMap) Get(key string) (any, bool)
Get returns the value at key and whether the key was set.
func (*OrderedMap) Keys ¶
func (m *OrderedMap) Keys() []string
Keys returns a copy of the keys in declared order.
func (*OrderedMap) MarshalJSON ¶
func (m *OrderedMap) MarshalJSON() ([]byte, error)
MarshalJSON implements json.Marshaler. It emits keys in declared order using the same indentation conventions as CanonicalJSON's pretty-print. Note: nested OrderedMap values are recursively respected; nested standard maps use json's default alphabetical sort. Mix only when needed.
func (*OrderedMap) Set ¶
func (m *OrderedMap) Set(key string, value any)
Set adds or replaces the value at key. Re-setting an existing key keeps its original position - it does NOT move it to the end. This matches what most synthesizers want: stable position regardless of update order.
type Registry ¶
type Registry struct {
// contains filtered or unexported fields
}
Registry is the thread-safe catalog of available synthesizers. Concrete implementations register themselves in their package init() via DefaultRegistry.Register. At command entry the Angela pipeline calls Enabled(cfg) to narrow the catalog to the synthesizers activated by AngelaConfig.Synthesizers.Enabled for the current run.
func NewRegistry ¶
func NewRegistry() *Registry
NewRegistry builds an empty registry. Tests use this to avoid leaking state between cases; production code uses DefaultRegistry.
func (*Registry) Enabled ¶
func (r *Registry) Enabled(cfg EnabledConfig) []Synthesizer
Enabled returns the synthesizers activated by cfg. The returned slice is in the order declared by cfg.Enabled (not alphabetical) so that callers can present findings in a stable, user-controlled sequence.
Unknown names in cfg.Enabled are silently skipped - this lets config files reference a synthesizer that ships in a later binary release without breaking older binaries.
func (*Registry) ForDoc ¶
func (r *Registry) ForDoc(doc *Doc, cfg EnabledConfig) []Synthesizer
ForDoc returns the synthesizers enabled AND applicable to doc. The order matches Enabled's order, so review/polish findings are presented in a stable sequence.
func (*Registry) Get ¶
func (r *Registry) Get(name string) Synthesizer
Get returns the synthesizer registered under name, or nil if absent.
func (*Registry) Names ¶
Names returns the list of registered synthesizer names, sorted for determinism.
func (*Registry) Register ¶
func (r *Registry) Register(s Synthesizer)
Register adds s under s.Name(). Duplicate names panic - a name collision across implementation packages is a programming error caught at startup.
type Section ¶
type Section struct {
// Heading is the heading text WITH leading hashes, e.g. "### Endpoints".
Heading string
// Level is the heading depth (1 for #, 2 for ##, ...).
Level int
// Title is Heading without the leading "#+ " prefix.
Title string
// StartLine is the 1-based line number of the heading itself.
StartLine int
// EndLine is the 1-based line number of the last line BEFORE the next
// heading at the same or shallower level (exclusive upper bound
// rendered as inclusive line number - the line is part of this section).
EndLine int
// Content is the raw content between the heading and EndLine (excluding
// the heading line). Trailing newline trimmed.
Content string
}
Section is one markdown heading with its content span in the Body.
func FuzzyFindSection ¶
FuzzyFindSection scans doc.Sections for a heading whose normalized title matches one of the provided candidate patterns. Returns the best match, a confidence in [0.0, 1.0], and whether a match was found.
Confidence model:
- 1.0 when the normalized heading equals the canonical form of a candidate.
- 0.85 when the heading's normalized title contains a candidate as a whole-word substring (e.g. "API endpoints" matches "endpoints").
- 0.7 when a candidate regex matches anywhere in the heading.
- Below 0.7, no match is returned.
Candidates may be plain words ("endpoints", "security") or regex alternatives ("(?i)endpoints?|routes?"). Plain words are interpreted case-insensitively with optional plural tolerance handled by the caller's regex (or by listing both forms).
func ParseSections ¶
ParseSections walks body and returns a flat list of headings in source order. EndLine of each section points at the last line BEFORE the next heading at the same or shallower level, so Content captures the entire subtree under the heading (including deeper sub-headings).
Lines are 1-indexed (consistent with Evidence.Line).
Recognized headings: ATX style only ("#" through "######"). Setext headings (underlines) are not supported in MVP - feature docs use ATX.
Fenced code blocks (```lang ... ```) are skipped so that lines like "# Full" inside an http/json fence do not become phantom headings and prematurely close the enclosing section. Both "```" and "~~~" fences are recognized; tildes are accepted for compatibility with CommonMark even though lore generators emit backticks.
type Signature ¶
type Signature struct {
// Hash is a hex-encoded sha256 over the canonical list of evidence
// spans (File, Line, ColStart, ColEnd, Snippet) the synthesizer used.
// Evidence-based, not output-based: the hash remains stable across
// cosmetic output tweaks and shifts only when source spans change.
Hash string `yaml:"hash"`
// At is the RFC3339 timestamp of the last successful Synthesize call
// that produced output matching this signature.
At string `yaml:"at"`
// Version is the synthesizer's own output-format version. Bumped when a
// backward-incompatible output change ships; forces regeneration even
// when the source evidences are unchanged.
Version string `yaml:"version"`
// Sections is the ordered list of source section titles consulted for
// this synthesis. Purely informational; not part of Hash.
Sections []string `yaml:"sections,omitempty"`
// EvidenceCount is the number of Evidence records backing the output.
// Informational, for audit visibility in frontmatter diffs.
EvidenceCount int `yaml:"evidence_count"`
// Warnings is the list of Warning.Code values emitted for this run. Used
// by the review hook to avoid re-emitting findings already present in
// the frontmatter and by audits to spot docs shipped in degraded mode.
Warnings []string `yaml:"warnings,omitempty"`
}
Signature is what a synthesizer persists into the doc's frontmatter under the "synthesized.<name>" key. See Task 4.
func MakeSignature ¶
MakeSignature builds a Signature for a synthesizer run from its evidence list, the sections it consulted, and the warnings it emitted. Used by both the polish hook (when proposing a new block) and tests.
version is the synthesizer's own output-format version (passed by the concrete synthesizer). At is the current UTC time formatted as RFC3339 with seconds precision - sub-second jitter would defeat I6 on machines with high-resolution clocks.
type Synthesizer ¶
type Synthesizer interface {
// Name is the stable identifier used in config, frontmatter signatures and
// CLI flags (e.g., "api-postman", "sql-query"). Must be lowercase kebab-case.
Name() string
// Applies is a fast, side-effect-free gate. It returns true when the doc
// looks like a candidate for this synthesizer (frontmatter type, presence
// of characteristic sections, regex hits). It must NOT parse the body in
// depth - Detect does that.
Applies(doc *Doc) bool
// Detect enumerates synthesis opportunities inside the doc. Each Candidate
// carries a primary anchor (evidence span) and any synthesizer-specific
// extras needed to later synthesize without re-parsing. Detect may return
// an empty slice and nil error when the doc applies but has no candidate
// (e.g., endpoints section present but empty).
Detect(doc *Doc) ([]Candidate, error)
// Synthesize turns a Candidate into a ready-to-insert Block plus the list
// of Evidences supporting every field in the block's output and any
// non-fatal Warnings (degraded mode, fuzzy heading match, etc.).
//
// The contract for I4 lives here: every output key produced by the
// synthesizer must appear at least once in the returned []Evidence with
// Rule == "literal" and Snippet equal to the source file content at the
// declared span.
Synthesize(c Candidate, cfg Config) (Block, []Evidence, []Warning, error)
}
Synthesizer is a doc-enrichment unit. Implementations are registered via Registry.Register and activated per-run through AngelaConfig.
type Warning ¶
type Warning struct {
// Code is a short, stable identifier ("missing-security-section",
// "fuzzy-heading-match", "endpoints-without-field-source"). Used as the
// finding category and as a lookup key in severity_override.
Code string
// Message is a human-readable explanation localized by the framework at
// render time. Concrete synthesizers emit the CODE only; the
// framework's i18n layer translates.
Message string
// Line is the 1-based line number the warning attaches to, or 0 when it
// is doc-scoped (e.g. missing-security-section is doc-scoped).
Line int
}
Warning is a non-fatal signal emitted during Detect or Synthesize. It flows into the review pipeline as a ReviewFinding and into the frontmatter signature (see Signature.Warnings).
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
impls
|
|
|
apipostman
Package apipostman implements the first concrete ExampleSynthesizer an api-postman synthesizer that composes ready-to-import HTTP+JSON request examples from information already present in a feature doc's Endpoints / Filters / Security sections.
|
Package apipostman implements the first concrete ExampleSynthesizer an api-postman synthesizer that composes ready-to-import HTTP+JSON request examples from information already present in a feature doc's Endpoints / Filters / Security sections. |