researchsynthesize

package

v1.0.0-beta.110 Latest Latest Go to latest Published: Jun 14, 2026 License: MIT Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/c360studio/semstreams

Links

Open Source Insights

Documentation ¶

Overview ¶

Package researchsynthesize implements the synthesize_answer component from ADR-045 Phase 1 (PR 5 of six per docs/operations/22-adr045-phase1-plan.md).

synthesize_answer is the chain's terminal LLM stage. It receives a publish trigger on component.synthesize_answer.<loop_id>, reads the upstream research.Intent + research.ExecutionOutput payloads (and the research.RouteDecision when present, for the DecompTrace audit), builds a structured-emit prompt, calls the configured research_synthesis LLM endpoint, parses the model's JSON output, runs quote-back validation, and writes the research.SearchResult envelope at search_result.complete.<loop_id> per ADR-045 R6 — the continuation rule (R6 in PR 6) watches this terminal trigger key to deliver the result back to the parent loop. The trigger key names the payload shape, not the producing component, so the continuation rule's key_pattern stays stable across synthesizer reshuffles.

Architectural notes:

LLM-wrapping component, not agent-wrapping. Direct LLM call via the configured CapabilityResearchSynthesis endpoint.
Quote-back validation: every ObjectStoreRef and EntityID the model emits in evidence_refs MUST appear in the input evidence array. Refs that do not match are stripped and a Warn is emitted; if the model emits zero valid refs against a non- empty evidence array, the synthesis is downgraded to degraded so operators can spot drift quickly. This enforces ADR-045's "no fabricated refs" contract.
DecompTrace is constructed server-side from the RouteDecision + ExecutionOutput; the model is not asked to spell it. The chain is the authoritative source of structural-decision history.
Per-loop evidence sample is bounded by config (MaxEvidenceInPrompt) so the prompt stays within small-model context windows. Evidence is rendered with stable ordering (Score descending; ties broken by EntityID) so the same loop replayed produces byte-identical prompts.
Authored synthesis prose carries no rule-matchable predicate — rules pattern-match on the SearchResult schema (Synthesis present, evidence_refs present), not on prose content.
All public methods are safe for concurrent use across loops; the component holds no per-call mutable state. Same pattern as research-graph-assess + research-graph-route.

Index ¶

Constants
func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)
func Register(registry *component.Registry) error
type Component
type Config
- func DefaultConfig() Config
- func (c *Config) ApplyDefaults()
- func (c *Config) Validate() error
type LoopStore
type Synthesizer

Constants ¶

View Source

const (
	// DefaultSynthesizeTimeout caps the structured-emit LLM call.
	// Set to 30s because natsclient.Client.Subscribe wraps the
	// per-message handler in a 30-second timeout context
	// (natsclient/client.go ~line 684) — any synthesize_timeout
	// above 30s is silently clipped by the subscription wrapper
	// before reaching the LLM call. Operators tuning past 30s
	// should raise the natsclient cap; the component-side knob
	// alone won't reach the wire.
	DefaultSynthesizeTimeout = 30 * time.Second

	// DefaultMaxResponseTokens caps the LLM's response budget. The
	// synthesis text can be long; 2048 is the production default,
	// operators can raise via capability.research_synthesis.
	DefaultMaxResponseTokens = 2048

	// DefaultMaxEvidenceInPrompt caps how many ExecutionOutput
	// evidence items the prompt embeds. Higher than assess's cap
	// because synthesis needs to ground its prose in concrete
	// references — but still bounded so the prompt fits common
	// context windows.
	DefaultMaxEvidenceInPrompt = 30

	// DefaultMaxSnippetCharsInPrompt caps per-evidence SnippetText
	// chars rendered in the prompt. Higher than assess because
	// synthesis quotes snippets directly into the answer; truncated
	// snippets shrink the answer's grounding.
	DefaultMaxSnippetCharsInPrompt = 480
)

Default knobs surfaced as exported constants so the prompt-builder tests and operator docs can reference them by name rather than duplicating literals.

View Source

const ComponentName = "research-graph-synthesize"

ComponentName is the canonical registry name + log subsystem.

View Source

const SystemPromptMarker = "You are the synthesis stage of a graph-search pipeline"

SystemPromptMarker is the first sentence of buildSystemPrompt's output, exported for the e2e mock LLM marker-matching. See processor/research-graph-route/prompt.go SystemPromptMarker for the full rationale.

Variables ¶

This section is empty.

Functions ¶

func NewProcessor ¶

func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)

NewProcessor is the component-factory shape registered with the component registry.

func Register ¶

func Register(registry *component.Registry) error

Register registers the synthesize_answer processor with the supplied component registry. Called from componentregistry.Register at process bootstrap.

Types ¶

type Component ¶

type Component struct {
	// contains filtered or unexported fields
}

Component implements the synthesize_answer processor. Same structural shape as research-graph-route + research-graph-assess.

func (*Component) ConfigSchema ¶

func (c *Component) ConfigSchema() component.ConfigSchema

ConfigSchema implements Discoverable.

func (*Component) DataFlow ¶

func (c *Component) DataFlow() component.FlowMetrics

DataFlow implements Discoverable.

func (*Component) Health ¶

func (c *Component) Health() component.HealthStatus

Health implements Discoverable.

func (*Component) Initialize ¶

func (c *Component) Initialize() error

Initialize is part of the LifecycleComponent contract.

func (*Component) InputPorts ¶

func (c *Component) InputPorts() []component.Port

InputPorts implements Discoverable.

func (*Component) Meta ¶

func (c *Component) Meta() component.Metadata

Meta implements Discoverable.

func (*Component) OutputPorts ¶

func (c *Component) OutputPorts() []component.Port

OutputPorts implements Discoverable. synthesize_answer has no NATS-publishing output port: emits are KV writes to AGENT_LOOPS.

func (*Component) Start ¶

func (c *Component) Start(ctx context.Context) error

Start opens AGENT_LOOPS, wires the LLM synthesizer, subscribes inputs, reports idle.

func (*Component) Stop ¶

func (c *Component) Stop(timeout time.Duration) error

Stop drains subscriptions, closes the LLM client.

type Config ¶

type Config struct {
	Ports *component.PortConfig `` /* 177-byte string literal not displayed */

	LoopsBucket string `` /* 175-byte string literal not displayed */

	SynthesizeTimeout time.Duration `` /* 330-byte string literal not displayed */

	MaxResponseTokens int `` /* 158-byte string literal not displayed */

	MaxEvidenceInPrompt int `` /* 289-byte string literal not displayed */

	MaxSnippetCharsInPrompt int `` /* 214-byte string literal not displayed */
}

Config holds operator-tunable knobs for the synthesize_answer component.

LLM wiring follows the same model-registry capability seam as route_search + assess_sufficiency: operator declares CapabilityResearchSynthesis in the model registry; the component resolves it at Start() time. Absence is a startup error.

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns a default Config skeleton with the standard synthesize_answer input port.

func (*Config) ApplyDefaults ¶

func (c *Config) ApplyDefaults()

ApplyDefaults fills in defaults for unset fields.

func (*Config) Validate ¶

func (c *Config) Validate() error

Validate validates the configuration. Negative caps are rejected; zero values fall through to ApplyDefaults.

type LoopStore ¶

type LoopStore interface {
	GetIntent(ctx context.Context, loopID string) (*research.Intent, error)
	GetExecutionOutput(ctx context.Context, loopID string) (*research.ExecutionOutput, error)
	// GetRouteDecision returns (nil, nil) when the key is absent so
	// the synthesizer can degrade gracefully — RouteDecision is
	// observability-grade, not load-bearing for synthesis itself.
	GetRouteDecision(ctx context.Context, loopID string) (*research.RouteDecision, error)

	PutSearchResult(ctx context.Context, loopID string, envelope []byte) error
	PutSnapshot(ctx context.Context, loopID string, envelope []byte) error

	// PutLoopCompletion writes the SearchResult envelope at the
	// COMPLETE_<loopID> key — the convention the existing
	// read_loop_result tool uses (see
	// processor/agentic-tools/loop_result.go completeKeyPrefix). Lets
	// the R6 continuation rule's spawned parent agent fetch the
	// SearchResult via read_loop_result(loop_id=<rg_xxx>) without a
	// new tool. The duplicate-write cost is one extra KV put per
	// chain; the alternative (custom read_research_result tool) is
	// Phase 2 scope.
	//
	// Failure is BEST-EFFORT: the handler logs Warn and continues so
	// the orchestration triple still lands, R6 still fires, and the
	// parent's read_loop_result call surfaces a clean key-not-found
	// (which the parent's persona can degrade against — e.g.,
	// "synthesis arrived but the body wasn't fetchable, here's what
	// we know from the trajectory"). The chain-advance invariant
	// belongs at the triple stamp, not the read-side envelope. A
	// fatal-here treatment would short-circuit R6 and leave the
	// parent waiting forever — strictly worse than degraded read.
	PutLoopCompletion(ctx context.Context, loopID string, envelope []byte) error
}

LoopStore is the AGENT_LOOPS read/write surface this component consumes. The RouteDecision read is optional — when present, the DecompTrace captures the router's structural choice; when missing (synthesize_directly fast-path, or a misordered chain), the trace is built from the ExecutionOutput alone.

type Synthesizer ¶

type Synthesizer interface {
	Synthesize(ctx context.Context, systemPrompt, userPrompt string, maxResponseTokens int) (content string, reason string, err error)
}

Synthesizer is the narrow LLM surface this component consumes. Production satisfies it via llmSynthesizerAdapter wrapping a real graph/llm.Client; tests substitute a fake.

Synthesize returns the raw response Content + a short reason string the adapter can use for error diagnostics. The handler parses the Content as JSON, applies quote-back validation, and folds the result into a research.SearchResult.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL