researchassess

package
v1.0.0-beta.99 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 7, 2026 License: MIT Imports: 23 Imported by: 0

Documentation

Overview

Package researchassess implements the assess_sufficiency component from ADR-045 Phase 1 (PR 5 of six per docs/operations/22-adr045-phase1-plan.md).

assess_sufficiency is the second LLM-judgment stage of the graph- search rule chain (after route_search). It receives a publish trigger on component.assess_sufficiency.<loop_id>, reads the upstream research.Intent + research.ExecutionOutput payloads from AGENT_LOOPS, builds a structured-emit prompt, calls the configured research_assessment LLM endpoint, parses the model's JSON output into a research.AssessmentOutput, and writes the AssessmentOutput envelope plus an assess.complete.<loop_id> trigger key that R3 (PR 6) watches to dispatch synthesize_answer (Sufficient=true) or the refine loop (Sufficient=false).

Architectural notes:

  • LLM-wrapping component, not agent-wrapping. Direct LLM call via the configured CapabilityResearchAssessment endpoint; the chain does not delegate decision-making to a free-form ReAct loop.

  • Prompt is structured per discipline memory feedback_persona_prose_needs_decision_criteria: per-decision framing for Sufficient vs Refine, concrete examples for the "looks adequate but isn't" trap, and negative-shape definition for Sufficient=true (when NOT to short-circuit).

  • RefinedQueries are intent-shaped per feedback_tool_signature_intent_not_structure: the model emits short natural-language gap descriptors; the backend constructs typed dispatches downstream. The model is not asked to spell sub-query types it can't reliably produce.

  • Per-loop ExecutionOutput sample is bounded by config (MaxEvidenceInPrompt) so the prompt stays within small-model context windows. Evidence is rendered with stable ordering (Score descending; ties broken by EntityID) so the same loop replayed produces byte-identical prompts.

  • Authored Rationale predicate is rule-opaque per feedback_llm_authored_predicates_rule_opaque — rules branch on the typed Sufficient boolean only.

  • All public methods are safe for concurrent use across loops; the component holds no per-call mutable state. Same pattern as research-graph-route + research-graph-execute.

Index

Constants

View Source
const (
	// DefaultAssessTimeout caps the structured-emit LLM call.
	// Generous enough for slower self-hosted endpoints; operators can
	// tighten for faster hosted models.
	DefaultAssessTimeout = 30 * time.Second

	// DefaultMaxResponseTokens caps the LLM's response budget. The
	// JSON-shaped output is small (decision + ~5 refined queries +
	// rationale); 1024 covers the worst case across both decision
	// paths.
	DefaultMaxResponseTokens = 1024

	// DefaultMaxEvidenceInPrompt caps how many ExecutionOutput
	// evidence items the prompt embeds. The assessor doesn't need
	// the full evidence array to decide — the top-N by score is
	// enough — and a tight cap keeps the prompt fitting frontier-tier
	// and small-model context windows alike.
	DefaultMaxEvidenceInPrompt = 20

	// DefaultMaxSnippetCharsInPrompt caps how many SnippetText
	// characters render per evidence item in the prompt. Per-item
	// rendering past this length truncates with an ellipsis. Keeps
	// the prompt bounded when individual SnippetText fields are
	// long.
	DefaultMaxSnippetCharsInPrompt = 280
)

Default knobs surfaced as exported constants so the prompt-builder tests and operator docs can reference them by name rather than duplicating literals.

View Source
const ComponentName = "research-graph-assess"

ComponentName is the canonical registry name + log subsystem.

View Source
const SystemPromptMarker = "You are the sufficiency-assessment stage of a graph-search pipeline"

SystemPromptMarker is the first sentence of buildSystemPrompt's output, exported for the e2e mock LLM marker-matching. See processor/research-graph-route/prompt.go SystemPromptMarker for the full rationale.

Variables

This section is empty.

Functions

func NewProcessor

func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)

NewProcessor is the component-factory shape registered with the component registry. Parses + validates config, applies defaults, and constructs the Component with the injected production adapters. The LLM assessor is wired in Start() because the model registry isn't available at construction time.

func Register

func Register(registry *component.Registry) error

Register registers the assess_sufficiency processor with the supplied component registry. Called from componentregistry.Register at process bootstrap so production binaries pick the component up without extra wiring.

Types

type Assessor

type Assessor interface {
	Assess(ctx context.Context, systemPrompt, userPrompt string, maxResponseTokens int) (content string, reason string, err error)
}

Assessor is the narrow LLM surface this component consumes. Production satisfies it via llmAssessorAdapter wrapping a real graph/llm.Client; tests substitute a fake that returns a deterministic raw JSON response per scenario without standing up any LLM infrastructure.

Assess returns the raw response Content + a short reason string the adapter can use for error diagnostics (model identifier, finish reason, etc.). The handler parses the Content as JSON into an AssessmentOutput and validates separately so prompt iteration and response-shape drift surface as separate test failures.

type Component

type Component struct {
	// contains filtered or unexported fields
}

Component implements the assess_sufficiency processor. Same shape as research-graph-route — lifecycle methods own the NATS / model- registry plumbing and the per-message handler hands off to the pure assessSufficiency function in handler.go.

func (*Component) ConfigSchema

func (c *Component) ConfigSchema() component.ConfigSchema

ConfigSchema implements Discoverable.

func (*Component) DataFlow

func (c *Component) DataFlow() component.FlowMetrics

DataFlow implements Discoverable.

func (*Component) Health

func (c *Component) Health() component.HealthStatus

Health implements Discoverable.

func (*Component) Initialize

func (c *Component) Initialize() error

Initialize is part of the LifecycleComponent contract. Nothing pre-Start.

func (*Component) InputPorts

func (c *Component) InputPorts() []component.Port

InputPorts implements Discoverable.

func (*Component) Meta

func (c *Component) Meta() component.Metadata

Meta implements Discoverable.

func (*Component) OutputPorts

func (c *Component) OutputPorts() []component.Port

OutputPorts implements Discoverable. assess_sufficiency has no NATS-publishing output port: emits are KV writes to AGENT_LOOPS.

func (*Component) Start

func (c *Component) Start(ctx context.Context) error

Start opens the AGENT_LOOPS bucket, wires the LLM assessor from the model registry, subscribes to the configured input ports, and reports idle.

func (*Component) Stop

func (c *Component) Stop(timeout time.Duration) error

Stop drains subscriptions, closes the LLM client, and flips `started` under c.mu.

type Config

type Config struct {
	Ports *component.PortConfig `` /* 179-byte string literal not displayed */

	LoopsBucket string `` /* 175-byte string literal not displayed */

	AssessTimeout time.Duration `` /* 216-byte string literal not displayed */

	MaxResponseTokens int `` /* 216-byte string literal not displayed */

	MaxEvidenceInPrompt int `` /* 286-byte string literal not displayed */

	MaxSnippetCharsInPrompt int `` /* 211-byte string literal not displayed */
}

Config holds operator-tunable knobs for the assess_sufficiency component.

LLM wiring follows the same model-registry capability seam as route_search: operator declares CapabilityResearchAssessment in the model registry; the component resolves it at Start() time. Absence is a startup error — assess_sufficiency has no keyword-only fallback (its entire job is the LLM judgment).

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a default Config skeleton with the standard assess_sufficiency input port.

func (*Config) ApplyDefaults

func (c *Config) ApplyDefaults()

ApplyDefaults fills in defaults for unset fields.

func (*Config) Validate

func (c *Config) Validate() error

Validate validates the configuration. Negative caps are rejected; zero values fall through to ApplyDefaults (treated as "use default").

type LoopStore

type LoopStore interface {
	// GetIntent loads the research_intent payload from the
	// research.requested.<loopID> key. Returned for the prompt's
	// topic + hints context.
	GetIntent(ctx context.Context, loopID string) (*research.Intent, error)

	// GetExecutionOutput loads the upstream ExecutionOutput from the
	// execute.complete.<loopID> trigger key. The assessor needs the
	// evidence array to decide sufficient / refine.
	GetExecutionOutput(ctx context.Context, loopID string) (*research.ExecutionOutput, error)

	// PutAssessmentOutput writes the AssessmentOutput envelope at R3's
	// trigger key assess.complete.<loopID>.
	PutAssessmentOutput(ctx context.Context, loopID string, envelope []byte) error

	// PutSnapshot writes the envelope at a stable non-trigger key
	// assess.snapshot.<loopID> so operators / downstream queryability
	// can read the full assessment without racing R3's wildcard
	// watcher.
	PutSnapshot(ctx context.Context, loopID string, envelope []byte) error
}

LoopStore is the AGENT_LOOPS read/write surface this component consumes. Production wraps natsclient.KVStore; tests substitute an in-memory map.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL