researchclassify

package
v1.0.0-beta.84 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 23, 2026 License: MIT Imports: 21 Imported by: 0

Documentation

Overview

Package researchclassify implements the nl_classify component from ADR-045 Phase 1 (PR 2 of six per docs/operations/22-adr045-phase1-plan.md).

nl_classify is the thinnest stage of the graph-search rule chain. It receives a publish trigger on component.nl_classify.<loop_id>, reads the research.Intent payload from AGENT_LOOPS, runs the existing graph/query.ClassifierChain (keyword + optional embedding + optional LLM variants) on the topic, executes the resulting SearchOptions against the graph to retrieve initial candidates, and writes a research.ClassifierOutput payload that R1 (PR 6) will fire route_search on.

Architectural notes:

  • The component wraps the existing classifier primitive; no new LLM calls beyond what the classifier chain already performs.
  • SearchOptions execution flows through the existing graph.query.searchGraph NATS surface so PR 4 (execute_subqueries) and this component share one evidence schema (Candidate ≡ subset of GlobalSearchResponse.EntityDigests).
  • All public methods are safe for concurrent use across loops; the component holds no per-call mutable state.

Index

Constants

View Source
const ComponentName = "research-graph-classify"

ComponentName is the canonical registry name + log subsystem.

Variables

This section is empty.

Functions

func NewProcessor

func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)

NewProcessor is the component-factory shape registered with the component registry. Parses + validates config, applies defaults, and constructs the Component with the injected production adapters (NATS-backed LoopStore + searchGraphRetriever). The classifier chain is built keyword-only at construction; the LLM tier is wired in Start() when the model registry supplies a query_classification capability.

func Register

func Register(registry *component.Registry) error

Register registers the nl_classify processor with the supplied component registry. Called from componentregistry.Register at process bootstrap so production binaries pick the component up without extra wiring.

Types

type CandidateRetriever

type CandidateRetriever interface {
	FetchCandidates(ctx context.Context, topic string, hints map[string]any, limit int) (CandidateSet, error)
}

CandidateRetriever fetches initial candidate entities for a topic. Production wraps a NATS request to graph.query.searchGraph; tests supply an in-memory fake so the per-variant pathway tests don't need a live graph stack.

hints carries the classifier-derived SearchOptions facets (already flattened by ClassificationResult.Options) so the implementation can pass them through to the downstream search. Returning Degraded lets the implementation flag fallback paths (e.g. semantic fallback fired) without losing the candidates that came back.

type CandidateSet

type CandidateSet struct {
	Candidates     []research.Candidate
	Degraded       bool
	DegradedReason string
}

CandidateSet is the retriever's return shape — separate type so the interface can evolve (add Strategy field, etc.) without changing the retriever signature.

type Classifier

type Classifier interface {
	ClassifyQuery(ctx context.Context, topic string) *query.ClassificationResult
}

Classifier is the narrow surface this component consumes from graph/query. Production satisfies it with *query.ClassifierChain (via classifierChainAdapter so the nil-check + Tier expansion live in adapters.go); tests substitute a fake that returns deterministic ClassificationResult shapes per scenario.

type Component

type Component struct {
	// contains filtered or unexported fields
}

Component implements the nl_classify processor. The struct field set is intentionally small; the lifecycle methods (Start/Stop) own the NATS / model-registry plumbing and the per-message handler hands off to the pure classifyAndRetrieve function in handler.go.

func (*Component) ConfigSchema

func (c *Component) ConfigSchema() component.ConfigSchema

ConfigSchema implements Discoverable.

func (*Component) DataFlow

func (c *Component) DataFlow() component.FlowMetrics

DataFlow implements Discoverable.

func (*Component) Health

func (c *Component) Health() component.HealthStatus

Health implements Discoverable.

func (*Component) Initialize

func (c *Component) Initialize() error

Initialize is part of the LifecycleComponent contract. nl_classify has nothing to initialise pre-Start — bucket open and LLM client wire happen in Start() where they can fail loudly and propagate without breaking other components in the same Initialize phase.

func (*Component) InputPorts

func (c *Component) InputPorts() []component.Port

InputPorts implements Discoverable.

func (*Component) Meta

func (c *Component) Meta() component.Metadata

Meta implements Discoverable.

func (*Component) OutputPorts

func (c *Component) OutputPorts() []component.Port

OutputPorts implements Discoverable. nl_classify has no NATS- publishing output port: its emits are KV writes to AGENT_LOOPS. The empty slice surfaces honestly in flow-discovery so operators don't expect a NATS subject they'll never see.

func (*Component) Start

func (c *Component) Start(ctx context.Context) error

Start opens the AGENT_LOOPS bucket, optionally wires the LLM tier into the classifier chain, subscribes to the configured input ports, and reports idle.

func (*Component) Stop

func (c *Component) Stop(timeout time.Duration) error

Stop drains subscriptions, closes the LLM client, and reports shutdown. Flips `started` under c.mu so a concurrent Health / DataFlow read can't see a torn read of the lifecycle flag.

type Config

type Config struct {
	Ports *component.PortConfig `` /* 165-byte string literal not displayed */

	LoopsBucket string `` /* 180-byte string literal not displayed */

	ClassifyTimeout time.Duration `` /* 233-byte string literal not displayed */

	RetrieveTimeout time.Duration `` /* 197-byte string literal not displayed */

	MaxCandidates int `` /* 240-byte string literal not displayed */

	// EnableLLMClassifier toggles wiring the T3 LLM tier into the
	// classifier chain when a query_classification capability is
	// configured in the model registry. Default false matches PR 2's
	// "thinnest viable" stance — operators opt in once they've
	// observed which topics the keyword tier handles.
	EnableLLMClassifier bool `` /* 230-byte string literal not displayed */
}

Config holds operator-tunable knobs for the nl_classify component.

LLM classifier wiring follows the same model-registry capability seam as processor/graph-query: the operator declares CapabilityQueryClassification in the model registry; the component resolves it at Start() time. Absence is a clean disable (keyword- only chain) rather than a startup error — same shape as graph-query's initLLMClassifier.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a default Config skeleton with the standard nl_classify input port. Operators typically copy this and add a chain-specific port name override.

func (*Config) ApplyDefaults

func (c *Config) ApplyDefaults()

ApplyDefaults fills in defaults for unset fields.

func (*Config) Validate

func (c *Config) Validate() error

Validate validates the configuration. MaxCandidates < 0 is rejected; 0 falls through to ApplyDefaults (treated as "use default"). The schema tag intentionally omits `min:1` so JSON- omitted MaxCandidates keys round-trip clean through validation.

type LoopStore

type LoopStore interface {
	GetIntent(ctx context.Context, loopID string) (*research.Intent, error)
	PutClassifierOutput(ctx context.Context, loopID string, envelope []byte) error
	PutSnapshot(ctx context.Context, loopID string, envelope []byte) error
}

LoopStore is the AGENT_LOOPS read/write surface this component consumes. Production wraps natsclient.KVStore; tests substitute an in-memory map.

Three methods so the component can:

  • GetIntent: load the research_intent payload from the research.requested.<loopID> key written by research_graph (PR 1).
  • PutClassifierOutput: persist the ClassifierOutput envelope at the classify.complete.<loopID> key — the trigger R1 watches.
  • PutSnapshot: optionally stash a copy of the classifier output at a stable, non-trigger key (classify.snapshot.<loopID>) so downstream components / human operators can read the full output without racing R1's wildcard watcher. PR 6 may unify this with the completion key once the rule contract is finalised; for PR 2 the snapshot key is a no-op in tests unless explicitly observed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL