researchclassify

package

v1.0.0-beta.84 Latest Latest Go to latest Published: May 23, 2026 License: MIT Imports: 21 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/c360studio/semstreams

Links

Open Source Insights

Documentation ¶

Overview ¶

Package researchclassify implements the nl_classify component from ADR-045 Phase 1 (PR 2 of six per docs/operations/22-adr045-phase1-plan.md).

nl_classify is the thinnest stage of the graph-search rule chain. It receives a publish trigger on component.nl_classify.<loop_id>, reads the research.Intent payload from AGENT_LOOPS, runs the existing graph/query.ClassifierChain (keyword + optional embedding + optional LLM variants) on the topic, executes the resulting SearchOptions against the graph to retrieve initial candidates, and writes a research.ClassifierOutput payload that R1 (PR 6) will fire route_search on.

Architectural notes:

The component wraps the existing classifier primitive; no new LLM calls beyond what the classifier chain already performs.
SearchOptions execution flows through the existing graph.query.searchGraph NATS surface so PR 4 (execute_subqueries) and this component share one evidence schema (Candidate ≡ subset of GlobalSearchResponse.EntityDigests).
All public methods are safe for concurrent use across loops; the component holds no per-call mutable state.

Index ¶

Constants
func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)
func Register(registry *component.Registry) error
type CandidateRetriever
type CandidateSet
type Classifier
type Component
type Config
- func DefaultConfig() Config
- func (c *Config) ApplyDefaults()
- func (c *Config) Validate() error
type LoopStore

Constants ¶

View Source

const ComponentName = "research-graph-classify"

ComponentName is the canonical registry name + log subsystem.

Variables ¶

This section is empty.

Functions ¶

func NewProcessor ¶

func NewProcessor(rawConfig json.RawMessage, deps component.Dependencies) (component.Discoverable, error)

NewProcessor is the component-factory shape registered with the component registry. Parses + validates config, applies defaults, and constructs the Component with the injected production adapters (NATS-backed LoopStore + searchGraphRetriever). The classifier chain is built keyword-only at construction; the LLM tier is wired in Start() when the model registry supplies a query_classification capability.

func Register ¶

func Register(registry *component.Registry) error

Register registers the nl_classify processor with the supplied component registry. Called from componentregistry.Register at process bootstrap so production binaries pick the component up without extra wiring.

Types ¶

type CandidateRetriever ¶

type CandidateRetriever interface {
	FetchCandidates(ctx context.Context, topic string, hints map[string]any, limit int) (CandidateSet, error)
}

CandidateRetriever fetches initial candidate entities for a topic. Production wraps a NATS request to graph.query.searchGraph; tests supply an in-memory fake so the per-variant pathway tests don't need a live graph stack.

hints carries the classifier-derived SearchOptions facets (already flattened by ClassificationResult.Options) so the implementation can pass them through to the downstream search. Returning Degraded lets the implementation flag fallback paths (e.g. semantic fallback fired) without losing the candidates that came back.

type CandidateSet ¶

type CandidateSet struct {
	Candidates     []research.Candidate
	Degraded       bool
	DegradedReason string
}

CandidateSet is the retriever's return shape — separate type so the interface can evolve (add Strategy field, etc.) without changing the retriever signature.

type Classifier ¶

type Classifier interface {
	ClassifyQuery(ctx context.Context, topic string) *query.ClassificationResult
}

Classifier is the narrow surface this component consumes from graph/query. Production satisfies it with *query.ClassifierChain (via classifierChainAdapter so the nil-check + Tier expansion live in adapters.go); tests substitute a fake that returns deterministic ClassificationResult shapes per scenario.

type Component ¶

type Component struct {
	// contains filtered or unexported fields
}

Component implements the nl_classify processor. The struct field set is intentionally small; the lifecycle methods (Start/Stop) own the NATS / model-registry plumbing and the per-message handler hands off to the pure classifyAndRetrieve function in handler.go.

func (*Component) ConfigSchema ¶

func (c *Component) ConfigSchema() component.ConfigSchema

ConfigSchema implements Discoverable.

func (*Component) DataFlow ¶

func (c *Component) DataFlow() component.FlowMetrics

DataFlow implements Discoverable.

func (*Component) Health ¶

func (c *Component) Health() component.HealthStatus

Health implements Discoverable.

func (*Component) Initialize ¶

func (c *Component) Initialize() error

Initialize is part of the LifecycleComponent contract. nl_classify has nothing to initialise pre-Start — bucket open and LLM client wire happen in Start() where they can fail loudly and propagate without breaking other components in the same Initialize phase.

func (*Component) InputPorts ¶

func (c *Component) InputPorts() []component.Port

InputPorts implements Discoverable.

func (*Component) Meta ¶

func (c *Component) Meta() component.Metadata

Meta implements Discoverable.

func (*Component) OutputPorts ¶

func (c *Component) OutputPorts() []component.Port

OutputPorts implements Discoverable. nl_classify has no NATS- publishing output port: its emits are KV writes to AGENT_LOOPS. The empty slice surfaces honestly in flow-discovery so operators don't expect a NATS subject they'll never see.

func (*Component) Start ¶

func (c *Component) Start(ctx context.Context) error

Start opens the AGENT_LOOPS bucket, optionally wires the LLM tier into the classifier chain, subscribes to the configured input ports, and reports idle.

func (*Component) Stop ¶

func (c *Component) Stop(timeout time.Duration) error

Stop drains subscriptions, closes the LLM client, and reports shutdown. Flips `started` under c.mu so a concurrent Health / DataFlow read can't see a torn read of the lifecycle flag.

type Config ¶

type Config struct {
	Ports *component.PortConfig `` /* 165-byte string literal not displayed */

	LoopsBucket string `` /* 180-byte string literal not displayed */

	ClassifyTimeout time.Duration `` /* 233-byte string literal not displayed */

	RetrieveTimeout time.Duration `` /* 197-byte string literal not displayed */

	MaxCandidates int `` /* 240-byte string literal not displayed */

	// EnableLLMClassifier toggles wiring the T3 LLM tier into the
	// classifier chain when a query_classification capability is
	// configured in the model registry. Default false matches PR 2's
	// "thinnest viable" stance — operators opt in once they've
	// observed which topics the keyword tier handles.
	EnableLLMClassifier bool `` /* 230-byte string literal not displayed */
}

Config holds operator-tunable knobs for the nl_classify component.

LLM classifier wiring follows the same model-registry capability seam as processor/graph-query: the operator declares CapabilityQueryClassification in the model registry; the component resolves it at Start() time. Absence is a clean disable (keyword- only chain) rather than a startup error — same shape as graph-query's initLLMClassifier.

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns a default Config skeleton with the standard nl_classify input port. Operators typically copy this and add a chain-specific port name override.

func (*Config) ApplyDefaults ¶

func (c *Config) ApplyDefaults()

ApplyDefaults fills in defaults for unset fields.

func (*Config) Validate ¶

func (c *Config) Validate() error

Validate validates the configuration. MaxCandidates < 0 is rejected; 0 falls through to ApplyDefaults (treated as "use default"). The schema tag intentionally omits `min:1` so JSON- omitted MaxCandidates keys round-trip clean through validation.

type LoopStore ¶

type LoopStore interface {
	GetIntent(ctx context.Context, loopID string) (*research.Intent, error)
	PutClassifierOutput(ctx context.Context, loopID string, envelope []byte) error
	PutSnapshot(ctx context.Context, loopID string, envelope []byte) error
}

LoopStore is the AGENT_LOOPS read/write surface this component consumes. Production wraps natsclient.KVStore; tests substitute an in-memory map.

Three methods so the component can:

GetIntent: load the research_intent payload from the research.requested.<loopID> key written by research_graph (PR 1).
PutClassifierOutput: persist the ClassifierOutput envelope at the classify.complete.<loopID> key — the trigger R1 watches.
PutSnapshot: optionally stash a copy of the classifier output at a stable, non-trigger key (classify.snapshot.<loopID>) so downstream components / human operators can read the full output without racing R1's wildcard watcher. PR 6 may unify this with the completion key once the rule contract is finalised; for PR 2 the snapshot key is a no-op in tests unless explicitly observed.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL