toolsindex

package
v1.75.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 31, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Package toolsindex is the tools-discovery consumer of the indexjobs framework (#440). It embeds every globally-visible MCP tool's descriptor (name + description + parameter-schema summary) under source_kind "tools" and ranks them by cosine similarity for platform_find_tools.

Unlike the api-catalog consumer, the tool corpus is not a DB table: tools are registered in-process from compiled-in toolkits plus admin visibility config. So the Source (in pkg/platform) enumerates the live registry; this package owns only the vector storage (a Sink over tool_embeddings) and the query-time ranking. The expected-count breadcrumb the reconciler diffs against lives in the framework-owned index_sources table (migration 000053).

Index

Constants

View Source
const SourceID = "platform"

SourceID is the single logical tool-corpus identifier. There is one tool registry per deployment, identical across replicas (same binary plus the same DB-backed visibility config), so a constant source_id is sufficient; vectors keyed on it are shared by every replica.

View Source
const SourceKind = "tools"

SourceKind is the indexjobs source_kind this package serves.

Variables

This section is empty.

Functions

This section is empty.

Types

type CurrentItemsFunc added in v1.75.1

type CurrentItemsFunc func(ctx context.Context) ([]indexjobs.Item, error)

CurrentItemsFunc returns the live tool corpus to index: the same items the worker's Source.LoadItems produces. The Sink calls it from FindGaps to diff the running registry against the persisted vectors, so gap detection sees the in-process tool set (descriptions, deny flips) a DB count never could.

type ScoredTool

type ScoredTool struct {
	ToolName string
	Score    float64
}

ScoredTool is one tool name with its cosine similarity to a query, returned by the store's similarity ranking. Score is in [-1, 1] (1 = identical direction); for the platform's normalized embeddings it is effectively [0, 1].

type Sink

type Sink struct {
	// contains filtered or unexported fields
}

Sink implements indexjobs.Sink for the tools kind over the tool_embeddings table. The key's SourceID is used verbatim as the tool_embeddings source_id; there is no composite encoding because, unlike api-catalog, the tool corpus is a single flat set per source.

func NewSink

func NewSink(store *Store, currentItems CurrentItemsFunc) *Sink

NewSink returns a Sink backed by the given store. currentItems supplies the live tool corpus for content-drift gap detection; when nil, FindGaps falls back to re-syncing every sweep so a wiring mistake never silently stops indexing.

func (*Sink) Coverage added in v1.73.0

func (s *Sink) Coverage(ctx context.Context) (indexjobs.Coverage, error)

Coverage reports the number of indexed tool vectors. ExpectedKnown is false: the tools kind stamps no expected count (it re-syncs the live registry every sweep), so the dashboard shows a sync indicator from the latest job status rather than an indexed/expected ratio.

func (*Sink) FindGaps

func (s *Sink) FindGaps(ctx context.Context) ([]string, error)

FindGaps reports whether the live tool corpus has drifted from the persisted vectors, returning the single tools source when it has and an empty slice when the index is in sync.

The tool set lives in the running process (compiled-in toolkits plus admin visibility and description config), and it drifts in ways a count comparison cannot see: a description edit or a deny flip changes the live descriptor without changing the stored vector count. So rather than the api-catalog count diff, this enumerates the live tools and compares each one's embed-text hash against the persisted vector, returning the source on any add, removal, or edit. A steady-state registry produces no gap, so the reconciler stops enqueuing the every-sweep no-op job the unconditional predecessor produced (issue #511); an actual change still converges within one reconcile interval.

currentItems nil is a wiring fault, not a steady state, so FindGaps fails safe by re-syncing rather than silently reporting no gap.

func (*Sink) Kind

func (*Sink) Kind() string

Kind reports the tools source kind.

func (*Sink) ListExisting

func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)

ListExisting returns the persisted vectors keyed by tool name for the worker's dedup pass.

func (*Sink) StampExpected

func (*Sink) StampExpected(context.Context, indexjobs.Key, int) error

StampExpected is a no-op for the tools kind. The framework calls it after a successful embed to record an expected item count for count-based gap detection, but tools detects gaps by always re-syncing (see Store.FindGaps), so there is no count to record.

func (*Sink) Upsert

func (s *Sink) Upsert(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error

Upsert atomically replaces the full vector set for the source, so a tool dropped from the registry has its stale vector removed.

func (*Sink) UpsertBatch

func (s *Sink) UpsertBatch(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error

UpsertBatch writes one chunk in place without disturbing rows outside it (the worker's incremental progress persistence).

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store persists tool embedding vectors (tool_embeddings) and the expected-count breadcrumb (index_sources) and answers the query-time cosine ranking. Backed by PostgreSQL + pgvector.

func NewStore

func NewStore(db *sql.DB) *Store

NewStore returns a Store over the given database.

func (*Store) Coverage added in v1.73.0

func (s *Store) Coverage(ctx context.Context) (int, error)

Coverage returns the number of indexed tool vectors across every source (one source today, "platform"). The tools kind stamps no expected count — it re-syncs the live registry every reconcile sweep (see FindGaps) — so only the indexed total is meaningful; the admin dashboard pairs it with the latest job status to show a sync indicator rather than an indexed/expected ratio.

func (*Store) ListVectors

func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)

ListVectors returns every persisted vector for the source, keyed by tool name, for the worker's text-hash dedup pass.

func (*Store) RankBySimilarity

func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)

RankBySimilarity returns every indexed tool for the source ordered by cosine similarity to queryVec (most similar first). pgvector's `<=>` is the cosine-distance operator, so 1 - distance is the similarity. No LIMIT is applied: the corpus is small (low hundreds) and the caller filters by persona before capping, which must happen on the full ranked set to avoid a denied tool consuming a top-K slot.

func (*Store) Replace

func (s *Store) Replace(ctx context.Context, sourceID string, rows []indexjobs.Vector) error

Replace atomically swaps the full vector set for the source: it deletes every existing row for source_id and inserts the supplied set in one transaction, so a tool removed from the registry has its stale vector dropped.

func (*Store) UpsertBatch

func (s *Store) UpsertBatch(ctx context.Context, sourceID string, rows []indexjobs.Vector) error

UpsertBatch inserts or updates the supplied rows in place without deleting rows outside the batch (incremental progress for the worker's per-chunk persistence).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL