Documentation
¶
Overview ¶
Package toolsindex is the tools-discovery consumer of the indexjobs framework (#440). It embeds every globally-visible MCP tool's descriptor (name + description + parameter-schema summary) under source_kind "tools" and ranks them by cosine similarity for platform_find_tools.
Unlike the api-catalog consumer, the tool corpus is not a DB table: tools are registered in-process from compiled-in toolkits plus admin visibility config. So the Source (in pkg/platform) enumerates the live registry; this package owns only the vector storage (a Sink over tool_embeddings) and the query-time ranking. The expected-count breadcrumb the reconciler diffs against lives in the framework-owned index_sources table (migration 000053).
Index ¶
- Constants
- type ScoredTool
- type Sink
- func (s *Sink) FindGaps(ctx context.Context) ([]string, error)
- func (*Sink) Kind() string
- func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
- func (*Sink) StampExpected(context.Context, indexjobs.Key, int) error
- func (s *Sink) Upsert(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- func (s *Sink) UpsertBatch(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- type Store
- func (*Store) FindGaps(_ context.Context) ([]string, error)
- func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
- func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
- func (s *Store) Replace(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
- func (s *Store) UpsertBatch(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
Constants ¶
const SourceID = "platform"
SourceID is the single logical tool-corpus identifier. There is one tool registry per deployment, identical across replicas (same binary plus the same DB-backed visibility config), so a constant source_id is sufficient; vectors keyed on it are shared by every replica.
const SourceKind = "tools"
SourceKind is the indexjobs source_kind this package serves.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type ScoredTool ¶
ScoredTool is one tool name with its cosine similarity to a query, returned by the store's similarity ranking. Score is in [-1, 1] (1 = identical direction); for the platform's normalized embeddings it is effectively [0, 1].
type Sink ¶
type Sink struct {
// contains filtered or unexported fields
}
Sink implements indexjobs.Sink for the tools kind over the tool_embeddings table (vectors) and index_sources (expected count). The key's SourceID is used verbatim as the tool_embeddings source_id; there is no composite encoding because, unlike api-catalog, the tool corpus is a single flat set per source.
func (*Sink) FindGaps ¶
FindGaps returns the source ids whose expected count and persisted vector count disagree.
func (*Sink) ListExisting ¶
func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
ListExisting returns the persisted vectors keyed by tool name for the worker's dedup pass.
func (*Sink) StampExpected ¶
StampExpected is a no-op for the tools kind. The framework calls it after a successful embed to record an expected item count for count-based gap detection, but tools detects gaps by always re-syncing (see Store.FindGaps), so there is no count to record.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store persists tool embedding vectors (tool_embeddings) and the expected-count breadcrumb (index_sources) and answers the query-time cosine ranking. Backed by PostgreSQL + pgvector.
func (*Store) FindGaps ¶
FindGaps always returns the single tools source, so the reconciler re-syncs the tool index on every sweep.
Unlike a DB-backed corpus, the tool set lives in the running process (compiled-in toolkits plus admin visibility/description config), and it drifts in ways a count comparison cannot see: a description edit or a visibility flip changes the live descriptors without changing the stored vector count, so an expected-vs-indexed count diff would report "no gap" while the index is stale. Returning the source unconditionally makes the worker re-enumerate the live registry each sweep; its text-hash dedup (pkg/indexjobs/embed.go) skips the embedding provider for unchanged tools, so a no-change pass costs one in-memory tools/list plus a row rewrite, and any add / remove / edit / deny-flip converges within one reconcile interval. The content-blind count check is left to DB-backed consumers (#441+), whose corpus is a table the gap query can compare against directly.
func (*Store) ListVectors ¶
func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
ListVectors returns every persisted vector for the source, keyed by tool name, for the worker's text-hash dedup pass.
func (*Store) RankBySimilarity ¶
func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
RankBySimilarity returns every indexed tool for the source ordered by cosine similarity to queryVec (most similar first). pgvector's `<=>` is the cosine-distance operator, so 1 - distance is the similarity. No LIMIT is applied: the corpus is small (low hundreds) and the caller filters by persona before capping, which must happen on the full ranked set to avoid a denied tool consuming a top-K slot.