Documentation
¶
Overview ¶
Package toolsindex is the tools-discovery consumer of the indexjobs framework (#440). It embeds every globally-visible MCP tool's descriptor (name + description + parameter-schema summary) under source_kind "tools" and ranks them by cosine similarity for platform_find_tools.
Unlike the api-catalog consumer, the tool corpus is not a DB table: tools are registered in-process from compiled-in toolkits plus admin visibility config. So the Source (in pkg/platform) enumerates the live registry; this package owns only the vector storage (a Sink over tool_embeddings) and the query-time ranking. The expected-count breadcrumb the reconciler diffs against lives in the framework-owned index_sources table (migration 000053).
Index ¶
- Constants
- type CurrentItemsFunc
- type ScoredTool
- type Sink
- func (s *Sink) Coverage(ctx context.Context) (indexjobs.Coverage, error)
- func (s *Sink) FindGaps(ctx context.Context) ([]string, error)
- func (*Sink) Kind() string
- func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
- func (*Sink) StampExpected(context.Context, indexjobs.Key, int) error
- func (s *Sink) Upsert(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- func (s *Sink) UpsertBatch(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- type Store
- func (s *Store) Coverage(ctx context.Context) (int, error)
- func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
- func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
- func (s *Store) Replace(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
- func (s *Store) UpsertBatch(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
Constants ¶
const SourceID = "platform"
SourceID is the single logical tool-corpus identifier. There is one tool registry per deployment, identical across replicas (same binary plus the same DB-backed visibility config), so a constant source_id is sufficient; vectors keyed on it are shared by every replica.
const SourceKind = "tools"
SourceKind is the indexjobs source_kind this package serves.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CurrentItemsFunc ¶ added in v1.75.1
CurrentItemsFunc returns the live tool corpus to index: the same items the worker's Source.LoadItems produces. The Sink calls it from FindGaps to diff the running registry against the persisted vectors, so gap detection sees the in-process tool set (descriptions, deny flips) a DB count never could.
type ScoredTool ¶
ScoredTool is one tool name with its cosine similarity to a query, returned by the store's similarity ranking. Score is in [-1, 1] (1 = identical direction); for the platform's normalized embeddings it is effectively [0, 1].
type Sink ¶
type Sink struct {
// contains filtered or unexported fields
}
Sink implements indexjobs.Sink for the tools kind over the tool_embeddings table. The key's SourceID is used verbatim as the tool_embeddings source_id; there is no composite encoding because, unlike api-catalog, the tool corpus is a single flat set per source.
func NewSink ¶
func NewSink(store *Store, currentItems CurrentItemsFunc) *Sink
NewSink returns a Sink backed by the given store. currentItems supplies the live tool corpus for content-drift gap detection; when nil, FindGaps falls back to re-syncing every sweep so a wiring mistake never silently stops indexing.
func (*Sink) Coverage ¶ added in v1.73.0
Coverage reports the number of indexed tool vectors. ExpectedKnown is false: the tools kind stamps no expected count (it re-syncs the live registry every sweep), so the dashboard shows a sync indicator from the latest job status rather than an indexed/expected ratio.
func (*Sink) FindGaps ¶
FindGaps reports whether the live tool corpus has drifted from the persisted vectors, returning the single tools source when it has and an empty slice when the index is in sync.
The tool set lives in the running process (compiled-in toolkits plus admin visibility and description config), and it drifts in ways a count comparison cannot see: a description edit or a deny flip changes the live descriptor without changing the stored vector count. So rather than the api-catalog count diff, this enumerates the live tools and compares each one's embed-text hash against the persisted vector, returning the source on any add, removal, or edit. A steady-state registry produces no gap, so the reconciler stops enqueuing the every-sweep no-op job the unconditional predecessor produced (issue #511); an actual change still converges within one reconcile interval.
currentItems nil is a wiring fault, not a steady state, so FindGaps fails safe by re-syncing rather than silently reporting no gap.
func (*Sink) ListExisting ¶
func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
ListExisting returns the persisted vectors keyed by tool name for the worker's dedup pass.
func (*Sink) StampExpected ¶
StampExpected is a no-op for the tools kind. The framework calls it after a successful embed to record an expected item count for count-based gap detection, but tools detects gaps by always re-syncing (see Store.FindGaps), so there is no count to record.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store persists tool embedding vectors (tool_embeddings) and the expected-count breadcrumb (index_sources) and answers the query-time cosine ranking. Backed by PostgreSQL + pgvector.
func (*Store) Coverage ¶ added in v1.73.0
Coverage returns the number of indexed tool vectors across every source (one source today, "platform"). The tools kind stamps no expected count — it re-syncs the live registry every reconcile sweep (see FindGaps) — so only the indexed total is meaningful; the admin dashboard pairs it with the latest job status to show a sync indicator rather than an indexed/expected ratio.
func (*Store) ListVectors ¶
func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
ListVectors returns every persisted vector for the source, keyed by tool name, for the worker's text-hash dedup pass.
func (*Store) RankBySimilarity ¶
func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
RankBySimilarity returns every indexed tool for the source ordered by cosine similarity to queryVec (most similar first). pgvector's `<=>` is the cosine-distance operator, so 1 - distance is the similarity. No LIMIT is applied: the corpus is small (low hundreds) and the caller filters by persona before capping, which must happen on the full ranked set to avoid a denied tool consuming a top-K slot.