Documentation
¶
Overview ¶
Package toolsindex is the tools-discovery consumer of the indexjobs framework (#440). It embeds every globally-visible MCP tool's descriptor (name + description + parameter-schema summary) under source_kind "tools" and ranks them by cosine similarity for platform_find_tools.
Unlike the api-catalog consumer, the tool corpus is not a DB table: tools are registered in-process from compiled-in toolkits plus admin visibility config. So the Source (in pkg/platform) enumerates the live registry; this package owns the vector storage (a Sink over tool_embeddings) and the query-time ranking. A successful index writes the complete registered set atomically, so the indexed vector count is also the expected count (Coverage reports both halves of the dashboard ratio from it), and gap detection diffs the live registry against the persisted vectors by descriptor hash (see Sink.FindGaps), not a stored count.
Index ¶
- Constants
- type CurrentItemsFunc
- type ScoredTool
- type Sink
- func (s *Sink) Coverage(ctx context.Context) (indexjobs.Coverage, error)
- func (s *Sink) FindGaps(ctx context.Context) ([]string, error)
- func (*Sink) Kind() string
- func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
- func (*Sink) StampExpected(context.Context, indexjobs.Key, int) error
- func (s *Sink) Upsert(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- func (s *Sink) UpsertBatch(ctx context.Context, key indexjobs.Key, rows []indexjobs.Vector) error
- type Store
- func (s *Store) Coverage(ctx context.Context, sourceID string) (int, error)
- func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
- func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
- func (s *Store) Replace(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
- func (s *Store) UpsertBatch(ctx context.Context, sourceID string, rows []indexjobs.Vector) error
Constants ¶
const SourceID = "platform"
SourceID is the single logical tool-corpus identifier. There is one tool registry per deployment, identical across replicas (same binary plus the same DB-backed visibility config), so a constant source_id is sufficient; vectors keyed on it are shared by every replica.
const SourceKind = "tools"
SourceKind is the indexjobs source_kind this package serves.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CurrentItemsFunc ¶ added in v1.75.1
CurrentItemsFunc returns the live tool corpus to index: the same items the worker's Source.LoadItems produces. The Sink calls it from FindGaps to diff the running registry against the persisted vectors, so gap detection sees the in-process tool set (descriptions, deny flips) a DB count never could.
type ScoredTool ¶
ScoredTool is one tool name with its cosine similarity to a query, returned by the store's similarity ranking. Score is in [-1, 1] (1 = identical direction); for the platform's normalized embeddings it is effectively [0, 1].
type Sink ¶
type Sink struct {
// contains filtered or unexported fields
}
Sink implements indexjobs.Sink for the tools kind over the tool_embeddings table. The key's SourceID is used verbatim as the tool_embeddings source_id; there is no composite encoding because, unlike api-catalog, the tool corpus is a single flat set per source.
func NewSink ¶
func NewSink(store *Store, currentItems CurrentItemsFunc) *Sink
NewSink returns a Sink backed by the given store. currentItems supplies the live tool corpus for content-drift gap detection; when nil, FindGaps falls back to re-syncing every sweep so a wiring mistake never silently stops indexing.
func (*Sink) Coverage ¶ added in v1.73.0
Coverage reports the tools kind's indexed/expected ratio. Because a successful index writes the complete registered set atomically (Store.Replace), the indexed vector count is also the expected count: the dashboard shows a real N / N · 100% just like the other kinds, with no separately-stored expected that could drift out of sync. The drift signal (a tool added, removed, or edited since the last index) is the content-hash gap check (FindGaps), which moves the verdict to Indexing while the reconciler catches up.
func (*Sink) FindGaps ¶
FindGaps reports whether the live tool corpus has drifted from the persisted vectors, returning the single tools source when it has and an empty slice when the index is in sync.
The tool set lives in the running process (compiled-in toolkits plus admin visibility and description config), and it drifts in ways a count comparison cannot see: a description edit or a deny flip changes the live descriptor without changing the stored vector count. So rather than the api-catalog count diff, this enumerates the live tools and compares each one's embed-text hash against the persisted vector, returning the source on any add, removal, or edit. A steady-state registry produces no gap, so the reconciler stops enqueuing the every-sweep no-op job the unconditional predecessor produced (issue #511); an actual change still converges within one reconcile interval.
currentItems nil is a wiring fault, not a steady state, so FindGaps fails safe by re-syncing rather than silently reporting no gap.
func (*Sink) ListExisting ¶
func (s *Sink) ListExisting(ctx context.Context, key indexjobs.Key) (map[string]indexjobs.Vector, error)
ListExisting returns the persisted vectors keyed by tool name for the worker's dedup pass.
func (*Sink) StampExpected ¶
StampExpected is a no-op for the tools kind. The framework calls it after a successful embed to record an expected item count, but tools derives its expected count from the indexed vector total (Coverage), which the atomic full-set Replace keeps equal to the item count, so there is nothing separate to record.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store persists tool embedding vectors (tool_embeddings) and answers the query-time cosine ranking. Backed by PostgreSQL + pgvector.
func (*Store) Coverage ¶ added in v1.73.0
Coverage returns the number of indexed tool vectors for the source. The tools corpus is written as a complete set on every successful index (Replace deletes-absent then inserts), so this count is both "how many are indexed" and "how many there are": indexed equals expected by construction. The Sink reports it as both halves of the dashboard ratio, so there is no separate expected count to store or to drift out of sync.
func (*Store) ListVectors ¶
func (s *Store) ListVectors(ctx context.Context, sourceID string) (map[string]indexjobs.Vector, error)
ListVectors returns every persisted vector for the source, keyed by tool name, for the worker's text-hash dedup pass.
func (*Store) RankBySimilarity ¶
func (s *Store) RankBySimilarity(ctx context.Context, sourceID string, queryVec []float32) ([]ScoredTool, error)
RankBySimilarity returns every indexed tool for the source ordered by cosine similarity to queryVec (most similar first). pgvector's `<=>` is the cosine-distance operator, so 1 - distance is the similarity. No LIMIT is applied: the corpus is small (low hundreds) and the caller filters by persona before capping, which must happen on the full ranked set to avoid a denied tool consuming a top-K slot.