Documentation
¶
Overview ¶
Package semantic provides semantic indexing and search for tool discovery.
It defines pluggable search strategies (BM25, embeddings, hybrid) without enforcing any specific vector backend or network dependency. This allows users to bring their own embedding provider (OpenAI, Ollama, local models).
Core Interfaces ¶
The package defines four key interfaces:
- Strategy: Scores a document against a query (BM25, embedding, or hybrid)
- Searcher: Performs search over indexed documents using a strategy
- Indexer: Stores and retrieves documents for search
- Embedder: Generates vector embeddings from text (user-provided)
Search Strategies ¶
Three built-in strategies are provided:
- BM25 (lexical): Token overlap scoring, no external dependencies
- Embedding (semantic): Cosine similarity of vector embeddings
- Hybrid: Weighted combination of BM25 and embedding scores
Create strategies using the constructor functions:
bm25 := semantic.NewBM25Strategy(nil) // nil uses default scorer emb := semantic.NewEmbeddingStrategy(embedder) // requires Embedder impl hybrid, _ := semantic.NewHybridStrategy(bm25, emb, 0.7) // 70% BM25
Document Model ¶
Document represents a tool for semantic indexing:
doc := semantic.Document{
ID: "github:create-issue",
Name: "create-issue",
Namespace: "github",
Description: "Create a new GitHub issue",
Tags: []string{"github", "issues"},
Category: "vcs",
}
Use Document.Normalized to prepare documents for indexing, which lowercases tags, sorts them, and builds the combined Text field.
Basic Usage ¶
// Create index and add documents
idx := semantic.NewInMemoryIndex()
idx.Add(ctx, doc1)
idx.Add(ctx, doc2)
// Create searcher with BM25 strategy
strategy := semantic.NewBM25Strategy(nil)
searcher := semantic.NewSearcher(idx, strategy)
// Search
results, err := searcher.Search(ctx, "create issue")
for _, r := range results {
fmt.Printf("[%.2f] %s\n", r.Score, r.Document.ID)
}
Implementing Custom Embedder ¶
To use embedding-based or hybrid search, implement the Embedder interface:
type MyEmbedder struct {
client *openai.Client
}
func (e *MyEmbedder) Embed(ctx context.Context, text string) ([]float32, error) {
resp, err := e.client.CreateEmbedding(ctx, openai.EmbeddingRequest{
Model: "text-embedding-3-small",
Input: []string{text},
})
if err != nil {
return nil, err
}
return resp.Data[0].Embedding, nil
}
Filtering Results ¶
Use the filter functions to narrow results:
docs := idx.List(ctx)
gitDocs := semantic.FilterByNamespace(docs, "git")
vcsDocs := semantic.FilterByTags(docs, []string{"vcs"})
Integration with index Package ¶
The [adapter.go] file provides conversion between index.SearchDoc and semantic.Document, enabling seamless integration:
// Convert index docs to semantic docs semDocs := semantic.DocumentsFromSearchDocs(searchDocs) // Convert back after processing searchDocs := semantic.SearchDocsFromDocuments(semDocs)
Thread Safety ¶
All types in this package are safe for concurrent use:
- InMemoryIndex uses sync.RWMutex for thread-safe document storage
- InMemorySearcher is stateless and safe for concurrent Search calls
- All Strategy implementations are safe for concurrent Score calls
Error Handling ¶
The package defines these sentinel errors:
- ErrInvalidSearcher: Searcher missing index or strategy
- ErrInvalidDocumentID: Document ID is empty
- ErrInvalidEmbedder: Embedder is nil when required
- ErrInvalidHybridConfig: Invalid hybrid strategy configuration
Use errors.Is for error checking:
if errors.Is(err, semantic.ErrInvalidEmbedder) {
// handle missing embedder
}
Index ¶
- Variables
- func SearchDocFromDocument(doc Document) index.SearchDoc
- func SearchDocsFromDocuments(docs []Document) []index.SearchDoc
- type BM25Scorer
- type Document
- func DocumentFromSearchDoc(doc index.SearchDoc) Document
- func DocumentsFromSearchDocs(docs []index.SearchDoc) []Document
- func FilterByCategory(docs []Document, category string) []Document
- func FilterByNamespace(docs []Document, namespace string) []Document
- func FilterByTags(docs []Document, tags []string) []Document
- type Embedder
- type InMemoryIndex
- func (i *InMemoryIndex) Add(_ context.Context, doc Document) error
- func (i *InMemoryIndex) Get(_ context.Context, id string) (Document, bool)
- func (i *InMemoryIndex) List(_ context.Context) []Document
- func (i *InMemoryIndex) Remove(_ context.Context, id string) error
- func (i *InMemoryIndex) Update(ctx context.Context, doc Document) error
- type InMemorySearcher
- type Indexer
- type Result
- type Searcher
- type Strategy
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ( ErrInvalidEmbedder = errors.New("semantic: embedder is required") ErrInvalidHybridConfig = errors.New("semantic: hybrid strategy requires bm25, embedding, and alpha in [0,1]") )
var ErrInvalidDocumentID = errors.New("semantic: document id is required")
var ErrInvalidSearcher = errors.New("semantic: searcher requires index and strategy")
Functions ¶
func SearchDocFromDocument ¶ added in v0.2.0
SearchDocFromDocument converts a semantic.Document back to an index.SearchDoc. This enables results from semantic search to be used with index package APIs.
The conversion maps fields as follows:
- ID: Canonical tool ID (preserved)
- DocText: From Text field (or rebuilt if empty)
- Summary.ID: Same as ID
- Summary.Name: From Name
- Summary.Namespace: From Namespace
- Summary.ShortDescription: From Description (truncated to 120 chars)
- Summary.Summary: Same as ShortDescription
- Summary.Tags: From Tags (copied)
Example ¶
package main
import (
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
// Convert from semantic.Document to index.SearchDoc
doc := semantic.Document{
ID: "slack:send-message",
Name: "send-message",
Namespace: "slack",
Description: "Send a message to a Slack channel",
Tags: []string{"slack", "messaging"},
Text: "send-message slack send a message slack messaging",
}
searchDoc := semantic.SearchDocFromDocument(doc)
fmt.Println("ID:", searchDoc.ID)
fmt.Println("Summary Name:", searchDoc.Summary.Name)
fmt.Println("DocText:", searchDoc.DocText)
}
Output: ID: slack:send-message Summary Name: send-message DocText: send-message slack send a message slack messaging
func SearchDocsFromDocuments ¶ added in v0.2.0
SearchDocsFromDocuments converts a slice of semantic.Document to index.SearchDoc. Returns nil for nil or empty input.
Types ¶
type BM25Scorer ¶
BM25Scorer scores documents for a query using a lexical strategy.
Contract: - Concurrency: implementations must be safe for concurrent use. - Determinism: identical inputs must yield stable scores.
type Document ¶
type Document struct {
ID string
Namespace string
Name string
Description string
Tags []string
Category string
Text string // normalized combined text
}
Document describes a tool for semantic indexing.
func DocumentFromSearchDoc ¶ added in v0.2.0
DocumentFromSearchDoc converts an index.SearchDoc to a semantic.Document. This enables seamless integration between the index package's search infrastructure and the semantic package's embedding-based search.
The conversion maps fields as follows:
- ID: Canonical tool ID (preserved)
- Name: From Summary.Name
- Namespace: From Summary.Namespace
- Description: From Summary.Summary (fallback to ShortDescription)
- Tags: From Summary.Tags (copied)
- Category: From Summary.Category
- Text: From DocText (pre-normalized search text)
Example ¶
package main
import (
"fmt"
"github.com/jonwraymond/tooldiscovery/index"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
// Convert from index.SearchDoc to semantic.Document
searchDoc := index.SearchDoc{
ID: "github:create-issue",
DocText: "create-issue github create issue tracker",
Summary: index.Summary{
ID: "github:create-issue",
Name: "create-issue",
Namespace: "github",
ShortDescription: "Create a new GitHub issue",
Tags: []string{"github", "issues"},
},
}
doc := semantic.DocumentFromSearchDoc(searchDoc)
fmt.Println("ID:", doc.ID)
fmt.Println("Name:", doc.Name)
fmt.Println("Namespace:", doc.Namespace)
}
Output: ID: github:create-issue Name: create-issue Namespace: github
func DocumentsFromSearchDocs ¶ added in v0.2.0
DocumentsFromSearchDocs converts a slice of index.SearchDoc to semantic.Document. Returns nil for nil or empty input.
func FilterByCategory ¶
FilterByCategory returns documents matching the category.
func FilterByNamespace ¶
FilterByNamespace returns documents matching the namespace.
Example ¶
package main
import (
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
docs := []semantic.Document{
{ID: "git:status", Namespace: "git"},
{ID: "git:commit", Namespace: "git"},
{ID: "docker:ps", Namespace: "docker"},
}
filtered := semantic.FilterByNamespace(docs, "git")
fmt.Println("Git documents:", len(filtered))
}
Output: Git documents: 2
func FilterByTags ¶
FilterByTags returns documents that contain any of the provided tags.
Example ¶
package main
import (
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
docs := []semantic.Document{
{ID: "tool1", Tags: []string{"vcs", "git"}},
{ID: "tool2", Tags: []string{"containers", "docker"}},
{ID: "tool3", Tags: []string{"vcs", "svn"}},
}
// Find all documents with "vcs" tag
filtered := semantic.FilterByTags(docs, []string{"vcs"})
fmt.Println("VCS documents:", len(filtered))
}
Output: VCS documents: 2
func (Document) Normalized ¶
Normalized returns a copy of the document with normalized tags and text.
Example ¶
package main
import (
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
doc := semantic.Document{
ID: "example",
Name: "Search Files",
Description: "Find files matching a pattern",
Tags: []string{"Search", "FILES", " filesystem "},
}
normalized := doc.Normalized()
fmt.Println("Tags:", normalized.Tags)
fmt.Println("Text:", normalized.Text)
}
Output: Tags: [files filesystem search] Text: Search Files Find files matching a pattern files filesystem search
type Embedder ¶
Embedder produces embeddings for text.
Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Errors: return an error for invalid input or provider failure.
type InMemoryIndex ¶
type InMemoryIndex struct {
// contains filtered or unexported fields
}
InMemoryIndex is a thread-safe in-memory document index.
Example ¶
package main
import (
"context"
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
idx := semantic.NewInMemoryIndex()
ctx := context.Background()
// Add documents
doc := semantic.Document{
ID: "files:search",
Name: "search",
Namespace: "files",
Description: "Search for files matching a pattern",
Tags: []string{"search", "filesystem"},
}
_ = idx.Add(ctx, doc)
// Retrieve document
retrieved, found := idx.Get(ctx, "files:search")
fmt.Println("Found:", found)
fmt.Println("Name:", retrieved.Name)
// List all documents
all := idx.List(ctx)
fmt.Println("Total documents:", len(all))
}
Output: Found: true Name: search Total documents: 1
func NewInMemoryIndex ¶
func NewInMemoryIndex() *InMemoryIndex
NewInMemoryIndex creates a new in-memory index.
func (*InMemoryIndex) Add ¶
func (i *InMemoryIndex) Add(_ context.Context, doc Document) error
Add inserts or updates a document in the index.
func (*InMemoryIndex) List ¶
func (i *InMemoryIndex) List(_ context.Context) []Document
List returns documents sorted by ID for deterministic ordering. The returned slice is a point-in-time snapshot; concurrent modifications will not affect the returned documents.
type InMemorySearcher ¶
type InMemorySearcher struct {
// contains filtered or unexported fields
}
InMemorySearcher is a deterministic searcher over an Indexer.
func NewSearcher ¶
func NewSearcher(index Indexer, strategy Strategy) *InMemorySearcher
NewSearcher creates a new searcher.
Example ¶
package main
import (
"context"
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
// Create an index and add documents
idx := semantic.NewInMemoryIndex()
ctx := context.Background()
docs := []semantic.Document{
{ID: "git:status", Name: "status", Namespace: "git", Description: "Show working tree status", Tags: []string{"vcs"}},
{ID: "git:commit", Name: "commit", Namespace: "git", Description: "Record changes to repository", Tags: []string{"vcs"}},
{ID: "docker:ps", Name: "ps", Namespace: "docker", Description: "List containers", Tags: []string{"containers"}},
}
for _, doc := range docs {
_ = idx.Add(ctx, doc)
}
// Create a searcher with BM25 strategy
strategy := semantic.NewBM25Strategy(nil) // nil uses default scorer
searcher := semantic.NewSearcher(idx, strategy)
// Search for git-related tools
results, _ := searcher.Search(ctx, "git status")
fmt.Println("Found:", len(results), "results")
if len(results) > 0 {
fmt.Println("Top result:", results[0].Document.ID)
}
}
Output: Found: 3 results Top result: git:status
type Indexer ¶
type Indexer interface {
Add(ctx context.Context, doc Document) error
Update(ctx context.Context, doc Document) error
Remove(ctx context.Context, id string) error
Get(ctx context.Context, id string) (Document, bool)
List(ctx context.Context) []Document
}
Indexer defines indexing operations for tool documents.
Contract: - Concurrency: implementations must be safe for concurrent use. - Context: methods must honor cancellation/deadlines where applicable. - Errors: invalid IDs should return ErrInvalidDocumentID. - Determinism: List returns stable ordering.
type Searcher ¶
Searcher performs semantic search over indexed documents.
Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Determinism: ordering must be stable for identical inputs.
type Strategy ¶
Strategy scores a document for a given query.
Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Determinism: identical inputs must yield stable scores.
func NewBM25Strategy ¶
func NewBM25Strategy(scorer BM25Scorer) Strategy
NewBM25Strategy creates a BM25-only strategy. If scorer is nil, a default token-overlap scorer is used.
func NewEmbeddingStrategy ¶
NewEmbeddingStrategy creates an embedding-only strategy.
func NewHybridStrategy ¶
NewHybridStrategy creates a weighted hybrid strategy.
Example ¶
package main
import (
"context"
"fmt"
"github.com/jonwraymond/tooldiscovery/semantic"
)
func main() {
// Create mock embedder for demonstration
embedder := &mockEmbedder{}
// Create individual strategies
bm25 := semantic.NewBM25Strategy(nil)
embedding := semantic.NewEmbeddingStrategy(embedder)
// Combine with hybrid strategy (0.7 = 70% BM25, 30% embedding)
hybrid, err := semantic.NewHybridStrategy(bm25, embedding, 0.7)
if err != nil {
fmt.Println("Error:", err)
return
}
// Use the hybrid strategy
doc := semantic.Document{
ID: "test",
Name: "search_files",
Description: "Search for files in the filesystem",
}
ctx := context.Background()
score, _ := hybrid.Score(ctx, "search files", doc.Normalized())
fmt.Printf("Hybrid score: %.2f\n", score)
}
// mockEmbedder is a simple embedder for examples
type mockEmbedder struct{}
func (m *mockEmbedder) Embed(_ context.Context, _ string) ([]float32, error) {
return []float32{1.0, 0.0, 0.0}, nil
}
Output: Hybrid score: 1.70