semantic

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package semantic provides semantic indexing and search for tool discovery.

It defines pluggable search strategies (BM25, embeddings, hybrid) without enforcing any specific vector backend or network dependency. This allows users to bring their own embedding provider (OpenAI, Ollama, local models).

Core Interfaces

The package defines four key interfaces:

  • Strategy: Scores a document against a query (BM25, embedding, or hybrid)
  • Searcher: Performs search over indexed documents using a strategy
  • Indexer: Stores and retrieves documents for search
  • Embedder: Generates vector embeddings from text (user-provided)

Search Strategies

Three built-in strategies are provided:

  • BM25 (lexical): Token overlap scoring, no external dependencies
  • Embedding (semantic): Cosine similarity of vector embeddings
  • Hybrid: Weighted combination of BM25 and embedding scores

Create strategies using the constructor functions:

bm25 := semantic.NewBM25Strategy(nil)           // nil uses default scorer
emb := semantic.NewEmbeddingStrategy(embedder)  // requires Embedder impl
hybrid, _ := semantic.NewHybridStrategy(bm25, emb, 0.7)  // 70% BM25

Document Model

Document represents a tool for semantic indexing:

doc := semantic.Document{
    ID:          "github:create-issue",
    Name:        "create-issue",
    Namespace:   "github",
    Description: "Create a new GitHub issue",
    Tags:        []string{"github", "issues"},
    Category:    "vcs",
}

Use Document.Normalized to prepare documents for indexing, which lowercases tags, sorts them, and builds the combined Text field.

Basic Usage

// Create index and add documents
idx := semantic.NewInMemoryIndex()
idx.Add(ctx, doc1)
idx.Add(ctx, doc2)

// Create searcher with BM25 strategy
strategy := semantic.NewBM25Strategy(nil)
searcher := semantic.NewSearcher(idx, strategy)

// Search
results, err := searcher.Search(ctx, "create issue")
for _, r := range results {
    fmt.Printf("[%.2f] %s\n", r.Score, r.Document.ID)
}

Implementing Custom Embedder

To use embedding-based or hybrid search, implement the Embedder interface:

type MyEmbedder struct {
    client *openai.Client
}

func (e *MyEmbedder) Embed(ctx context.Context, text string) ([]float32, error) {
    resp, err := e.client.CreateEmbedding(ctx, openai.EmbeddingRequest{
        Model: "text-embedding-3-small",
        Input: []string{text},
    })
    if err != nil {
        return nil, err
    }
    return resp.Data[0].Embedding, nil
}

Filtering Results

Use the filter functions to narrow results:

docs := idx.List(ctx)
gitDocs := semantic.FilterByNamespace(docs, "git")
vcsDocs := semantic.FilterByTags(docs, []string{"vcs"})

Integration with index Package

The [adapter.go] file provides conversion between index.SearchDoc and semantic.Document, enabling seamless integration:

// Convert index docs to semantic docs
semDocs := semantic.DocumentsFromSearchDocs(searchDocs)

// Convert back after processing
searchDocs := semantic.SearchDocsFromDocuments(semDocs)

Thread Safety

All types in this package are safe for concurrent use:

  • InMemoryIndex uses sync.RWMutex for thread-safe document storage
  • InMemorySearcher is stateless and safe for concurrent Search calls
  • All Strategy implementations are safe for concurrent Score calls

Error Handling

The package defines these sentinel errors:

Use errors.Is for error checking:

if errors.Is(err, semantic.ErrInvalidEmbedder) {
    // handle missing embedder
}

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	ErrInvalidEmbedder     = errors.New("semantic: embedder is required")
	ErrInvalidHybridConfig = errors.New("semantic: hybrid strategy requires bm25, embedding, and alpha in [0,1]")
)
View Source
var ErrInvalidDocumentID = errors.New("semantic: document id is required")
View Source
var ErrInvalidSearcher = errors.New("semantic: searcher requires index and strategy")

Functions

func SearchDocFromDocument added in v0.2.0

func SearchDocFromDocument(doc Document) index.SearchDoc

SearchDocFromDocument converts a semantic.Document back to an index.SearchDoc. This enables results from semantic search to be used with index package APIs.

The conversion maps fields as follows:

  • ID: Canonical tool ID (preserved)
  • DocText: From Text field (or rebuilt if empty)
  • Summary.ID: Same as ID
  • Summary.Name: From Name
  • Summary.Namespace: From Namespace
  • Summary.ShortDescription: From Description (truncated to 120 chars)
  • Summary.Summary: Same as ShortDescription
  • Summary.Tags: From Tags (copied)
Example
package main

import (
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	// Convert from semantic.Document to index.SearchDoc
	doc := semantic.Document{
		ID:          "slack:send-message",
		Name:        "send-message",
		Namespace:   "slack",
		Description: "Send a message to a Slack channel",
		Tags:        []string{"slack", "messaging"},
		Text:        "send-message slack send a message slack messaging",
	}

	searchDoc := semantic.SearchDocFromDocument(doc)
	fmt.Println("ID:", searchDoc.ID)
	fmt.Println("Summary Name:", searchDoc.Summary.Name)
	fmt.Println("DocText:", searchDoc.DocText)
}
Output:

ID: slack:send-message
Summary Name: send-message
DocText: send-message slack send a message slack messaging

func SearchDocsFromDocuments added in v0.2.0

func SearchDocsFromDocuments(docs []Document) []index.SearchDoc

SearchDocsFromDocuments converts a slice of semantic.Document to index.SearchDoc. Returns nil for nil or empty input.

Types

type BM25Scorer

type BM25Scorer interface {
	Score(query string, doc Document) float64
}

BM25Scorer scores documents for a query using a lexical strategy.

Contract: - Concurrency: implementations must be safe for concurrent use. - Determinism: identical inputs must yield stable scores.

type Document

type Document struct {
	ID          string
	Namespace   string
	Name        string
	Description string
	Tags        []string
	Category    string
	Text        string // normalized combined text
}

Document describes a tool for semantic indexing.

func DocumentFromSearchDoc added in v0.2.0

func DocumentFromSearchDoc(doc index.SearchDoc) Document

DocumentFromSearchDoc converts an index.SearchDoc to a semantic.Document. This enables seamless integration between the index package's search infrastructure and the semantic package's embedding-based search.

The conversion maps fields as follows:

  • ID: Canonical tool ID (preserved)
  • Name: From Summary.Name
  • Namespace: From Summary.Namespace
  • Description: From Summary.Summary (fallback to ShortDescription)
  • Tags: From Summary.Tags (copied)
  • Category: From Summary.Category
  • Text: From DocText (pre-normalized search text)
Example
package main

import (
	"fmt"

	"github.com/jonwraymond/tooldiscovery/index"
	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	// Convert from index.SearchDoc to semantic.Document
	searchDoc := index.SearchDoc{
		ID:      "github:create-issue",
		DocText: "create-issue github create issue tracker",
		Summary: index.Summary{
			ID:               "github:create-issue",
			Name:             "create-issue",
			Namespace:        "github",
			ShortDescription: "Create a new GitHub issue",
			Tags:             []string{"github", "issues"},
		},
	}

	doc := semantic.DocumentFromSearchDoc(searchDoc)
	fmt.Println("ID:", doc.ID)
	fmt.Println("Name:", doc.Name)
	fmt.Println("Namespace:", doc.Namespace)
}
Output:

ID: github:create-issue
Name: create-issue
Namespace: github

func DocumentsFromSearchDocs added in v0.2.0

func DocumentsFromSearchDocs(docs []index.SearchDoc) []Document

DocumentsFromSearchDocs converts a slice of index.SearchDoc to semantic.Document. Returns nil for nil or empty input.

func FilterByCategory

func FilterByCategory(docs []Document, category string) []Document

FilterByCategory returns documents matching the category.

func FilterByNamespace

func FilterByNamespace(docs []Document, namespace string) []Document

FilterByNamespace returns documents matching the namespace.

Example
package main

import (
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	docs := []semantic.Document{
		{ID: "git:status", Namespace: "git"},
		{ID: "git:commit", Namespace: "git"},
		{ID: "docker:ps", Namespace: "docker"},
	}

	filtered := semantic.FilterByNamespace(docs, "git")
	fmt.Println("Git documents:", len(filtered))
}
Output:

Git documents: 2

func FilterByTags

func FilterByTags(docs []Document, tags []string) []Document

FilterByTags returns documents that contain any of the provided tags.

Example
package main

import (
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	docs := []semantic.Document{
		{ID: "tool1", Tags: []string{"vcs", "git"}},
		{ID: "tool2", Tags: []string{"containers", "docker"}},
		{ID: "tool3", Tags: []string{"vcs", "svn"}},
	}

	// Find all documents with "vcs" tag
	filtered := semantic.FilterByTags(docs, []string{"vcs"})
	fmt.Println("VCS documents:", len(filtered))
}
Output:

VCS documents: 2

func (Document) Normalized

func (d Document) Normalized() Document

Normalized returns a copy of the document with normalized tags and text.

Example
package main

import (
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	doc := semantic.Document{
		ID:          "example",
		Name:        "Search Files",
		Description: "Find files matching a pattern",
		Tags:        []string{"Search", "FILES", "  filesystem  "},
	}

	normalized := doc.Normalized()
	fmt.Println("Tags:", normalized.Tags)
	fmt.Println("Text:", normalized.Text)
}
Output:

Tags: [files filesystem search]
Text: Search Files Find files matching a pattern files filesystem search

type Embedder

type Embedder interface {
	Embed(ctx context.Context, text string) ([]float32, error)
}

Embedder produces embeddings for text.

Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Errors: return an error for invalid input or provider failure.

type InMemoryIndex

type InMemoryIndex struct {
	// contains filtered or unexported fields
}

InMemoryIndex is a thread-safe in-memory document index.

Example
package main

import (
	"context"
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	idx := semantic.NewInMemoryIndex()
	ctx := context.Background()

	// Add documents
	doc := semantic.Document{
		ID:          "files:search",
		Name:        "search",
		Namespace:   "files",
		Description: "Search for files matching a pattern",
		Tags:        []string{"search", "filesystem"},
	}
	_ = idx.Add(ctx, doc)

	// Retrieve document
	retrieved, found := idx.Get(ctx, "files:search")
	fmt.Println("Found:", found)
	fmt.Println("Name:", retrieved.Name)

	// List all documents
	all := idx.List(ctx)
	fmt.Println("Total documents:", len(all))
}
Output:

Found: true
Name: search
Total documents: 1

func NewInMemoryIndex

func NewInMemoryIndex() *InMemoryIndex

NewInMemoryIndex creates a new in-memory index.

func (*InMemoryIndex) Add

func (i *InMemoryIndex) Add(_ context.Context, doc Document) error

Add inserts or updates a document in the index.

func (*InMemoryIndex) Get

func (i *InMemoryIndex) Get(_ context.Context, id string) (Document, bool)

Get retrieves a document by ID.

func (*InMemoryIndex) List

func (i *InMemoryIndex) List(_ context.Context) []Document

List returns documents sorted by ID for deterministic ordering. The returned slice is a point-in-time snapshot; concurrent modifications will not affect the returned documents.

func (*InMemoryIndex) Remove

func (i *InMemoryIndex) Remove(_ context.Context, id string) error

Remove deletes a document by ID.

func (*InMemoryIndex) Update

func (i *InMemoryIndex) Update(ctx context.Context, doc Document) error

Update updates a document by ID. If it doesn't exist, it is inserted.

type InMemorySearcher

type InMemorySearcher struct {
	// contains filtered or unexported fields
}

InMemorySearcher is a deterministic searcher over an Indexer.

func NewSearcher

func NewSearcher(index Indexer, strategy Strategy) *InMemorySearcher

NewSearcher creates a new searcher.

Example
package main

import (
	"context"
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	// Create an index and add documents
	idx := semantic.NewInMemoryIndex()
	ctx := context.Background()

	docs := []semantic.Document{
		{ID: "git:status", Name: "status", Namespace: "git", Description: "Show working tree status", Tags: []string{"vcs"}},
		{ID: "git:commit", Name: "commit", Namespace: "git", Description: "Record changes to repository", Tags: []string{"vcs"}},
		{ID: "docker:ps", Name: "ps", Namespace: "docker", Description: "List containers", Tags: []string{"containers"}},
	}

	for _, doc := range docs {
		_ = idx.Add(ctx, doc)
	}

	// Create a searcher with BM25 strategy
	strategy := semantic.NewBM25Strategy(nil) // nil uses default scorer
	searcher := semantic.NewSearcher(idx, strategy)

	// Search for git-related tools
	results, _ := searcher.Search(ctx, "git status")
	fmt.Println("Found:", len(results), "results")
	if len(results) > 0 {
		fmt.Println("Top result:", results[0].Document.ID)
	}
}
Output:

Found: 3 results
Top result: git:status

func (*InMemorySearcher) Search

func (s *InMemorySearcher) Search(ctx context.Context, query string) ([]Result, error)

Search scores all documents and returns results ordered by score desc, ID asc.

type Indexer

type Indexer interface {
	Add(ctx context.Context, doc Document) error
	Update(ctx context.Context, doc Document) error
	Remove(ctx context.Context, id string) error
	Get(ctx context.Context, id string) (Document, bool)
	List(ctx context.Context) []Document
}

Indexer defines indexing operations for tool documents.

Contract: - Concurrency: implementations must be safe for concurrent use. - Context: methods must honor cancellation/deadlines where applicable. - Errors: invalid IDs should return ErrInvalidDocumentID. - Determinism: List returns stable ordering.

type Result

type Result struct {
	Document Document
	Score    float64
}

Result represents a scored search result.

type Searcher

type Searcher interface {
	Search(ctx context.Context, query string) ([]Result, error)
}

Searcher performs semantic search over indexed documents.

Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Determinism: ordering must be stable for identical inputs.

type Strategy

type Strategy interface {
	Score(ctx context.Context, query string, doc Document) (float64, error)
}

Strategy scores a document for a given query.

Contract: - Concurrency: implementations must be safe for concurrent use. - Context: must honor cancellation/deadlines. - Determinism: identical inputs must yield stable scores.

func NewBM25Strategy

func NewBM25Strategy(scorer BM25Scorer) Strategy

NewBM25Strategy creates a BM25-only strategy. If scorer is nil, a default token-overlap scorer is used.

func NewEmbeddingStrategy

func NewEmbeddingStrategy(embedder Embedder) Strategy

NewEmbeddingStrategy creates an embedding-only strategy.

func NewHybridStrategy

func NewHybridStrategy(bm25 Strategy, embedding Strategy, alpha float64) (Strategy, error)

NewHybridStrategy creates a weighted hybrid strategy.

Example
package main

import (
	"context"
	"fmt"

	"github.com/jonwraymond/tooldiscovery/semantic"
)

func main() {
	// Create mock embedder for demonstration
	embedder := &mockEmbedder{}

	// Create individual strategies
	bm25 := semantic.NewBM25Strategy(nil)
	embedding := semantic.NewEmbeddingStrategy(embedder)

	// Combine with hybrid strategy (0.7 = 70% BM25, 30% embedding)
	hybrid, err := semantic.NewHybridStrategy(bm25, embedding, 0.7)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	// Use the hybrid strategy
	doc := semantic.Document{
		ID:          "test",
		Name:        "search_files",
		Description: "Search for files in the filesystem",
	}

	ctx := context.Background()
	score, _ := hybrid.Score(ctx, "search files", doc.Normalized())
	fmt.Printf("Hybrid score: %.2f\n", score)
}

// mockEmbedder is a simple embedder for examples
type mockEmbedder struct{}

func (m *mockEmbedder) Embed(_ context.Context, _ string) ([]float32, error) {

	return []float32{1.0, 0.0, 0.0}, nil
}
Output:

Hybrid score: 1.70

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL