docsearch

package
v0.14.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Overview

Package docsearch provides semantic search over design documentation. It implements a two-stage pipeline: SimHash shortlist → lazy embedding → cosine similarity.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

Search executes a documentation search with the given options. Returns results sorted by score (descending) and search statistics. The context controls the overall timeout for neural search — if it expires, partial results are returned gracefully (fallback to SimHash).

Types

type CacheInfo

type CacheInfo struct {
	CorpusPath    string
	CacheFile     string
	Model         string
	EntryCount    int
	CacheSize     int64
	LastUpdated   time.Time
	OrphanedCount int
}

CacheInfo contains cache statistics

func GetCacheInfo

func GetCacheInfo(corpusPath string) (*CacheInfo, error)

GetCacheInfo returns cache statistics for a corpus

type CachedEmbedding

type CachedEmbedding struct {
	Path        string    `json:"path"`
	Embedding   []float32 `json:"embedding"`
	Model       string    `json:"model"`
	ContentHash string    `json:"content_hash"` // SHA256 of content for staleness detection
	UpdatedAt   time.Time `json:"updated_at"`
}

CachedEmbedding stores an embedding with metadata

type CleanupResult

type CleanupResult struct {
	RemovedCount int
	RemovedPaths []string
	OldSize      int64
	NewSize      int64
}

CleanupResult contains cleanup operation results

func CleanupCache

func CleanupCache(corpusPath string) (*CleanupResult, error)

CleanupCache removes orphaned entries from the cache

type DocFrame

type DocFrame struct {
	Path               string    // Full path to document
	Title              string    // Document title
	Content            string    // Full text content
	SimHash            uint64    // 64-bit SimHash of content
	Embedding          []float64 // Neural embedding (lazy computed)
	EmbeddingModel     string    // Model used for embedding
	EmbeddingUpdatedAt time.Time // When embedding was computed
}

DocFrame represents a document with its metadata and optional embedding

type EmbeddingCache

type EmbeddingCache struct {
	// contains filtered or unexported fields
}

EmbeddingCache stores document embeddings with model versioning

func NewEmbeddingCache

func NewEmbeddingCache(model, corpus string) *EmbeddingCache

NewEmbeddingCache creates a new embedding cache for a specific corpus

func (*EmbeddingCache) Close

func (c *EmbeddingCache) Close() error

Close saves the cache to disk

func (*EmbeddingCache) Get

func (c *EmbeddingCache) Get(path, contentHash string) ([]float32, bool)

Get retrieves a cached embedding if it exists, model matches, and content is fresh

func (*EmbeddingCache) Set

func (c *EmbeddingCache) Set(path string, embedding []float32, contentHash string)

Set stores an embedding in the cache with content hash

type EmbeddingCacheFile

type EmbeddingCacheFile struct {
	Model   string                      `json:"model"`
	Entries map[string]*CachedEmbedding `json:"entries"`
}

EmbeddingCacheFile is the JSON structure for persisting the cache

type SearchOptions

type SearchOptions struct {
	Query            string   // Search query text
	DocsPath         string   // Path to document corpus directory
	ExtraPaths       []string // Additional corpus directories to search (e.g., changelogs/)
	Subdir           string   // Filter by subdirectory pattern (e.g., "planned", "guides")
	Neural           bool     // Use neural embeddings (requires Ollama)
	NeuralCandidates int      // Max candidates for neural search
	Limit            int      // Max results to return
	JSON             bool     // Output as JSON
	Rebuild          bool     // Force rebuild of all embeddings (ignore cache)
}

SearchOptions configures a documentation search

type SearchResult

type SearchResult struct {
	Path  string  // Full path to document
	Title string  // Document title (from # heading)
	Score float64 // Similarity score (0-1)
}

SearchResult represents a single search match

type SearchStats

type SearchStats struct {
	TotalDocs          int    // Total docs in corpus
	SimHashCandidates  int    // Candidates from SimHash shortlist
	EmbeddingsComputed int    // New embeddings computed this search
	EmbeddingsReused   int    // Cached embeddings reused
	EmbeddingModel     string // Model used for embeddings
	SearchTimeMs       int64  // Total search time in milliseconds
}

SearchStats provides search performance metrics

type WarmupResult

type WarmupResult struct {
	TotalDocs     int
	AlreadyCached int
	NewlyEmbedded int
	Failed        int
	Model         string
}

WarmupResult contains statistics from a cache warmup operation.

func WarmupCache

func WarmupCache(ctx context.Context, docsPath string, quiet bool) (*WarmupResult, error)

WarmupCache pre-computes embeddings for all docs in a corpus, populating the cache. This prevents cold-cache hangs during interactive neural search. The context controls the overall timeout — warmup stops gracefully if it expires. If quiet is true, progress is not printed to stderr.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL