Documentation
¶
Overview ¶
Package docsearch provides semantic search over design documentation. It implements a two-stage pipeline: SimHash shortlist → lazy embedding → cosine similarity.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Search ¶
func Search(ctx context.Context, opts SearchOptions) ([]SearchResult, SearchStats, error)
Search executes a documentation search with the given options. Returns results sorted by score (descending) and search statistics. The context controls the overall timeout for neural search — if it expires, partial results are returned gracefully (fallback to SimHash).
Types ¶
type CacheInfo ¶
type CacheInfo struct {
CorpusPath string
CacheFile string
Model string
EntryCount int
CacheSize int64
LastUpdated time.Time
OrphanedCount int
}
CacheInfo contains cache statistics
func GetCacheInfo ¶
GetCacheInfo returns cache statistics for a corpus
type CachedEmbedding ¶
type CachedEmbedding struct {
Path string `json:"path"`
Embedding []float32 `json:"embedding"`
Model string `json:"model"`
ContentHash string `json:"content_hash"` // SHA256 of content for staleness detection
UpdatedAt time.Time `json:"updated_at"`
}
CachedEmbedding stores an embedding with metadata
type CleanupResult ¶
CleanupResult contains cleanup operation results
func CleanupCache ¶
func CleanupCache(corpusPath string) (*CleanupResult, error)
CleanupCache removes orphaned entries from the cache
type DocFrame ¶
type DocFrame struct {
Path string // Full path to document
Title string // Document title
Content string // Full text content
SimHash uint64 // 64-bit SimHash of content
Embedding []float64 // Neural embedding (lazy computed)
EmbeddingModel string // Model used for embedding
EmbeddingUpdatedAt time.Time // When embedding was computed
}
DocFrame represents a document with its metadata and optional embedding
type EmbeddingCache ¶
type EmbeddingCache struct {
// contains filtered or unexported fields
}
EmbeddingCache stores document embeddings with model versioning
func NewEmbeddingCache ¶
func NewEmbeddingCache(model, corpus string) *EmbeddingCache
NewEmbeddingCache creates a new embedding cache for a specific corpus
type EmbeddingCacheFile ¶
type EmbeddingCacheFile struct {
Model string `json:"model"`
Entries map[string]*CachedEmbedding `json:"entries"`
}
EmbeddingCacheFile is the JSON structure for persisting the cache
type SearchOptions ¶
type SearchOptions struct {
Query string // Search query text
DocsPath string // Path to document corpus directory
ExtraPaths []string // Additional corpus directories to search (e.g., changelogs/)
Subdir string // Filter by subdirectory pattern (e.g., "planned", "guides")
Neural bool // Use neural embeddings (requires Ollama)
NeuralCandidates int // Max candidates for neural search
Limit int // Max results to return
JSON bool // Output as JSON
Rebuild bool // Force rebuild of all embeddings (ignore cache)
}
SearchOptions configures a documentation search
type SearchResult ¶
type SearchResult struct {
Path string // Full path to document
Title string // Document title (from # heading)
Score float64 // Similarity score (0-1)
}
SearchResult represents a single search match
type SearchStats ¶
type SearchStats struct {
TotalDocs int // Total docs in corpus
SimHashCandidates int // Candidates from SimHash shortlist
EmbeddingsComputed int // New embeddings computed this search
EmbeddingsReused int // Cached embeddings reused
EmbeddingModel string // Model used for embeddings
SearchTimeMs int64 // Total search time in milliseconds
}
SearchStats provides search performance metrics
type WarmupResult ¶
type WarmupResult struct {
TotalDocs int
AlreadyCached int
NewlyEmbedded int
Failed int
Model string
}
WarmupResult contains statistics from a cache warmup operation.
func WarmupCache ¶
WarmupCache pre-computes embeddings for all docs in a corpus, populating the cache. This prevents cold-cache hangs during interactive neural search. The context controls the overall timeout — warmup stops gracefully if it expires. If quiet is true, progress is not printed to stderr.