docsearch

package

v0.14.2 Latest Latest Go to latest Published: Apr 26, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sunholo-data/ailang

Links

Open Source Insights

Documentation ¶

Overview ¶

Package docsearch provides semantic search over design documentation. It implements a two-stage pipeline: SimHash shortlist → lazy embedding → cosine similarity.

Index ¶

func Search(ctx context.Context, opts SearchOptions) ([]SearchResult, SearchStats, error)
type CacheInfo
- func GetCacheInfo(corpusPath string) (*CacheInfo, error)
type CachedEmbedding
type CleanupResult
- func CleanupCache(corpusPath string) (*CleanupResult, error)
type DocFrame
type EmbeddingCache
- func NewEmbeddingCache(model, corpus string) *EmbeddingCache
type EmbeddingCacheFile
type SearchOptions
type SearchResult
type SearchStats
type WarmupResult
- func WarmupCache(ctx context.Context, docsPath string, quiet bool) (*WarmupResult, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Search ¶

func Search(ctx context.Context, opts SearchOptions) ([]SearchResult, SearchStats, error)

Search executes a documentation search with the given options. Returns results sorted by score (descending) and search statistics. The context controls the overall timeout for neural search — if it expires, partial results are returned gracefully (fallback to SimHash).

Types ¶

type CacheInfo ¶

type CacheInfo struct {
	CorpusPath    string
	CacheFile     string
	Model         string
	EntryCount    int
	CacheSize     int64
	LastUpdated   time.Time
	OrphanedCount int
}

CacheInfo contains cache statistics

func GetCacheInfo ¶

func GetCacheInfo(corpusPath string) (*CacheInfo, error)

GetCacheInfo returns cache statistics for a corpus

type CachedEmbedding ¶

type CachedEmbedding struct {
	Path        string    `json:"path"`
	Embedding   []float32 `json:"embedding"`
	Model       string    `json:"model"`
	ContentHash string    `json:"content_hash"` // SHA256 of content for staleness detection
	UpdatedAt   time.Time `json:"updated_at"`
}

CachedEmbedding stores an embedding with metadata

type CleanupResult ¶

type CleanupResult struct {
	RemovedCount int
	RemovedPaths []string
	OldSize      int64
	NewSize      int64
}

CleanupResult contains cleanup operation results

func CleanupCache ¶

func CleanupCache(corpusPath string) (*CleanupResult, error)

CleanupCache removes orphaned entries from the cache

type DocFrame ¶

type DocFrame struct {
	Path               string    // Full path to document
	Title              string    // Document title
	Content            string    // Full text content
	SimHash            uint64    // 64-bit SimHash of content
	Embedding          []float64 // Neural embedding (lazy computed)
	EmbeddingModel     string    // Model used for embedding
	EmbeddingUpdatedAt time.Time // When embedding was computed
}

DocFrame represents a document with its metadata and optional embedding

type EmbeddingCache ¶

type EmbeddingCache struct {
	// contains filtered or unexported fields
}

EmbeddingCache stores document embeddings with model versioning

func NewEmbeddingCache ¶

func NewEmbeddingCache(model, corpus string) *EmbeddingCache

NewEmbeddingCache creates a new embedding cache for a specific corpus

func (*EmbeddingCache) Close ¶

func (c *EmbeddingCache) Close() error

Close saves the cache to disk

func (*EmbeddingCache) Get ¶

func (c *EmbeddingCache) Get(path, contentHash string) ([]float32, bool)

Get retrieves a cached embedding if it exists, model matches, and content is fresh

func (*EmbeddingCache) Set ¶

func (c *EmbeddingCache) Set(path string, embedding []float32, contentHash string)

Set stores an embedding in the cache with content hash

type EmbeddingCacheFile ¶

type EmbeddingCacheFile struct {
	Model   string                      `json:"model"`
	Entries map[string]*CachedEmbedding `json:"entries"`
}

EmbeddingCacheFile is the JSON structure for persisting the cache

type SearchOptions ¶

type SearchOptions struct {
	Query            string   // Search query text
	DocsPath         string   // Path to document corpus directory
	ExtraPaths       []string // Additional corpus directories to search (e.g., changelogs/)
	Subdir           string   // Filter by subdirectory pattern (e.g., "planned", "guides")
	Neural           bool     // Use neural embeddings (requires Ollama)
	NeuralCandidates int      // Max candidates for neural search
	Limit            int      // Max results to return
	JSON             bool     // Output as JSON
	Rebuild          bool     // Force rebuild of all embeddings (ignore cache)
}

SearchOptions configures a documentation search

type SearchResult ¶

type SearchResult struct {
	Path  string  // Full path to document
	Title string  // Document title (from # heading)
	Score float64 // Similarity score (0-1)
}

SearchResult represents a single search match

type SearchStats ¶

type SearchStats struct {
	TotalDocs          int    // Total docs in corpus
	SimHashCandidates  int    // Candidates from SimHash shortlist
	EmbeddingsComputed int    // New embeddings computed this search
	EmbeddingsReused   int    // Cached embeddings reused
	EmbeddingModel     string // Model used for embeddings
	SearchTimeMs       int64  // Total search time in milliseconds
}

SearchStats provides search performance metrics

type WarmupResult ¶

type WarmupResult struct {
	TotalDocs     int
	AlreadyCached int
	NewlyEmbedded int
	Failed        int
	Model         string
}

WarmupResult contains statistics from a cache warmup operation.

func WarmupCache ¶

func WarmupCache(ctx context.Context, docsPath string, quiet bool) (*WarmupResult, error)

WarmupCache pre-computes embeddings for all docs in a corpus, populating the cache. This prevents cold-cache hangs during interactive neural search. The context controls the overall timeout — warmup stops gracefully if it expires. If quiet is true, progress is not printed to stderr.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL