Documentation
¶
Overview ¶
Package rag provides Retrieval-Augmented Generation (RAG) capabilities.
Architecture ¶
The RAG package follows a layered architecture:
┌─────────────────────────────────────────────────────────────────────────┐ │ SearchEngine (rag/search.go) │ │ • Query processing, retrieval, reranking │ ├─────────────────────────────────────────────────────────────────────────┤ │ Chunker (rag/chunker.go) │ │ • Content splitting strategies │ ├─────────────────────────────────────────────────────────────────────────┤ │ Shared Foundation │ │ ┌───────────────────────────┐ ┌───────────────────────────┐ │ │ │ vector/provider.go │ │ embedder/embedder.go │ │ │ └───────────────────────────┘ └───────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────────┘
Usage ¶
Basic usage for document ingestion and search:
// Create search engine
engine, _ := rag.NewSearchEngine(rag.SearchEngineConfig{
Provider: vectorProvider,
Embedder: embedder,
})
// Ingest document
engine.IngestDocument(ctx, "doc1", "Document content...", metadata)
// Search
results, _ := engine.Search(ctx, "query", 10)
Integration with Memory ¶
The RAG package shares the same vector.Provider abstraction as the memory package, allowing both to use the same vector database backend.
Index ¶
- Constants
- func DoWithResult[T any](ctx context.Context, r *Retryer, operation string, fn func() (T, error)) (T, error)
- func IsRetryExhausted(err error) bool
- func NewVectorProviderFromConfig(cfg *config.VectorStoreConfig) (vector.Provider, error)
- type APIAuthConfig
- type APIEndpointConfig
- type APISource
- func (a *APISource) Close() error
- func (a *APISource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (a *APISource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (a *APISource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (a *APISource) SupportsIncrementalIndexing() bool
- func (a *APISource) Type() string
- type BinaryExtractor
- type BlobSource
- func (bs *BlobSource) Close() error
- func (bs *BlobSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (bs *BlobSource) GetFilter() FileFilter
- func (bs *BlobSource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (bs *BlobSource) GetPrefix() string
- func (bs *BlobSource) GetURL() string
- func (bs *BlobSource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (bs *BlobSource) SupportsIncrementalIndexing() bool
- func (bs *BlobSource) Type() string
- type BlobSourceConfig
- type Chunk
- type ChunkContext
- type Chunker
- type ChunkerConfig
- type ChunkerStrategy
- type ChunkingError
- type CodeMetadata
- type CollectionSource
- func (cs *CollectionSource) Close() error
- func (cs *CollectionSource) CollectionName() string
- func (cs *CollectionSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (cs *CollectionSource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (cs *CollectionSource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (cs *CollectionSource) SupportsIncrementalIndexing() bool
- func (cs *CollectionSource) Type() string
- type ContentExtractor
- type DBPoolAdapter
- type DataSource
- type DirectorySource
- func (ds *DirectorySource) Close() error
- func (ds *DirectorySource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (ds *DirectorySource) GetBasePath() string
- func (ds *DirectorySource) GetFilter() FileFilter
- func (ds *DirectorySource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (ds *DirectorySource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (ds *DirectorySource) SupportsIncrementalIndexing() bool
- func (ds *DirectorySource) Type() string
- type DirectorySourceConfig
- type Document
- type DocumentEvent
- type DocumentEventType
- type DocumentStore
- func (s *DocumentStore) Clear(ctx context.Context) error
- func (s *DocumentStore) Close() error
- func (s *DocumentStore) Collection() string
- func (s *DocumentStore) Config() DocumentStoreConfig
- func (s *DocumentStore) GetDocument(ctx context.Context, id string) (*SearchResult, error)
- func (s *DocumentStore) GetSearchEngine() *SearchEngine
- func (s *DocumentStore) HealthCheck(ctx context.Context) HealthCheck
- func (s *DocumentStore) Index(ctx context.Context) error
- func (s *DocumentStore) Metrics() IndexMetricsSnapshot
- func (s *DocumentStore) Name() string
- func (s *DocumentStore) RefreshDocument(ctx context.Context, docID string) error
- func (s *DocumentStore) RegisterExtractor(e ContentExtractor)
- func (s *DocumentStore) Search(ctx context.Context, req SearchRequest) (*SearchResponse, error)
- func (s *DocumentStore) SearchWithFilter(ctx context.Context, query string, topK int, filter map[string]any) (*SearchResponse, error)
- func (s *DocumentStore) StartWatching(ctx context.Context) error
- func (s *DocumentStore) Stats() DocumentStoreStats
- func (s *DocumentStore) StopWatching()
- type DocumentStoreConfig
- type DocumentStoreError
- type DocumentStoreStats
- type ExtractedContent
- type ExtractionError
- type ExtractorRegistry
- func (r *ExtractorRegistry) Extract(ctx context.Context, doc Document) (*ExtractedContent, error)
- func (r *ExtractorRegistry) ExtractContent(ctx context.Context, path string, mimeType string, fileSize int64) (*ExtractedContent, error)
- func (r *ExtractorRegistry) GetExtractors() []ContentExtractor
- func (r *ExtractorRegistry) HasExtractorForFile(path string, mimeType string) bool
- func (r *ExtractorRegistry) Register(extractor ContentExtractor)
- type FactoryDeps
- type FileCheckpoint
- type FileFilter
- type FileWatcher
- type FileWatcherConfig
- type FunctionInfo
- type GoMetadataExtractor
- type HealthCheck
- type HealthChecker
- type HealthStatus
- type HyDE
- type IndexCheckpoint
- type IndexCheckpointManager
- func (cm *IndexCheckpointManager) ClearCheckpoint() error
- func (cm *IndexCheckpointManager) ForceSave() error
- func (cm *IndexCheckpointManager) FormatCheckpointInfo(checkpoint *IndexCheckpoint) string
- func (cm *IndexCheckpointManager) GetProcessedCount() int
- func (cm *IndexCheckpointManager) IsEnabled() bool
- func (cm *IndexCheckpointManager) LoadCheckpoint() (*IndexCheckpoint, error)
- func (cm *IndexCheckpointManager) RecordFile(path string, size int64, modTime time.Time, status string)
- func (cm *IndexCheckpointManager) SaveCheckpoint() error
- func (cm *IndexCheckpointManager) SetTotalFiles(total int)
- func (cm *IndexCheckpointManager) ShouldProcessFile(path string, size int64, modTime time.Time) bool
- type IndexError
- type IndexMetrics
- func (m *IndexMetrics) IncrementErrors()
- func (m *IndexMetrics) IncrementIndexed()
- func (m *IndexMetrics) IncrementSkipped()
- func (m *IndexMetrics) IncrementTotal()
- func (m *IndexMetrics) RecordSearch(latency time.Duration)
- func (m *IndexMetrics) Reset()
- func (m *IndexMetrics) SetEndTime(t time.Time)
- func (m *IndexMetrics) SetStartTime(t time.Time)
- func (m *IndexMetrics) Snapshot() IndexMetricsSnapshot
- type IndexMetricsSnapshot
- type LLMQueryExpander
- type MCPExtractor
- type MCPExtractorConfig
- type MetadataExtractor
- type MetadataExtractorRegistry
- type MultiQueryExpander
- type NativeParseResult
- type NativeParser
- type NativeParserRegistry
- type NilChunker
- type NilDataSource
- func (NilDataSource) Close() error
- func (NilDataSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (NilDataSource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (NilDataSource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (NilDataSource) SupportsIncrementalIndexing() bool
- func (NilDataSource) Type() string
- type NilMultiQueryExpander
- type NilQueryExpander
- type NilReranker
- type OverlappingChunker
- type PaginationConfig
- type PatternCache
- type PatternFilter
- type ProgressStats
- type ProgressTracker
- func (pt *ProgressTracker) GetExtractorStats() map[string]int64
- func (pt *ProgressTracker) GetStats() ProgressStats
- func (pt *ProgressTracker) IncrementDeleted()
- func (pt *ProgressTracker) IncrementFailed()
- func (pt *ProgressTracker) IncrementIndexed()
- func (pt *ProgressTracker) IncrementProcessed()
- func (pt *ProgressTracker) IncrementSkipped()
- func (pt *ProgressTracker) RecordExtractorUsage(extractorName string)
- func (pt *ProgressTracker) SetCurrentFile(filename string)
- func (pt *ProgressTracker) SetTotalFiles(total int64)
- func (pt *ProgressTracker) Start()
- func (pt *ProgressTracker) Stop()
- type QueryExpander
- type RankingDecision
- type RerankResult
- type Reranker
- type RetryConfig
- type RetryError
- type Retryer
- type SQLSource
- func (s *SQLSource) Close() error
- func (s *SQLSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
- func (s *SQLSource) GetLastModified(ctx context.Context, id string) (time.Time, error)
- func (s *SQLSource) ReadDocument(ctx context.Context, id string) (*Document, error)
- func (s *SQLSource) SupportsIncrementalIndexing() bool
- func (s *SQLSource) Type() string
- type SQLSourceOptions
- type SQLTableConfig
- type SearchEngine
- func (e *SearchEngine) Clear(ctx context.Context) error
- func (e *SearchEngine) Close() error
- func (e *SearchEngine) Collection() string
- func (e *SearchEngine) DeleteByFilter(ctx context.Context, filter map[string]any) error
- func (e *SearchEngine) DeleteDocument(ctx context.Context, documentID string) error
- func (e *SearchEngine) HealthCheck(ctx context.Context) HealthCheck
- func (e *SearchEngine) IngestDocument(ctx context.Context, doc Document) error
- func (e *SearchEngine) IngestDocuments(ctx context.Context, docs []Document) error
- func (e *SearchEngine) Search(ctx context.Context, req SearchRequest) (*SearchResponse, error)
- func (e *SearchEngine) Status() map[string]any
- type SearchEngineConfig
- type SearchError
- type SearchMetrics
- type SearchMetricsSnapshot
- type SearchOptions
- type SearchRequest
- type SearchResponse
- type SearchResult
- type SemanticChunker
- type SimpleChunker
- type SourceDocument
- type TextExtractor
- type Tool
- type ToolCaller
- type ToolInfo
- type ToolParameter
- type ToolResult
- type TypeInfo
Constants ¶
const ( // MinQueryLength is the minimum allowed query length. MinQueryLength = 2 // MaxQueryLength is the maximum allowed query length. MaxQueryLength = 10000 )
Query validation constants (from legacy).
Variables ¶
This section is empty.
Functions ¶
func DoWithResult ¶
func DoWithResult[T any](ctx context.Context, r *Retryer, operation string, fn func() (T, error)) (T, error)
DoWithResult executes an operation that returns a value.
func IsRetryExhausted ¶
IsRetryExhausted checks if an error is a retry exhaustion error.
func NewVectorProviderFromConfig ¶
func NewVectorProviderFromConfig(cfg *config.VectorStoreConfig) (vector.Provider, error)
NewVectorProviderFromConfig creates a vector provider from configuration.
Types ¶
type APIAuthConfig ¶
type APIAuthConfig struct {
Type string `yaml:"type"` // "bearer", "basic", "apikey", "oauth2"
Token string `yaml:"token"` // Token/API key
User string `yaml:"user"` // Username (for basic auth)
Pass string `yaml:"pass"` // Password (for basic auth)
Header string `yaml:"header"` // Header name (for apikey type)
Extra map[string]string `yaml:"extra"` // Additional auth parameters
}
APIAuthConfig defines authentication for API requests.
Direct port from legacy pkg/context/indexing/api_source.go
type APIEndpointConfig ¶
type APIEndpointConfig struct {
Path string `yaml:"path"` // API path (relative to baseURL)
Method string `yaml:"method"` // HTTP method (default: GET)
Params map[string]string `yaml:"params"` // Query parameters
Headers map[string]string `yaml:"headers"` // Additional headers
Body string `yaml:"body"` // Request body (for POST/PUT)
IDField string `yaml:"id_field"` // JSON field to use as document ID
ContentField string `yaml:"content_field"` // JSON field(s) to use as content (comma-separated or JSONPath)
MetadataFields []string `yaml:"metadata_fields"` // JSON fields to include as metadata
UpdatedField string `yaml:"updated_field"` // JSON field for last modified time
Pagination *PaginationConfig `yaml:"pagination"` // Pagination configuration
Transform string `yaml:"transform"` // Optional JavaScript-like transform function (future)
}
APIEndpointConfig defines an API endpoint to index.
Direct port from legacy pkg/context/indexing/api_source.go
type APISource ¶
type APISource struct {
// contains filtered or unexported fields
}
APISource implements DataSource for REST API endpoints.
Direct port from legacy pkg/context/indexing/api_source.go
func NewAPISource ¶
func NewAPISource(baseURL string, endpoints []APIEndpointConfig, auth *APIAuthConfig) *APISource
NewAPISource creates a new REST API data source.
Direct port from legacy pkg/context/indexing/api_source.go
func (*APISource) DiscoverDocuments ¶
DiscoverDocuments returns channels of discovered documents and errors.
Direct port from legacy pkg/context/indexing/api_source.go
func (*APISource) GetLastModified ¶
GetLastModified returns the last modification time for a document.
func (*APISource) ReadDocument ¶
ReadDocument retrieves a specific document by its ID.
func (*APISource) SupportsIncrementalIndexing ¶
SupportsIncrementalIndexing returns true if UpdatedField is configured.
type BinaryExtractor ¶
type BinaryExtractor struct {
// contains filtered or unexported fields
}
BinaryExtractor handles binary files like PDF, DOCX, XLSX using native parsers.
Direct port from legacy pkg/context/extraction/binary_extractor.go
func NewBinaryExtractor ¶
func NewBinaryExtractor(nativeParsers NativeParser) *BinaryExtractor
NewBinaryExtractor creates a new binary extractor.
func (*BinaryExtractor) CanExtract ¶
func (be *BinaryExtractor) CanExtract(path string, mimeType string) bool
CanExtract checks if this extractor can handle the file.
func (*BinaryExtractor) Extract ¶
func (be *BinaryExtractor) Extract(ctx context.Context, path string, fileSize int64) (*ExtractedContent, error)
Extract uses native parsers to extract content from binary files.
func (*BinaryExtractor) Name ¶
func (be *BinaryExtractor) Name() string
Name returns the extractor name.
func (*BinaryExtractor) Priority ¶
func (be *BinaryExtractor) Priority() int
Priority returns medium priority (5).
type BlobSource ¶ added in v1.21.0
type BlobSource struct {
// contains filtered or unexported fields
}
BlobSource implements DataSource for blob storage (local files, S3, GCS, Azure Blob, etc.). Uses gocloud.dev/blob to provide unified access to multiple storage backends.
Replaces and extends DirectorySource to support both local and cloud storage: - file:///path/to/dir - Local filesystem - s3://bucket/prefix?region=us-east-1 - AWS S3 - gs://bucket/prefix - Google Cloud Storage - azblob://container/prefix - Azure Blob Storage
func NewBlobSource ¶ added in v1.21.0
func NewBlobSource(ctx context.Context, cfg BlobSourceConfig) (*BlobSource, error)
NewBlobSource creates a new blob storage data source.
URL examples:
- s3://my-bucket/docs?region=us-east-1
- gs://my-bucket/docs
- azblob://my-container/docs
Environment variables for credentials:
- AWS: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
- GCS: GOOGLE_APPLICATION_CREDENTIALS
- Azure: AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY
func (*BlobSource) Close ¶ added in v1.21.0
func (bs *BlobSource) Close() error
Close releases any resources held by the data source.
func (*BlobSource) DiscoverDocuments ¶ added in v1.21.0
func (bs *BlobSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
DiscoverDocuments returns channels of discovered documents and errors. Documents are discovered asynchronously and sent through the channel.
func (*BlobSource) GetFilter ¶ added in v1.21.0
func (bs *BlobSource) GetFilter() FileFilter
GetFilter returns the file filter.
func (*BlobSource) GetLastModified ¶ added in v1.21.0
GetLastModified returns the last modification time for a document.
func (*BlobSource) GetPrefix ¶ added in v1.21.0
func (bs *BlobSource) GetPrefix() string
GetPrefix returns the blob prefix filter.
func (*BlobSource) GetURL ¶ added in v1.21.0
func (bs *BlobSource) GetURL() string
GetURL returns the blob storage URL.
func (*BlobSource) ReadDocument ¶ added in v1.21.0
ReadDocument retrieves a specific document by its ID. ID format: blob:<storage_type>:<key>
func (*BlobSource) SupportsIncrementalIndexing ¶ added in v1.21.0
func (bs *BlobSource) SupportsIncrementalIndexing() bool
SupportsIncrementalIndexing returns true as blob sources support incremental indexing.
func (*BlobSource) Type ¶ added in v1.21.0
func (bs *BlobSource) Type() string
Type returns the data source type.
type BlobSourceConfig ¶ added in v1.21.0
type BlobSourceConfig struct {
// URL is the blob storage URL (file://, s3://, gs://, azblob://)
URL string
// Prefix filters to blobs with this prefix (optional)
Prefix string
// Include patterns for files (same as DirectorySource)
Include []string
// Exclude patterns for files (same as DirectorySource)
Exclude []string
// MaxFileSize limits file size in bytes
MaxFileSize int64
}
BlobSourceConfig configures a blob storage source.
func DefaultBlobSourceConfig ¶ added in v1.21.0
func DefaultBlobSourceConfig(url string) BlobSourceConfig
DefaultBlobSourceConfig returns default configuration for blob source.
type Chunk ¶
type Chunk struct {
// Content is the actual text content of this chunk.
Content string `json:"content"`
// Index is the chunk's position within the document (0-based).
Index int `json:"index"`
// Total is the total number of chunks for this document.
Total int `json:"total"`
// StartLine is the starting line number in the source document (1-based).
StartLine int `json:"start_line"`
// EndLine is the ending line number in the source document (1-based).
EndLine int `json:"end_line"`
// StartByte is the byte offset where this chunk begins (optional).
StartByte int `json:"start_byte,omitempty"`
// EndByte is the byte offset where this chunk ends (optional).
EndByte int `json:"end_byte,omitempty"`
// Context provides semantic context for the chunk (function name, type, etc.).
Context *ChunkContext `json:"context,omitempty"`
// Metadata contains additional chunk-specific information.
Metadata map[string]any `json:"metadata,omitempty"`
}
Chunk represents a piece of content with position and context information.
Chunks are the fundamental unit of retrieval in RAG systems. Each chunk:
- Contains a portion of the original document
- Tracks its position within the source
- Preserves semantic context for better retrieval
Derived from legacy pkg/context/chunking/chunker.go:Chunk
type ChunkContext ¶
type ChunkContext struct {
// FunctionName is the containing function/method name (for code).
FunctionName string `json:"function_name,omitempty"`
// TypeName is the containing type/class name (for code).
TypeName string `json:"type_name,omitempty"`
// FilePath is the source file path.
FilePath string `json:"file_path,omitempty"`
// Language is the detected programming language (for code).
Language string `json:"language,omitempty"`
// Section is the document section name (for prose documents).
Section string `json:"section,omitempty"`
// ParentID links to a parent chunk (for hierarchical retrieval).
ParentID string `json:"parent_id,omitempty"`
}
ChunkContext provides semantic context for a chunk.
This is especially useful for code files where understanding the function or type a chunk belongs to improves retrieval quality.
type Chunker ¶
type Chunker interface {
// Chunk splits content into pieces.
//
// The content is split according to the chunker's strategy.
// Each chunk includes position information (line numbers, byte offsets)
// for source mapping.
//
// Parameters:
// - content: the text to split
// - ctx: optional context (e.g., from metadata extraction)
//
// Returns chunks ordered by position in the original content.
Chunk(content string, ctx *ChunkContext) ([]Chunk, error)
// Strategy returns the chunker strategy name.
Strategy() ChunkerStrategy
// Config returns the chunker configuration.
Config() ChunkerConfig
}
Chunker splits content into smaller pieces for indexing.
Chunking is critical for RAG quality:
- Too small: loses context, retrieves fragments
- Too large: wastes tokens, dilutes relevance
- Good chunking: preserves semantic units, enables precise retrieval
Derived from legacy pkg/context/chunking/chunker.go:Chunker
func NewChunker ¶
func NewChunker(cfg ChunkerConfig) (Chunker, error)
NewChunker creates a chunker from configuration.
func NewChunkerFromConfig ¶
func NewChunkerFromConfig(cfg *config.ChunkingConfig) (Chunker, error)
NewChunkerFromConfig creates a chunker from configuration.
type ChunkerConfig ¶
type ChunkerConfig struct {
// Strategy is the chunking strategy.
// Values: "simple", "overlapping", "semantic"
// Default: "simple"
Strategy ChunkerStrategy `yaml:"strategy,omitempty"`
// Size is the target chunk size in characters.
// Default: 1000
Size int `yaml:"size,omitempty"`
// Overlap is the overlap size in characters (for overlapping strategy).
// Default: 200
Overlap int `yaml:"overlap,omitempty"`
// MinSize is the minimum chunk size (chunks smaller than this are merged).
// Default: 100
MinSize int `yaml:"min_size,omitempty"`
// MaxSize is the maximum chunk size (hard limit).
// Default: 2000
MaxSize int `yaml:"max_size,omitempty"`
// Separators are the preferred split points for semantic chunking.
// Default: ["\n\n", "\n", ". ", " "]
Separators []string `yaml:"separators,omitempty"`
// PreserveWords avoids splitting in the middle of words.
// Default: true
PreserveWords bool `yaml:"preserve_words,omitempty"`
}
ChunkerConfig configures chunking behavior.
func DefaultChunkerConfig ¶
func DefaultChunkerConfig() ChunkerConfig
DefaultChunkerConfig returns sensible defaults.
func (*ChunkerConfig) SetDefaults ¶
func (c *ChunkerConfig) SetDefaults()
SetDefaults applies default values.
func (*ChunkerConfig) Validate ¶
func (c *ChunkerConfig) Validate() error
Validate checks the configuration for errors.
type ChunkerStrategy ¶
type ChunkerStrategy string
ChunkerStrategy identifies a chunking strategy.
const ( // ChunkerSimple splits content by fixed character count. // Fast but may split mid-sentence/word. ChunkerSimple ChunkerStrategy = "simple" // ChunkerOverlapping splits with overlap between chunks. // Better for retrieval as context is preserved at boundaries. ChunkerOverlapping ChunkerStrategy = "overlapping" // ChunkerSemantic splits at natural boundaries (paragraphs, sections). // Best quality but more complex and slower. ChunkerSemantic ChunkerStrategy = "semantic" )
type ChunkingError ¶
type ChunkingError struct {
Strategy string // Chunking strategy
DocumentID string // Document ID
Message string // Error message
Err error // Underlying error
}
ChunkingError represents an error during document chunking.
func NewChunkingError ¶
func NewChunkingError(strategy, documentID, message string, err error) *ChunkingError
NewChunkingError creates a new ChunkingError.
func (*ChunkingError) Error ¶
func (e *ChunkingError) Error() string
Error implements the error interface.
func (*ChunkingError) Unwrap ¶
func (e *ChunkingError) Unwrap() error
Unwrap returns the underlying error.
type CodeMetadata ¶
type CodeMetadata struct {
Functions []FunctionInfo `json:"functions,omitempty"`
Types []TypeInfo `json:"types,omitempty"`
Imports []string `json:"imports,omitempty"`
Symbols map[string]interface{} `json:"symbols,omitempty"`
Custom map[string]interface{} `json:"custom,omitempty"`
}
CodeMetadata contains extracted code structure information.
Direct port from legacy pkg/context/metadata/extractor.go
type CollectionSource ¶
type CollectionSource struct {
// contains filtered or unexported fields
}
CollectionSource implements DataSource for collection-only stores. It's a no-op source that doesn't index anything - used when document store points to an existing collection that's already populated.
Direct port from legacy pkg/context/indexing/collection_source.go
func NewCollectionSource ¶
func NewCollectionSource(collectionName string) *CollectionSource
NewCollectionSource creates a new collection-only data source.
func (*CollectionSource) Close ¶
func (cs *CollectionSource) Close() error
Close closes the collection source.
func (*CollectionSource) CollectionName ¶
func (cs *CollectionSource) CollectionName() string
CollectionName returns the collection name.
func (*CollectionSource) DiscoverDocuments ¶
func (cs *CollectionSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
DiscoverDocuments returns empty channels - no documents to index.
func (*CollectionSource) GetLastModified ¶
GetLastModified returns zero time - not supported for collection sources.
func (*CollectionSource) ReadDocument ¶
ReadDocument returns an error - not supported for collection sources.
func (*CollectionSource) SupportsIncrementalIndexing ¶
func (cs *CollectionSource) SupportsIncrementalIndexing() bool
SupportsIncrementalIndexing returns false.
func (*CollectionSource) Type ¶
func (cs *CollectionSource) Type() string
Type returns the data source type.
type ContentExtractor ¶
type ContentExtractor interface {
// Name returns the extractor name for logging/debugging.
Name() string
// CanExtract determines if this extractor can handle the given file.
CanExtract(path string, mimeType string) bool
// Extract extracts content from the file.
Extract(ctx context.Context, path string, fileSize int64) (*ExtractedContent, error)
// Priority returns the priority (higher = preferred when multiple extractors match).
Priority() int
}
ContentExtractor defines the interface for extracting content from files.
Direct port from legacy pkg/context/extraction/extractor.go
type DBPoolAdapter ¶
type DBPoolAdapter struct {
// contains filtered or unexported fields
}
DBPoolAdapter wraps config.DBPool to provide sql.DB connections.
func NewDBPoolAdapter ¶
func NewDBPoolAdapter(pool *config.DBPool, databaseDSN string) *DBPoolAdapter
NewDBPoolAdapter creates an adapter for the DBPool.
type DataSource ¶
type DataSource interface {
// Type returns the type of data source (e.g., "directory", "sql", "api", "s3")
Type() string
// DiscoverDocuments returns a channel of discovered documents and a channel of errors.
// Documents are discovered asynchronously and sent through the channel.
// For file sources, content should be read from files.
// For SQL/API sources, content should already be populated.
DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
// ReadDocument retrieves a specific document by its ID.
// The ID format depends on the source type (file path, SQL row ID, API endpoint, etc.)
ReadDocument(ctx context.Context, id string) (*Document, error)
// SupportsIncrementalIndexing indicates if this source supports incremental updates
// based on modification timestamps or change tracking.
SupportsIncrementalIndexing() bool
// GetLastModified returns the last modification time for a document, if available.
// Returns zero time if not supported or document doesn't exist.
GetLastModified(ctx context.Context, id string) (time.Time, error)
// Close releases any resources held by the data source.
Close() error
}
DataSource represents a generic source of documents to be indexed. It abstracts over filesystem, SQL databases, REST APIs, and cloud storage.
Direct port from legacy pkg/context/indexing/data_source.go
func NewBlobSourceFromConfig ¶ added in v1.21.0
func NewBlobSourceFromConfig(ctx context.Context, cfg BlobSourceConfig) (DataSource, error)
NewBlobSourceFromConfig creates a blob source from BlobSourceConfig.
func NewDataSourceFromConfig ¶
func NewDataSourceFromConfig(cfg *config.DocumentSourceConfig, deps *FactoryDeps) (DataSource, error)
NewDataSourceFromConfig creates a data source from configuration.
func NewDirectorySourceFromConfig ¶
func NewDirectorySourceFromConfig(cfg DirectorySourceConfig) (DataSource, error)
NewDirectorySourceFromConfig creates a directory source from config.
type DirectorySource ¶
type DirectorySource struct {
// contains filtered or unexported fields
}
DirectorySource implements DataSource for local filesystem directories.
Direct port from legacy pkg/context/indexing/directory_source.go
func NewDirectorySource ¶
func NewDirectorySource(basePath string, filter FileFilter, maxFileSize int64) *DirectorySource
NewDirectorySource creates a new directory-based data source.
Direct port from legacy pkg/context/indexing/directory_source.go
func (*DirectorySource) Close ¶
func (ds *DirectorySource) Close() error
Close releases any resources held by the data source.
func (*DirectorySource) DiscoverDocuments ¶
func (ds *DirectorySource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
DiscoverDocuments returns channels of discovered documents and errors. Documents are discovered asynchronously and sent through the channel.
Direct port from legacy pkg/context/indexing/directory_source.go
func (*DirectorySource) GetBasePath ¶
func (ds *DirectorySource) GetBasePath() string
GetBasePath returns the base directory path (helper method).
func (*DirectorySource) GetFilter ¶
func (ds *DirectorySource) GetFilter() FileFilter
GetFilter returns the file filter (helper method).
func (*DirectorySource) GetLastModified ¶
GetLastModified returns the last modification time for a document.
func (*DirectorySource) ReadDocument ¶
ReadDocument retrieves a specific document by its ID (file path).
Direct port from legacy pkg/context/indexing/directory_source.go
func (*DirectorySource) SupportsIncrementalIndexing ¶
func (ds *DirectorySource) SupportsIncrementalIndexing() bool
SupportsIncrementalIndexing returns true as directory sources support incremental indexing.
func (*DirectorySource) Type ¶
func (ds *DirectorySource) Type() string
Type returns the data source type.
type DirectorySourceConfig ¶
type DirectorySourceConfig struct {
Path string
Include []string
Exclude []string
MaxFileSize int64 // Max file size in bytes to process (0 for no limit)
}
DirectorySourceConfig configures a directory data source.
func DefaultDirectorySourceConfig ¶
func DefaultDirectorySourceConfig(path string) DirectorySourceConfig
DefaultDirectorySourceConfig returns sensible defaults for directory source. Includes both text-based source code files and binary document formats that can be parsed by native parsers (PDF, DOCX, XLSX).
type Document ¶
type Document struct {
// ID is the unique identifier for this document.
ID string `json:"id"`
// Content is the text content to be indexed.
Content string `json:"content"`
// Title is the document title (optional).
Title string `json:"title,omitempty"`
// SourcePath is the path to the source file (for file-based documents).
SourcePath string `json:"source_path,omitempty"`
// MimeType is the content type (e.g., "text/plain", "text/markdown").
MimeType string `json:"mime_type,omitempty"`
// Size is the content size in bytes.
Size int64 `json:"size"`
// Metadata contains additional document information.
Metadata map[string]any `json:"metadata,omitempty"`
}
Document represents a document to be indexed.
Documents go through the following pipeline:
- Content extraction (if binary)
- Chunking (split into searchable pieces)
- Embedding (convert to vectors)
- Indexing (store in vector database)
type DocumentEvent ¶
type DocumentEvent struct {
Type DocumentEventType
Document Document
Error error
}
DocumentEvent represents a change in a document.
type DocumentEventType ¶
type DocumentEventType string
DocumentEventType indicates the type of change.
const ( DocumentEventCreate DocumentEventType = "create" DocumentEventUpdate DocumentEventType = "update" DocumentEventDelete DocumentEventType = "delete" DocumentEventError DocumentEventType = "error" )
type DocumentStore ¶
type DocumentStore struct {
// contains filtered or unexported fields
}
DocumentStore manages document indexing and search.
It combines:
- DataSource: Where documents come from
- ContentExtractor: How to extract text from documents
- SearchEngine: How to index and search
- File watching: Automatic re-indexing on changes
- Concurrent indexing with configurable worker pool
- Retry logic for transient failures
- Checkpoint/resume for interrupted indexing
- Progress tracking with ETA
Direct port from legacy pkg/context/document_store.go
func NewDocumentStore ¶
func NewDocumentStore(cfg DocumentStoreConfig) (*DocumentStore, error)
NewDocumentStore creates a new document store.
func NewDocumentStoreFromConfig ¶
func NewDocumentStoreFromConfig( name string, storeCfg *config.DocumentStoreConfig, deps *FactoryDeps, ) (*DocumentStore, error)
NewDocumentStoreFromConfig creates a document store from configuration.
func (*DocumentStore) Clear ¶
func (s *DocumentStore) Clear(ctx context.Context) error
Clear removes all indexed documents.
func (*DocumentStore) Close ¶
func (s *DocumentStore) Close() error
Close stops watching and releases resources.
func (*DocumentStore) Collection ¶
func (s *DocumentStore) Collection() string
Collection returns the collection name.
func (*DocumentStore) Config ¶
func (s *DocumentStore) Config() DocumentStoreConfig
Config returns the store configuration.
func (*DocumentStore) GetDocument ¶
func (s *DocumentStore) GetDocument(ctx context.Context, id string) (*SearchResult, error)
GetDocument retrieves a specific document by ID.
Direct port from legacy pkg/context/document_store.go
func (*DocumentStore) GetSearchEngine ¶
func (s *DocumentStore) GetSearchEngine() *SearchEngine
GetSearchEngine returns the underlying search engine.
Direct port from legacy pkg/context/document_store.go
func (*DocumentStore) HealthCheck ¶
func (s *DocumentStore) HealthCheck(ctx context.Context) HealthCheck
DocumentStoreHealth checks the health of a DocumentStore.
func (*DocumentStore) Index ¶
func (s *DocumentStore) Index(ctx context.Context) error
Index indexes all documents from the source with concurrent processing.
Uses channel-based DiscoverDocuments from legacy architecture with worker pool for concurrent indexing (like legacy indexingSemaphore). Supports checkpoint/resume for interrupted indexing.
Direct port from legacy pkg/context/document_store_indexing.go
func (*DocumentStore) Metrics ¶
func (s *DocumentStore) Metrics() IndexMetricsSnapshot
Metrics returns detailed indexing metrics.
func (*DocumentStore) RefreshDocument ¶
func (s *DocumentStore) RefreshDocument(ctx context.Context, docID string) error
RefreshDocument re-indexes a single document by path.
Direct port from legacy pkg/context/document_store.go
func (*DocumentStore) RegisterExtractor ¶
func (s *DocumentStore) RegisterExtractor(e ContentExtractor)
RegisterExtractor adds a custom content extractor.
func (*DocumentStore) Search ¶
func (s *DocumentStore) Search(ctx context.Context, req SearchRequest) (*SearchResponse, error)
Search searches for documents.
func (*DocumentStore) SearchWithFilter ¶
func (s *DocumentStore) SearchWithFilter(ctx context.Context, query string, topK int, filter map[string]any) (*SearchResponse, error)
SearchWithFilter searches with metadata filtering.
func (*DocumentStore) StartWatching ¶
func (s *DocumentStore) StartWatching(ctx context.Context) error
StartWatching starts watching for document changes.
Direct port from legacy pkg/context/document_store.go
func (*DocumentStore) Stats ¶
func (s *DocumentStore) Stats() DocumentStoreStats
Stats returns indexing statistics.
func (*DocumentStore) StopWatching ¶
func (s *DocumentStore) StopWatching()
StopWatching stops watching for changes.
type DocumentStoreConfig ¶
type DocumentStoreConfig struct {
// Name identifies this store.
Name string
// Description describes the store (used by SearchTool).
Description string
// Source provides documents.
Source DataSource
// SearchEngine for indexing and search.
SearchEngine *SearchEngine
// Chunker for splitting documents (optional, defaults to engine's chunker).
Chunker Chunker
// Collection name (optional, defaults to store name).
Collection string
// SourcePath is the base path for checkpoints (auto-detected from directory source).
SourcePath string
// Watch enables file watching for automatic re-indexing.
Watch bool
// IncrementalIndexing only re-indexes changed documents.
IncrementalIndexing bool
// EnableCheckpoints enables resume capability for interrupted indexing.
// Checkpoints are saved to .hector/checkpoints/ in the source path.
// Default: true for directory sources
EnableCheckpoints bool
// EnableProgress enables progress display during indexing.
// Default: true
EnableProgress bool
// Search configuration for advanced features.
Search *SearchOptions
// MaxConcurrentIndexing limits parallel document processing (default: NumCPU).
// Set to 1 for sequential indexing (legacy behavior).
MaxConcurrentIndexing int
// RetryConfig for transient failure handling (optional).
RetryConfig *RetryConfig
}
DocumentStoreConfig configures a document store.
type DocumentStoreError ¶
type DocumentStoreError struct {
StoreName string // Name of the document store
Operation string // Operation that failed
Message string // Error message
FilePath string // File path if applicable
Err error // Underlying error
Timestamp time.Time // When the error occurred
}
DocumentStoreError represents an error in document store operations.
Inspired by legacy pkg/context/document_store.go error handling
func NewDocumentStoreError ¶
func NewDocumentStoreError(storeName, operation, message, filePath string, err error) *DocumentStoreError
NewDocumentStoreError creates a new DocumentStoreError.
func (*DocumentStoreError) Error ¶
func (e *DocumentStoreError) Error() string
Error implements the error interface.
func (*DocumentStoreError) Unwrap ¶
func (e *DocumentStoreError) Unwrap() error
Unwrap returns the underlying error.
type DocumentStoreStats ¶
type DocumentStoreStats struct {
Name string `json:"name"`
Collection string `json:"collection"`
IndexedCount int `json:"indexed_count"`
WatchEnabled bool `json:"watch_enabled"`
SourceType string `json:"source_type"`
TotalDocs int64 `json:"total_docs"`
SkippedDocs int64 `json:"skipped_docs"`
ErrorDocs int64 `json:"error_docs"`
DocsPerSecond float64 `json:"docs_per_second"`
SearchCount int64 `json:"search_count"`
}
DocumentStoreStats contains store statistics.
type ExtractedContent ¶
type ExtractedContent struct {
Content string // The extracted text content
Title string // Document title (if available)
Author string // Document author (if available)
Metadata map[string]string // Additional metadata
ProcessingTimeMs int64 // Time taken to extract
ExtractorName string // Name of extractor used
}
ExtractedContent represents extracted file content with metadata.
Direct port from legacy pkg/context/extraction/extractor.go
type ExtractionError ¶
type ExtractionError struct {
Extractor string // Extractor name
FilePath string // File path
Message string // Error message
Err error // Underlying error
}
ExtractionError represents an error during content extraction.
func NewExtractionError ¶
func NewExtractionError(extractor, filePath, message string, err error) *ExtractionError
NewExtractionError creates a new ExtractionError.
func (*ExtractionError) Error ¶
func (e *ExtractionError) Error() string
Error implements the error interface.
func (*ExtractionError) Unwrap ¶
func (e *ExtractionError) Unwrap() error
Unwrap returns the underlying error.
type ExtractorRegistry ¶
type ExtractorRegistry struct {
// contains filtered or unexported fields
}
ExtractorRegistry manages multiple content extractors.
Direct port from legacy pkg/context/extraction/extractor.go
func NewExtractorRegistry ¶
func NewExtractorRegistry() *ExtractorRegistry
NewExtractorRegistry creates a new extractor registry with default extractors. Registers:
- BinaryExtractor (priority 5): PDF, DOCX, XLSX via native parsers
- TextExtractor (priority 1): Plain text files
func (*ExtractorRegistry) Extract ¶
func (r *ExtractorRegistry) Extract(ctx context.Context, doc Document) (*ExtractedContent, error)
Extract tries to extract content using the best available extractor. Adapts the document-based interface for store.go compatibility.
func (*ExtractorRegistry) ExtractContent ¶
func (r *ExtractorRegistry) ExtractContent(ctx context.Context, path string, mimeType string, fileSize int64) (*ExtractedContent, error)
ExtractContent tries to extract content using the best available extractor.
func (*ExtractorRegistry) GetExtractors ¶
func (r *ExtractorRegistry) GetExtractors() []ContentExtractor
GetExtractors returns all registered extractors (for debugging).
func (*ExtractorRegistry) HasExtractorForFile ¶
func (r *ExtractorRegistry) HasExtractorForFile(path string, mimeType string) bool
HasExtractorForFile checks if any extractor can handle the given file. This is useful for determining if a file can be indexed before attempting extraction.
func (*ExtractorRegistry) Register ¶
func (r *ExtractorRegistry) Register(extractor ContentExtractor)
Register adds an extractor to the registry.
type FactoryDeps ¶
type FactoryDeps struct {
// DBPool provides database connections.
DBPool *config.DBPool
// DatabaseDSN for database connections.
DatabaseDSN string
// VectorProviders maps provider names to instances.
VectorProviders map[string]vector.Provider
// Embedders maps embedder names to instances.
Embedders map[string]embedder.Embedder
// LLMs maps LLM names to instances.
LLMs map[string]model.LLM
// ToolCaller provides access to MCP tools for document parsing.
// Optional - only needed if MCPParsers is configured.
ToolCaller ToolCaller
}
FactoryDeps provides dependencies for creating RAG components.
type FileCheckpoint ¶
type FileCheckpoint struct {
Path string `json:"path"`
Hash string `json:"hash"`
Size int64 `json:"size"`
ModTime time.Time `json:"mod_time"`
Status string `json:"status"` // "indexed", "skipped", "failed"
ProcessedAt time.Time `json:"processed_at"`
}
FileCheckpoint contains information about a processed file.
Direct port from legacy pkg/context/checkpoint.go
type FileFilter ¶
FileFilter determines if a file should be indexed.
Direct port from legacy pkg/context/indexing/data_source.go:FileFilter
type FileWatcher ¶
type FileWatcher struct {
// contains filtered or unexported fields
}
FileWatcher watches a directory for file changes using fsnotify.
Direct port from legacy pkg/context/document_store.go fsnotify watching
func NewFileWatcher ¶
func NewFileWatcher(cfg FileWatcherConfig) (*FileWatcher, error)
NewFileWatcher creates a new file watcher.
func (*FileWatcher) IsWatching ¶
func (fw *FileWatcher) IsWatching() bool
IsWatching returns whether the watcher is active.
func (*FileWatcher) Start ¶
func (fw *FileWatcher) Start(ctx context.Context) (<-chan DocumentEvent, error)
Start begins watching the directory for changes.
type FileWatcherConfig ¶
type FileWatcherConfig struct {
BasePath string
Filter FileFilter
DebounceDelay time.Duration // Delay before processing events (default: 100ms)
}
FileWatcherConfig configures the file watcher.
type FunctionInfo ¶
type FunctionInfo struct {
Name string `json:"name"`
Signature string `json:"signature,omitempty"`
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
Receiver string `json:"receiver,omitempty"` // For methods
IsExported bool `json:"is_exported,omitempty"`
DocComment string `json:"doc_comment,omitempty"`
}
FunctionInfo contains information about a function.
Direct port from legacy pkg/context/metadata/extractor.go
type GoMetadataExtractor ¶
type GoMetadataExtractor struct{}
GoMetadataExtractor extracts metadata from Go source files using AST parsing.
Direct port from legacy pkg/context/metadata/go_extractor.go
func NewGoMetadataExtractor ¶
func NewGoMetadataExtractor() *GoMetadataExtractor
NewGoMetadataExtractor creates a new Go metadata extractor.
func (*GoMetadataExtractor) CanExtract ¶
func (ge *GoMetadataExtractor) CanExtract(language string) bool
CanExtract checks if this extractor can handle the language.
func (*GoMetadataExtractor) Extract ¶
func (ge *GoMetadataExtractor) Extract(content string, filePath string) (*CodeMetadata, error)
Extract parses Go source code and extracts metadata.
func (*GoMetadataExtractor) Name ¶
func (ge *GoMetadataExtractor) Name() string
Name returns the extractor name.
type HealthCheck ¶
type HealthCheck struct {
// Component name.
Component string `json:"component"`
// Status of the component.
Status HealthStatus `json:"status"`
// Message provides details about the status.
Message string `json:"message,omitempty"`
// Latency of the health check.
Latency time.Duration `json:"latency_ms"`
// Timestamp of the check.
Timestamp time.Time `json:"timestamp"`
// Details contains component-specific health information.
Details map[string]any `json:"details,omitempty"`
}
HealthCheck represents the result of a health check.
func (HealthCheck) IsHealthy ¶
func (h HealthCheck) IsHealthy() bool
IsHealthy returns true if the status is healthy.
type HealthChecker ¶
type HealthChecker interface {
HealthCheck(ctx context.Context) HealthCheck
}
HealthChecker is an interface for components that support health checking.
type HealthStatus ¶
type HealthStatus string
HealthStatus represents the health state of a component.
const ( // HealthStatusHealthy indicates the component is functioning normally. HealthStatusHealthy HealthStatus = "healthy" // HealthStatusDegraded indicates the component is functioning but with issues. HealthStatusDegraded HealthStatus = "degraded" // HealthStatusUnhealthy indicates the component is not functioning. HealthStatusUnhealthy HealthStatus = "unhealthy" )
type HyDE ¶
type HyDE struct {
// contains filtered or unexported fields
}
HyDE implements Hypothetical Document Embeddings.
Instead of searching with the query embedding directly, HyDE:
- Uses an LLM to generate a hypothetical document that would answer the query
- Embeds the hypothetical document
- Uses that embedding for search
This can significantly improve retrieval for questions, as the hypothetical document's embedding is closer to actual relevant documents than the query embedding.
Paper: "Precise Zero-Shot Dense Retrieval without Relevance Labels" https://arxiv.org/abs/2212.10496
Derived from legacy pkg/context/hyde.go
func (*HyDE) EnhancedSearch ¶
func (h *HyDE) EnhancedSearch(ctx context.Context, query string) (hypotheticalDoc string, err error)
EnhancedSearch performs HyDE-enhanced search.
This is a convenience method that:
- Generates a hypothetical document
- Returns both the hypothetical doc and the original query
The caller should embed the hypothetical doc instead of the query.
type IndexCheckpoint ¶
type IndexCheckpoint struct {
Version string `json:"version"`
StoreName string `json:"store_name"`
SourcePath string `json:"source_path"`
StartTime time.Time `json:"start_time"`
LastUpdate time.Time `json:"last_update"`
ProcessedFiles map[string]FileCheckpoint `json:"processed_files"`
TotalFiles int `json:"total_files"`
IndexedCount int `json:"indexed_count"`
SkippedCount int `json:"skipped_count"`
FailedCount int `json:"failed_count"`
}
IndexCheckpoint represents a saved indexing checkpoint.
Direct port from legacy pkg/context/checkpoint.go
type IndexCheckpointManager ¶
type IndexCheckpointManager struct {
// contains filtered or unexported fields
}
IndexCheckpointManager manages indexing checkpoints.
Direct port from legacy pkg/context/checkpoint.go
func NewIndexCheckpointManager ¶
func NewIndexCheckpointManager(storeName, sourcePath string, enabled bool) *IndexCheckpointManager
NewIndexCheckpointManager creates a new checkpoint manager.
Direct port from legacy pkg/context/checkpoint.go
func (*IndexCheckpointManager) ClearCheckpoint ¶
func (cm *IndexCheckpointManager) ClearCheckpoint() error
ClearCheckpoint removes the checkpoint file.
func (*IndexCheckpointManager) ForceSave ¶
func (cm *IndexCheckpointManager) ForceSave() error
ForceSave forces a checkpoint save regardless of the save interval.
func (*IndexCheckpointManager) FormatCheckpointInfo ¶
func (cm *IndexCheckpointManager) FormatCheckpointInfo(checkpoint *IndexCheckpoint) string
FormatCheckpointInfo returns a human-readable checkpoint summary.
func (*IndexCheckpointManager) GetProcessedCount ¶
func (cm *IndexCheckpointManager) GetProcessedCount() int
GetProcessedCount returns the number of processed files.
func (*IndexCheckpointManager) IsEnabled ¶
func (cm *IndexCheckpointManager) IsEnabled() bool
IsEnabled returns whether checkpointing is enabled.
func (*IndexCheckpointManager) LoadCheckpoint ¶
func (cm *IndexCheckpointManager) LoadCheckpoint() (*IndexCheckpoint, error)
LoadCheckpoint attempts to load an existing checkpoint.
func (*IndexCheckpointManager) RecordFile ¶
func (cm *IndexCheckpointManager) RecordFile(path string, size int64, modTime time.Time, status string)
RecordFile records a processed file in the checkpoint.
func (*IndexCheckpointManager) SaveCheckpoint ¶
func (cm *IndexCheckpointManager) SaveCheckpoint() error
SaveCheckpoint saves the current checkpoint.
func (*IndexCheckpointManager) SetTotalFiles ¶
func (cm *IndexCheckpointManager) SetTotalFiles(total int)
SetTotalFiles sets the total file count.
func (*IndexCheckpointManager) ShouldProcessFile ¶
func (cm *IndexCheckpointManager) ShouldProcessFile(path string, size int64, modTime time.Time) bool
ShouldProcessFile checks if a file should be processed (not in checkpoint or changed).
type IndexError ¶
type IndexError struct {
StoreName string // Document store name
DocumentID string // Document ID
Operation string // Operation (e.g., "embed", "upsert", "delete")
Message string // Error message
Err error // Underlying error
}
IndexError represents an error during indexing operations.
func NewIndexError ¶
func NewIndexError(storeName, documentID, operation, message string, err error) *IndexError
NewIndexError creates a new IndexError.
func (*IndexError) Error ¶
func (e *IndexError) Error() string
Error implements the error interface.
func (*IndexError) Unwrap ¶
func (e *IndexError) Unwrap() error
Unwrap returns the underlying error.
type IndexMetrics ¶
type IndexMetrics struct {
// contains filtered or unexported fields
}
IndexMetrics tracks document store indexing metrics.
Thread-safe for concurrent access during indexing.
func NewIndexMetrics ¶
func NewIndexMetrics(storeName string) *IndexMetrics
NewIndexMetrics creates a new metrics tracker.
func (*IndexMetrics) IncrementErrors ¶
func (m *IndexMetrics) IncrementErrors()
IncrementErrors increments error count.
func (*IndexMetrics) IncrementIndexed ¶
func (m *IndexMetrics) IncrementIndexed()
IncrementIndexed increments indexed document count.
func (*IndexMetrics) IncrementSkipped ¶
func (m *IndexMetrics) IncrementSkipped()
IncrementSkipped increments skipped document count.
func (*IndexMetrics) IncrementTotal ¶
func (m *IndexMetrics) IncrementTotal()
IncrementTotal increments total document count.
func (*IndexMetrics) RecordSearch ¶
func (m *IndexMetrics) RecordSearch(latency time.Duration)
RecordSearch records a search operation with latency.
func (*IndexMetrics) SetEndTime ¶
func (m *IndexMetrics) SetEndTime(t time.Time)
SetEndTime sets the indexing end time.
func (*IndexMetrics) SetStartTime ¶
func (m *IndexMetrics) SetStartTime(t time.Time)
SetStartTime sets the indexing start time.
func (*IndexMetrics) Snapshot ¶
func (m *IndexMetrics) Snapshot() IndexMetricsSnapshot
Snapshot returns a point-in-time copy of all metrics.
type IndexMetricsSnapshot ¶
type IndexMetricsSnapshot struct {
StoreName string `json:"store_name"`
TotalDocs int64 `json:"total_docs"`
IndexedDocs int64 `json:"indexed_docs"`
SkippedDocs int64 `json:"skipped_docs"`
ErrorDocs int64 `json:"error_docs"`
DocsPerSecond float64 `json:"docs_per_second"`
StartTime time.Time `json:"start_time,omitempty"`
EndTime time.Time `json:"end_time,omitempty"`
SearchCount int64 `json:"search_count"`
AvgSearchLatency time.Duration `json:"avg_search_latency_ns"`
MaxSearchLatency time.Duration `json:"max_search_latency_ns"`
LastSearchLatency time.Duration `json:"last_search_latency_ns"`
}
IndexMetricsSnapshot is a point-in-time copy of metrics.
type LLMQueryExpander ¶
type LLMQueryExpander struct {
// contains filtered or unexported fields
}
LLMQueryExpander uses an LLM to generate query variations.
Direct port from legacy pkg/context/query_expansion.go
func NewLLMQueryExpander ¶
func NewLLMQueryExpander(llm model.LLM) *LLMQueryExpander
NewLLMQueryExpander creates a new LLM-based query expander.
type MCPExtractor ¶
type MCPExtractor struct {
// contains filtered or unexported fields
}
MCPExtractor handles document parsing via MCP tools. This allows using any MCP service (Docling, etc.) for document parsing.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
func NewMCPExtractor ¶
func NewMCPExtractor(config MCPExtractorConfig) (*MCPExtractor, error)
NewMCPExtractor creates a new MCP-based extractor.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
func (*MCPExtractor) CanExtract ¶
func (e *MCPExtractor) CanExtract(path string, mimeType string) bool
CanExtract checks if this extractor can handle the file.
func (*MCPExtractor) Extract ¶
func (e *MCPExtractor) Extract(ctx context.Context, path string, fileSize int64) (*ExtractedContent, error)
Extract uses MCP tools to extract content from files.
func (*MCPExtractor) Priority ¶
func (e *MCPExtractor) Priority() int
Priority returns the extractor priority.
type MCPExtractorConfig ¶
type MCPExtractorConfig struct {
ToolCaller ToolCaller
ParserToolNames []string // Tool names to try (e.g., ["parse_document", "docling_parse"])
SupportedExts []string // File extensions this extractor handles (empty = all)
Priority int // Priority (higher = preferred)
LocalBasePath string // Local base path of the document store (e.g., "/Users/user/workspace/hector/test-docs")
PathPrefix string // Remote path prefix for containerized MCP services (e.g., "/docs")
}
MCPExtractorConfig configures an MCP extractor.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type MetadataExtractor ¶
type MetadataExtractor interface {
// Name returns the extractor name
Name() string
// CanExtract determines if this extractor can handle the given language
CanExtract(language string) bool
// Extract extracts metadata from source code
Extract(content string, filePath string) (*CodeMetadata, error)
}
MetadataExtractor defines the interface for extracting metadata from source code.
Direct port from legacy pkg/context/metadata/extractor.go
type MetadataExtractorRegistry ¶
type MetadataExtractorRegistry struct {
// contains filtered or unexported fields
}
MetadataExtractorRegistry manages metadata extractors.
Direct port from legacy pkg/context/metadata/extractor.go
func NewMetadataExtractorRegistry ¶
func NewMetadataExtractorRegistry() *MetadataExtractorRegistry
NewMetadataExtractorRegistry creates a new metadata extractor registry.
func (*MetadataExtractorRegistry) ExtractMetadata ¶
func (r *MetadataExtractorRegistry) ExtractMetadata(language string, content string, filePath string) (*CodeMetadata, error)
ExtractMetadata tries to extract metadata using the appropriate extractor.
func (*MetadataExtractorRegistry) GetExtractors ¶
func (r *MetadataExtractorRegistry) GetExtractors() []MetadataExtractor
GetExtractors returns all registered extractors.
func (*MetadataExtractorRegistry) Register ¶
func (r *MetadataExtractorRegistry) Register(extractor MetadataExtractor)
Register adds a metadata extractor for specific languages.
type MultiQueryExpander ¶
type MultiQueryExpander struct {
// contains filtered or unexported fields
}
MultiQueryExpander generates multiple query variants for better recall.
Multi-query retrieval improves recall by:
- Generating alternative phrasings of the query
- Searching with each variant
- Combining and deduplicating results
This helps when:
- Queries are ambiguous
- Relevant documents use different terminology
- Users don't know exact terms used in documents
Derived from legacy pkg/context/multi_query.go
func NewMultiQueryExpander ¶
func NewMultiQueryExpander(llm model.LLM, numQueries int) *MultiQueryExpander
NewMultiQueryExpander creates a new multi-query expander.
func (*MultiQueryExpander) ExpandQuery ¶
ExpandQuery generates multiple query variants.
type NativeParseResult ¶
type NativeParseResult struct {
Success bool
Content string
Title string
Author string
Metadata map[string]string
Error string
ProcessingTimeMs int64
}
NativeParseResult represents the result from a native parser.
Direct port from legacy pkg/context/extraction/binary_extractor.go
type NativeParser ¶
type NativeParser interface {
ParseDocument(ctx context.Context, filePath string, fileSize int64) (*NativeParseResult, error)
}
NativeParser interface for parsing binary documents.
Direct port from legacy pkg/context/extraction/binary_extractor.go
type NativeParserRegistry ¶
type NativeParserRegistry struct {
// contains filtered or unexported fields
}
NativeParserRegistry manages native document parsers for PDF, DOCX, XLSX.
Ported from legacy pkg/context/native_parsers.go
func NewNativeParserRegistry ¶
func NewNativeParserRegistry() *NativeParserRegistry
NewNativeParserRegistry creates a new native parser registry with built-in parsers.
func (*NativeParserRegistry) GetSupportedExtensions ¶
func (r *NativeParserRegistry) GetSupportedExtensions() []string
GetSupportedExtensions returns all supported file extensions.
func (*NativeParserRegistry) ParseDocument ¶
func (r *NativeParserRegistry) ParseDocument(ctx context.Context, filePath string, fileSize int64) (*NativeParseResult, error)
ParseDocument finds the appropriate parser and extracts content. Implements NativeParser interface.
type NilChunker ¶
type NilChunker struct{}
NilChunker returns the entire content as a single chunk.
func (NilChunker) Chunk ¶
func (NilChunker) Chunk(content string, ctx *ChunkContext) ([]Chunk, error)
func (NilChunker) Config ¶
func (NilChunker) Config() ChunkerConfig
func (NilChunker) Strategy ¶
func (NilChunker) Strategy() ChunkerStrategy
type NilDataSource ¶
type NilDataSource struct{}
NilDataSource is a no-op data source that returns no documents.
func (NilDataSource) Close ¶
func (NilDataSource) Close() error
func (NilDataSource) DiscoverDocuments ¶
func (NilDataSource) DiscoverDocuments(ctx context.Context) (<-chan Document, <-chan error)
func (NilDataSource) GetLastModified ¶
func (NilDataSource) ReadDocument ¶
func (NilDataSource) SupportsIncrementalIndexing ¶
func (NilDataSource) SupportsIncrementalIndexing() bool
func (NilDataSource) Type ¶
func (NilDataSource) Type() string
type NilMultiQueryExpander ¶
type NilMultiQueryExpander struct{}
NilMultiQueryExpander returns the original query unchanged.
func (NilMultiQueryExpander) ExpandQuery ¶
type NilQueryExpander ¶
type NilQueryExpander struct{}
NilQueryExpander returns the original query unchanged.
type NilReranker ¶
type NilReranker struct{}
NilReranker returns results unchanged.
func (NilReranker) Rerank ¶
func (NilReranker) Rerank(ctx context.Context, query string, results []SearchResult) (*RerankResult, error)
type OverlappingChunker ¶
type OverlappingChunker struct {
// contains filtered or unexported fields
}
OverlappingChunker implements chunking with configurable overlap.
This is a direct port of legacy pkg/context/chunking/overlapping_chunker.go. Overlap helps preserve context at chunk boundaries, improving retrieval quality when relevant information spans two chunks.
Use when:
- Retrieval quality is important
- Content has flowing prose
- You can afford slightly more storage
func NewOverlappingChunker ¶
func NewOverlappingChunker(cfg ChunkerConfig) *OverlappingChunker
NewOverlappingChunker creates a new overlapping chunker.
func (*OverlappingChunker) Chunk ¶
func (c *OverlappingChunker) Chunk(content string, ctx *ChunkContext) ([]Chunk, error)
Chunk splits content into overlapping chunks. Direct port from legacy pkg/context/chunking/overlapping_chunker.go
func (*OverlappingChunker) Config ¶
func (c *OverlappingChunker) Config() ChunkerConfig
func (*OverlappingChunker) Strategy ¶
func (c *OverlappingChunker) Strategy() ChunkerStrategy
type PaginationConfig ¶
type PaginationConfig struct {
Type string `yaml:"type"` // "offset", "cursor", "page", "link"
PageParam string `yaml:"page_param"` // Query parameter name for page/offset
SizeParam string `yaml:"size_param"` // Query parameter name for page size
MaxPages int `yaml:"max_pages"` // Maximum pages to fetch (0 = unlimited)
PageSize int `yaml:"page_size"` // Items per page
NextField string `yaml:"next_field"` // JSON field containing next page URL/cursor
DataField string `yaml:"data_field"` // JSON field containing array of items (if nested)
}
PaginationConfig defines how to handle paginated API responses.
Direct port from legacy pkg/context/indexing/api_source.go
type PatternCache ¶
type PatternCache struct {
// contains filtered or unexported fields
}
PatternCache provides fast pattern matching.
Direct port from legacy pkg/context/indexing/pattern_filter.go
type PatternFilter ¶
type PatternFilter struct {
// contains filtered or unexported fields
}
PatternFilter implements FileFilter using include/exclude patterns.
Direct port from legacy pkg/context/indexing/pattern_filter.go
func NewPatternFilter ¶
func NewPatternFilter(sourcePath string, includePatterns, excludePatterns []string) (*PatternFilter, error)
NewPatternFilter creates a new pattern-based filter with validation.
Direct port from legacy pkg/context/indexing/pattern_filter.go
func (*PatternFilter) ShouldExclude ¶
func (pf *PatternFilter) ShouldExclude(path string) bool
ShouldExclude checks if a file matches exclude patterns.
func (*PatternFilter) ShouldInclude ¶
func (pf *PatternFilter) ShouldInclude(path string) bool
ShouldInclude checks if a file matches include patterns.
type ProgressStats ¶
type ProgressStats struct {
TotalFiles int64
ProcessedFiles int64
IndexedFiles int64
SkippedFiles int64
FailedFiles int64
DeletedFiles int64
CurrentFile string
ElapsedTime time.Duration
}
ProgressStats contains progress statistics.
type ProgressTracker ¶
type ProgressTracker struct {
// contains filtered or unexported fields
}
ProgressTracker tracks indexing progress with real-time statistics.
Direct port from legacy pkg/context/progress_tracker.go
func NewProgressTracker ¶
func NewProgressTracker(enabled bool, verbose bool) *ProgressTracker
NewProgressTracker creates a new progress tracker.
func (*ProgressTracker) GetExtractorStats ¶
func (pt *ProgressTracker) GetExtractorStats() map[string]int64
GetExtractorStats returns extractor usage statistics.
func (*ProgressTracker) GetStats ¶
func (pt *ProgressTracker) GetStats() ProgressStats
GetStats returns current statistics.
func (*ProgressTracker) IncrementDeleted ¶
func (pt *ProgressTracker) IncrementDeleted()
IncrementDeleted increments the deleted files counter.
func (*ProgressTracker) IncrementFailed ¶
func (pt *ProgressTracker) IncrementFailed()
IncrementFailed increments the failed files counter.
func (*ProgressTracker) IncrementIndexed ¶
func (pt *ProgressTracker) IncrementIndexed()
IncrementIndexed increments the indexed files counter.
func (*ProgressTracker) IncrementProcessed ¶
func (pt *ProgressTracker) IncrementProcessed()
IncrementProcessed increments the processed files counter.
func (*ProgressTracker) IncrementSkipped ¶
func (pt *ProgressTracker) IncrementSkipped()
IncrementSkipped increments the skipped files counter.
func (*ProgressTracker) RecordExtractorUsage ¶
func (pt *ProgressTracker) RecordExtractorUsage(extractorName string)
RecordExtractorUsage records which extractor was used for a document.
func (*ProgressTracker) SetCurrentFile ¶
func (pt *ProgressTracker) SetCurrentFile(filename string)
SetCurrentFile sets the currently processing file.
func (*ProgressTracker) SetTotalFiles ¶
func (pt *ProgressTracker) SetTotalFiles(total int64)
SetTotalFiles sets the total number of files to process.
func (*ProgressTracker) Start ¶
func (pt *ProgressTracker) Start()
Start begins the progress display loop.
type QueryExpander ¶
type QueryExpander interface {
// Expand generates multiple query variations from the original query.
Expand(ctx context.Context, query string, numVariations int) ([]string, error)
}
QueryExpander expands a single query into multiple query variations.
Direct port from legacy pkg/context/query_expansion.go
type RankingDecision ¶
type RankingDecision struct {
// Index is the original result index.
Index int `json:"index"`
// Relevance is the LLM-assigned relevance score (1-10).
Relevance int `json:"relevance"`
// Reason explains why this ranking was assigned.
Reason string `json:"reason,omitempty"`
}
RankingDecision represents the LLM's ranking for a single result.
type RerankResult ¶
type RerankResult struct {
// Results are the reranked search results.
Results []SearchResult
// Rankings contains the LLM's ranking decisions.
Rankings []RankingDecision
}
RerankConfig configures the reranker.
type Reranker ¶
type Reranker struct {
// contains filtered or unexported fields
}
Reranker re-ranks search results using an LLM.
Reranking improves search quality by:
- Using deeper semantic understanding than vector similarity
- Evaluating actual relevance to the query
- Considering context that embeddings might miss
Trade-offs:
- Adds latency (100-500ms per search)
- Incurs LLM API costs
- Only practical for small result sets (10-20 items)
Derived from legacy pkg/context/reranking/reranker.go
func NewReranker ¶
NewReranker creates a new reranker.
func (*Reranker) Rerank ¶
func (r *Reranker) Rerank(ctx context.Context, query string, results []SearchResult) (*RerankResult, error)
Rerank re-orders results based on LLM assessment.
The process:
- Format results and query for the LLM
- Ask LLM to rank results by relevance
- Parse LLM response and reorder results
- Assign new scores based on ranking position
After reranking:
- Scores are position-based (1st=1.0, 2nd=0.95, etc.)
- Original vector similarity scores are replaced
type RetryConfig ¶
type RetryConfig struct {
// MaxRetries is the maximum number of retry attempts (default: 3).
MaxRetries int
// BaseDelay is the initial delay between retries (default: 1s).
BaseDelay time.Duration
// MaxDelay is the maximum delay between retries (default: 30s).
MaxDelay time.Duration
// JitterFactor adds randomness to delays (0.0-1.0, default: 0.1).
JitterFactor float64
// RetryableErrors are error substrings that indicate retryable failures.
RetryableErrors []string
}
RetryConfig configures retry behavior.
Reuses patterns from httpclient for consistency.
func DefaultRetryConfig ¶
func DefaultRetryConfig() RetryConfig
DefaultRetryConfig returns sensible defaults for RAG operations.
type RetryError ¶
RetryError represents an error after retry attempts.
func (*RetryError) Error ¶
func (e *RetryError) Error() string
func (*RetryError) Unwrap ¶
func (e *RetryError) Unwrap() error
type Retryer ¶
type Retryer struct {
// contains filtered or unexported fields
}
Retryer handles retry logic with exponential backoff.
Based on httpclient patterns but generalized for any operation.
func NewRetryer ¶
func NewRetryer(cfg RetryConfig) *Retryer
NewRetryer creates a new retryer with the given config.
type SQLSource ¶
type SQLSource struct {
// contains filtered or unexported fields
}
SQLSource implements DataSource for SQL databases using database/sql.
Direct port from legacy pkg/context/indexing/sql_source.go
func NewSQLSource ¶
func NewSQLSource(opts SQLSourceOptions) (*SQLSource, error)
NewSQLSource creates a new SQL data source.
Direct port from legacy pkg/context/indexing/sql_source.go
func (*SQLSource) Close ¶
Close closes the underlying database connection. Note: In most cases, the connection is managed externally (e.g., by DBPool), so this is a no-op. The caller should manage the lifecycle.
func (*SQLSource) DiscoverDocuments ¶
DiscoverDocuments returns channels of discovered documents and errors.
Direct port from legacy pkg/context/indexing/sql_source.go
func (*SQLSource) GetLastModified ¶
GetLastModified returns the last modification time for a document.
func (*SQLSource) ReadDocument ¶
ReadDocument retrieves a specific document by its ID.
Direct port from legacy pkg/context/indexing/sql_source.go
func (*SQLSource) SupportsIncrementalIndexing ¶
SupportsIncrementalIndexing returns true if UpdatedColumn is configured.
type SQLSourceOptions ¶
type SQLSourceOptions struct {
DB *sql.DB
Driver string
Tables []SQLTableConfig
MaxRows int
}
SQLSourceOptions configures the SQL source.
type SQLTableConfig ¶
type SQLTableConfig struct {
Table string `yaml:"table"`
Columns []string `yaml:"columns"` // Columns to concatenate for content
IDColumn string `yaml:"id_column"` // Primary key or unique identifier
UpdatedColumn string `yaml:"updated_column"` // Column for tracking updates (e.g., updated_at)
WhereClause string `yaml:"where_clause"` // Optional WHERE clause for filtering
MetadataColumns []string `yaml:"metadata_columns"` // Columns to include as metadata
}
SQLTableConfig defines which tables and columns to index.
Direct port from legacy pkg/context/indexing/sql_source.go
type SearchEngine ¶
type SearchEngine struct {
// contains filtered or unexported fields
}
SearchEngine provides document indexing and semantic search.
It combines:
- Document ingestion with chunking
- Vector similarity search
- Optional hybrid search (vector + keyword)
- Optional query enhancement (HyDE, multi-query)
- Optional reranking
Derived from legacy pkg/context/search.go:SearchEngine
func NewSearchEngine ¶
func NewSearchEngine(cfg SearchEngineConfig) (*SearchEngine, error)
NewSearchEngine creates a new search engine.
func NewSearchEngineFromConfig ¶
func NewSearchEngineFromConfig( storeCfg *config.DocumentStoreConfig, deps *FactoryDeps, collectionName string, ) (*SearchEngine, error)
NewSearchEngineFromConfig creates a search engine from configuration. collectionName is used as the default if storeCfg.Collection is empty.
func (*SearchEngine) Clear ¶
func (e *SearchEngine) Clear(ctx context.Context) error
Clear removes all documents from the index.
func (*SearchEngine) Collection ¶
func (e *SearchEngine) Collection() string
Collection returns the collection name.
func (*SearchEngine) DeleteByFilter ¶
DeleteByFilter removes documents matching the filter.
Direct port from legacy pkg/context/search.go
func (*SearchEngine) DeleteDocument ¶
func (e *SearchEngine) DeleteDocument(ctx context.Context, documentID string) error
DeleteDocument removes a document and all its chunks from the index.
func (*SearchEngine) HealthCheck ¶
func (e *SearchEngine) HealthCheck(ctx context.Context) HealthCheck
SearchEngineHealth checks the health of a SearchEngine.
func (*SearchEngine) IngestDocument ¶
func (e *SearchEngine) IngestDocument(ctx context.Context, doc Document) error
IngestDocument indexes a document for search.
The document is:
- Split into chunks using the configured chunker
- Each chunk is embedded
- Chunks are stored in the vector database
Document ID should be stable across re-indexing to enable updates.
func (*SearchEngine) IngestDocuments ¶
func (e *SearchEngine) IngestDocuments(ctx context.Context, docs []Document) error
IngestDocuments indexes multiple documents concurrently.
func (*SearchEngine) Search ¶
func (e *SearchEngine) Search(ctx context.Context, req SearchRequest) (*SearchResponse, error)
Search finds documents matching the query.
func (*SearchEngine) Status ¶
func (e *SearchEngine) Status() map[string]any
Status returns the current status of the search engine.
Direct port from legacy pkg/context/search.go (GetStatus)
type SearchEngineConfig ¶
type SearchEngineConfig struct {
// Provider for vector storage and search (required).
Provider vector.Provider
// Embedder for generating embeddings (required).
Embedder embedder.Embedder
// Chunker for splitting documents (optional, defaults to simple).
Chunker Chunker
// Collection name for storing documents (optional, defaults to "rag_documents").
Collection string
// DefaultTopK is the default number of results (default: 10).
DefaultTopK int
// DefaultThreshold filters results below this score (default: 0.0).
DefaultThreshold float32
// HyDE for hypothetical document embedding (optional).
HyDE *HyDE
// Reranker for LLM-based result reranking (optional).
Reranker *Reranker
// MultiQuery for query expansion (optional).
MultiQuery *MultiQueryExpander
}
SearchEngineConfig configures the search engine.
type SearchError ¶
type SearchError struct {
Component string // Component that failed (e.g., "embedder", "vector_db", "reranker")
Operation string // Operation that failed
Message string // Error message
Query string // Query that caused the error
Err error // Underlying error
}
SearchError represents an error during search operations.
Inspired by legacy pkg/context error handling
func NewSearchError ¶
func NewSearchError(component, operation, message, query string, err error) *SearchError
NewSearchError creates a new SearchError.
func (*SearchError) Error ¶
func (e *SearchError) Error() string
Error implements the error interface.
func (*SearchError) Unwrap ¶
func (e *SearchError) Unwrap() error
Unwrap returns the underlying error.
type SearchMetrics ¶
type SearchMetrics struct {
// contains filtered or unexported fields
}
SearchMetrics tracks search engine metrics.
func NewSearchMetrics ¶
func NewSearchMetrics(engineName string) *SearchMetrics
NewSearchMetrics creates a new search metrics tracker.
func (*SearchMetrics) RecordSearch ¶
func (m *SearchMetrics) RecordSearch(latency time.Duration, resultCount int, opts *SearchOptions)
RecordSearch records a search operation.
func (*SearchMetrics) Snapshot ¶
func (m *SearchMetrics) Snapshot() SearchMetricsSnapshot
Snapshot returns a point-in-time copy of search metrics.
type SearchMetricsSnapshot ¶
type SearchMetricsSnapshot struct {
EngineName string `json:"engine_name"`
TotalSearches int64 `json:"total_searches"`
SuccessfulHits int64 `json:"successful_hits"`
EmptyResults int64 `json:"empty_results"`
AvgLatency time.Duration `json:"avg_latency_ns"`
MaxLatency time.Duration `json:"max_latency_ns"`
MinLatency time.Duration `json:"min_latency_ns"`
HyDEUsage int64 `json:"hyde_usage"`
RerankUsage int64 `json:"rerank_usage"`
MultiQueryUsage int64 `json:"multi_query_usage"`
}
SearchMetricsSnapshot is a point-in-time copy of search metrics.
type SearchOptions ¶
type SearchOptions struct {
// Mode specifies the search mode: "vector", "keyword", "hybrid".
Mode string `json:"mode,omitempty"`
// EnableHyDE enables Hypothetical Document Embeddings.
EnableHyDE bool `json:"enable_hyde,omitempty"`
// EnableRerank enables LLM-based reranking.
EnableRerank bool `json:"enable_rerank,omitempty"`
// EnableMultiQuery enables query expansion.
EnableMultiQuery bool `json:"enable_multi_query,omitempty"`
// NumQueries is the number of query variants for multi-query.
NumQueries int `json:"num_queries,omitempty"`
}
SearchOptions configures search behavior.
type SearchRequest ¶
type SearchRequest struct {
// Query is the search query text.
Query string `json:"query"`
// Collection scopes the search to a specific collection.
Collection string `json:"collection,omitempty"`
// TopK is the maximum number of results to return.
TopK int `json:"top_k,omitempty"`
// Threshold filters results below this score.
Threshold float32 `json:"threshold,omitempty"`
// Filter applies metadata filtering.
Filter map[string]any `json:"filter,omitempty"`
// Options contains search-specific options.
Options *SearchOptions `json:"options,omitempty"`
}
SearchRequest represents a search query.
func (*SearchRequest) SetDefaults ¶
func (r *SearchRequest) SetDefaults()
SetDefaults applies default values to SearchRequest.
type SearchResponse ¶
type SearchResponse struct {
// Results contains the matched documents/chunks.
Results []SearchResult `json:"results"`
// TotalMatches is the total number of matches (before limit).
TotalMatches int `json:"total_matches,omitempty"`
// SearchTimeMs is the search duration in milliseconds.
SearchTimeMs int64 `json:"search_time_ms,omitempty"`
// QueryExpansions contains expanded queries (if multi-query enabled).
QueryExpansions []string `json:"query_expansions,omitempty"`
}
SearchResponse contains search results.
type SearchResult ¶
type SearchResult struct {
// ID is the chunk/document identifier.
ID string `json:"id"`
// Content is the matched content.
Content string `json:"content"`
// Score represents relevance (higher is better).
Score float32 `json:"score"`
// DocumentID is the parent document identifier.
DocumentID string `json:"document_id,omitempty"`
// ChunkIndex is the chunk position within the document.
ChunkIndex int `json:"chunk_index,omitempty"`
// Metadata contains additional result information.
Metadata map[string]any `json:"metadata,omitempty"`
// Highlights contains matched text spans (optional).
Highlights []string `json:"highlights,omitempty"`
}
SearchResult represents a single search result.
Results are ordered by Score (highest first). The Score semantics depend on whether reranking was applied:
- Without reranking: vector similarity (0.0 to 1.0)
- With reranking: LLM-determined position score
func CombineResults ¶
func CombineResults(resultSets [][]SearchResult) []SearchResult
CombineResults merges results from multiple queries.
Deduplicates by document ID and keeps the highest score for each.
type SemanticChunker ¶
type SemanticChunker struct {
// contains filtered or unexported fields
}
SemanticChunker implements AST-aware chunking that respects code structure.
This is a direct port of legacy pkg/context/chunking/semantic_chunker.go. It attempts to keep functions and types together when possible, using metadata to identify semantic boundaries.
Use when:
- Chunking code files
- Retrieval quality is paramount
- Variable chunk sizes are acceptable
func NewSemanticChunker ¶
func NewSemanticChunker(cfg ChunkerConfig) *SemanticChunker
NewSemanticChunker creates a new semantic chunker.
func (*SemanticChunker) Chunk ¶
func (c *SemanticChunker) Chunk(content string, ctx *ChunkContext) ([]Chunk, error)
Chunk splits content into semantically meaningful chunks. It uses metadata to identify function and type boundaries. Direct port from legacy pkg/context/chunking/semantic_chunker.go
func (*SemanticChunker) Config ¶
func (c *SemanticChunker) Config() ChunkerConfig
func (*SemanticChunker) Strategy ¶
func (c *SemanticChunker) Strategy() ChunkerStrategy
type SimpleChunker ¶
type SimpleChunker struct {
// contains filtered or unexported fields
}
SimpleChunker implements basic line-based chunking.
This is a direct port of legacy pkg/context/chunking/simple_chunker.go. It splits content by lines first, then groups lines into chunks of the configured size. This ensures chunks never split mid-line.
Use when:
- Speed is critical
- Content has uniform structure
- Line boundaries should be preserved
func NewSimpleChunker ¶
func NewSimpleChunker(cfg ChunkerConfig) *SimpleChunker
NewSimpleChunker creates a new simple chunker.
func (*SimpleChunker) Chunk ¶
func (c *SimpleChunker) Chunk(content string, ctx *ChunkContext) ([]Chunk, error)
Chunk splits content into chunks based on line count. Direct port from legacy pkg/context/chunking/simple_chunker.go
func (*SimpleChunker) Config ¶
func (c *SimpleChunker) Config() ChunkerConfig
func (*SimpleChunker) Strategy ¶
func (c *SimpleChunker) Strategy() ChunkerStrategy
type SourceDocument ¶
type SourceDocument struct {
// ID is a unique identifier for the document (format depends on source type)
ID string
// Content is the text content to be indexed.
// For file sources, this should be populated by reading the file.
// For SQL/API sources, this is populated during discovery.
Content string
// Metadata contains source-specific metadata (file path, table name, API endpoint, etc.)
Metadata map[string]interface{}
// LastModified is the last modification time, if available
LastModified time.Time
// Size is the size of the document in bytes (approximate for non-file sources)
Size int64
// ShouldIndex indicates whether this document should be indexed (after filtering)
ShouldIndex bool
// SourcePath is the original source path (file path, table name, API endpoint, etc.)
// This is used for relative path calculations and display purposes
SourcePath string
}
SourceDocument represents a document from any source (file, SQL row, API response, etc.)
Direct port from legacy pkg/context/indexing/data_source.go:Document Renamed to SourceDocument to avoid conflict with the rag.Document type
type TextExtractor ¶
type TextExtractor struct{}
TextExtractor handles plain text files.
Direct port from legacy pkg/context/extraction/text_extractor.go
func NewTextExtractor ¶
func NewTextExtractor() *TextExtractor
NewTextExtractor creates a new text extractor.
func (*TextExtractor) CanExtract ¶
func (te *TextExtractor) CanExtract(path string, mimeType string) bool
CanExtract checks if this is a text file.
func (*TextExtractor) Extract ¶
func (te *TextExtractor) Extract(ctx context.Context, path string, fileSize int64) (*ExtractedContent, error)
Extract reads and cleans text content.
func (*TextExtractor) Name ¶
func (te *TextExtractor) Name() string
Name returns the extractor name.
func (*TextExtractor) Priority ¶
func (te *TextExtractor) Priority() int
Priority returns lower priority (1) so specific extractors can override.
type Tool ¶
type Tool interface {
GetInfo() ToolInfo
Execute(ctx context.Context, args map[string]interface{}) (ToolResult, error)
}
Tool is a minimal interface for executing tools.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type ToolCaller ¶
ToolCaller is a minimal interface for calling tools without creating import cycles. This allows MCP extractors to work with any tool registry implementation.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type ToolInfo ¶
type ToolInfo struct {
Name string
Description string
Parameters []ToolParameter
}
ToolInfo contains information about a tool.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type ToolParameter ¶
ToolParameter describes a tool parameter.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type ToolResult ¶
ToolResult contains the result of tool execution.
Direct port from legacy pkg/context/extraction/mcp_extractor.go
type TypeInfo ¶
type TypeInfo struct {
Name string `json:"name"`
Kind string `json:"kind"` // "struct", "interface", "alias", etc.
StartLine int `json:"start_line"`
EndLine int `json:"end_line"`
Fields []string `json:"fields,omitempty"`
Methods []string `json:"methods,omitempty"`
IsExported bool `json:"is_exported,omitempty"`
DocComment string `json:"doc_comment,omitempty"`
}
TypeInfo contains information about a type (struct, interface, etc.).
Direct port from legacy pkg/context/metadata/extractor.go
Source Files
¶
- api_source.go
- binary_extractor.go
- blob_source.go
- checkpoint.go
- chunk.go
- chunker.go
- chunker_simple.go
- collection_source.go
- data_source.go
- directory_source.go
- errors.go
- extractor.go
- factory.go
- health.go
- hyde.go
- mcp_extractor.go
- metadata.go
- metadata_go.go
- metrics.go
- multiquery.go
- native_parsers.go
- pattern_filter.go
- progress_tracker.go
- query_expansion.go
- reranker.go
- retry.go
- sanitize.go
- search.go
- sql_source.go
- store.go
- util.go
- watcher.go