Documentation
¶
Overview ¶
Package repository provides repository indexing functionality for semantic code search.
The package walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file. This enables semantic search over codebases using vector embeddings.
Security ¶
The package implements defense-in-depth security:
- Path traversal prevention via filepath.Clean()
- File size limits (1MB default, 10MB maximum)
- Glob pattern validation
- Multi-tenant isolation via project-scoped checkpoints
- Binary file detection (skips invalid UTF-8)
Usage ¶
Basic indexing example:
svc := repository.NewService(checkpointService)
opts := repository.IndexOptions{
IncludePatterns: []string{"*.go", "*.md"},
ExcludePatterns: []string{"vendor/**", "*_test.go"},
MaxFileSize: 1024 * 1024, // 1MB
}
result, err := svc.IndexRepository(ctx, "/path/to/repo", opts)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Indexed %d files\n", result.FilesIndexed)
Pattern Matching ¶
Include patterns specify which files to index. If empty, all files are included (subject to exclude patterns). Exclude patterns take precedence over include patterns.
Patterns use Go's filepath.Match syntax:
- "*.go" matches all Go files in the current directory
- "*_test.go" matches all test files
- "vendor/**" matches the vendor directory recursively (custom handling)
Performance ¶
Current implementation uses sequential file walking with one checkpoint per file. Future optimizations planned:
- Batch embedding generation (10x speedup)
- Parallel processing with worker pools (20x speedup)
- Incremental indexing (skip unchanged files)
Index ¶
- type GrepOptions
- type GrepResult
- type IndexOptions
- type IndexRepositoryFunc
- type IndexResult
- type RepoSearchResult
- type SearchOptions
- type Service
- func (s *Service) Grep(ctx context.Context, pattern string, opts GrepOptions) ([]GrepResult, error)
- func (s *Service) IndexRepository(ctx context.Context, path string, opts IndexOptions) (*IndexResult, error)
- func (s *Service) Search(ctx context.Context, query string, opts SearchOptions) ([]RepoSearchResult, error)
- type ServiceAdapter
- type Store
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type GrepOptions ¶ added in v0.3.0
type GrepOptions struct {
ProjectPath string
IncludePatterns []string
ExcludePatterns []string
CaseSensitive bool
}
GrepOptions configures repository grep behavior.
type GrepResult ¶ added in v0.3.0
type GrepResult struct {
FilePath string `json:"file_path"`
Content string `json:"content"`
LineNumber int `json:"line_number"`
}
GrepResult from repository grep.
type IndexOptions ¶
type IndexOptions struct {
// TenantID is the tenant identifier for multi-tenant isolation.
// If empty, uses default from git user.name or OS username.
TenantID string
// Branch is the git branch to associate with indexed files.
// If empty, auto-detects current branch from repository.
Branch string
// IncludePatterns are glob patterns for files to include (e.g., ["*.md", "*.go"]).
// If empty, all files are included (subject to exclude patterns and size limit).
IncludePatterns []string
// ExcludePatterns are glob patterns for files to exclude (e.g., ["*.log", "node_modules/**"]).
// Takes precedence over include patterns.
ExcludePatterns []string
// MaxFileSize is the maximum file size in bytes to index.
// Default: 1MB (1048576), Maximum: 10MB (10485760).
MaxFileSize int64
}
IndexOptions configures repository indexing behavior.
type IndexRepositoryFunc ¶
type IndexRepositoryFunc func(ctx context.Context, path string, opts interface{}) (interface{}, error)
IndexRepositoryFunc is a generic function signature for repository indexing. It can be implemented by adapters with different type definitions.
type IndexResult ¶
type IndexResult struct {
// Path is the repository path that was indexed.
Path string
// Branch is the git branch that was indexed.
Branch string
// CollectionName is the Qdrant collection where files were stored.
CollectionName string
// FilesIndexed is the number of files successfully indexed.
FilesIndexed int
// IncludePatterns used during indexing.
IncludePatterns []string
// ExcludePatterns used during indexing.
ExcludePatterns []string
// MaxFileSize applied during indexing.
MaxFileSize int64
// IndexedAt is the timestamp when indexing completed.
IndexedAt time.Time
}
IndexResult contains the results of a repository indexing operation.
type RepoSearchResult ¶ added in v0.3.0
type RepoSearchResult struct {
FilePath string `json:"file_path"`
Content string `json:"content"`
Score float32 `json:"score"`
Branch string `json:"branch"`
Metadata map[string]interface{} `json:"metadata"`
}
RepoSearchResult from repository search.
type SearchOptions ¶ added in v0.3.0
type SearchOptions struct {
CollectionName string // Preferred: direct collection name from repository_index
ProjectPath string // Required if CollectionName not provided
TenantID string // Required if CollectionName not provided
Branch string // Optional: filter by branch (empty = all branches)
Limit int // Max results (default: 10)
}
SearchOptions configures repository search behavior.
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service provides repository indexing functionality.
It walks file trees, filters files based on patterns and size limits, and stores them in a dedicated _codebase collection with branch awareness.
func NewService ¶
NewService creates a new repository indexing service.
func NewServiceWithStoreProvider ¶ added in v0.3.0
func NewServiceWithStoreProvider(stores vectorstore.StoreProvider) *Service
NewServiceWithStoreProvider creates a repository service using StoreProvider for database-per-project isolation.
With StoreProvider, each project gets its own chromem.DB instance, and the collection name is simplified to just "codebase".
func (*Service) Grep ¶ added in v0.3.0
func (s *Service) Grep(ctx context.Context, pattern string, opts GrepOptions) ([]GrepResult, error)
Grep performs a regex search over repository files.
func (*Service) IndexRepository ¶
func (s *Service) IndexRepository(ctx context.Context, path string, opts IndexOptions) (*IndexResult, error)
IndexRepository indexes all files in a repository matching the given options.
Files are stored in a dedicated {tenant}_{project}_codebase collection, with branch metadata for filtering.
Security: The path is cleaned and validated to prevent path traversal attacks. Multi-tenant isolation is maintained through project-specific collections.
Returns IndexResult with statistics, or an error if indexing fails.
func (*Service) Search ¶ added in v0.3.0
func (s *Service) Search(ctx context.Context, query string, opts SearchOptions) ([]RepoSearchResult, error)
Search performs semantic search over indexed repository files.
type ServiceAdapter ¶
type ServiceAdapter struct {
// contains filtered or unexported fields
}
ServiceAdapter adapts repository.Service to work with other type definitions.
This allows the repository service to be used with different IndexOptions and IndexResult types without tightly coupling packages.
func NewServiceAdapter ¶
func NewServiceAdapter(service *Service) *ServiceAdapter
NewServiceAdapter creates an adapter for the repository service.
func (*ServiceAdapter) AsFunc ¶
func (a *ServiceAdapter) AsFunc() IndexRepositoryFunc
AsFunc returns a function that accepts generic options interface{} and returns generic result interface{}, allowing for type adaptation.
type Store ¶ added in v0.3.0
type Store interface {
// AddDocuments adds documents to the vector store.
// Documents with Collection field set will be stored in that collection.
AddDocuments(ctx context.Context, docs []vectorstore.Document) ([]string, error)
// SearchInCollection performs semantic search in a specific collection.
SearchInCollection(ctx context.Context, collectionName string, query string, k int, filters map[string]interface{}) ([]vectorstore.SearchResult, error)
}
Store defines the interface for vector store operations. This allows the repository service to store and search indexed files.