Documentation
¶
Overview ¶
Package repository provides repository indexing functionality for semantic code search.
The package walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file. This enables semantic search over codebases using vector embeddings.
Security ¶
The package implements defense-in-depth security:
- Path traversal prevention via filepath.Clean()
- File size limits (1MB default, 10MB maximum)
- Glob pattern validation
- Multi-tenant isolation via project-scoped checkpoints
- Binary file detection (skips invalid UTF-8)
Usage ¶
Basic indexing example:
svc := repository.NewService(checkpointService)
opts := repository.IndexOptions{
IncludePatterns: []string{"*.go", "*.md"},
ExcludePatterns: []string{"vendor/**", "*_test.go"},
MaxFileSize: 1024 * 1024, // 1MB
}
result, err := svc.IndexRepository(ctx, "/path/to/repo", opts)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Indexed %d files\n", result.FilesIndexed)
Pattern Matching ¶
Include patterns specify which files to index. If empty, all files are included (subject to exclude patterns). Exclude patterns take precedence over include patterns.
Patterns use Go's filepath.Match syntax:
- "*.go" matches all Go files in the current directory
- "*_test.go" matches all test files
- "vendor/**" matches the vendor directory recursively (custom handling)
Performance ¶
Current implementation uses sequential file walking with one checkpoint per file. Future optimizations planned:
- Batch embedding generation (10x speedup)
- Parallel processing with worker pools (20x speedup)
- Incremental indexing (skip unchanged files)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type CheckpointService ¶
type CheckpointService interface {
Save(ctx context.Context, req *checkpoint.SaveRequest) (*checkpoint.Checkpoint, error)
}
CheckpointService defines the interface for checkpoint operations.
This allows the repository service to create checkpoints without depending on the concrete checkpoint.Service implementation.
type IndexOptions ¶
type IndexOptions struct {
// TenantID is the tenant identifier for multi-tenant isolation.
// If empty, uses default from git user.name or OS username.
TenantID string
// IncludePatterns are glob patterns for files to include (e.g., ["*.md", "*.go"]).
// If empty, all files are included (subject to exclude patterns and size limit).
IncludePatterns []string
// ExcludePatterns are glob patterns for files to exclude (e.g., ["*.log", "node_modules/**"]).
// Takes precedence over include patterns.
ExcludePatterns []string
// MaxFileSize is the maximum file size in bytes to index.
// Default: 1MB (1048576), Maximum: 10MB (10485760).
MaxFileSize int64
}
IndexOptions configures repository indexing behavior.
type IndexRepositoryFunc ¶
type IndexRepositoryFunc func(ctx context.Context, path string, opts interface{}) (interface{}, error)
IndexRepositoryFunc is a generic function signature for repository indexing. It can be implemented by adapters with different type definitions.
type IndexResult ¶
type IndexResult struct {
// Path is the repository path that was indexed.
Path string
// FilesIndexed is the number of files successfully indexed.
FilesIndexed int
// IncludePatterns used during indexing.
IncludePatterns []string
// ExcludePatterns used during indexing.
ExcludePatterns []string
// MaxFileSize applied during indexing.
MaxFileSize int64
// IndexedAt is the timestamp when indexing completed.
IndexedAt time.Time
}
IndexResult contains the results of a repository indexing operation.
type Service ¶
type Service struct {
// contains filtered or unexported fields
}
Service provides repository indexing functionality.
It walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file.
func NewService ¶
func NewService(checkpointSvc CheckpointService) *Service
NewService creates a new repository indexing service.
func (*Service) IndexRepository ¶
func (s *Service) IndexRepository(ctx context.Context, path string, opts IndexOptions) (*IndexResult, error)
IndexRepository indexes all files in a repository matching the given options.
The function walks the file tree at path, filters files according to include/exclude patterns and size limits, and creates a checkpoint for each indexed file.
Security: The path is cleaned and validated to prevent path traversal attacks. Multi-tenant isolation is maintained through project-specific checkpoints.
Returns IndexResult with statistics, or an error if indexing fails.
type ServiceAdapter ¶
type ServiceAdapter struct {
// contains filtered or unexported fields
}
ServiceAdapter adapts repository.Service to work with other type definitions.
This allows the repository service to be used with different IndexOptions and IndexResult types without tightly coupling packages.
func NewServiceAdapter ¶
func NewServiceAdapter(service *Service) *ServiceAdapter
NewServiceAdapter creates an adapter for the repository service.
func (*ServiceAdapter) AsFunc ¶
func (a *ServiceAdapter) AsFunc() IndexRepositoryFunc
AsFunc returns a function that accepts generic options interface{} and returns generic result interface{}, allowing for type adaptation.