repository

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2025 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package repository provides repository indexing functionality for semantic code search.

The package walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file. This enables semantic search over codebases using vector embeddings.

Security

The package implements defense-in-depth security:

  • Path traversal prevention via filepath.Clean()
  • File size limits (1MB default, 10MB maximum)
  • Glob pattern validation
  • Multi-tenant isolation via project-scoped checkpoints
  • Binary file detection (skips invalid UTF-8)

Usage

Basic indexing example:

svc := repository.NewService(checkpointService)
opts := repository.IndexOptions{
    IncludePatterns: []string{"*.go", "*.md"},
    ExcludePatterns: []string{"vendor/**", "*_test.go"},
    MaxFileSize:     1024 * 1024, // 1MB
}
result, err := svc.IndexRepository(ctx, "/path/to/repo", opts)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Indexed %d files\n", result.FilesIndexed)

Pattern Matching

Include patterns specify which files to index. If empty, all files are included (subject to exclude patterns). Exclude patterns take precedence over include patterns.

Patterns use Go's filepath.Match syntax:

  • "*.go" matches all Go files in the current directory
  • "*_test.go" matches all test files
  • "vendor/**" matches the vendor directory recursively (custom handling)

Performance

Current implementation uses sequential file walking with one checkpoint per file. Future optimizations planned:

  • Batch embedding generation (10x speedup)
  • Parallel processing with worker pools (20x speedup)
  • Incremental indexing (skip unchanged files)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CheckpointService

type CheckpointService interface {
	Save(ctx context.Context, req *checkpoint.SaveRequest) (*checkpoint.Checkpoint, error)
}

CheckpointService defines the interface for checkpoint operations.

This allows the repository service to create checkpoints without depending on the concrete checkpoint.Service implementation.

type IndexOptions

type IndexOptions struct {
	// TenantID is the tenant identifier for multi-tenant isolation.
	// If empty, uses default from git user.name or OS username.
	TenantID string

	// IncludePatterns are glob patterns for files to include (e.g., ["*.md", "*.go"]).
	// If empty, all files are included (subject to exclude patterns and size limit).
	IncludePatterns []string

	// ExcludePatterns are glob patterns for files to exclude (e.g., ["*.log", "node_modules/**"]).
	// Takes precedence over include patterns.
	ExcludePatterns []string

	// MaxFileSize is the maximum file size in bytes to index.
	// Default: 1MB (1048576), Maximum: 10MB (10485760).
	MaxFileSize int64
}

IndexOptions configures repository indexing behavior.

type IndexRepositoryFunc

type IndexRepositoryFunc func(ctx context.Context, path string, opts interface{}) (interface{}, error)

IndexRepositoryFunc is a generic function signature for repository indexing. It can be implemented by adapters with different type definitions.

type IndexResult

type IndexResult struct {
	// Path is the repository path that was indexed.
	Path string

	// FilesIndexed is the number of files successfully indexed.
	FilesIndexed int

	// IncludePatterns used during indexing.
	IncludePatterns []string

	// ExcludePatterns used during indexing.
	ExcludePatterns []string

	// MaxFileSize applied during indexing.
	MaxFileSize int64

	// IndexedAt is the timestamp when indexing completed.
	IndexedAt time.Time
}

IndexResult contains the results of a repository indexing operation.

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service provides repository indexing functionality.

It walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file.

func NewService

func NewService(checkpointSvc CheckpointService) *Service

NewService creates a new repository indexing service.

func (*Service) IndexRepository

func (s *Service) IndexRepository(ctx context.Context, path string, opts IndexOptions) (*IndexResult, error)

IndexRepository indexes all files in a repository matching the given options.

The function walks the file tree at path, filters files according to include/exclude patterns and size limits, and creates a checkpoint for each indexed file.

Security: The path is cleaned and validated to prevent path traversal attacks. Multi-tenant isolation is maintained through project-specific checkpoints.

Returns IndexResult with statistics, or an error if indexing fails.

type ServiceAdapter

type ServiceAdapter struct {
	// contains filtered or unexported fields
}

ServiceAdapter adapts repository.Service to work with other type definitions.

This allows the repository service to be used with different IndexOptions and IndexResult types without tightly coupling packages.

func NewServiceAdapter

func NewServiceAdapter(service *Service) *ServiceAdapter

NewServiceAdapter creates an adapter for the repository service.

func (*ServiceAdapter) AsFunc

func (a *ServiceAdapter) AsFunc() IndexRepositoryFunc

AsFunc returns a function that accepts generic options interface{} and returns generic result interface{}, allowing for type adaptation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL