repository

package
v0.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: GPL-3.0 Imports: 14 Imported by: 0

Documentation

Overview

Package repository provides repository indexing functionality for semantic code search.

The package walks file trees, filters files based on patterns and size limits, and creates searchable checkpoints for each indexed file. This enables semantic search over codebases using vector embeddings.

Security

The package implements defense-in-depth security:

  • Path traversal prevention via filepath.Clean()
  • File size limits (1MB default, 10MB maximum)
  • Glob pattern validation
  • Multi-tenant isolation via project-scoped checkpoints
  • Binary file detection (skips invalid UTF-8)

Usage

Basic indexing example:

svc := repository.NewService(checkpointService)
opts := repository.IndexOptions{
    IncludePatterns: []string{"*.go", "*.md"},
    ExcludePatterns: []string{"vendor/**", "*_test.go"},
    MaxFileSize:     1024 * 1024, // 1MB
}
result, err := svc.IndexRepository(ctx, "/path/to/repo", opts)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Indexed %d files\n", result.FilesIndexed)

Pattern Matching

Include patterns specify which files to index. If empty, all files are included (subject to exclude patterns). Exclude patterns take precedence over include patterns.

Patterns use Go's filepath.Match syntax:

  • "*.go" matches all Go files in the current directory
  • "*_test.go" matches all test files
  • "vendor/**" matches the vendor directory recursively (custom handling)

Performance

Current implementation uses sequential file walking with one checkpoint per file. Future optimizations planned:

  • Batch embedding generation (10x speedup)
  • Parallel processing with worker pools (20x speedup)
  • Incremental indexing (skip unchanged files)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GrepOptions added in v0.3.0

type GrepOptions struct {
	ProjectPath     string
	IncludePatterns []string
	ExcludePatterns []string
	CaseSensitive   bool
}

GrepOptions configures repository grep behavior.

type GrepResult added in v0.3.0

type GrepResult struct {
	FilePath   string `json:"file_path"`
	Content    string `json:"content"`
	LineNumber int    `json:"line_number"`
}

GrepResult from repository grep.

type IndexOptions

type IndexOptions struct {
	// TenantID is the tenant identifier for multi-tenant isolation.
	// If empty, uses default from git user.name or OS username.
	TenantID string

	// Branch is the git branch to associate with indexed files.
	// If empty, auto-detects current branch from repository.
	Branch string

	// IncludePatterns are glob patterns for files to include (e.g., ["*.md", "*.go"]).
	// If empty, all files are included (subject to exclude patterns and size limit).
	IncludePatterns []string

	// ExcludePatterns are glob patterns for files to exclude (e.g., ["*.log", "node_modules/**"]).
	// Takes precedence over include patterns.
	ExcludePatterns []string

	// MaxFileSize is the maximum file size in bytes to index.
	// Default: 1MB (1048576), Maximum: 10MB (10485760).
	MaxFileSize int64
}

IndexOptions configures repository indexing behavior.

type IndexRepositoryFunc

type IndexRepositoryFunc func(ctx context.Context, path string, opts interface{}) (interface{}, error)

IndexRepositoryFunc is a generic function signature for repository indexing. It can be implemented by adapters with different type definitions.

type IndexResult

type IndexResult struct {
	// Path is the repository path that was indexed.
	Path string

	// Branch is the git branch that was indexed.
	Branch string

	// CollectionName is the Qdrant collection where files were stored.
	CollectionName string

	// FilesIndexed is the number of files successfully indexed.
	FilesIndexed int

	// IncludePatterns used during indexing.
	IncludePatterns []string

	// ExcludePatterns used during indexing.
	ExcludePatterns []string

	// MaxFileSize applied during indexing.
	MaxFileSize int64

	// IndexedAt is the timestamp when indexing completed.
	IndexedAt time.Time
}

IndexResult contains the results of a repository indexing operation.

type RepoSearchResult added in v0.3.0

type RepoSearchResult struct {
	FilePath string                 `json:"file_path"`
	Content  string                 `json:"content"`
	Score    float32                `json:"score"`
	Branch   string                 `json:"branch"`
	Metadata map[string]interface{} `json:"metadata"`
}

RepoSearchResult from repository search.

type SearchOptions added in v0.3.0

type SearchOptions struct {
	CollectionName string // Preferred: direct collection name from repository_index
	ProjectPath    string // Required if CollectionName not provided
	TenantID       string // Required if CollectionName not provided
	Branch         string // Optional: filter by branch (empty = all branches)
	Limit          int    // Max results (default: 10)
}

SearchOptions configures repository search behavior.

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service provides repository indexing functionality.

It walks file trees, filters files based on patterns and size limits, and stores them in a dedicated _codebase collection with branch awareness.

func NewService

func NewService(store Store) *Service

NewService creates a new repository indexing service.

func NewServiceWithStoreProvider added in v0.3.0

func NewServiceWithStoreProvider(stores vectorstore.StoreProvider) *Service

NewServiceWithStoreProvider creates a repository service using StoreProvider for database-per-project isolation.

With StoreProvider, each project gets its own chromem.DB instance, and the collection name is simplified to just "codebase".

func (*Service) Grep added in v0.3.0

func (s *Service) Grep(ctx context.Context, pattern string, opts GrepOptions) ([]GrepResult, error)

Grep performs a regex search over repository files.

func (*Service) IndexRepository

func (s *Service) IndexRepository(ctx context.Context, path string, opts IndexOptions) (*IndexResult, error)

IndexRepository indexes all files in a repository matching the given options.

Files are stored in a dedicated {tenant}_{project}_codebase collection, with branch metadata for filtering.

Security: The path is cleaned and validated to prevent path traversal attacks. Multi-tenant isolation is maintained through project-specific collections.

Returns IndexResult with statistics, or an error if indexing fails.

func (*Service) Search added in v0.3.0

func (s *Service) Search(ctx context.Context, query string, opts SearchOptions) ([]RepoSearchResult, error)

Search performs semantic search over indexed repository files.

type ServiceAdapter

type ServiceAdapter struct {
	// contains filtered or unexported fields
}

ServiceAdapter adapts repository.Service to work with other type definitions.

This allows the repository service to be used with different IndexOptions and IndexResult types without tightly coupling packages.

func NewServiceAdapter

func NewServiceAdapter(service *Service) *ServiceAdapter

NewServiceAdapter creates an adapter for the repository service.

func (*ServiceAdapter) AsFunc

func (a *ServiceAdapter) AsFunc() IndexRepositoryFunc

AsFunc returns a function that accepts generic options interface{} and returns generic result interface{}, allowing for type adaptation.

type Store added in v0.3.0

type Store interface {
	// AddDocuments adds documents to the vector store.
	// Documents with Collection field set will be stored in that collection.
	AddDocuments(ctx context.Context, docs []vectorstore.Document) ([]string, error)

	// SearchInCollection performs semantic search in a specific collection.
	SearchInCollection(ctx context.Context, collectionName string, query string, k int, filters map[string]interface{}) ([]vectorstore.SearchResult, error)
}

Store defines the interface for vector store operations. This allows the repository service to store and search indexed files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL