indexer

package
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package indexer clones and indexes GitHub repositories using the Synapses CLI, caching results to disk so re-runs skip already-indexed repos.

Designed for RepoBench-R: clone the repos referenced in the dataset, index each with `synapses index --path <dir>`, then let the benchmark binary call tools with `?project=<dir>` per sample.

Usage:

indexer.Run(indexer.Options{
    ReposDir:   "/tmp/repobench_repos",
    CacheFile:  "/tmp/repobench_index_cache.json",
    Repos:      []string{"sissaschool/elementpath", ...},
    Workers:    8,
    SynapsesBin: "/Users/itachi/.synapses/bin/synapses",
})

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func LocalPath

func LocalPath(reposDir, repo string) string

LocalPath returns the canonical local directory for a GitHub repo. "owner/repo" → "<reposDir>/owner/repo"

func Summary

func Summary(results []Result)

Summary prints a summary of indexer results.

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

Cache persists indexed repo paths across runs.

func LoadCache

func LoadCache(path string) (*Cache, error)

LoadCache loads or creates a cache file.

func (*Cache) Get

func (c *Cache) Get(repo string) string

Get returns the local path for a repo, or "" if not cached.

func (*Cache) Set

func (c *Cache) Set(repo, localPath string) error

Set records a repo → local path mapping and flushes to disk.

type Options

type Options struct {
	// ReposDir is the directory where repos are cloned.
	ReposDir string
	// CacheFile is the JSON file tracking which repos have been indexed.
	CacheFile string
	// Repos is the list of "owner/repo" strings to clone and index.
	Repos []string
	// Workers is the number of parallel clone+index workers (default 8).
	Workers int
	// SynapsesBin is the path to the synapses binary (default: auto-detect).
	SynapsesBin string
	// SkipIndex skips the `synapses index` step (clone only).
	SkipIndex bool
	// TimeoutPerRepo is the max time per clone+index operation (default 3 min).
	TimeoutPerRepo time.Duration
	// Verbose prints per-repo progress.
	Verbose bool
}

Options controls the indexer.

type Result

type Result struct {
	Repo      string  `json:"repo"`
	LocalPath string  `json:"local_path"`
	Indexed   bool    `json:"indexed"`
	Skipped   bool    `json:"skipped"` // already cached
	Error     string  `json:"error,omitempty"`
	DurationS float64 `json:"duration_s"`
}

Result holds the outcome of indexing a single repo.

func Run

func Run(opts Options) ([]Result, error)

Run clones and indexes all repos with parallel workers. Returns results for all repos (including cached ones).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL