agent

package
v0.0.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 12, 2026 License: MIT Imports: 14 Imported by: 0

Documentation

Overview

Package agent provides C7 agent evaluation infrastructure for headless Claude Code execution.

Index

Constants

View Source
const CompletenessPrompt = `` /* 762-byte string literal not displayed */

CompletenessPrompt evaluates documentation completeness. Criteria: essential sections present, API coverage, troubleshooting.

View Source
const CrossRefCoherencePrompt = `` /* 922-byte string literal not displayed */

CrossRefCoherencePrompt evaluates documentation cross-reference quality. Criteria: consistent terminology, valid internal links, coherent structure.

View Source
const ExampleQualityPrompt = `` /* 853-byte string literal not displayed */

ExampleQualityPrompt evaluates code example quality. Criteria: runnability, clarity, best practices demonstration.

View Source
const ReadmeClarityPrompt = `` /* 944-byte string literal not displayed */

ReadmeClarityPrompt evaluates README documentation clarity. Criteria: purpose clarity, quickstart quality, structure, inline examples.

Variables

This section is empty.

Functions

func CheckClaudeCLI

func CheckClaudeCLI() error

CheckClaudeCLI verifies that the Claude CLI is installed and accessible. Returns nil if available, or a descriptive error with installation instructions.

func CreateWorkspace

func CreateWorkspace(projectDir string) (workDir string, cleanup func(), err error)

CreateWorkspace creates an isolated directory for agent execution. It attempts to use git worktree for efficient isolation. If the project is not a git repository, it falls back to read-only mode using the original directory (agent tasks use read-only tools, so this is safe).

Returns:

  • workDir: the directory path for agent execution
  • cleanup: function to call when done (removes worktree if created)
  • err: error if workspace creation fails

func LoadResponses

func LoadResponses(debugDir string) (map[string]DebugResponse, error)

LoadResponses reads all JSON response files from debugDir and returns them keyed by "{metric_id}_{sample_index}".

func SaveResponses

func SaveResponses(debugDir string, results []metrics.MetricResult) error

SaveResponses persists metric results as individual JSON files in debugDir. Each sample is saved as {metric_id}_{sample_index}.json.

Types

type C7Progress

type C7Progress struct {
	// contains filtered or unexported fields
}

C7Progress displays real-time progress for C7 agent evaluation. Thread-safe for concurrent metric updates.

func NewC7Progress

func NewC7Progress(w *os.File, metricIDs []string, metricNames []string) *C7Progress

NewC7Progress creates a new progress display. If writer is not a TTY, display operations are no-ops.

func (*C7Progress) AddTokens

func (p *C7Progress) AddTokens(tokens int)

AddTokens adds to the running token count.

func (*C7Progress) SetMetricComplete

func (p *C7Progress) SetMetricComplete(id string, score int)

SetMetricComplete marks a metric as complete with its final score.

func (*C7Progress) SetMetricFailed

func (p *C7Progress) SetMetricFailed(id string, err string)

SetMetricFailed marks a metric as failed with an error message.

func (*C7Progress) SetMetricRunning

func (p *C7Progress) SetMetricRunning(id string, totalSamples int)

SetMetricRunning marks a metric as running and sets total samples.

func (*C7Progress) SetMetricSample

func (p *C7Progress) SetMetricSample(id string, current int)

SetMetricSample updates the current sample number for a running metric.

func (*C7Progress) Start

func (p *C7Progress) Start()

Start begins the progress display refresh loop.

func (*C7Progress) Stop

func (p *C7Progress) Stop()

Stop halts the progress display and prints a final summary.

func (*C7Progress) TotalTokens

func (p *C7Progress) TotalTokens() int

TotalTokens returns the current token count.

type CLIStatus

type CLIStatus struct {
	Available   bool   // whether CLI is usable
	Version     string // CLI version string (e.g., "claude 2.1.12")
	Error       string // error message if not available
	InstallHint string // installation instructions
}

CLIStatus represents the availability and version of the Claude CLI.

func DetectCLI

func DetectCLI() CLIStatus

DetectCLI checks if the Claude CLI is installed and returns its status. This is a convenience wrapper around detectCLIWithContext using a 5-second timeout.

func GetCLIStatus

func GetCLIStatus() CLIStatus

GetCLIStatus returns cached CLI status, detecting on first call. This is efficient for repeated checks within a single process.

type DebugResponse

type DebugResponse struct {
	MetricID    string  `json:"metric_id"`
	SampleIndex int     `json:"sample_index"`
	FilePath    string  `json:"file_path"`
	Prompt      string  `json:"prompt"`
	Response    string  `json:"response"`
	Duration    float64 `json:"duration_seconds"`
	Error       string  `json:"error,omitempty"`
}

DebugResponse represents a single captured C7 metric response for persistence and replay.

type EvaluationResult

type EvaluationResult struct {
	Score  int    `json:"score"`  // 1-10
	Reason string `json:"reason"` // Brief explanation
}

EvaluationResult holds the result of content quality evaluation.

type Evaluator

type Evaluator struct {
	// contains filtered or unexported fields
}

Evaluator performs content quality evaluation using the Claude CLI.

func NewEvaluator

func NewEvaluator(timeout time.Duration) *Evaluator

NewEvaluator creates an Evaluator with the specified timeout. If timeout is 0, a default of 60 seconds is used.

func (*Evaluator) EvaluateContent

func (e *Evaluator) EvaluateContent(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)

EvaluateContent runs content evaluation using the Claude CLI. The systemPrompt provides evaluation criteria, and content is the material to evaluate.

func (*Evaluator) EvaluateWithRetry

func (e *Evaluator) EvaluateWithRetry(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)

EvaluateWithRetry runs EvaluateContent with one retry on failure.

func (*Evaluator) SetCommandRunner

func (e *Evaluator) SetCommandRunner(fn commandRunnerFunc)

SetCommandRunner replaces the command execution function (for testing).

type ParallelResult

type ParallelResult struct {
	Results     []metrics.MetricResult
	TotalTokens int
	Errors      []error
}

ParallelResult holds the complete outcome of parallel metric execution.

func RunMetricsParallel

func RunMetricsParallel(
	ctx context.Context,
	workDir string,
	targets []*types.AnalysisTarget,
	progress *C7Progress,
	executor metrics.Executor,
) ParallelResult

RunMetricsParallel executes all metrics concurrently with progress updates. It does not abort on individual metric failures - all metrics run to completion. If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.

func RunMetricsSequential

func RunMetricsSequential(
	ctx context.Context,
	workDir string,
	targets []*types.AnalysisTarget,
	progress *C7Progress,
	executor metrics.Executor,
) ParallelResult

RunMetricsSequential executes all metrics sequentially (fallback/debugging). If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.

type ReplayExecutor

type ReplayExecutor struct {
	// contains filtered or unexported fields
}

ReplayExecutor replays previously captured responses instead of calling Claude CLI. It implements the metrics.Executor interface.

func NewReplayExecutor

func NewReplayExecutor(responses map[string]DebugResponse) *ReplayExecutor

NewReplayExecutor creates a ReplayExecutor from a map of captured responses.

func (*ReplayExecutor) ExecutePrompt

func (r *ReplayExecutor) ExecutePrompt(ctx context.Context, workDir, prompt, tools string, timeout time.Duration) (string, error)

ExecutePrompt replays a captured response by identifying the metric from the prompt text. Implements metrics.Executor interface.

Directories

Path Synopsis
Package metrics provides 5 MECE agent evaluation metrics for C7.
Package metrics provides 5 MECE agent evaluation metrics for C7.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL