Documentation
¶
Overview ¶
Package agent provides C7 agent evaluation infrastructure for headless Claude Code execution.
Index ¶
- Constants
- func CheckClaudeCLI() error
- func CreateWorkspace(projectDir string) (workDir string, cleanup func(), err error)
- func LoadResponses(debugDir string) (map[string]DebugResponse, error)
- func SaveResponses(debugDir string, results []metrics.MetricResult) error
- type C7Progress
- func (p *C7Progress) AddTokens(tokens int)
- func (p *C7Progress) SetMetricComplete(id string, score int)
- func (p *C7Progress) SetMetricFailed(id string, err string)
- func (p *C7Progress) SetMetricRunning(id string, totalSamples int)
- func (p *C7Progress) SetMetricSample(id string, current int)
- func (p *C7Progress) Start()
- func (p *C7Progress) Stop()
- func (p *C7Progress) TotalTokens() int
- type CLIStatus
- type DebugResponse
- type EvaluationResult
- type Evaluator
- type ParallelResult
- type ReplayExecutor
Constants ¶
const CompletenessPrompt = `` /* 762-byte string literal not displayed */
CompletenessPrompt evaluates documentation completeness. Criteria: essential sections present, API coverage, troubleshooting.
const CrossRefCoherencePrompt = `` /* 922-byte string literal not displayed */
CrossRefCoherencePrompt evaluates documentation cross-reference quality. Criteria: consistent terminology, valid internal links, coherent structure.
const ExampleQualityPrompt = `` /* 853-byte string literal not displayed */
ExampleQualityPrompt evaluates code example quality. Criteria: runnability, clarity, best practices demonstration.
const ReadmeClarityPrompt = `` /* 944-byte string literal not displayed */
ReadmeClarityPrompt evaluates README documentation clarity. Criteria: purpose clarity, quickstart quality, structure, inline examples.
Variables ¶
This section is empty.
Functions ¶
func CheckClaudeCLI ¶
func CheckClaudeCLI() error
CheckClaudeCLI verifies that the Claude CLI is installed and accessible. Returns nil if available, or a descriptive error with installation instructions.
func CreateWorkspace ¶
CreateWorkspace creates an isolated directory for agent execution. It attempts to use git worktree for efficient isolation. If the project is not a git repository, it falls back to read-only mode using the original directory (agent tasks use read-only tools, so this is safe).
Returns:
- workDir: the directory path for agent execution
- cleanup: function to call when done (removes worktree if created)
- err: error if workspace creation fails
func LoadResponses ¶
func LoadResponses(debugDir string) (map[string]DebugResponse, error)
LoadResponses reads all JSON response files from debugDir and returns them keyed by "{metric_id}_{sample_index}".
func SaveResponses ¶
func SaveResponses(debugDir string, results []metrics.MetricResult) error
SaveResponses persists metric results as individual JSON files in debugDir. Each sample is saved as {metric_id}_{sample_index}.json.
Types ¶
type C7Progress ¶
type C7Progress struct {
// contains filtered or unexported fields
}
C7Progress displays real-time progress for C7 agent evaluation. Thread-safe for concurrent metric updates.
func NewC7Progress ¶
func NewC7Progress(w *os.File, metricIDs []string, metricNames []string) *C7Progress
NewC7Progress creates a new progress display. If writer is not a TTY, display operations are no-ops.
func (*C7Progress) AddTokens ¶
func (p *C7Progress) AddTokens(tokens int)
AddTokens adds to the running token count.
func (*C7Progress) SetMetricComplete ¶
func (p *C7Progress) SetMetricComplete(id string, score int)
SetMetricComplete marks a metric as complete with its final score.
func (*C7Progress) SetMetricFailed ¶
func (p *C7Progress) SetMetricFailed(id string, err string)
SetMetricFailed marks a metric as failed with an error message.
func (*C7Progress) SetMetricRunning ¶
func (p *C7Progress) SetMetricRunning(id string, totalSamples int)
SetMetricRunning marks a metric as running and sets total samples.
func (*C7Progress) SetMetricSample ¶
func (p *C7Progress) SetMetricSample(id string, current int)
SetMetricSample updates the current sample number for a running metric.
func (*C7Progress) Start ¶
func (p *C7Progress) Start()
Start begins the progress display refresh loop.
func (*C7Progress) Stop ¶
func (p *C7Progress) Stop()
Stop halts the progress display and prints a final summary.
func (*C7Progress) TotalTokens ¶
func (p *C7Progress) TotalTokens() int
TotalTokens returns the current token count.
type CLIStatus ¶
type CLIStatus struct {
Available bool // whether CLI is usable
Version string // CLI version string (e.g., "claude 2.1.12")
Error string // error message if not available
InstallHint string // installation instructions
}
CLIStatus represents the availability and version of the Claude CLI.
func DetectCLI ¶
func DetectCLI() CLIStatus
DetectCLI checks if the Claude CLI is installed and returns its status. This is a convenience wrapper around detectCLIWithContext using a 5-second timeout.
func GetCLIStatus ¶
func GetCLIStatus() CLIStatus
GetCLIStatus returns cached CLI status, detecting on first call. This is efficient for repeated checks within a single process.
type DebugResponse ¶
type DebugResponse struct {
MetricID string `json:"metric_id"`
SampleIndex int `json:"sample_index"`
FilePath string `json:"file_path"`
Prompt string `json:"prompt"`
Response string `json:"response"`
Duration float64 `json:"duration_seconds"`
Error string `json:"error,omitempty"`
}
DebugResponse represents a single captured C7 metric response for persistence and replay.
type EvaluationResult ¶
type EvaluationResult struct {
Score int `json:"score"` // 1-10
Reason string `json:"reason"` // Brief explanation
}
EvaluationResult holds the result of content quality evaluation.
type Evaluator ¶
type Evaluator struct {
// contains filtered or unexported fields
}
Evaluator performs content quality evaluation using the Claude CLI.
func NewEvaluator ¶
NewEvaluator creates an Evaluator with the specified timeout. If timeout is 0, a default of 60 seconds is used.
func (*Evaluator) EvaluateContent ¶
func (e *Evaluator) EvaluateContent(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)
EvaluateContent runs content evaluation using the Claude CLI. The systemPrompt provides evaluation criteria, and content is the material to evaluate.
func (*Evaluator) EvaluateWithRetry ¶
func (e *Evaluator) EvaluateWithRetry(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)
EvaluateWithRetry runs EvaluateContent with one retry on failure.
func (*Evaluator) SetCommandRunner ¶
func (e *Evaluator) SetCommandRunner(fn commandRunnerFunc)
SetCommandRunner replaces the command execution function (for testing).
type ParallelResult ¶
type ParallelResult struct {
Results []metrics.MetricResult
TotalTokens int
Errors []error
}
ParallelResult holds the complete outcome of parallel metric execution.
func RunMetricsParallel ¶
func RunMetricsParallel( ctx context.Context, workDir string, targets []*types.AnalysisTarget, progress *C7Progress, executor metrics.Executor, ) ParallelResult
RunMetricsParallel executes all metrics concurrently with progress updates. It does not abort on individual metric failures - all metrics run to completion. If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.
func RunMetricsSequential ¶
func RunMetricsSequential( ctx context.Context, workDir string, targets []*types.AnalysisTarget, progress *C7Progress, executor metrics.Executor, ) ParallelResult
RunMetricsSequential executes all metrics sequentially (fallback/debugging). If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.
type ReplayExecutor ¶
type ReplayExecutor struct {
// contains filtered or unexported fields
}
ReplayExecutor replays previously captured responses instead of calling Claude CLI. It implements the metrics.Executor interface.
func NewReplayExecutor ¶
func NewReplayExecutor(responses map[string]DebugResponse) *ReplayExecutor
NewReplayExecutor creates a ReplayExecutor from a map of captured responses.
func (*ReplayExecutor) ExecutePrompt ¶
func (r *ReplayExecutor) ExecutePrompt(ctx context.Context, workDir, prompt, tools string, timeout time.Duration) (string, error)
ExecutePrompt replays a captured response by identifying the metric from the prompt text. Implements metrics.Executor interface.