agent

package

v0.0.7 Latest Latest Go to latest Published: Feb 12, 2026 License: MIT Imports: 14 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ingo-eichhorst/agent-readyness

Links

Open Source Insights

Documentation ¶

Overview ¶

Package agent provides C7 agent evaluation infrastructure for headless Claude Code execution.

Index ¶

Constants
func CheckClaudeCLI() error
func CreateWorkspace(projectDir string) (workDir string, cleanup func(), err error)
func LoadResponses(debugDir string) (map[string]DebugResponse, error)
func SaveResponses(debugDir string, results []metrics.MetricResult) error
type C7Progress
- func NewC7Progress(w *os.File, metricIDs []string, metricNames []string) *C7Progress
- func (p *C7Progress) AddTokens(tokens int)
- func (p *C7Progress) SetMetricComplete(id string, score int)
- func (p *C7Progress) SetMetricFailed(id string, err string)
- func (p *C7Progress) SetMetricRunning(id string, totalSamples int)
- func (p *C7Progress) SetMetricSample(id string, current int)
- func (p *C7Progress) Start()
- func (p *C7Progress) Stop()
- func (p *C7Progress) TotalTokens() int
type CLIStatus
- func DetectCLI() CLIStatus
- func GetCLIStatus() CLIStatus
type DebugResponse
type EvaluationResult
type Evaluator
- func NewEvaluator(timeout time.Duration) *Evaluator
- func (e *Evaluator) EvaluateContent(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)
- func (e *Evaluator) EvaluateWithRetry(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)
- func (e *Evaluator) SetCommandRunner(fn commandRunnerFunc)
type ParallelResult
- func RunMetricsParallel(ctx context.Context, workDir string, targets []*types.AnalysisTarget, ...) ParallelResult
- func RunMetricsSequential(ctx context.Context, workDir string, targets []*types.AnalysisTarget, ...) ParallelResult
type ReplayExecutor
- func NewReplayExecutor(responses map[string]DebugResponse) *ReplayExecutor
- func (r *ReplayExecutor) ExecutePrompt(ctx context.Context, workDir, prompt, tools string, timeout time.Duration) (string, error)

Constants ¶

View Source

const CompletenessPrompt = `` /* 762-byte string literal not displayed */

CompletenessPrompt evaluates documentation completeness. Criteria: essential sections present, API coverage, troubleshooting.

View Source

const CrossRefCoherencePrompt = `` /* 922-byte string literal not displayed */

CrossRefCoherencePrompt evaluates documentation cross-reference quality. Criteria: consistent terminology, valid internal links, coherent structure.

View Source

const ExampleQualityPrompt = `` /* 853-byte string literal not displayed */

ExampleQualityPrompt evaluates code example quality. Criteria: runnability, clarity, best practices demonstration.

View Source

const ReadmeClarityPrompt = `` /* 944-byte string literal not displayed */

ReadmeClarityPrompt evaluates README documentation clarity. Criteria: purpose clarity, quickstart quality, structure, inline examples.

Variables ¶

This section is empty.

Functions ¶

func CheckClaudeCLI ¶

func CheckClaudeCLI() error

CheckClaudeCLI verifies that the Claude CLI is installed and accessible. Returns nil if available, or a descriptive error with installation instructions.

func CreateWorkspace ¶

func CreateWorkspace(projectDir string) (workDir string, cleanup func(), err error)

CreateWorkspace creates an isolated directory for agent execution. It attempts to use git worktree for efficient isolation. If the project is not a git repository, it falls back to read-only mode using the original directory (agent tasks use read-only tools, so this is safe).

Returns:

workDir: the directory path for agent execution
cleanup: function to call when done (removes worktree if created)
err: error if workspace creation fails

func LoadResponses ¶

func LoadResponses(debugDir string) (map[string]DebugResponse, error)

LoadResponses reads all JSON response files from debugDir and returns them keyed by "{metric_id}_{sample_index}".

func SaveResponses ¶

func SaveResponses(debugDir string, results []metrics.MetricResult) error

SaveResponses persists metric results as individual JSON files in debugDir. Each sample is saved as {metric_id}_{sample_index}.json.

Types ¶

type C7Progress ¶

type C7Progress struct {
	// contains filtered or unexported fields
}

C7Progress displays real-time progress for C7 agent evaluation. Thread-safe for concurrent metric updates.

func NewC7Progress ¶

func NewC7Progress(w *os.File, metricIDs []string, metricNames []string) *C7Progress

NewC7Progress creates a new progress display. If writer is not a TTY, display operations are no-ops.

func (*C7Progress) AddTokens ¶

func (p *C7Progress) AddTokens(tokens int)

AddTokens adds to the running token count.

func (*C7Progress) SetMetricComplete ¶

func (p *C7Progress) SetMetricComplete(id string, score int)

SetMetricComplete marks a metric as complete with its final score.

func (*C7Progress) SetMetricFailed ¶

func (p *C7Progress) SetMetricFailed(id string, err string)

SetMetricFailed marks a metric as failed with an error message.

func (*C7Progress) SetMetricRunning ¶

func (p *C7Progress) SetMetricRunning(id string, totalSamples int)

SetMetricRunning marks a metric as running and sets total samples.

func (*C7Progress) SetMetricSample ¶

func (p *C7Progress) SetMetricSample(id string, current int)

SetMetricSample updates the current sample number for a running metric.

func (*C7Progress) Start ¶

func (p *C7Progress) Start()

Start begins the progress display refresh loop.

func (*C7Progress) Stop ¶

func (p *C7Progress) Stop()

Stop halts the progress display and prints a final summary.

func (*C7Progress) TotalTokens ¶

func (p *C7Progress) TotalTokens() int

TotalTokens returns the current token count.

type CLIStatus ¶

type CLIStatus struct {
	Available   bool   // whether CLI is usable
	Version     string // CLI version string (e.g., "claude 2.1.12")
	Error       string // error message if not available
	InstallHint string // installation instructions
}

CLIStatus represents the availability and version of the Claude CLI.

func DetectCLI ¶

func DetectCLI() CLIStatus

DetectCLI checks if the Claude CLI is installed and returns its status. This is a convenience wrapper around detectCLIWithContext using a 5-second timeout.

func GetCLIStatus ¶

func GetCLIStatus() CLIStatus

GetCLIStatus returns cached CLI status, detecting on first call. This is efficient for repeated checks within a single process.

type DebugResponse ¶

type DebugResponse struct {
	MetricID    string  `json:"metric_id"`
	SampleIndex int     `json:"sample_index"`
	FilePath    string  `json:"file_path"`
	Prompt      string  `json:"prompt"`
	Response    string  `json:"response"`
	Duration    float64 `json:"duration_seconds"`
	Error       string  `json:"error,omitempty"`
}

DebugResponse represents a single captured C7 metric response for persistence and replay.

type EvaluationResult ¶

type EvaluationResult struct {
	Score  int    `json:"score"`  // 1-10
	Reason string `json:"reason"` // Brief explanation
}

EvaluationResult holds the result of content quality evaluation.

type Evaluator ¶

type Evaluator struct {
	// contains filtered or unexported fields
}

Evaluator performs content quality evaluation using the Claude CLI.

func NewEvaluator ¶

func NewEvaluator(timeout time.Duration) *Evaluator

NewEvaluator creates an Evaluator with the specified timeout. If timeout is 0, a default of 60 seconds is used.

func (*Evaluator) EvaluateContent ¶

func (e *Evaluator) EvaluateContent(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)

EvaluateContent runs content evaluation using the Claude CLI. The systemPrompt provides evaluation criteria, and content is the material to evaluate.

func (*Evaluator) EvaluateWithRetry ¶

func (e *Evaluator) EvaluateWithRetry(ctx context.Context, systemPrompt, content string) (EvaluationResult, error)

EvaluateWithRetry runs EvaluateContent with one retry on failure.

func (*Evaluator) SetCommandRunner ¶

func (e *Evaluator) SetCommandRunner(fn commandRunnerFunc)

SetCommandRunner replaces the command execution function (for testing).

type ParallelResult ¶

type ParallelResult struct {
	Results     []metrics.MetricResult
	TotalTokens int
	Errors      []error
}

ParallelResult holds the complete outcome of parallel metric execution.

func RunMetricsParallel ¶

func RunMetricsParallel(
	ctx context.Context,
	workDir string,
	targets []*types.AnalysisTarget,
	progress *C7Progress,
	executor metrics.Executor,
) ParallelResult

RunMetricsParallel executes all metrics concurrently with progress updates. It does not abort on individual metric failures - all metrics run to completion. If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.

func RunMetricsSequential ¶

func RunMetricsSequential(
	ctx context.Context,
	workDir string,
	targets []*types.AnalysisTarget,
	progress *C7Progress,
	executor metrics.Executor,
) ParallelResult

RunMetricsSequential executes all metrics sequentially (fallback/debugging). If executor is nil, a default CLIExecutorAdapter is created for live CLI execution.

type ReplayExecutor ¶

type ReplayExecutor struct {
	// contains filtered or unexported fields
}

ReplayExecutor replays previously captured responses instead of calling Claude CLI. It implements the metrics.Executor interface.

func NewReplayExecutor ¶

func NewReplayExecutor(responses map[string]DebugResponse) *ReplayExecutor

NewReplayExecutor creates a ReplayExecutor from a map of captured responses.

func (*ReplayExecutor) ExecutePrompt ¶

func (r *ReplayExecutor) ExecutePrompt(ctx context.Context, workDir, prompt, tools string, timeout time.Duration) (string, error)

ExecutePrompt replays a captured response by identifying the metric from the prompt text. Implements metrics.Executor interface.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
metrics Package metrics provides 5 MECE agent evaluation metrics for C7.	Package metrics provides 5 MECE agent evaluation metrics for C7.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL