benchmark

package
v0.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: MIT Imports: 22 Imported by: 0

Documentation

Overview

Package benchmark implements a multi-recipe benchmark runner with isolated workspaces and comparable metrics collection.

Package benchmark — OTLP/gRPC integration.

OTLPGRPCExporter ships iterion run events to an OpenTelemetry collector over gRPC. It coexists with the Prometheus exporter (different observability planes — Prometheus for metric scraping, OTLP for per-event traces / log aggregation), so wiring both is supported.

The exporter is a thin adapter over claw-code-go's pkg/apikit/telemetry/otlpgrpc Exporter — batching, retries on transient gRPC errors, and shutdown drain are inherited from the official OpenTelemetry SDK upstream.

Package benchmark — Prometheus integration.

PrometheusExporter implements the metric contract described in docs/observability/README.md. The Grafana dashboard (docs/observability/grafana/iterion-workflow.json) queries the metric names defined here, so any rename must update both ends in lock-step.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsEndpointMissing

func IsEndpointMissing(err error) bool

IsEndpointMissing reports whether err signals the operator did not configure an OTLP endpoint. Convenience wrapper over errors.Is so CLI wiring stays a single line.

func RenderReport

func RenderReport(w io.Writer, report *BenchmarkReport)

RenderReport writes a human-readable text table comparing recipe results.

Types

type BenchmarkReport

type BenchmarkReport struct {
	ID        string        `json:"id"`
	CreatedAt time.Time     `json:"created_at"`
	CaseLabel string        `json:"case_label"`
	Results   []*RunMetrics `json:"results"`
}

BenchmarkReport is a persisted comparison of multiple recipe runs.

type MetricsStore

type MetricsStore struct {
	// contains filtered or unexported fields
}

MetricsStore persists benchmark reports to disk.

func NewMetricsStore

func NewMetricsStore(root string) (*MetricsStore, error)

NewMetricsStore creates a MetricsStore rooted at the given directory.

func (*MetricsStore) ListReports

func (ms *MetricsStore) ListReports() ([]string, error)

ListReports returns all benchmark report IDs, sorted alphabetically.

func (*MetricsStore) LoadReport

func (ms *MetricsStore) LoadReport(id string) (*BenchmarkReport, error)

LoadReport reads a benchmark report by ID.

func (*MetricsStore) SaveReport

func (ms *MetricsStore) SaveReport(report *BenchmarkReport) error

SaveReport persists a benchmark report to disk.

type OTLPGRPCExporter

type OTLPGRPCExporter struct {
	// contains filtered or unexported fields
}

OTLPGRPCExporter wraps an upstream OTLP/gRPC exporter and translates iterion store events into apikit.TelemetryEvent analytics records. Every event in events.jsonl produces a corresponding analytics event on the OTLP wire so collectors can index per-event flow without needing to tail the JSONL files directly.

func NewOTLPGRPCExporter

func NewOTLPGRPCExporter(runID string, cfg otlpgrpc.Config) (*OTLPGRPCExporter, error)

NewOTLPGRPCExporter constructs an exporter from the given config. It returns the typed otlpgrpc.ErrEndpointMissing when the endpoint is blank so callers can distinguish "user opted out" from "config invalid" using errors.Is.

func (*OTLPGRPCExporter) EventObserver

func (e *OTLPGRPCExporter) EventObserver() func(store.Event)

EventObserver returns a callback compatible with runtime.WithEventObserver. Every store event is translated into an apikit.TelemetryEvent of analytics shape — namespace "iterion", action set to the event type — with run_id, node_id, branch_id, and the event's Data map flattened into Properties. Marshalling errors in the Data map are silently dropped: telemetry must never abort a run.

func (*OTLPGRPCExporter) Stop

func (e *OTLPGRPCExporter) Stop(ctx context.Context) error

Stop drains the underlying batch processor and closes the gRPC client. Safe to call multiple times. A nil receiver is a no-op so callers can defer Stop unconditionally.

type PrometheusExporter

type PrometheusExporter struct {
	// contains filtered or unexported fields
}

PrometheusExporter wires iterion's EventHooks and event emitter to Prometheus counters, histograms, and gauges. Construct one per run (so run_id labels stay attached) and chain it onto the existing store-backed observers via model.ChainHooks and (*PrometheusExporter). WrapEmitter.

func NewPrometheusExporter

func NewPrometheusExporter(runID string, registry *prometheus.Registry) *PrometheusExporter

NewPrometheusExporter constructs an exporter and registers all metrics on the given registry. Pass `nil` to use a fresh prometheus.NewRegistry() (the default for /metrics endpoints that should not leak the global registry's other collectors).

func (*PrometheusExporter) EventHooks

func (p *PrometheusExporter) EventHooks() model.EventHooks

EventHooks returns the model.EventHooks that drive the per-node counters (cost, tokens, requests, retries, tool calls, duration).

func (*PrometheusExporter) EventObserver

func (p *PrometheusExporter) EventObserver() func(store.Event)

EventObserver returns a callback compatible with runtime.WithEventObserver. It updates the parallel-branches gauge based on branch_started / branch_finished events.

The callback runs synchronously on the engine's emit path, so we recover from any panic in the metrics layer to avoid taking down a running workflow because of an observability hiccup.

func (*PrometheusExporter) Handler

func (p *PrometheusExporter) Handler() http.Handler

Handler returns an http.Handler serving the Prometheus text-exposition format for this exporter's metrics. Mount it at /metrics.

func (*PrometheusExporter) Registry

func (p *PrometheusExporter) Registry() *prometheus.Registry

Registry exposes the prometheus.Registry so callers can layer extra collectors (or use a custom HTTP handler).

type RecipeRun

type RecipeRun struct {
	Recipe  *recipe.RecipeSpec
	RunID   string
	Store   *store.RunStore
	Metrics *RunMetrics
	Err     error
}

RecipeRun holds the result of a single recipe's execution.

type RunMetrics

type RunMetrics struct {
	RecipeName   string        `json:"recipe_name"`
	RunID        string        `json:"run_id"`
	Status       string        `json:"status"`
	Verdict      string        `json:"verdict"`
	TotalCostUSD float64       `json:"total_cost_usd"`
	TotalTokens  int           `json:"total_tokens"`
	Iterations   int           `json:"iterations"`
	Duration     time.Duration `json:"duration_ns"`
	DurationStr  string        `json:"duration"`
	Retries      int           `json:"retries"`
	ModelCalls   int           `json:"model_calls"`
}

RunMetrics holds the aggregated metrics for a single recipe run.

func CollectMetrics

func CollectMetrics(s *store.RunStore, runID, recipeName string, evalPrimary string) (*RunMetrics, error)

CollectMetrics extracts aggregated metrics from a run's persisted events and run metadata.

type Runner

type Runner struct {
	// contains filtered or unexported fields
}

Runner orchestrates multi-recipe benchmarks with isolated workspaces.

func NewRunner

func NewRunner(cfg RunnerConfig) (*Runner, error)

NewRunner creates a benchmark runner from the given configuration.

func (*Runner) Run

func (r *Runner) Run(ctx context.Context, storeRoot string) (*BenchmarkReport, error)

Run executes all recipes sequentially, each in its own isolated store, collects metrics, and returns a BenchmarkReport.

type RunnerConfig

type RunnerConfig struct {
	// CaseLabel is a human-readable label for the benchmark case (e.g. PR title).
	CaseLabel string

	// Workflow is the base compiled workflow all recipes share.
	Workflow *ir.Workflow

	// Recipes to compare. At least two are required for a meaningful benchmark.
	Recipes []*recipe.RecipeSpec

	// Inputs are the common run-time inputs passed to every recipe.
	Inputs map[string]interface{}

	// ExecutorFactory creates a fresh NodeExecutor for each recipe run,
	// ensuring complete isolation (no shared caches, sessions, etc.).
	ExecutorFactory func() runtime.NodeExecutor
}

RunnerConfig configures a benchmark run.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL