Documentation
¶
Overview ¶
Package benchmark implements a multi-recipe benchmark runner with isolated workspaces and comparable metrics collection.
Package benchmark — OTLP/gRPC integration.
OTLPGRPCExporter ships iterion run events to an OpenTelemetry collector over gRPC. It coexists with the Prometheus exporter (different observability planes — Prometheus for metric scraping, OTLP for per-event traces / log aggregation), so wiring both is supported.
The exporter is a thin adapter over claw-code-go's pkg/apikit/telemetry/otlpgrpc Exporter — batching, retries on transient gRPC errors, and shutdown drain are inherited from the official OpenTelemetry SDK upstream.
Package benchmark — Prometheus integration.
PrometheusExporter implements the metric contract described in docs/observability/README.md. The Grafana dashboard (docs/observability/grafana/iterion-workflow.json) queries the metric names defined here, so any rename must update both ends in lock-step.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func IsEndpointMissing ¶
IsEndpointMissing reports whether err signals the operator did not configure an OTLP endpoint. Convenience wrapper over errors.Is so CLI wiring stays a single line.
func RenderReport ¶
func RenderReport(w io.Writer, report *BenchmarkReport)
RenderReport writes a human-readable text table comparing recipe results.
Types ¶
type BenchmarkReport ¶
type BenchmarkReport struct {
ID string `json:"id"`
CreatedAt time.Time `json:"created_at"`
CaseLabel string `json:"case_label"`
Results []*RunMetrics `json:"results"`
}
BenchmarkReport is a persisted comparison of multiple recipe runs.
type MetricsStore ¶
type MetricsStore struct {
// contains filtered or unexported fields
}
MetricsStore persists benchmark reports to disk.
func NewMetricsStore ¶
func NewMetricsStore(root string) (*MetricsStore, error)
NewMetricsStore creates a MetricsStore rooted at the given directory.
func (*MetricsStore) ListReports ¶
func (ms *MetricsStore) ListReports() ([]string, error)
ListReports returns all benchmark report IDs, sorted alphabetically.
func (*MetricsStore) LoadReport ¶
func (ms *MetricsStore) LoadReport(id string) (*BenchmarkReport, error)
LoadReport reads a benchmark report by ID.
func (*MetricsStore) SaveReport ¶
func (ms *MetricsStore) SaveReport(report *BenchmarkReport) error
SaveReport persists a benchmark report to disk.
type OTLPGRPCExporter ¶
type OTLPGRPCExporter struct {
// contains filtered or unexported fields
}
OTLPGRPCExporter wraps an upstream OTLP/gRPC exporter and translates iterion store events into apikit.TelemetryEvent analytics records. Every event in events.jsonl produces a corresponding analytics event on the OTLP wire so collectors can index per-event flow without needing to tail the JSONL files directly.
func NewOTLPGRPCExporter ¶
func NewOTLPGRPCExporter(runID string, cfg otlpgrpc.Config) (*OTLPGRPCExporter, error)
NewOTLPGRPCExporter constructs an exporter from the given config. It returns the typed otlpgrpc.ErrEndpointMissing when the endpoint is blank so callers can distinguish "user opted out" from "config invalid" using errors.Is.
func (*OTLPGRPCExporter) EventObserver ¶
func (e *OTLPGRPCExporter) EventObserver() func(store.Event)
EventObserver returns a callback compatible with runtime.WithEventObserver. Every store event is translated into an apikit.TelemetryEvent of analytics shape — namespace "iterion", action set to the event type — with run_id, node_id, branch_id, and the event's Data map flattened into Properties. Marshalling errors in the Data map are silently dropped: telemetry must never abort a run.
type PrometheusExporter ¶
type PrometheusExporter struct {
// contains filtered or unexported fields
}
PrometheusExporter wires iterion's EventHooks and event emitter to Prometheus counters, histograms, and gauges. Construct one per run (so run_id labels stay attached) and chain it onto the existing store-backed observers via model.ChainHooks and (*PrometheusExporter). WrapEmitter.
func NewPrometheusExporter ¶
func NewPrometheusExporter(runID string, registry *prometheus.Registry) *PrometheusExporter
NewPrometheusExporter constructs an exporter and registers all metrics on the given registry. Pass `nil` to use a fresh prometheus.NewRegistry() (the default for /metrics endpoints that should not leak the global registry's other collectors).
func (*PrometheusExporter) EventHooks ¶
func (p *PrometheusExporter) EventHooks() model.EventHooks
EventHooks returns the model.EventHooks that drive the per-node counters (cost, tokens, requests, retries, tool calls, duration).
func (*PrometheusExporter) EventObserver ¶
func (p *PrometheusExporter) EventObserver() func(store.Event)
EventObserver returns a callback compatible with runtime.WithEventObserver. It updates the parallel-branches gauge based on branch_started / branch_finished events.
The callback runs synchronously on the engine's emit path, so we recover from any panic in the metrics layer to avoid taking down a running workflow because of an observability hiccup.
func (*PrometheusExporter) Handler ¶
func (p *PrometheusExporter) Handler() http.Handler
Handler returns an http.Handler serving the Prometheus text-exposition format for this exporter's metrics. Mount it at /metrics.
func (*PrometheusExporter) Registry ¶
func (p *PrometheusExporter) Registry() *prometheus.Registry
Registry exposes the prometheus.Registry so callers can layer extra collectors (or use a custom HTTP handler).
type RecipeRun ¶
type RecipeRun struct {
Recipe *recipe.RecipeSpec
RunID string
Store *store.RunStore
Metrics *RunMetrics
Err error
}
RecipeRun holds the result of a single recipe's execution.
type RunMetrics ¶
type RunMetrics struct {
RecipeName string `json:"recipe_name"`
RunID string `json:"run_id"`
Status string `json:"status"`
Verdict string `json:"verdict"`
TotalCostUSD float64 `json:"total_cost_usd"`
TotalTokens int `json:"total_tokens"`
Iterations int `json:"iterations"`
Duration time.Duration `json:"duration_ns"`
DurationStr string `json:"duration"`
Retries int `json:"retries"`
ModelCalls int `json:"model_calls"`
}
RunMetrics holds the aggregated metrics for a single recipe run.
func CollectMetrics ¶
func CollectMetrics(s *store.RunStore, runID, recipeName string, evalPrimary string) (*RunMetrics, error)
CollectMetrics extracts aggregated metrics from a run's persisted events and run metadata.
type Runner ¶
type Runner struct {
// contains filtered or unexported fields
}
Runner orchestrates multi-recipe benchmarks with isolated workspaces.
func NewRunner ¶
func NewRunner(cfg RunnerConfig) (*Runner, error)
NewRunner creates a benchmark runner from the given configuration.
type RunnerConfig ¶
type RunnerConfig struct {
// CaseLabel is a human-readable label for the benchmark case (e.g. PR title).
CaseLabel string
// Workflow is the base compiled workflow all recipes share.
Workflow *ir.Workflow
// Recipes to compare. At least two are required for a meaningful benchmark.
Recipes []*recipe.RecipeSpec
// Inputs are the common run-time inputs passed to every recipe.
Inputs map[string]interface{}
// ExecutorFactory creates a fresh NodeExecutor for each recipe run,
// ensuring complete isolation (no shared caches, sessions, etc.).
ExecutorFactory func() runtime.NodeExecutor
}
RunnerConfig configures a benchmark run.