benchmark

package

v0.3.1 Latest Latest Go to latest Published: May 4, 2026 License: MIT Imports: 22 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/SocialGouv/iterion

Links

Open Source Insights

Documentation ¶

Overview ¶

Package benchmark implements a multi-recipe benchmark runner with isolated workspaces and comparable metrics collection.

Package benchmark — OTLP/gRPC integration.

OTLPGRPCExporter ships iterion run events to an OpenTelemetry collector over gRPC. It coexists with the Prometheus exporter (different observability planes — Prometheus for metric scraping, OTLP for per-event traces / log aggregation), so wiring both is supported.

The exporter is a thin adapter over claw-code-go's pkg/apikit/telemetry/otlpgrpc Exporter — batching, retries on transient gRPC errors, and shutdown drain are inherited from the official OpenTelemetry SDK upstream.

Package benchmark — Prometheus integration.

PrometheusExporter implements the metric contract described in docs/observability/README.md. The Grafana dashboard (docs/observability/grafana/iterion-workflow.json) queries the metric names defined here, so any rename must update both ends in lock-step.

Index ¶

func IsEndpointMissing(err error) bool
func RenderReport(w io.Writer, report *BenchmarkReport)
type BenchmarkReport
type MetricsStore
- func NewMetricsStore(root string) (*MetricsStore, error)
type OTLPGRPCExporter
- func NewOTLPGRPCExporter(runID string, cfg otlpgrpc.Config) (*OTLPGRPCExporter, error)
- func (e *OTLPGRPCExporter) EventObserver() func(store.Event)
- func (e *OTLPGRPCExporter) Stop(ctx context.Context) error
type PrometheusExporter
- func NewPrometheusExporter(runID string, registry *prometheus.Registry) *PrometheusExporter
type RecipeRun
type RunMetrics
- func CollectMetrics(s *store.RunStore, runID, recipeName string, evalPrimary string) (*RunMetrics, error)
type Runner
- func NewRunner(cfg RunnerConfig) (*Runner, error)
- func (r *Runner) Run(ctx context.Context, storeRoot string) (*BenchmarkReport, error)
type RunnerConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsEndpointMissing ¶

func IsEndpointMissing(err error) bool

IsEndpointMissing reports whether err signals the operator did not configure an OTLP endpoint. Convenience wrapper over errors.Is so CLI wiring stays a single line.

func RenderReport ¶

func RenderReport(w io.Writer, report *BenchmarkReport)

RenderReport writes a human-readable text table comparing recipe results.

Types ¶

type BenchmarkReport ¶

type BenchmarkReport struct {
	ID        string        `json:"id"`
	CreatedAt time.Time     `json:"created_at"`
	CaseLabel string        `json:"case_label"`
	Results   []*RunMetrics `json:"results"`
}

BenchmarkReport is a persisted comparison of multiple recipe runs.

type MetricsStore ¶

type MetricsStore struct {
	// contains filtered or unexported fields
}

MetricsStore persists benchmark reports to disk.

func NewMetricsStore ¶

func NewMetricsStore(root string) (*MetricsStore, error)

NewMetricsStore creates a MetricsStore rooted at the given directory.

func (*MetricsStore) ListReports ¶

func (ms *MetricsStore) ListReports() ([]string, error)

ListReports returns all benchmark report IDs, sorted alphabetically.

func (*MetricsStore) LoadReport ¶

func (ms *MetricsStore) LoadReport(id string) (*BenchmarkReport, error)

LoadReport reads a benchmark report by ID.

func (*MetricsStore) SaveReport ¶

func (ms *MetricsStore) SaveReport(report *BenchmarkReport) error

SaveReport persists a benchmark report to disk.

type OTLPGRPCExporter ¶

type OTLPGRPCExporter struct {
	// contains filtered or unexported fields
}

OTLPGRPCExporter wraps an upstream OTLP/gRPC exporter and translates iterion store events into apikit.TelemetryEvent analytics records. Every event in events.jsonl produces a corresponding analytics event on the OTLP wire so collectors can index per-event flow without needing to tail the JSONL files directly.

func NewOTLPGRPCExporter ¶

func NewOTLPGRPCExporter(runID string, cfg otlpgrpc.Config) (*OTLPGRPCExporter, error)

NewOTLPGRPCExporter constructs an exporter from the given config. It returns the typed otlpgrpc.ErrEndpointMissing when the endpoint is blank so callers can distinguish "user opted out" from "config invalid" using errors.Is.

func (*OTLPGRPCExporter) EventObserver ¶

func (e *OTLPGRPCExporter) EventObserver() func(store.Event)

EventObserver returns a callback compatible with runtime.WithEventObserver. Every store event is translated into an apikit.TelemetryEvent of analytics shape — namespace "iterion", action set to the event type — with run_id, node_id, branch_id, and the event's Data map flattened into Properties. Marshalling errors in the Data map are silently dropped: telemetry must never abort a run.

func (*OTLPGRPCExporter) Stop ¶

func (e *OTLPGRPCExporter) Stop(ctx context.Context) error

Stop drains the underlying batch processor and closes the gRPC client. Safe to call multiple times. A nil receiver is a no-op so callers can defer Stop unconditionally.

type PrometheusExporter ¶

type PrometheusExporter struct {
	// contains filtered or unexported fields
}

PrometheusExporter wires iterion's EventHooks and event emitter to Prometheus counters, histograms, and gauges. Construct one per run (so run_id labels stay attached) and chain it onto the existing store-backed observers via model.ChainHooks and (*PrometheusExporter). WrapEmitter.

func NewPrometheusExporter ¶

func NewPrometheusExporter(runID string, registry *prometheus.Registry) *PrometheusExporter

NewPrometheusExporter constructs an exporter and registers all metrics on the given registry. Pass `nil` to use a fresh prometheus.NewRegistry() (the default for /metrics endpoints that should not leak the global registry's other collectors).

func (*PrometheusExporter) EventHooks ¶

func (p *PrometheusExporter) EventHooks() model.EventHooks

EventHooks returns the model.EventHooks that drive the per-node counters (cost, tokens, requests, retries, tool calls, duration).

func (*PrometheusExporter) EventObserver ¶

func (p *PrometheusExporter) EventObserver() func(store.Event)

EventObserver returns a callback compatible with runtime.WithEventObserver. It updates the parallel-branches gauge based on branch_started / branch_finished events.

The callback runs synchronously on the engine's emit path, so we recover from any panic in the metrics layer to avoid taking down a running workflow because of an observability hiccup.

func (*PrometheusExporter) Handler ¶

func (p *PrometheusExporter) Handler() http.Handler

Handler returns an http.Handler serving the Prometheus text-exposition format for this exporter's metrics. Mount it at /metrics.

func (*PrometheusExporter) Registry ¶

func (p *PrometheusExporter) Registry() *prometheus.Registry

Registry exposes the prometheus.Registry so callers can layer extra collectors (or use a custom HTTP handler).

type RecipeRun ¶

type RecipeRun struct {
	Recipe  *recipe.RecipeSpec
	RunID   string
	Store   *store.RunStore
	Metrics *RunMetrics
	Err     error
}

RecipeRun holds the result of a single recipe's execution.

type RunMetrics ¶

type RunMetrics struct {
	RecipeName   string        `json:"recipe_name"`
	RunID        string        `json:"run_id"`
	Status       string        `json:"status"`
	Verdict      string        `json:"verdict"`
	TotalCostUSD float64       `json:"total_cost_usd"`
	TotalTokens  int           `json:"total_tokens"`
	Iterations   int           `json:"iterations"`
	Duration     time.Duration `json:"duration_ns"`
	DurationStr  string        `json:"duration"`
	Retries      int           `json:"retries"`
	ModelCalls   int           `json:"model_calls"`
}

RunMetrics holds the aggregated metrics for a single recipe run.

func CollectMetrics ¶

func CollectMetrics(s *store.RunStore, runID, recipeName string, evalPrimary string) (*RunMetrics, error)

CollectMetrics extracts aggregated metrics from a run's persisted events and run metadata.

type Runner ¶

type Runner struct {
	// contains filtered or unexported fields
}

Runner orchestrates multi-recipe benchmarks with isolated workspaces.

func NewRunner ¶

func NewRunner(cfg RunnerConfig) (*Runner, error)

NewRunner creates a benchmark runner from the given configuration.

func (*Runner) Run ¶

func (r *Runner) Run(ctx context.Context, storeRoot string) (*BenchmarkReport, error)

Run executes all recipes sequentially, each in its own isolated store, collects metrics, and returns a BenchmarkReport.

type RunnerConfig ¶

type RunnerConfig struct {
	// CaseLabel is a human-readable label for the benchmark case (e.g. PR title).
	CaseLabel string

	// Workflow is the base compiled workflow all recipes share.
	Workflow *ir.Workflow

	// Recipes to compare. At least two are required for a meaningful benchmark.
	Recipes []*recipe.RecipeSpec

	// Inputs are the common run-time inputs passed to every recipe.
	Inputs map[string]interface{}

	// ExecutorFactory creates a fresh NodeExecutor for each recipe run,
	// ensuring complete isolation (no shared caches, sessions, etc.).
	ExecutorFactory func() runtime.NodeExecutor
}

RunnerConfig configures a benchmark run.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL