agent-sdk-observability

module
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 16, 2026 License: GPL-3.0

README

agent-sdk-observability

A small, opinionated observability module shared by the ethPandaOps agent SDKs (ethpandaops/claude-agent-sdk-go, ethpandaops/codex-agent-sdk-go, ethpandaops/openrouter-agent-sdk-go, ethpandaops/vllm-agent-sdk-go) and the applications that consume them.

What's in here: the pieces that do not exist upstream in OpenTelemetry — a pluggable error classifier, an opinionated histogram aggregation selector, a tracer wrapper that pairs the two, a one-line Prometheus bridge, and free-function span-attribute constructors for the GenAI and HTTP namespaces.

What isn't in here: OTel metric-instrument wrappers. Upstream already packages GenAI and HTTP instruments as typed structs (genaiconv.NewClientOperationDuration, etc.) — use those directly. Span attributes are a gap upstream does not fill, which is why this module provides semconv/genaiconv and semconv/httpconv. The openai.* extensions are also maintained here temporarily, only until upstream ships an openaiconv sub-package.

Packages

Package Purpose
errclass Pluggable sentinel-based classifier that produces a closed-set error.type label. Libraries register their own mappings.
tracer Thin wrapper over trace.Tracer whose RecordError call classifies via errclass, sets error.type, and marks span status in one step. Noop-safe default.
histograms Opinionated base-2 exponential-histogram aggregation selector (bucket factor ≈ 1.09). Plugs into the OTel Prometheus exporter.
promexporter One-liner that bridges an OTel MeterProvider into a prometheus.Registerer with trace-based exemplars and the histogram selector.
testkit In-memory harnesses for metric and trace assertions in unit tests.
semconv/genaiconv Span-attribute constructors for the GenAI namespace. Lifts upstream's typed enums into attribute.KeyValue.
semconv/httpconv Span-attribute constructors for the HTTP namespace. Same rationale as genaiconv.
semconv/openaiconv Temporary. The openai.* attribute extensions from the GenAI spec. Deletes once upstream ships openaiconv.

Design in one paragraph

SDKs use upstream OpenTelemetry for metric instruments (via upstream's instrument structs such as genaiconv.NewClientOperationDuration) and use this module's semconv/genaiconv and semconv/httpconv for span attributes. Upstream packages attribute helpers as methods on instrument structs, which covers metrics well but leaves span-level instrumentation with raw attribute.String(key, val) calls. This module's semconv sub-packages close that gap. SDKs additionally import errclass to classify their errors and tracer to wire that classification into spans. Applications that surface metrics to Prometheus use promexporter to obtain a MeterProvider; applications that ship traces over OTLP set that up with the OTel SDK directly. Everything defaults to noop so observability is fully opt-in.

Quickstart — inside an SDK

package mysdk

import (
    "context"
    "errors"
    "time"

    "go.opentelemetry.io/otel/metric"
    noopmetric "go.opentelemetry.io/otel/metric/noop"
    "go.opentelemetry.io/otel/trace"
    upstreamgenai "go.opentelemetry.io/otel/semconv/v1.40.0/genaiconv"

    "github.com/ethpandaops/agent-sdk-observability/errclass"
    "github.com/ethpandaops/agent-sdk-observability/semconv/genaiconv"
    "github.com/ethpandaops/agent-sdk-observability/tracer"
)

const scopeName = "github.com/ethpandaops/mysdk"

var (
    ErrRateLimited  = errors.New("rate limited")
    ErrUnauthorized = errors.New("unauthorized")
)

type Client struct {
    opDuration upstreamgenai.ClientOperationDuration
    tokenUsage upstreamgenai.ClientTokenUsage
    tracer     *tracer.Recorder
}

func New(mp metric.MeterProvider, tp trace.TracerProvider, version string) (*Client, error) {
    if mp == nil {
        mp = noopmetric.NewMeterProvider()
    }

    classes := errclass.New()
    classes.RegisterDefaults() // context.Canceled -> Canceled, context.DeadlineExceeded -> Timeout
    classes.RegisterSentinel(ErrRateLimited, errclass.RateLimited)
    classes.RegisterSentinel(ErrUnauthorized, errclass.Auth)

    meter := mp.Meter(scopeName, metric.WithInstrumentationVersion(version))

    // Metrics: upstream instrument structs own name/unit/description.
    opDuration, err := upstreamgenai.NewClientOperationDuration(meter)
    if err != nil {
        return nil, err
    }
    tokenUsage, err := upstreamgenai.NewClientTokenUsage(meter)
    if err != nil {
        return nil, err
    }

    return &Client{
        opDuration: opDuration,
        tokenUsage: tokenUsage,
        tracer:     tracer.New(tp, scopeName, version, classes),
    }, nil
}

func (c *Client) Chat(ctx context.Context, model string) error {
    // Span attributes: thin genaiconv lifts upstream typed enums into
    // attribute.KeyValue values.
    ctx, span := c.tracer.Start(ctx, "chat", trace.SpanKindClient,
        genaiconv.OperationName(upstreamgenai.OperationNameChat),
        genaiconv.ProviderName(upstreamgenai.ProviderNameAnthropic),
        genaiconv.RequestModel(model),
    )
    defer span.End()

    start := time.Now()
    err := c.doCall(ctx)

    var class errclass.Class
    if err != nil {
        class = span.RecordError(err) // classify + record + set status + return class
    }

    // Metric: upstream's Record takes operation name AND provider name
    // positionally (both are required-by-signature); optional attributes
    // are instrument-scoped methods (AttrRequestModel, AttrErrorType).
    c.opDuration.Record(ctx, time.Since(start).Seconds(),
        upstreamgenai.OperationNameChat,
        upstreamgenai.ProviderNameAnthropic,
        c.opDuration.AttrRequestModel(model),
        c.opDuration.AttrErrorType(upstreamgenai.ErrorTypeAttr(class)),
    )

    return err
}

Key points:

  • SDKs depend on the OTel API (otel/metric, otel/trace) plus upstream semconv (const strings and typed enums, no runtime cost). No OTel SDK dependency.
  • Both providers default to noop — pass nil or omit and the SDK emits nothing.
  • span.RecordError(err) returns the classified Class; reuse it for metric labels to avoid classifying twice.
  • For span attributes, use semconv/genaiconv and semconv/httpconv in this module. For metric instruments, use upstream's genaiconv / httpconv instrument structs (NewClientOperationDuration, etc.).

Quickstart — inside the application

Prometheus
import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"

    "github.com/ethpandaops/agent-sdk-observability/promexporter"
)

reg := prometheus.NewRegistry()
mp, err := promexporter.NewMeterProvider(reg)
if err != nil {
    return err
}

client := mysdk.New(mp, tp, version)

http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{
    EnableOpenMetrics: true, // required for exemplar serialization
}))

Metrics land on reg alongside any native client_golang collectors. Histograms are base-2 exponential with bucket factor ≈ 1.09; the OpenMetrics content type serializes them as Prometheus native histograms.

OTLP traces

Wire OTLP directly with the OTel SDK: one otlptracegrpc.New plus a sdktrace.NewTracerProvider with the chosen sampler and batch span processor. No wrapper is provided here — the OTel SDK constructors are already the minimal API.

Design rationale

Closed-set error.type

errclass exists because OTel's error.type semantic convention needs a bounded set of label values. Free-form error strings blow up cardinality on every Prometheus scrape. The standard classes cover the universal cases:

timeout, canceled, invalid_request, rate_limited, auth,
permission_denied, upstream_5xx, network, unknown

Each SDK registers sentinel mappings for its own errors. First matcher wins, so register specific matchers before general ones. RegisterDefaults() wires context.DeadlineExceeded → Timeout and context.Canceled → Canceled; call it once per registry.

Consuming SDKs may define their own Class constants when genuinely warranted. Keep cardinality bounded. Unknown climbing on a dashboard means sentinel coverage is lagging.

Exponential histograms, factor ≈ 1.09

The histograms package hard-codes MaxScale=3, MaxSize=160, which yields a bucket growth factor of 2^(1/8) ≈ 1.0905 per bucket with a 160-bucket memory cap per series. MaxScale=3 caps at factor 1.09; MaxScale=8 (the OTel default) gives 1.003, which is overkill for latency and token observations and wastes memory. The selector plugs into the Prometheus exporter and falls back to OTel defaults for non-histogram instruments.

Trace-based exemplars

promexporter uses exemplar.TraceBasedFilter. Only samples that are part of recorded traces carry exemplars, so the metric-to-trace link in Grafana stays honest: a latency-spike exemplar always resolves to an actual trace, not an arbitrary one.

Thin tracer wrapper, no meter wrapper

tracer.Recorder exists because RecordError(err) doing classify + span.RecordError + SetAttributes(error.type=…) + SetStatus(codes.Error, class) as one call is a real ergonomic win. That pattern shows up at every failure site in every SDK.

A meter wrapper was considered and rejected. The only value would have been noop-safety (handled by otel/metric/noop) and eliding metric.WithAttributes(...) (a familiar OTel idiom).

No exporter wrappers for OTLP

OTLP trace and OTLP metric exporters have nothing opinionated to add beyond the OTel SDK's own constructors. A wrapper would be a thin shim that grows stale whenever OTel adds new knobs.

Span attributes vs metric instruments

Upstream go.opentelemetry.io/otel/semconv/v1.40.0/genaiconv (and its HTTP sibling) packages attributes as methods on instrument structs — great for metrics, useless for spans. Rather than force every SDK to write attribute.String("gen_ai.request.model", m) at every span site, this module provides free-function constructors (genaiconv.RequestModel(m)). Upstream-typed enums lift cleanly via helpers that take the typed value (genaiconv.OperationName(upstream.OperationNameChat)). No metric-instrument machinery is duplicated — use upstream directly for that.

openaiconv is temporary

semconv/openaiconv is the only semconv package maintained locally against a spec that upstream has not packaged yet. The OpenAI extensions are in the spec but the OTel Go tree has not shipped a corresponding sub-package. This directory deletes when upstream ships it.

Cardinality discipline

Safe as labels Never as labels
gen_ai.provider.name raw error messages
gen_ai.operation.name request IDs, trace IDs
gen_ai.request.model user IDs
error.type (closed set) URLs with path parameters
http.request.method prompt / response text
http.response.status_code anything unbounded

Unbounded values belong on spans, not in metric labels.

Testing

import "github.com/ethpandaops/agent-sdk-observability/testkit"

func TestSDKEmitsMetrics(t *testing.T) {
    metricsH := testkit.NewMetricsHarness()
    tracesH := testkit.NewTracesHarness()

    client, _ := mysdk.New(metricsH.Provider(), tracesH.Provider(), "v0-test")
    _ = client.Chat(ctx, "claude-3-5-sonnet")

    names, _ := metricsH.MetricNames(ctx)
    // assert "gen_ai.client.operation.duration" in names

    points, _ := metricsH.HistogramPoints(ctx, "gen_ai.client.operation.duration")
    // inspect points[].Attributes and Count

    for _, s := range tracesH.Summaries() {
        // s.Name, s.Attributes, s.Events, s.TraceID
    }
}

testkit wraps sdkmetric.ManualReader and sdktrace.SpanRecorder with a narrow set of assertion helpers (MetricNames, Int64Points, HistogramPoints, Summaries). Drop down to OTel's own test types when richer assertions are needed.

Stability

  • The OpenTelemetry GenAI semantic conventions are in "Development" status. Attribute and metric names may shift before they stabilize. Pin an upstream semconv version and upgrade intentionally.
  • This module's surface is intentionally small. The goal is to shrink over time, not grow. openaiconv deletes when upstream ships it, and any piece that OTel eventually provides natively gets deleted here too.

Contributing

Before adding a package, ask:

  1. Does OTel already provide this? If yes, use OTel.
  2. Would every SDK write this boilerplate otherwise? If no, skip.
  3. Is it opinion (histogram tuning, error classes, exemplar filter) or glue for a genuine upstream gap? If neither, reconsider.

Licence

See LICENSE.

Directories

Path Synopsis
Package errclass implements a pluggable classifier that maps errors to a closed set of OpenTelemetry-compatible error.type labels.
Package errclass implements a pluggable classifier that maps errors to a closed set of OpenTelemetry-compatible error.type labels.
Package histograms holds the opinionated default aggregation used across the ethPandaOps agent SDKs.
Package histograms holds the opinionated default aggregation used across the ethPandaOps agent SDKs.
Package promexporter is the one-liner bridge from an OTel MeterProvider into a prometheus.Registerer.
Package promexporter is the one-liner bridge from an OTel MeterProvider into a prometheus.Registerer.
semconv
genaiconv
Package genaiconv exposes free-function constructors for OpenTelemetry GenAI span attributes.
Package genaiconv exposes free-function constructors for OpenTelemetry GenAI span attributes.
httpconv
Package httpconv exposes free-function constructors for OpenTelemetry HTTP span attributes.
Package httpconv exposes free-function constructors for OpenTelemetry HTTP span attributes.
openaiconv
Package openaiconv exposes the OpenAI extensions to the OpenTelemetry GenAI semantic conventions.
Package openaiconv exposes the OpenAI extensions to the OpenTelemetry GenAI semantic conventions.
Package testkit provides in-memory OTel harnesses for metric and trace assertions in unit tests.
Package testkit provides in-memory OTel harnesses for metric and trace assertions in unit tests.
Package tracer wraps an OpenTelemetry Tracer with a noop default and an errclass Registry so a single RecordError call can classify, record, and set span status.
Package tracer wraps an OpenTelemetry Tracer with a noop default and an errclass Registry so a single RecordError call can classify, record, and set span status.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL