agent-sdk-observability
A small, opinionated observability module shared by the ethPandaOps agent SDKs
(ethpandaops/claude-agent-sdk-go,
ethpandaops/codex-agent-sdk-go,
ethpandaops/openrouter-agent-sdk-go,
ethpandaops/vllm-agent-sdk-go)
and the applications that consume them.
What's in here: the pieces that do not exist upstream in OpenTelemetry —
a pluggable error classifier, an opinionated histogram aggregation selector,
a tracer wrapper that pairs the two, a one-line Prometheus bridge, and
free-function span-attribute constructors for the GenAI and HTTP namespaces.
What isn't in here: OTel metric-instrument wrappers. Upstream already
packages GenAI and HTTP instruments as typed structs
(genaiconv.NewClientOperationDuration, etc.) — use those directly. Span
attributes are a gap upstream does not fill, which is why this module
provides semconv/genaiconv and semconv/httpconv. The openai.*
extensions are also maintained here temporarily, only until upstream ships
an openaiconv sub-package.
Packages
| Package |
Purpose |
errclass |
Pluggable sentinel-based classifier that produces a closed-set error.type label. Libraries register their own mappings. |
tracer |
Thin wrapper over trace.Tracer whose RecordError call classifies via errclass, sets error.type, and marks span status in one step. Noop-safe default. |
histograms |
Opinionated base-2 exponential-histogram aggregation selector (bucket factor ≈ 1.09). Plugs into the OTel Prometheus exporter. |
promexporter |
One-liner that bridges an OTel MeterProvider into a prometheus.Registerer with trace-based exemplars and the histogram selector. |
testkit |
In-memory harnesses for metric and trace assertions in unit tests. |
semconv/genaiconv |
Span-attribute constructors for the GenAI namespace. Lifts upstream's typed enums into attribute.KeyValue. |
semconv/httpconv |
Span-attribute constructors for the HTTP namespace. Same rationale as genaiconv. |
semconv/openaiconv |
Temporary. The openai.* attribute extensions from the GenAI spec. Deletes once upstream ships openaiconv. |
Design in one paragraph
SDKs use upstream OpenTelemetry for metric instruments (via upstream's
instrument structs such as genaiconv.NewClientOperationDuration) and use
this module's semconv/genaiconv and semconv/httpconv for span attributes.
Upstream packages attribute helpers as methods on instrument structs, which
covers metrics well but leaves span-level instrumentation with raw
attribute.String(key, val) calls. This module's semconv sub-packages close
that gap. SDKs additionally import errclass to classify their errors and
tracer to wire that classification into spans. Applications that surface
metrics to Prometheus use promexporter to obtain a MeterProvider;
applications that ship traces over OTLP set that up with the OTel SDK
directly. Everything defaults to noop so observability is fully opt-in.
Quickstart — inside an SDK
package mysdk
import (
"context"
"errors"
"time"
"go.opentelemetry.io/otel/metric"
noopmetric "go.opentelemetry.io/otel/metric/noop"
"go.opentelemetry.io/otel/trace"
upstreamgenai "go.opentelemetry.io/otel/semconv/v1.40.0/genaiconv"
"github.com/ethpandaops/agent-sdk-observability/errclass"
"github.com/ethpandaops/agent-sdk-observability/semconv/genaiconv"
"github.com/ethpandaops/agent-sdk-observability/tracer"
)
const scopeName = "github.com/ethpandaops/mysdk"
var (
ErrRateLimited = errors.New("rate limited")
ErrUnauthorized = errors.New("unauthorized")
)
type Client struct {
opDuration upstreamgenai.ClientOperationDuration
tokenUsage upstreamgenai.ClientTokenUsage
tracer *tracer.Recorder
}
func New(mp metric.MeterProvider, tp trace.TracerProvider, version string) (*Client, error) {
if mp == nil {
mp = noopmetric.NewMeterProvider()
}
classes := errclass.New()
classes.RegisterDefaults() // context.Canceled -> Canceled, context.DeadlineExceeded -> Timeout
classes.RegisterSentinel(ErrRateLimited, errclass.RateLimited)
classes.RegisterSentinel(ErrUnauthorized, errclass.Auth)
meter := mp.Meter(scopeName, metric.WithInstrumentationVersion(version))
// Metrics: upstream instrument structs own name/unit/description.
opDuration, err := upstreamgenai.NewClientOperationDuration(meter)
if err != nil {
return nil, err
}
tokenUsage, err := upstreamgenai.NewClientTokenUsage(meter)
if err != nil {
return nil, err
}
return &Client{
opDuration: opDuration,
tokenUsage: tokenUsage,
tracer: tracer.New(tp, scopeName, version, classes),
}, nil
}
func (c *Client) Chat(ctx context.Context, model string) error {
// Span attributes: thin genaiconv lifts upstream typed enums into
// attribute.KeyValue values.
ctx, span := c.tracer.Start(ctx, "chat", trace.SpanKindClient,
genaiconv.OperationName(upstreamgenai.OperationNameChat),
genaiconv.ProviderName(upstreamgenai.ProviderNameAnthropic),
genaiconv.RequestModel(model),
)
defer span.End()
start := time.Now()
err := c.doCall(ctx)
var class errclass.Class
if err != nil {
class = span.RecordError(err) // classify + record + set status + return class
}
// Metric: upstream's Record takes operation name AND provider name
// positionally (both are required-by-signature); optional attributes
// are instrument-scoped methods (AttrRequestModel, AttrErrorType).
c.opDuration.Record(ctx, time.Since(start).Seconds(),
upstreamgenai.OperationNameChat,
upstreamgenai.ProviderNameAnthropic,
c.opDuration.AttrRequestModel(model),
c.opDuration.AttrErrorType(upstreamgenai.ErrorTypeAttr(class)),
)
return err
}
Key points:
- SDKs depend on the OTel API (
otel/metric, otel/trace) plus upstream
semconv (const strings and typed enums, no runtime cost). No OTel SDK
dependency.
- Both providers default to noop — pass
nil or omit and the SDK emits
nothing.
span.RecordError(err) returns the classified Class; reuse it for metric
labels to avoid classifying twice.
- For span attributes, use
semconv/genaiconv and semconv/httpconv in this
module. For metric instruments, use upstream's genaiconv / httpconv
instrument structs (NewClientOperationDuration, etc.).
Quickstart — inside the application
Prometheus
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/ethpandaops/agent-sdk-observability/promexporter"
)
reg := prometheus.NewRegistry()
mp, err := promexporter.NewMeterProvider(reg)
if err != nil {
return err
}
client := mysdk.New(mp, tp, version)
http.Handle("/metrics", promhttp.HandlerFor(reg, promhttp.HandlerOpts{
EnableOpenMetrics: true, // required for exemplar serialization
}))
Metrics land on reg alongside any native client_golang collectors.
Histograms are base-2 exponential with bucket factor ≈ 1.09; the OpenMetrics
content type serializes them as Prometheus native histograms.
OTLP traces
Wire OTLP directly with the OTel SDK: one otlptracegrpc.New plus a
sdktrace.NewTracerProvider with the chosen sampler and batch span processor.
No wrapper is provided here — the OTel SDK constructors are already the
minimal API.
Design rationale
Closed-set error.type
errclass exists because OTel's error.type semantic convention needs a
bounded set of label values. Free-form error strings blow up cardinality on
every Prometheus scrape. The standard classes cover the universal cases:
timeout, canceled, invalid_request, rate_limited, auth,
permission_denied, upstream_5xx, network, unknown
Each SDK registers sentinel mappings for its own errors. First matcher wins,
so register specific matchers before general ones. RegisterDefaults()
wires context.DeadlineExceeded → Timeout and context.Canceled → Canceled;
call it once per registry.
Consuming SDKs may define their own Class constants when genuinely
warranted. Keep cardinality bounded. Unknown climbing on a dashboard means
sentinel coverage is lagging.
Exponential histograms, factor ≈ 1.09
The histograms package hard-codes MaxScale=3, MaxSize=160, which yields
a bucket growth factor of 2^(1/8) ≈ 1.0905 per bucket with a 160-bucket
memory cap per series. MaxScale=3 caps at factor 1.09; MaxScale=8 (the OTel
default) gives 1.003, which is overkill for latency and token observations
and wastes memory. The selector plugs into the Prometheus exporter and falls
back to OTel defaults for non-histogram instruments.
Trace-based exemplars
promexporter uses exemplar.TraceBasedFilter. Only samples that are part
of recorded traces carry exemplars, so the metric-to-trace link in Grafana
stays honest: a latency-spike exemplar always resolves to an actual trace,
not an arbitrary one.
Thin tracer wrapper, no meter wrapper
tracer.Recorder exists because RecordError(err) doing
classify + span.RecordError + SetAttributes(error.type=…) +
SetStatus(codes.Error, class) as one call is a real ergonomic win. That
pattern shows up at every failure site in every SDK.
A meter wrapper was considered and rejected. The only value would have
been noop-safety (handled by otel/metric/noop) and eliding
metric.WithAttributes(...) (a familiar OTel idiom).
No exporter wrappers for OTLP
OTLP trace and OTLP metric exporters have nothing opinionated to add beyond
the OTel SDK's own constructors. A wrapper would be a thin shim that grows
stale whenever OTel adds new knobs.
Span attributes vs metric instruments
Upstream go.opentelemetry.io/otel/semconv/v1.40.0/genaiconv (and its HTTP
sibling) packages attributes as methods on instrument structs — great for
metrics, useless for spans. Rather than force every SDK to write
attribute.String("gen_ai.request.model", m) at every span site, this
module provides free-function constructors (genaiconv.RequestModel(m)).
Upstream-typed enums lift cleanly via helpers that take the typed value
(genaiconv.OperationName(upstream.OperationNameChat)). No metric-instrument
machinery is duplicated — use upstream directly for that.
openaiconv is temporary
semconv/openaiconv is the only semconv package maintained locally against
a spec that upstream has not packaged yet. The OpenAI extensions are in the
spec but the
OTel Go tree has not shipped a corresponding sub-package. This directory
deletes when upstream ships it.
Cardinality discipline
| Safe as labels |
Never as labels |
gen_ai.provider.name |
raw error messages |
gen_ai.operation.name |
request IDs, trace IDs |
gen_ai.request.model |
user IDs |
error.type (closed set) |
URLs with path parameters |
http.request.method |
prompt / response text |
http.response.status_code |
anything unbounded |
Unbounded values belong on spans, not in metric labels.
Testing
import "github.com/ethpandaops/agent-sdk-observability/testkit"
func TestSDKEmitsMetrics(t *testing.T) {
metricsH := testkit.NewMetricsHarness()
tracesH := testkit.NewTracesHarness()
client, _ := mysdk.New(metricsH.Provider(), tracesH.Provider(), "v0-test")
_ = client.Chat(ctx, "claude-3-5-sonnet")
names, _ := metricsH.MetricNames(ctx)
// assert "gen_ai.client.operation.duration" in names
points, _ := metricsH.HistogramPoints(ctx, "gen_ai.client.operation.duration")
// inspect points[].Attributes and Count
for _, s := range tracesH.Summaries() {
// s.Name, s.Attributes, s.Events, s.TraceID
}
}
testkit wraps sdkmetric.ManualReader and sdktrace.SpanRecorder with a
narrow set of assertion helpers (MetricNames, Int64Points,
HistogramPoints, Summaries). Drop down to OTel's own test types when
richer assertions are needed.
Stability
- The OpenTelemetry GenAI semantic conventions
are in "Development" status. Attribute and metric names may shift before
they stabilize. Pin an upstream semconv version and upgrade intentionally.
- This module's surface is intentionally small. The goal is to shrink over
time, not grow.
openaiconv deletes when upstream ships it, and any piece
that OTel eventually provides natively gets deleted here too.
Contributing
Before adding a package, ask:
- Does OTel already provide this? If yes, use OTel.
- Would every SDK write this boilerplate otherwise? If no, skip.
- Is it opinion (histogram tuning, error classes, exemplar filter) or glue
for a genuine upstream gap? If neither, reconsider.
Licence
See LICENSE.