telemetry

package
v0.3.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2026 License: MIT Imports: 23 Imported by: 0

Documentation

Overview

Package telemetry provides OpenTelemetry initialization and global accessor functions for traces, metrics, and logs.

Three pipelines are initialised independently via InitTracer, InitMeter, and InitLog. When no exporter is configured the SDK is still installed so that valid trace/span IDs are available for structured log correlation, while actual export is discarded (zero overhead).

Index

Constants

View Source
const (

	// AttrPodID identifies the sdk/pod runtime instance an operation
	// belongs to. Set by the pod controller; absent when the
	// operation runs outside any pod (e.g. a bare agent.Run).
	AttrPodID = "pod.id"

	// AttrAgentID identifies the agent (sdk/agent.Agent.ID) executing
	// the operation. Stable across runs of the same logical agent.
	AttrAgentID = "agent.id"

	// AttrTenantID identifies the tenant on whose behalf the
	// operation is running. Producers should populate this when a
	// tenant boundary is meaningful (multi-tenant SaaS deployments).
	AttrTenantID = "tenant.id"

	// AttrRunID identifies one engine.Run execution
	// (engine.Run.ID). Used as the routing key for engine event
	// envelopes (engine.run.<run_id>.*) and as the correlation
	// key in run-summary spans.
	AttrRunID = "run.id"

	// AttrParentRunID identifies the parent run when one engine.Run
	// dispatches another (multi-agent call chain). Empty for
	// top-level runs.
	AttrParentRunID = "parent.run.id"

	// AttrTaskID identifies the A2A-aligned task an operation
	// belongs to (sdk/agent.Request.TaskID, mirrored into
	// sdk/agent.Result.TaskID). Promoted by sdk/agent.Run into
	// engine.Run.Attributes so engines / nodes / observers can
	// recover it without reaching back through agent state.
	// Optional: empty when the upstream Request did not carry a
	// task identifier.
	AttrTaskID = "task.id"

	// AttrEngineKind identifies the concrete engine.Engine
	// implementation (graph runner, future script engine, remote
	// A2A bridge, ...). Producers SHOULD use a stable short token
	// like "graph", "script", "a2a-remote".
	AttrEngineKind = "engine.kind"

	// AttrRunStatus reports the terminal status of a run. Suggested
	// values: "ok" (clean completion), "interrupted" (cooperative
	// stop), "cancelled" (ctx cancellation), "failed" (any other
	// non-nil error). Consumers SHOULD treat unknown values as
	// "failed".
	AttrRunStatus = "run.status"

	// AttrGraphName identifies the graph definition (graph.GraphDefinition.Name)
	// being executed. Emitted by sdk/graph/runner; absent for
	// non-graph engines.
	AttrGraphName = "graph.name"

	// AttrNodeID identifies one graph node (graph.Node.ID) inside a
	// graph run. Emitted on per-node spans, metrics and log records.
	AttrNodeID = "node.id"

	// AttrActorID is the legacy spelling of AttrAgentID. The "actor"
	// terminology pre-dates the agent / step-actor distinction
	// settled in v0.4: the producer identity (envelope agent_id /
	// span attribute) and the engine.SubjectStep* "actor" segment
	// (graph runner: agent.id + ".node." + node id) are two
	// different dimensions, and the single "actor" name conflated
	// them.
	//
	// Deprecated: use [AttrAgentID]. Removed in v0.5.0.
	AttrActorID = "actor.id"

	// AttrToolName identifies the dispatched tool (tool.Tool.Name).
	// Emitted on tool dispatch spans / metrics.
	AttrToolName = "tool.name"

	// AttrToolCallID identifies a single tool invocation
	// (model.ToolCall.ID assigned by the LLM). Use to correlate the
	// tool_call event envelope with its tool_result.
	AttrToolCallID = "tool.call_id"

	// AttrLLMProvider identifies the LLM vendor / SDK family that
	// served a call ("openai", "anthropic", "bytedance", "ollama",
	// "azure", "deepseek", "minimax", "qwen", ...). The pod
	// controller filters/aggregates on this dimension to apply
	// per-provider rate limits, circuit breakers, and cost
	// tracking; producers MUST use the lowercase short token form
	// for cross-package join-ability.
	AttrLLMProvider = "llm.provider"

	// AttrLLMModel identifies the resolved LLM model name a call
	// targets. Emitted by sdk/llm dispatch spans and by the
	// run-summary span when usage is reported.
	AttrLLMModel = "llm.model"

	// AttrLLMInputTokens / AttrLLMOutputTokens / AttrLLMTotalTokens
	// mirror the model.TokenUsage fields. Producers MUST use these
	// exact keys when reporting LLM usage so dashboards can sum
	// across packages without per-package translation rules.
	AttrLLMInputTokens  = "llm.tokens.input"
	AttrLLMOutputTokens = "llm.tokens.output"
	AttrLLMTotalTokens  = "llm.tokens.total"

	// AttrLLMCachedInputTokens mirrors model.TokenUsage.CachedInputTokens —
	// the subset of input tokens served from the provider's prompt
	// cache. It is always a subset of AttrLLMInputTokens (enforced
	// by the adapter normalisation in sdkx/llm) so dashboards can
	// compute a uniform hit-rate as cached / input without
	// provider-specific branching. Producers MUST omit the
	// attribute when zero (no cache hit reported, or provider
	// does not expose a cache breakdown) to match the
	// model.TokenUsage `omitempty` wire convention and keep span
	// payloads slim on the common path.
	AttrLLMCachedInputTokens = "llm.tokens.input.cached"

	// AttrLLMCostMicros is the cost of the call in micro-units of
	// the configured currency (e.g. micro-USD = USD * 1_000_000).
	// Integer math avoids float drift in cumulative budgets. Zero
	// when the host has no pricing catalog configured.
	AttrLLMCostMicros = "llm.cost.micros"

	// AttrLLMLatencyMs is the wall-clock duration of the call in
	// milliseconds.
	AttrLLMLatencyMs = "llm.latency.ms"

	// AttrConversationID identifies the conversation an operation
	// belongs to. Shared by sdk/history (transcript / DAG / archive),
	// sdk/recall (long-term memory writes keyed by conversation),
	// sdk/kanban (when the kanban scope mirrors a conversation), and
	// the future sdk/pod controller (multi-agent pods that share a
	// conversation context). Producers MUST use this constant
	// instead of legacy snake_case "conversation_id" string literals
	// so dashboards can join across the four packages by a single
	// dimension.
	AttrConversationID = "conversation.id"

	// AttrDatasetID identifies a knowledge dataset. Emitted by
	// sdk/knowledge (rebuild / write / delete), the knowledgenode
	// graph node, and any retrieval span that targets one specific
	// dataset. Cross-package dimension; needed for "errors per
	// dataset" / "latency per dataset" splits in the dashboard.
	AttrDatasetID = "dataset.id"

	// AttrErrorMessage carries the human-readable error string on
	// log records and span events. Aligned with OTel semantic-
	// conventions `exception.message` semantically, but kept under
	// the shorter `error.message` key because flowcraft logs do not
	// otherwise emit the OTel exception-event shape (no
	// `exception.type` / `exception.stacktrace`); a single canonical
	// key for the message text is enough.
	//
	// Producers MUST use this constant rather than the legacy
	// "error" key so dashboards can filter by "error.message exists"
	// uniformly. The "error" key was used inconsistently across the
	// SDK (sometimes the message, sometimes a code) — switching to
	// a single canonical name makes the intent unambiguous.
	AttrErrorMessage = "error.message"

	// AttrKanbanCardID identifies one kanban Card (kanban.Card.ID).
	AttrKanbanCardID = "kanban.card.id"

	// AttrKanbanCardKind identifies the card kind ("task" / "signal" / ...).
	AttrKanbanCardKind = "kanban.card.kind"

	// AttrKanbanProducerID identifies the agent that produced a
	// card; mirrors kanban.WithProducerID.
	AttrKanbanProducerID = "kanban.producer.id"

	// AttrKanbanTargetAgentID identifies the consumer agent a task
	// card is targeted at.
	AttrKanbanTargetAgentID = "kanban.target.agent.id"
)
View Source
const (
	InstrumentationName = "flowcraft"
	ServiceName         = "flowcraft"
	ServiceVersion      = "0.1.0"
)

Variables

This section is empty.

Functions

func ConsoleProcessors added in v0.1.9

func ConsoleProcessors(min otellog.Severity) []sdklog.Processor

ConsoleProcessors returns the canonical stdout/stderr split sink: records in [min, Warn) go to stdout, records in [Warn, +∞) go to stderr — mirroring POSIX conventions.

Each side is a NewPlainTextExporter wrapped in sdklog.NewBatchProcessor for async batching and shutdown draining, then gated by NewSeverityFilter so OTel's Enabled protocol can short-circuit dropped records before formatting.

Pass the result spread into WithLogProcessor:

telemetry.InitLog(ctx,
    telemetry.WithLogProcessor(telemetry.ConsoleProcessors(otellog.SeverityInfo)...),
)

Each call returns a fresh slice of processors with their own batchers and exporters; do not share the returned processors across multiple LoggerProviders.

func Debug

func Debug(ctx context.Context, msg string, attrs ...otellog.KeyValue)

func Disable

func Disable()

Disable globally disables all convenience log functions (useful in tests).

func Enable

func Enable()

Enable re-enables convenience log functions.

func Error

func Error(ctx context.Context, msg string, attrs ...otellog.KeyValue)

func FormatPlainTextRecordLine added in v0.1.9

func FormatPlainTextRecordLine(record *sdklog.Record) []byte

FormatPlainTextRecordLine renders an OTel log record as a single line in the canonical plain-text format used by NewPlainTextExporter:

RFC3339Nano SEVERITY message k=v ...

Exposed so downstream sinks (file exporters, custom processors) can match the on-screen format.

func Info

func Info(ctx context.Context, msg string, attrs ...otellog.KeyValue)

func InitAll

func InitAll(ctx context.Context, opts ...Option) (shutdown func(context.Context) error, err error)

InitAll initializes tracing, metrics, and logging in one call. It returns a single shutdown function that tears down all three in reverse order.

func InitLog

func InitLog(ctx context.Context, opts ...LogOption) (func(context.Context) error, error)

InitLog initializes the OpenTelemetry LoggerProvider.

Sinks are exactly the WithLogProcessor entries supplied by the caller; pass them in any order, batching/filtering policy is theirs to control. If no processor is supplied a discardProcessor (noop) is installed so global log calls remain safe.

Typical usage:

telemetry.InitLog(ctx,
    telemetry.WithLogProcessor(telemetry.ConsoleProcessors(otellog.SeverityInfo)...),
)

To wire an OTLP / file / custom exporter, wrap it in the OTel BatchProcessor (or your own processor) and pass it via WithLogProcessor.

func InitMeter

func InitMeter(ctx context.Context, opts ...MeterOption) (func(context.Context) error, error)

InitMeter initializes the OpenTelemetry MeterProvider.

With an Exporter it creates a PeriodicReader for regular metric collection. Without one the provider is created with no reader (noop — instruments are valid but never exported).

func InitTracer

func InitTracer(ctx context.Context, opts ...TraceOption) (func(context.Context) error, error)

InitTracer initializes the OpenTelemetry TracerProvider.

With an Exporter the provider uses WithBatcher for async export. Without one it installs a real SDK provider backed by discardExporter (via WithSyncer to avoid background goroutine overhead) so that valid trace/span IDs are still generated for structured log correlation.

func Logger

func Logger(name string) otellog.Logger

Logger returns an OpenTelemetry Logger from the global LoggerProvider.

func Meter

func Meter() metric.Meter

Meter returns a meter scoped to the framework.

func MeterWithSuffix

func MeterWithSuffix(suffix string) metric.Meter

MeterWithSuffix returns a named sub-meter.

func NewPlainTextExporter added in v0.1.9

func NewPlainTextExporter(w io.Writer) sdklog.Exporter

NewPlainTextExporter returns an sdklog.Exporter that formats each record via FormatPlainTextRecordLine and writes it to w. All severities are written to the same w; for stdout/stderr splitting use ConsoleProcessors or compose two exporters with NewSeverityFilter.

The exporter writes synchronously per Export call but is intended to be wrapped in sdklog.NewBatchProcessor (which provides asynchronous batching, queueing, and shutdown draining). ConsoleProcessors and the internal default sink already do this wrapping.

A nil w is treated as io.Discard.

func NewSeverityFilter added in v0.1.9

func NewSeverityFilter(base sdklog.Processor, min, max otellog.Severity) sdklog.Processor

NewSeverityFilter wraps base with a severity gate. Records with severity < min, or severity >= max when max != 0, are dropped before reaching base.

Use max = 0 for "no upper bound" (the common case).

The filter implements OTel's Enabled protocol, so dropped records are never constructed in the first place — saving CPU and allocations across the entire pipeline (formatting, batching, exporting).

Returns a noop processor if base is nil.

func RecordRunSummary added in v0.2.3

func RecordRunSummary(ctx context.Context, summary RunSummary)

RecordRunSummary emits a short-lived "engine.run.summary" span summarising one engine.Run. The span is started and ended synchronously inside this call — its only job is to carry the summary attributes; no real work happens between Start and End.

Why a span and not a metric or log:

  • it inherits the active TraceID from ctx, so dashboards can drill from the per-run summary back to the per-step spans the engine emitted during execution without separate correlation logic;
  • exporters route it through the existing OTLP / file / stdout pipeline configured by InitTracer — no new sink to wire;
  • duration is a first-class span attribute, no extra attribute key needed.

When ctx carries no active TracerProvider this still creates a span against the global noop tracer — the call is a no-op but always safe.

RecordRunSummary is the lowest-level helper; sdk/agent and sdk/pod will likely wrap it with their own typed entry points (e.g. one that takes an agent.Result and pre-fills RunID / Status / Err).

func SetLoggerName

func SetLoggerName(name string)

SetLoggerName sets the scope name for the convenience log functions.

func Trace

func Trace(ctx context.Context, msg string, attrs ...otellog.KeyValue)

func Tracer

func Tracer() trace.Tracer

Tracer returns a tracer scoped to the framework.

func TracerWithSuffix

func TracerWithSuffix(suffix string) trace.Tracer

TracerWithSuffix returns a named sub-tracer (e.g. "flowcraft/store").

func Warn

func Warn(ctx context.Context, msg string, attrs ...otellog.KeyValue)

Types

type LogOption

type LogOption func(*logOptions)

LogOption configures InitLog behaviour.

func WithLogProcessor added in v0.1.9

func WithLogProcessor(p sdklog.Processor) LogOption

WithLogProcessor registers an OTel log processor. May be called multiple times to stack independent destinations (file, OTLP, custom routing).

This mirrors OTel's own sdklog.NewLoggerProvider(WithProcessor(...)) design and is the canonical way to attach log destinations.

func WithLogServiceName

func WithLogServiceName(name string) LogOption

func WithLogServiceVersion

func WithLogServiceVersion(version string) LogOption

func WithOTLPLogProcessor added in v0.2.9

func WithOTLPLogProcessor(cfg OTLPConfig) LogOption

WithOTLPLogProcessor is the log-side counterpart. Wraps an OTLP/HTTP log exporter in a batch processor and appends it to the LoggerProvider's processor list.

type MeterOption

type MeterOption func(*meterOptions)

MeterOption configures InitMeter behaviour.

func WithMeterExporter

func WithMeterExporter(exp sdkmetric.Exporter) MeterOption

func WithMeterServiceName

func WithMeterServiceName(name string) MeterOption

func WithMeterServiceVersion

func WithMeterServiceVersion(version string) MeterOption

func WithOTLPMeterExporter added in v0.2.9

func WithOTLPMeterExporter(cfg OTLPConfig) MeterOption

WithOTLPMeterExporter is the metric-side counterpart of WithOTLPTraceExporter. Sends OTLP/HTTP metric exports to the configured collector.

type OTLPConfig added in v0.2.9

type OTLPConfig struct {
	// Endpoint is the OTLP collector endpoint host[:port].
	//
	// Examples:
	//   - "otel-collector:4318"             (HTTP, default port)
	//   - "api.honeycomb.io"                 (managed, TLS)
	//   - "localhost:4318"                   (local dev)
	//
	// MUST NOT include a scheme — Insecure controls TLS. Path
	// suffixes are supported by the underlying exporter via
	// URLPath below.
	Endpoint string

	// URLPath optionally overrides the default OTLP HTTP path
	// (/v1/traces, /v1/metrics, /v1/logs). Empty = use defaults.
	URLPath string

	// Headers are sent on every export request. Use for managed
	// SaaS auth (e.g. Honeycomb's "x-honeycomb-team", Grafana
	// Cloud basic auth). Keys are case-insensitive per HTTP.
	Headers map[string]string

	// Insecure disables TLS. Use only for in-cluster collectors.
	// Default false (TLS on).
	Insecure bool
}

OTLPConfig is the small wire-protocol-agnostic surface shared by the three WithOTLP* shortcuts in this file. It captures the 90% case (HTTP, optional headers, optional auth) without exposing the full transport-specific option zoo of the underlying OTel exporter packages — callers that need finer control should use WithExporter / WithMeterExporter / WithLogProcessor directly with a self-built exporter.

type Option

type Option func(*initAllOpts)

func LoggerOpts

func LoggerOpts(opts ...LogOption) Option

func MeterOpts

func MeterOpts(opts ...MeterOption) Option

func TracerOpts

func TracerOpts(opts ...TraceOption) Option

type RunSummary added in v0.2.3

type RunSummary struct {
	// RunID is the engine.Run.ID (or any other stable per-execution
	// identifier the producer uses). Omitted from span attributes
	// when empty.
	RunID string

	// ParentRunID, when non-empty, identifies the calling run in a
	// multi-agent dispatch chain.
	ParentRunID string

	// AgentID identifies the agent that owned the run (sdk/agent.Agent.ID).
	AgentID string

	// PodID identifies the sdk/pod runtime instance, when applicable.
	PodID string

	// EngineKind is a short stable token for the executing engine
	// implementation ("graph" / "script" / "a2a-remote" / ...).
	EngineKind string

	// Status reports the terminal outcome. Recommended values are
	// the same as documented on AttrRunStatus: "ok", "interrupted",
	// "cancelled", "failed". Empty defaults to "ok".
	Status string

	// Err is the error returned by Engine.Execute, if any. When
	// non-nil the span status is set to codes.Error and Err.Error()
	// is recorded; Status defaults to "failed" if not set
	// explicitly by the caller.
	Err error

	// StartedAt / EndedAt bracket the execution wall clock. When
	// EndedAt is zero it defaults to time.Now(); when StartedAt is
	// zero the resulting span has zero duration but is still
	// emitted.
	StartedAt time.Time
	EndedAt   time.Time

	// LLMModel, when non-empty, identifies the dominant model used
	// in the run (typically the only one). Producers handling
	// multi-model runs SHOULD emit one summary per model or pick
	// the most-used one.
	LLMModel string

	// Token / cost / latency totals, mirroring the AttrLLM* keys.
	// Zero values are omitted from the span.
	InputTokens  int64
	OutputTokens int64
	TotalTokens  int64
	CostMicros   int64

	// CachedInputTokens, when > 0, mirrors
	// model.TokenUsage.CachedInputTokens summed across the run —
	// the subset of InputTokens served from the provider's prompt
	// cache. Always <= InputTokens (enforced by the per-call
	// adapter normalisation in sdkx/llm). Producers that aggregate
	// per-call usage SHOULD sum this field via TokenUsage.Add so
	// the run-summary span exposes a uniform cache hit-rate
	// (CachedInputTokens / InputTokens) without dashboards needing
	// to drill into per-call spans. Omitted from span attributes
	// when zero.
	CachedInputTokens int64

	// Extra carries additional caller-supplied attributes that
	// should land on the same span (tenant id, custom dimensions,
	// …). Use the Attr* constants when applicable.
	Extra []attribute.KeyValue
}

RunSummary captures the outcome of one engine.Run for telemetry purposes. It is intentionally engine-neutral and OTel-SDK-shaped (no dependency on sdk/model or sdk/engine) so this helper can stay in the leaf telemetry package.

All fields are optional. RecordRunSummary fills sensible defaults (Status="ok" when empty, Duration computed from StartedAt when EndedAt is zero, …) and tolerates a fully zero value (it just records the bare minimum span).

type TraceOption

type TraceOption func(*options)

TraceOption configures InitTracer behaviour.

func WithExporter

func WithExporter(exp sdktrace.SpanExporter) TraceOption

func WithOTLPTraceExporter added in v0.2.9

func WithOTLPTraceExporter(cfg OTLPConfig) TraceOption

WithOTLPTraceExporter is the lazy alternative to manually constructing an otlptracehttp.New exporter and feeding it to WithExporter. It returns a TraceOption that installs an OTLP/HTTP trace exporter pointing at the configured endpoint.

Use it via InitTracer or InitAll:

shutdown, err := telemetry.InitAll(ctx,
    telemetry.TracerOpts(
        telemetry.WithOTLPTraceExporter(telemetry.OTLPConfig{
            Endpoint: "otel-collector:4318",
            Insecure: true,
        }),
    ),
)

Construction errors (bad endpoint format, etc.) surface from InitTracer rather than from this helper.

func WithServiceName

func WithServiceName(name string) TraceOption

func WithServiceVersion

func WithServiceVersion(version string) TraceOption

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL