metrics

package
v0.4.0-rc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 5, 2025 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewMetricsFromEnv added in v0.4.0

func NewMetricsFromEnv(ctx context.Context, stdout io.Writer, promReader sdkmetric.Reader) (metric.Meter, func(context.Context) error, error)

NewMetricsFromEnv configures an OpenTelemetry MeterProvider based on environment variables, always incorporating the provided Prometheus reader. It optionally includes additional exporters (e.g., console or OTLP) if enabled via environment variables. The function returns a metric.Meter for instrumentation and a shutdown function to gracefully close the provider.

The stdout parameter directs output for the console exporter (use os.Stdout in production). Environment variables checked directly include:

  • OTEL_SDK_DISABLED: If "true", disables OTEL exporters.
  • OTEL_METRICS_EXPORTER: Supported values are "none", "console", "prometheus", "otlp".
  • OTEL_EXPORTER_OTLP_ENDPOINT or OTEL_EXPORTER_OTLP_METRICS_ENDPOINT: Enables OTLP if set.

Prometheus is always enabled via the provided promReader; other exporters are added conditionally.

Types

type ChatCompletionMetrics added in v0.3.0

type ChatCompletionMetrics interface {
	// StartRequest initializes timing for a new request.
	StartRequest(headers map[string]string)
	// SetOriginalModel sets the original model from the incoming request body before any virtualization applies.
	// This is usually called after parsing the request body. Example: gpt-5
	SetOriginalModel(originalModel internalapi.OriginalModel)
	// SetRequestModel sets the model from the request. This is usually called after parsing the request body.
	// Example: gpt-5-nano
	SetRequestModel(requestModel internalapi.RequestModel)
	// SetResponseModel sets the model that ultimately generated the response.
	// Example: gpt-5-nano-2025-08-07
	SetResponseModel(responseModel internalapi.ResponseModel)
	// SetBackend sets the selected backend when the routing decision has been made. This is usually called
	// after parsing the request body to determine the model and invoke the routing logic.
	SetBackend(backend *filterapi.Backend)

	// RecordTokenUsage records token usage metrics.
	RecordTokenUsage(ctx context.Context, inputTokens, cachedInputTokens, outputTokens uint32, requestHeaderLabelMapping map[string]string)
	// RecordRequestCompletion records latency metrics for the entire request.
	RecordRequestCompletion(ctx context.Context, success bool, requestHeaderLabelMapping map[string]string)
	// RecordTokenLatency records latency metrics for token generation.
	RecordTokenLatency(ctx context.Context, tokens uint32, endOfStream bool, requestHeaderLabelMapping map[string]string)
	// GetTimeToFirstTokenMs returns the time to first token in stream mode in milliseconds.
	GetTimeToFirstTokenMs() float64
	// GetInterTokenLatencyMs returns the inter token latency in stream mode in milliseconds.
	GetInterTokenLatencyMs() float64
}

ChatCompletionMetrics is the interface for the chat completion AI Gateway metrics.

type ChatCompletionMetricsFactory added in v0.4.0

type ChatCompletionMetricsFactory func() ChatCompletionMetrics

ChatCompletionMetricsFactory is a closure that creates a new ChatCompletionMetrics instance.

func NewChatCompletionFactory added in v0.4.0

func NewChatCompletionFactory(meter metric.Meter, requestHeaderLabelMapping map[string]string) ChatCompletionMetricsFactory

NewChatCompletionFactory returns a closure to create a new ChatCompletionMetrics instance.

type CompletionMetrics added in v0.4.0

type CompletionMetrics interface {
	// StartRequest initializes timing for a new request.
	StartRequest(headers map[string]string)
	// SetOriginalModel sets the original model from the incoming request body before any virtualization applies.
	// This is usually called after parsing the request body. Example: gpt-3.5-turbo-instruct
	SetOriginalModel(originalModel internalapi.OriginalModel)
	// SetRequestModel sets the model from the request. This is usually called after parsing the request body.
	// Example: gpt-3.5-turbo-instruct-override
	SetRequestModel(requestModel internalapi.RequestModel)
	// SetResponseModel sets the model that ultimately generated the response.
	// Example: gpt-3.5-turbo-instruct-0914
	SetResponseModel(responseModel internalapi.ResponseModel)
	// SetBackend sets the selected backend when the routing decision has been made. This is usually called
	// after parsing the request body to determine the model and invoke the routing logic.
	SetBackend(backend *filterapi.Backend)

	// RecordTokenUsage records token usage metrics.
	RecordTokenUsage(ctx context.Context, inputTokens, outputTokens uint32, requestHeaderLabelMapping map[string]string)
	// RecordRequestCompletion records latency metrics for the entire request.
	RecordRequestCompletion(ctx context.Context, success bool, requestHeaderLabelMapping map[string]string)
	// RecordTokenLatency records latency metrics for token generation.
	RecordTokenLatency(ctx context.Context, tokens uint32, endOfStream bool, requestHeaderLabelMapping map[string]string)
	// GetTimeToFirstTokenMs returns the time to first token in stream mode in milliseconds.
	GetTimeToFirstTokenMs() float64
	// GetInterTokenLatencyMs returns the inter token latency in stream mode in milliseconds.
	GetInterTokenLatencyMs() float64
}

CompletionMetrics is the interface for the completion AI Gateway metrics.

type CompletionMetricsFactory added in v0.4.0

type CompletionMetricsFactory func() CompletionMetrics

CompletionMetricsFactory is a closure that creates a new CompletionMetrics instance.

func NewCompletionFactory added in v0.4.0

func NewCompletionFactory(meter metric.Meter, requestHeaderLabelMapping map[string]string) CompletionMetricsFactory

NewCompletionFactory returns a closure to create a new CompletionMetrics instance.

type EmbeddingsMetrics added in v0.3.0

type EmbeddingsMetrics interface {
	// StartRequest initializes timing for a new request.
	StartRequest(headers map[string]string)
	// SetOriginalModel sets the original model from the incoming request body before any virtualization applies.
	// This is usually called after parsing the request body. Example: text-embedding-3-small
	SetOriginalModel(originalModel internalapi.OriginalModel)
	// SetRequestModel sets the model from the request. This is usually called after parsing the request body.
	// Example: text-embedding-3-small
	SetRequestModel(requestModel internalapi.RequestModel)
	// SetResponseModel sets the model that ultimately generated the response.
	// Example: text-embedding-3-small-2025-02-18
	SetResponseModel(responseModel internalapi.ResponseModel)
	// SetBackend sets the selected backend when the routing decision has been made. This is usually called
	// after parsing the request body to determine the model and invoke the routing logic.
	SetBackend(backend *filterapi.Backend)

	// RecordTokenUsage records token usage metrics for embeddings (only input tokens are relevant).
	RecordTokenUsage(ctx context.Context, inputTokens uint32, requestHeaderLabelMapping map[string]string)
	// RecordRequestCompletion records latency metrics for the entire request.
	RecordRequestCompletion(ctx context.Context, success bool, requestHeaderLabelMapping map[string]string)
}

EmbeddingsMetrics is the interface for the embeddings AI Gateway metrics.

type EmbeddingsMetricsFactory added in v0.4.0

type EmbeddingsMetricsFactory func() EmbeddingsMetrics

EmbeddingsMetricsFactory is a closure that creates a new EmbeddingsMetrics instance.

func NewEmbeddingsFactory added in v0.4.0

func NewEmbeddingsFactory(meter metric.Meter, requestHeaderAttributeMapping map[string]string) EmbeddingsMetricsFactory

NewEmbeddingsFactory returns a closure to create a new Embeddings instance.

type ImageGenerationMetrics added in v0.4.0

type ImageGenerationMetrics interface {
	// StartRequest initializes timing for a new request.
	StartRequest(headers map[string]string)
	// SetOriginalModel sets the original model from the incoming request body before any virtualization applies.
	// This is usually called after parsing the request body. Example: dall-e-3
	SetOriginalModel(originalModel internalapi.OriginalModel)
	// SetRequestModel sets the request model name.
	SetRequestModel(requestModel internalapi.RequestModel)
	// SetResponseModel sets the response model name.
	SetResponseModel(responseModel internalapi.ResponseModel)
	// SetBackend sets the selected backend when the routing decision has been made. This is usually called
	// after parsing the request body to determine the model and invoke the routing logic.
	SetBackend(backend *filterapi.Backend)

	// RecordTokenUsage records token usage metrics (image gen typically 0, but supported).
	RecordTokenUsage(ctx context.Context, inputTokens, outputTokens uint32, requestHeaderLabelMapping map[string]string)
	// RecordRequestCompletion records latency metrics for the entire request.
	RecordRequestCompletion(ctx context.Context, success bool, requestHeaderLabelMapping map[string]string)
	// RecordImageGeneration records metrics specific to image generation (request duration only).
	RecordImageGeneration(ctx context.Context, requestHeaderLabelMapping map[string]string)
}

ImageGenerationMetrics is the interface for the image generation AI Gateway metrics.

type ImageGenerationMetricsFactory added in v0.4.0

type ImageGenerationMetricsFactory func() ImageGenerationMetrics

ImageGenerationMetricsFactory is a closure that creates a new ImageGenerationMetrics instance.

func NewImageGenerationFactory added in v0.4.0

func NewImageGenerationFactory(meter metric.Meter, requestHeaderLabelMapping map[string]string) ImageGenerationMetricsFactory

NewImageGenerationFactory returns a closure to create a new ImageGenerationMetrics instance.

type MCPErrorType added in v0.4.0

type MCPErrorType string

MCPErrorType defines the type of error that occurred during an MCP request.

const (
	// MCPErrorUnsupportedProtocolVersion indicates that the protocol version is not supported.
	MCPErrorUnsupportedProtocolVersion MCPErrorType = "unsupported_protocol_version"
	// MCPErrorInvalidJSONRPC indicates that the JSON-RPC request is invalid.
	MCPErrorInvalidJSONRPC MCPErrorType = "invalid_json_rpc"
	// MCPErrorUnsupportedMethod indicates that the method is not supported.
	MCPErrorUnsupportedMethod MCPErrorType = "unsupported_method"
	// MCPErrorUnsupportedResponse indicates that the response is not supported.
	MCPErrorUnsupportedResponse MCPErrorType = "unsupported_response"
	// MCPErrorInvalidParam indicates that a parameter is invalid.
	MCPErrorInvalidParam MCPErrorType = "invalid_param"
	// MCPErrorInvalidSessionID indicates that the session ID is invalid.
	MCPErrorInvalidSessionID MCPErrorType = "invalid_session_id"
	// MCPErrorInternal indicates that an internal error occurred.
	MCPErrorInternal MCPErrorType = "internal_error"
)

type MCPMetrics added in v0.4.0

type MCPMetrics interface {
	// WithRequestAttributes returns a new MCPMetrics instance with default attributes extracted from the HTTP request.
	WithRequestAttributes(req *http.Request) MCPMetrics
	// RecordRequestDuration records the duration of a success MCP request.
	RecordRequestDuration(ctx context.Context, startAt *time.Time, meta mcpsdk.Params)
	// RecordRequestErrorDuration records the duration of an MCP request that resulted in an error.
	RecordRequestErrorDuration(ctx context.Context, startAt *time.Time, errType MCPErrorType, meta mcpsdk.Params)
	// RecordMethodCount records the count of method invocations.
	RecordMethodCount(ctx context.Context, methodName string, meta mcpsdk.Params)
	// RecordMethodErrorCount records the count of method invocations with error status.
	RecordMethodErrorCount(ctx context.Context, meta mcpsdk.Params)
	// RecordInitializationDuration records the duration of MCP initialization.
	RecordInitializationDuration(ctx context.Context, startAt *time.Time, meta mcpsdk.Params)
	// RecordClientCapabilities records the negotiated client capabilities.
	RecordClientCapabilities(ctx context.Context, capabilities *mcpsdk.ClientCapabilities, meta mcpsdk.Params)
	// RecordServerCapabilities records the negotiated server capabilities.
	RecordServerCapabilities(ctx context.Context, capabilities *mcpsdk.ServerCapabilities, meta mcpsdk.Params)
	// RecordProgress records a progress notification sent/received.
	RecordProgress(ctx context.Context, meta mcpsdk.Params)
}

MCPMetrics holds metrics for MCP.

func NewMCP added in v0.4.0

func NewMCP(meter metric.Meter, requestHeaderAttributeMapping map[string]string) MCPMetrics

NewMCP creates a new mcp metrics instance.

type MessagesMetrics added in v0.4.0

type MessagesMetrics interface {
	ChatCompletionMetrics
}

MessagesMetrics is the interface for the /messages endpoint AI Gateway metrics.

Semantically, it is identical to ChatCompletionMetrics, so it embeds that interface.

The only different is that it has the operation name "messages" instead of "chat".

type MessagesMetricsFactory added in v0.4.0

type MessagesMetricsFactory func() MessagesMetrics

MessagesMetricsFactory is a closure that creates a new MessagesMetrics instance.

func NewMessagesFactory added in v0.4.0

func NewMessagesFactory(meter metric.Meter, requestHeaderLabelMapping map[string]string) MessagesMetricsFactory

NewMessagesFactory returns a closure that creates a new MessagesMetrics instance.

type RerankMetrics added in v0.4.0

type RerankMetrics interface {
	// StartRequest initializes timing for a new request.
	StartRequest(headers map[string]string)
	// SetOriginalModel sets the original model from the incoming request body before any virtualization applies.
	// This is usually called after parsing the request body. Example: rerank-english-v3
	SetOriginalModel(originalModel internalapi.OriginalModel)
	// SetRequestModel sets the model from the request. This is usually called after parsing the request body.
	// Example: rerank-english-v3
	SetRequestModel(requestModel internalapi.RequestModel)
	// SetResponseModel sets the model that ultimately generated the response.
	// Example: rerank-english-v3-2025-02-18
	SetResponseModel(responseModel internalapi.ResponseModel)
	// SetBackend sets the selected backend when the routing decision has been made. This is usually called
	// after parsing the request body to determine the model and invoke the routing logic.
	SetBackend(backend *filterapi.Backend)

	// RecordTokenUsage records token usage metrics for rerank (only input tokens are relevant).
	RecordTokenUsage(ctx context.Context, inputTokens uint32, requestHeaderLabelMapping map[string]string)
	// RecordRequestCompletion records latency metrics for the entire request.
	RecordRequestCompletion(ctx context.Context, success bool, requestHeaderLabelMapping map[string]string)
}

RerankMetrics is the interface for the rerank AI Gateway metrics.

type RerankMetricsFactory added in v0.4.0

type RerankMetricsFactory func() RerankMetrics

RerankMetricsFactory is a closure that creates a new RerankMetrics instance.

func NewRerankFactory added in v0.4.0

func NewRerankFactory(meter metric.Meter, requestHeaderAttributeMapping map[string]string) RerankMetricsFactory

NewRerankFactory returns a closure to create a new Rerank instance.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL