Documentation
¶
Overview ¶
Package metrics provides token usage metrics collection and aggregation.
Index ¶
- type Accumulator
- func (a *Accumulator) Clear()
- func (a *Accumulator) ClearCost()
- func (a *Accumulator) CostMicroSnapshot() map[string]CostMicroUSDCounts
- func (a *Accumulator) CostSnapshot() CostUsage
- func (a *Accumulator) Query(duration time.Duration) TimeSeriesResponse
- func (a *Accumulator) QueryCost(duration time.Duration) CostTimeSeriesResponse
- func (a *Accumulator) QueryCostByClient(duration time.Duration) CostTimeSeriesResponse
- func (a *Accumulator) Record(serverName string, inputTokens, outputTokens int)
- func (a *Accumulator) RecordCost(serverName string, replicaID int, cost CostBreakdown)
- func (a *Accumulator) RecordCostWithClient(serverName string, replicaID int, clientID string, cost CostBreakdown)
- func (a *Accumulator) RecordFormatSavings(serverName string, originalTokens, formattedTokens int)
- func (a *Accumulator) RecordReplica(serverName string, replicaID, inputTokens, outputTokens int)
- func (a *Accumulator) RecordReplicaWithClient(serverName string, replicaID int, clientID string, ...)
- func (a *Accumulator) RecordToolCall(serverName, toolName string)
- func (a *Accumulator) ReplaySnapshot(serverName string, ts time.Time, inputTokens, outputTokens, costMicro int64)
- func (a *Accumulator) Restore(perServer map[string]TokenCounts)
- func (a *Accumulator) RestoreCost(perServer map[string]CostMicroUSDCounts)
- func (a *Accumulator) Snapshot() TokenUsage
- func (a *Accumulator) StartedAt() time.Time
- func (a *Accumulator) ToolUsageSnapshot() map[string]map[string]ToolStat
- type CostBreakdown
- type CostCounts
- type CostDataPoint
- type CostMicroUSDCounts
- type CostTimeSeriesResponse
- type CostUsage
- type DataPoint
- type FormatSavings
- type ModelResolver
- type Observer
- type TimeSeriesResponse
- type TokenCounts
- type TokenUsage
- type ToolStat
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Accumulator ¶
type Accumulator struct {
// contains filtered or unexported fields
}
Accumulator collects token usage metrics with thread-safe operations. Session totals use atomic counters. Historical data is stored in a ring buffer of pre-aggregated 1-minute time buckets.
func NewAccumulator ¶
func NewAccumulator(maxDataPoints int) *Accumulator
NewAccumulator creates a metrics accumulator with the given ring buffer capacity. Each slot holds one minute of aggregated data, so 10000 slots ≈ ~7 days.
func (*Accumulator) Clear ¶
func (a *Accumulator) Clear()
Clear resets all metrics — session totals, per-server totals, and history.
func (*Accumulator) ClearCost ¶
func (a *Accumulator) ClearCost()
ClearCost resets cost counters and cost ring-buffer values without touching token counters or format-savings state. Used by the `DELETE /api/metrics/cost` endpoint so operators can wipe cost data without losing token history.
func (*Accumulator) CostMicroSnapshot ¶
func (a *Accumulator) CostMicroSnapshot() map[string]CostMicroUSDCounts
CostMicroSnapshot returns per-server cumulative cost in the int64 micro-USD shape used by the persistence layer. Skipping the float USD round-trip avoids any precision loss between the in-memory atomics and the on-disk schema. Used by telemetry.MetricsFlusher.flushOnce to compute a cost diff against prevCost in the same units that get written to metrics.jsonl, and consumed symmetrically by SeedFromFile via RestoreCost. Session totals are not returned — they are derivable as the sum across servers, which RestoreCost re-derives on rehydrate.
func (*Accumulator) CostSnapshot ¶
func (a *Accumulator) CostSnapshot() CostUsage
CostSnapshot returns the current cost usage summary in USD. The shape mirrors Snapshot()'s TokenUsage so API responses can carry both side by side. Cache fields are non-zero only when RecordCost recorded cache usage; otherwise they are omitted from JSON via omitempty.
func (*Accumulator) Query ¶
func (a *Accumulator) Query(duration time.Duration) TimeSeriesResponse
Query returns historical time-series data for the given duration. For ranges > 6h, data points are downsampled to hourly buckets.
func (*Accumulator) QueryCost ¶
func (a *Accumulator) QueryCost(duration time.Duration) CostTimeSeriesResponse
QueryCost returns historical cost-over-time data for the given duration. For ranges > 6h, data points are downsampled to hourly buckets, matching the Query (token) behavior so charts can share the same time-range selector. The PerClient map on the response is left nil; call QueryCostByClient when caller asks for per-client grouping.
func (*Accumulator) QueryCostByClient ¶
func (a *Accumulator) QueryCostByClient(duration time.Duration) CostTimeSeriesResponse
QueryCostByClient is QueryCost with per-client grouping enabled. The returned response has its PerClient field populated alongside PerServer so consumers can render either dimension off a single response.
func (*Accumulator) Record ¶
func (a *Accumulator) Record(serverName string, inputTokens, outputTokens int)
Record adds a token usage observation from a tool call. Equivalent to RecordReplica with replicaID=-1 (i.e. do not attribute to a replica).
func (*Accumulator) RecordCost ¶
func (a *Accumulator) RecordCost(serverName string, replicaID int, cost CostBreakdown)
RecordCost adds a per-call USD cost observation alongside the token observation that RecordReplica records. Pass replicaID < 0 to skip the per-replica update, mirroring RecordReplica.
Cost MUST be computed at observation time, not derived from stored token totals at read time: a model change mid-window would otherwise mis-price earlier calls. Cache-read and cache-write components arrive as separate fields on CostBreakdown so the Snapshot shape can preserve the split.
func (*Accumulator) RecordCostWithClient ¶
func (a *Accumulator) RecordCostWithClient(serverName string, replicaID int, clientID string, cost CostBreakdown)
RecordCostWithClient is the client-aware variant of RecordCost. The cost is added to the per-client cost aggregates and per-client cost ring buffer in addition to the session, per-server, and per-replica aggregates. An empty clientID skips the per-client update.
func (*Accumulator) RecordFormatSavings ¶
func (a *Accumulator) RecordFormatSavings(serverName string, originalTokens, formattedTokens int)
RecordFormatSavings records token counts before and after format conversion. Normal token usage tracking is handled separately by the ToolCallObserver; this method only tracks the format savings delta.
func (*Accumulator) RecordReplica ¶
func (a *Accumulator) RecordReplica(serverName string, replicaID, inputTokens, outputTokens int)
RecordReplica adds a token usage observation attributed to a specific replica. Per-server aggregates are updated in all cases. Pass replicaID < 0 to skip the per-replica update (used for servers that are not part of a replica set).
func (*Accumulator) RecordReplicaWithClient ¶
func (a *Accumulator) RecordReplicaWithClient(serverName string, replicaID int, clientID string, inputTokens, outputTokens int)
RecordReplicaWithClient is the client-aware variant of RecordReplica. It updates the per-client token counters in addition to session, per-server, and per-replica aggregates. An empty clientID skips the per-client update, matching the replicaID < 0 convention so callers without attribution can continue to use the same code path.
func (*Accumulator) RecordToolCall ¶
func (a *Accumulator) RecordToolCall(serverName, toolName string)
RecordToolCall increments per-(server, tool) call counters and stamps the last-called timestamp. Used by pkg/optimize's unused_tool heuristic.
An empty serverName or toolName is a no-op so callers without per-tool attribution (legacy ToolCallObserver path) can invoke unconditionally.
func (*Accumulator) ReplaySnapshot ¶
func (a *Accumulator) ReplaySnapshot(serverName string, ts time.Time, inputTokens, outputTokens, costMicro int64)
ReplaySnapshot adds a historical observation to the time-series ring buffers (aggregate + per-server) without touching cumulative counters. Used by telemetry.MetricsFlusher.SeedFromFile to rehydrate per-minute bucket history from each persisted Diff line — the chart shows pre-restart activity continuously alongside live data instead of resetting to a single post-restart point.
costMicro is the rolled-up total cost for the minute (sum of the four CostBreakdown components) in int64 micro-USD, matching the live RecordCost path which also calls addCostToBucket(now, totalMicro). Pass 0 for token-only replays (legacy persistence files predate the cost field). Cost-only replays — non-zero costMicro with zero token counts — are supported so a minute that recorded a priced fixture without token attribution still hydrates its cost bucket on seed.
Cumulative counters are restored separately via Restore + RestoreCost. Calling both with the same source data reproduces the on-disk state.
ts is bucketed to the minute via the same key the live Record path uses, so chronological replay produces one bucket per flush minute and live observations after replay continue advancing the same ring naturally.
func (*Accumulator) Restore ¶
func (a *Accumulator) Restore(perServer map[string]TokenCounts)
Restore replaces per-server token totals with the supplied map and recomputes session totals as the sum across all servers (matching the invariant Record/RecordReplica maintains). Used on daemon startup to repopulate cumulative counters from a persisted metrics.jsonl file.
Existing per-server counters are overwritten for any server present in the map; servers absent from the map retain their current state. Replicas and format-savings counters are not restored — those carry no on-disk equivalent in the snapshot format. Time-series ring buckets are populated separately via ReplaySnapshot.
func (*Accumulator) RestoreCost ¶
func (a *Accumulator) RestoreCost(perServer map[string]CostMicroUSDCounts)
RestoreCost is the cost analogue of Restore: it overwrites per-server cost component atomics with the supplied map and recomputes session cost totals as the sum across all servers (matching the invariant RecordCost maintains). Used on daemon startup by telemetry.MetricsFlusher.SeedFromFile to repopulate cumulative cost counters from a persisted metrics.jsonl file so the Cost KPI card reflects pre-restart spend the moment the UI loads.
Per-component splitting (input / output / cache-read / cache-write) is preserved on the cumulative atomics so CostSnapshot.Session can render the breakdown without recomputing — same trade-off live RecordCost makes. The time-series ring buffers are populated separately via ReplaySnapshot, which carries only the rolled-up total per bucket.
Servers absent from the map retain their current cost state. Replicas, format-savings, and per-client cost have no on-disk equivalent in the snapshot format and are not restored.
func (*Accumulator) Snapshot ¶
func (a *Accumulator) Snapshot() TokenUsage
Snapshot returns the current token usage summary.
func (*Accumulator) StartedAt ¶
func (a *Accumulator) StartedAt() time.Time
StartedAt returns the wall-clock time the accumulator was created. Clear and ClearCost do not reset this value — the start-of-observation window stays anchored to the gateway lifetime, which is what pkg/optimize uses to gate "<24h of data" findings.
func (*Accumulator) ToolUsageSnapshot ¶
func (a *Accumulator) ToolUsageSnapshot() map[string]map[string]ToolStat
ToolUsageSnapshot returns a deep copy of the per-(server, tool) call counters. Empty when no per-tool calls have been recorded (typical for gateways still on the legacy ToolCallObserver path).
type CostBreakdown ¶
CostBreakdown is the per-call USD cost split passed to RecordCost. Cache fields are priced separately from input tokens to match LiteLLM's cache rate fields — conflating them mis-prices providers like Anthropic by roughly an order of magnitude.
func (CostBreakdown) IsValid ¶
func (c CostBreakdown) IsValid() bool
IsValid reports whether every component is finite and non-negative. A misconfigured Source could in theory return NaN/Inf rates or a negative Calculate result; recording those into atomic counters would permanently corrupt the snapshot. RecordCost drops invalid breakdowns.
func (CostBreakdown) IsZero ¶
func (c CostBreakdown) IsZero() bool
IsZero reports whether all components are zero. Used by RecordCost to short-circuit accumulator updates when a tool call has no priceable usage (unknown model, all-zero token counts).
type CostCounts ¶
type CostCounts struct {
InputUSD float64 `json:"input_usd"`
OutputUSD float64 `json:"output_usd"`
CacheReadUSD float64 `json:"cache_read_usd,omitempty"`
CacheWriteUSD float64 `json:"cache_write_usd,omitempty"`
TotalUSD float64 `json:"total_usd"`
}
CostCounts is the snapshot shape for a single dimension (session, per-server, per-replica) of cost accumulation. All values are USD. Cache fields are omitempty so consumers that only care about input/output costs are not forced to render zeroes.
type CostDataPoint ¶
CostDataPoint is the time-series shape for cost-over-time queries.
type CostMicroUSDCounts ¶
type CostMicroUSDCounts struct {
InputMicroUSD int64 `json:"input_micro_usd,omitempty"`
OutputMicroUSD int64 `json:"output_micro_usd,omitempty"`
CacheReadMicroUSD int64 `json:"cache_read_micro_usd,omitempty"`
CacheWriteMicroUSD int64 `json:"cache_write_micro_usd,omitempty"`
}
CostMicroUSDCounts is the int64 micro-USD shape used by the persistence layer to round-trip the four cost components without float precision loss. Mirrors the in-memory atomic representation on serverCounters.
func (CostMicroUSDCounts) IsZero ¶
func (c CostMicroUSDCounts) IsZero() bool
IsZero reports whether all four cost components are zero.
func (CostMicroUSDCounts) TotalMicroUSD ¶
func (c CostMicroUSDCounts) TotalMicroUSD() int64
TotalMicroUSD returns the rolled-up sum of the four components — the shape ReplaySnapshot stores per bucket, matching addCostToBucket's live behavior of writing a single total per minute.
type CostTimeSeriesResponse ¶
type CostTimeSeriesResponse struct {
Range string `json:"range"`
Interval string `json:"interval"`
Points []CostDataPoint `json:"data_points"`
PerServer map[string][]CostDataPoint `json:"per_server"`
// PerClient groups cost over time by originating MCP client. Populated
// only when the API caller requests per-client grouping (the
// `per_client=true` query parameter on /api/metrics/cost) so the JSON
// stays compact for the common per-server view.
PerClient map[string][]CostDataPoint `json:"per_client,omitempty"`
}
CostTimeSeriesResponse is the cost analogue of TimeSeriesResponse. The `Range` and `Interval` strings reuse the same vocabulary as the token time-series so charts can share a time-range selector.
type CostUsage ¶
type CostUsage struct {
Session CostCounts `json:"session"`
PerServer map[string]CostCounts `json:"per_server"`
PerReplica map[string]map[int]CostCounts `json:"per_replica,omitempty"`
// PerClient groups USD cost by the originating MCP client. omitempty so
// pre-attribution consumers keep their existing JSON shape.
PerClient map[string]CostCounts `json:"per_client,omitempty"`
}
CostUsage is the top-level cost snapshot. The shape mirrors TokenUsage so API consumers can render cost charts beside token charts.
type DataPoint ¶
type DataPoint struct {
Timestamp time.Time `json:"timestamp"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
TotalTokens int64 `json:"total_tokens"`
}
DataPoint is a single time-series data point with token counts.
type FormatSavings ¶
type FormatSavings struct {
OriginalTokens int64 `json:"original_tokens"`
FormattedTokens int64 `json:"formatted_tokens"`
SavedTokens int64 `json:"saved_tokens"`
SavingsPercent float64 `json:"savings_percent"`
}
FormatSavings tracks token savings from output formatting.
type ModelResolver ¶
ModelResolver returns the configured model ID for a server, or "" when the server has no model attribution. Used by the Observer when a tool result does not carry a model in its CallUsage metadata. Resolvers must be safe for concurrent calls.
type Observer ¶
type Observer struct {
// contains filtered or unexported fields
}
Observer implements mcp.ToolCallObserver and mcp.ClientObserver by counting tokens, pricing the call against the active pricing.Source, and recording both into an Accumulator.
func NewObserver ¶
func NewObserver(counter token.Counter, accumulator *Accumulator) *Observer
NewObserver creates a ToolCallObserver that counts tokens and records metrics. The cost path is wired but inert until SetModelResolver installs a server -> model mapping or tool results carry CallUsage with a model field. Until then RecordCost is called only when both the call reports a model and that model is known to the active pricing.Source.
func (*Observer) ObserveToolCall ¶
func (o *Observer) ObserveToolCall(serverName string, replicaID int, arguments map[string]any, result *mcp.ToolCallResult)
ObserveToolCall counts input/output tokens and records them, then prices the call against the active pricing.Source and records the per-component USD breakdown alongside the tokens.
The cost path is best-effort: a call against an unknown model records tokens normally and skips RecordCost. Cache-read and cache-write tokens reported in result._meta (CallUsage) are priced via the provider's cache rates rather than rolled into the input rate.
func (*Observer) ObserveToolCallWithClient ¶
func (o *Observer) ObserveToolCallWithClient(_ context.Context, obs mcp.ToolCallObservation) mcp.ToolCallSummary
ObserveToolCallWithClient is the ClientObserver entry point. It records the same tokens + cost as ObserveToolCall, additionally attributes them to the supplied client, and returns a summary the gateway uses to populate OTel GenAI semantic span attributes without re-counting tokens.
func (*Observer) SetModelResolver ¶
func (o *Observer) SetModelResolver(r ModelResolver)
SetModelResolver installs the server -> model resolver used as a fallback when a tool result does not carry a model in its CallUsage. Passing nil clears the resolver, after which only call-level model attribution is honored.
type TimeSeriesResponse ¶
type TimeSeriesResponse struct {
Range string `json:"range"`
Interval string `json:"interval"`
Points []DataPoint `json:"data_points"`
PerServer map[string][]DataPoint `json:"per_server"`
}
TimeSeriesResponse is returned by the historical metrics endpoint.
type TokenCounts ¶
type TokenCounts struct {
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
TotalTokens int64 `json:"total_tokens"`
}
TokenCounts holds input/output/total token counts.
type TokenUsage ¶
type TokenUsage struct {
Session TokenCounts `json:"session"`
PerServer map[string]TokenCounts `json:"per_server"`
PerReplica map[string]map[int]TokenCounts `json:"per_replica,omitempty"`
// PerClient groups token usage by the originating MCP client (for example
// "claude-code", "cursor"). The field is omitempty so consumers built
// before per-client attribution shipped continue to see the same JSON
// shape. Future per-user / per-team dimensions land as sibling fields
// (per_user, per_team) under this same shape rather than reshaping
// per_client.
PerClient map[string]TokenCounts `json:"per_client,omitempty"`
FormatSavings FormatSavings `json:"format_savings"`
}
TokenUsage is the top-level token usage snapshot returned by the API.
type ToolStat ¶
type ToolStat struct {
Calls int64 `json:"calls"`
LastCalledAt time.Time `json:"last_called_at,omitempty"`
}
ToolStat is the snapshot shape for per-(server, tool) call tracking. Used by pkg/optimize to detect tools that have not seen any calls inside a freshness window. Calls is the cumulative count since the accumulator was created or last cleared; LastCalledAt is the wall-clock time the most recent call was recorded, or the zero value when no calls have been recorded.