Documentation
¶
Overview ¶
Package measure aggregates per-request measurements in memory so shim can answer the question "what are you actually doing to my traffic?" loudly, in a single JSON payload exposed at /v1/metrics.
Stage 1 measurements (Fork 2-a, 3-a in todos/shim-stage1-plan.md):
- Per-endpoint latency reservoir (fixed-size, percentile-at-read).
- Per-endpoint token-delta totals (shim cl100k_base count vs upstream usage.*_tokens).
- Rewrite-event counts (model rewrites, stop_sequences truncations, ...).
All state is in-memory; restart resets to zero. Persistence is a Stage 2 decision.
Index ¶
- Constants
- type Collector
- func (c *Collector) RecordLatency(endpoint string, d time.Duration)
- func (c *Collector) RecordRequestSeen(endpoint string)
- func (c *Collector) RecordRewriteEvent(kind string)
- func (c *Collector) RecordTokenDelta(endpoint string, shimCount, upstreamPrompt, upstreamCompletion int)
- func (c *Collector) RecordUpstreamError(endpoint string, status int)
- func (c *Collector) Snapshot() Snapshot
- type LatencyStats
- type Snapshot
- type TokenStats
- type UpstreamErrorStats
Constants ¶
const ( RewriteModel = "model" RewriteStopSequences = "stop_sequences" )
Rewrite event kinds. Adding a new kind = add a constant here + wire the call site + extend the rewrites table in README.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Collector ¶
type Collector struct {
// contains filtered or unexported fields
}
Collector aggregates measurements across goroutines. All public methods are safe to call concurrently; percentile compute is deferred to Snapshot.
func New ¶
func New() *Collector
New returns an empty Collector. The reservoir RNG is seeded from time.Now() at construction; callers wanting determinism can build their own and replace the rng field via tests (same-package access).
func (*Collector) RecordLatency ¶
RecordLatency adds an observation for endpoint. Duration is stored as floating-point milliseconds.
func (*Collector) RecordRequestSeen ¶
RecordRequestSeen increments the per-endpoint request counter. Called at handler entry, before any parsing or validation. The denominator for any per-endpoint ratio (errors/seen, rewrites/seen) operators want to compute from /v1/metrics.
func (*Collector) RecordRewriteEvent ¶
RecordRewriteEvent increments the counter for kind. Use the Rewrite* constants; arbitrary strings are accepted but break grep-ability.
func (*Collector) RecordTokenDelta ¶
func (c *Collector) RecordTokenDelta(endpoint string, shimCount, upstreamPrompt, upstreamCompletion int)
RecordTokenDelta adds one (shim-count, upstream-claimed) pair for endpoint. shimCount is the cl100k_base BPE count from tokens.Count; upstreamPrompt and upstreamCompletion are the upstream usage fields. Totals + count surface the delta at Snapshot time.
No-op when BOTH upstreamPrompt and upstreamCompletion are zero — upstream omitted the usage block, recording zeros would dilute the shim/upstream comparison without adding signal. README discloses this.
func (*Collector) RecordUpstreamError ¶
RecordUpstreamError records a non-2xx response from the upstream. status is the upstream's HTTP code (e.g., 400, 502, 429). Aggregated by class (4xx/5xx) and per-status for drill-down. Operators read both to answer "what's failing" without reading raw logs.
type LatencyStats ¶
type LatencyStats struct {
P50 float64 `json:"p50"`
P95 float64 `json:"p95"`
P99 float64 `json:"p99"`
N int `json:"n"`
}
LatencyStats reports percentiles in milliseconds. N is total observations recorded for this endpoint (NOT capped at reservoir size — n past reservoirCap means random replacement applied).
type Snapshot ¶
type Snapshot struct {
Latency map[string]LatencyStats `json:"latency"`
Tokens map[string]TokenStats `json:"token_delta"`
Rewrites map[string]int `json:"rewrites"`
UpstreamErrors map[string]UpstreamErrorStats `json:"upstream_errors"`
RequestsSeen map[string]int `json:"requests_seen"`
}
Snapshot is the JSON-marshallable view returned by Collector.Snapshot. The wire shape is committed in todos/shim-stage1-plan.md §1 step 6; breaking changes need a CHANGELOG entry.
type TokenStats ¶
type TokenStats struct {
ShimTotal int `json:"shim_total"`
UpstreamPromptTotal int `json:"upstream_prompt_total"`
UpstreamCompletionTotal int `json:"upstream_completion_total"`
N int `json:"n"`
}
TokenStats reports cumulative sums. The delta is shim_total vs. upstream_prompt_total — a wide gap signals the cl100k count diverges meaningfully from the upstream's actual tokenizer for the traffic shape.
type UpstreamErrorStats ¶
type UpstreamErrorStats struct {
Total int `json:"total"`
Class4xx int `json:"class_4xx"`
Class5xx int `json:"class_5xx"`
ByStatus map[string]int `json:"by_status"`
}
UpstreamErrorStats reports counts of upstream non-2xx responses per endpoint. ByStatus keys are stringified codes ("400", "502", ...) so the JSON shape is canonical; Total = Class4xx + Class5xx for status codes in the standard error ranges (3xx and oddities are counted only in Total + ByStatus).