metrics

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package metrics provides a lightweight in-process metrics registry with Prometheus text-format output. It requires zero external dependencies.

Instrumented points on the hot path:

  • agentguard_checks_total — counter, by decision label
  • agentguard_request_duration_ms — histogram, end-to-end /v1/check
  • agentguard_policy_eval_duration_ms — histogram, Engine.Check only
  • agentguard_audit_write_duration_ms — histogram, Logger.Log only
  • agentguard_pending_approvals — gauge, current queue depth

Index

Constants

View Source
const (
	ApprovalEvictedLRUResolved = "lru_resolved"
	ApprovalEvictedQueueFull   = "queue_full"
)

Well-known reason labels for IncApprovalEvicted. When the approval queue is at capacity, either an old resolved entry is dropped to make room (lru_resolved) or the request is refused with 503 because nothing was resolved (queue_full). Both paths increment this counter so operators can distinguish "we need a bigger queue" from "we need more approvers".

View Source
const (
	MigrationStatusRan     = "ran"
	MigrationStatusSkipped = "skipped"
	MigrationStatusFailed  = "failed"
)
View Source
const (
	NotifyDroppedQueueFull = "queue_full"
)

Well-known reason labels for IncNotifyDropped. Kept bounded so the Prometheus series cardinality stays predictable.

View Source
const (
	RejectedBodyTooLarge = "body_too_large"
)

Well-known reason labels for IncRequestRejected. Other reasons are allowed but callers must keep the cardinality bounded.

View Source
const (
	// SSEDroppedSlowConsumer labels a broadcast that was discarded because
	// the per-subscriber channel was full (the subscriber isn't draining
	// fast enough). This is the fail-fast drop in broadcastLocked's
	// default case.
	SSEDroppedSlowConsumer = "slow_consumer"
)

Variables

View Source
var (
	ChecksTotal      uint64 // all /v1/check requests
	AllowedTotal     uint64
	DeniedTotal      uint64
	ApprovalTotal    uint64 // REQUIRE_APPROVAL decisions
	RateLimitedTotal uint64 // rate-limit denies

)
View Source
var (
	AuditReplayEntriesTotal uint64
	AuditRotationsTotal     uint64
)

Audit replay + rotation counters. Replay happens once at startup (seeding in-memory decision counters from the audit log); rotations happen inline on FileLogger.Log when the size threshold is crossed.

View Source
var (
	RequestDuration    = newHistogram(durationBuckets)
	PolicyEvalDuration = newHistogram(durationBuckets)
	AuditWriteDuration = newHistogram(durationBuckets)
)

Package-level histograms.

View Source
var ApprovalReplayMismatchTotal uint64

ApprovalReplayMismatchTotal counts /v1/check requests that carried an approval_id whose corresponding PendingAction.Request did not match the retry's operationally-meaningful fields (agent_id / scope / command / path / domain / url / action). Mismatches are NOT short-circuited to the cached decision — the request falls through to normal Engine.Check evaluation. This metric is the security signal: legitimate retries match shape and never increment it; a non-zero rate means either a buggy gateway is reusing ids across distinct actions or an attacker who learned an approved id is replaying it against unrelated commands.

See V05 audit B1 (R-Sec H1, R-Stub C3) for the underlying gating- bypass finding the validator closes.

View Source
var AuditCorruptLinesTotal uint64

AuditCorruptLinesTotal counts audit log lines that failed JSON parse during Query() and were skipped. Rare in practice — the usual cause is a crash between the write syscall and the newline flush, or disk corruption. Kept visible via /metrics so operators can spot silent audit-file degradation instead of discovering it when a query returns fewer entries than expected.

Functions

func AddAuditReplayEntries added in v0.5.0

func AddAuditReplayEntries(n uint64)

AddAuditReplayEntries records entries processed during replay. Cumulative across multiple replays in pathological re-entrance, but in the normal single-replay-per-process case just equals that one replay's count.

func ApprovalEvictedFor added in v0.5.0

func ApprovalEvictedFor(reason string) uint64

ApprovalEvictedFor returns the count for a specific reason (for tests).

func DecSSESubscribers added in v0.5.0

func DecSSESubscribers()

DecSSESubscribers is the counterpart to IncSSESubscribers.

func IncApprovalEvicted added in v0.5.0

func IncApprovalEvicted(reason string)

IncApprovalEvicted increments agentguard_approvals_evicted_total{reason=...}. Cardinality is bounded to the ApprovalEvicted* constants above.

func IncApprovalReplayMismatch added in v0.5.0

func IncApprovalReplayMismatch()

IncApprovalReplayMismatch increments agentguard_approval_replay_mismatch_total. Called from pkg/proxy.handleCheck when the approval-id round-trip lookup hits an entry but the retry request's shape differs from the original.

func IncAuditCorruptLine added in v0.5.0

func IncAuditCorruptLine()

IncAuditCorruptLine bumps agentguard_audit_corrupt_lines_total.

func IncAuditRotation added in v0.5.0

func IncAuditRotation()

IncAuditRotation increments agentguard_audit_rotations_total. Called from the FileLogger rotation success path after the new live file is open.

func IncDecision

func IncDecision(decision string)

IncDecision increments the appropriate decision counter.

func IncLLMProxyBufferOverflow added in v0.5.0

func IncLLMProxyBufferOverflow(provider string)

IncLLMProxyBufferOverflow increments agentguard_llmproxy_buffer_overflow_total{provider=...}. Provider MUST be "openai" or "anthropic" — the LLM proxy enforces that upstream so cardinality stays bounded.

func IncLLMProxyNonStreamingOverflow added in v0.5.0

func IncLLMProxyNonStreamingOverflow(provider string)

IncLLMProxyNonStreamingOverflow increments agentguard_llmproxy_non_streaming_overflow_total{provider=...}. Provider MUST be "openai" or "anthropic" — the LLM proxy enforces that upstream so cardinality stays bounded.

func IncLLMProxyStreamsRejected added in v0.5.0

func IncLLMProxyStreamsRejected()

IncLLMProxyStreamsRejected bumps agentguard_llmproxy_streams_rejected_total. Called once per streaming request that was refused with 503 because the global cap was already at MaxConcurrentStreams.

func IncNotifyDropped added in v0.5.0

func IncNotifyDropped(notifier, reason string)

IncNotifyDropped increments the labeled counter for a notification drop. notifier should be a bounded-cardinality notifier type ("webhook"/"slack"/"console"/"log"); reason should be a stable NotifyDropped* constant. Callers MUST NOT pass agent- or user-supplied strings here — that would explode Prometheus cardinality.

func IncRateLimitBucketEvicted added in v0.5.0

func IncRateLimitBucketEvicted(scope string)

IncRateLimitBucketEvicted increments agentguard_ratelimit_bucket_evictions_total{scope=...}. Cardinality is bounded by the set of policy scopes (typically < 20 across a deployment).

func IncRateLimited

func IncRateLimited()

IncRateLimited increments the rate-limit-specific counter.

It used to also bump ChecksTotal/DeniedTotal, which double-counted rate-limited requests because logAndRespond unconditionally calls IncDecision("DENY") for the synthetic rate-limit DENY result. As of v0.5 the unified logAndRespond path owns ChecksTotal/DeniedTotal for every decision (including the synthetic rate-limit DENY); IncRateLimited only touches the rate-limit-specific series.

Closes R3 #21 (audit finding "rate-limited requests double-count ChecksTotal and DeniedTotal").

func IncRequestRejected added in v0.5.0

func IncRequestRejected(reason string)

IncRequestRejected increments agentguard_request_rejected_total{reason=...}.

func IncSSEEventDropped added in v0.5.0

func IncSSEEventDropped(reason string)

IncSSEEventDropped bumps the labeled counter for an SSE broadcast drop.

func IncSSESubscribers added in v0.5.0

func IncSSESubscribers()

IncSSESubscribers is called on Subscribe. Matching dec runs on Unsubscribe so the gauge stays accurate even if a client drops without the server side noticing (Unsubscribe is always called from the SSE handler's defer).

func LLMProxyBufferOverflowFor added in v0.5.0

func LLMProxyBufferOverflowFor(provider string) uint64

LLMProxyBufferOverflowFor returns the current count (for tests).

func LLMProxyNonStreamingOverflowFor added in v0.5.0

func LLMProxyNonStreamingOverflowFor(provider string) uint64

LLMProxyNonStreamingOverflowFor returns the current count (for tests).

func LLMProxyStreamsActive added in v0.5.0

func LLMProxyStreamsActive() int64

LLMProxyStreamsActive returns the current active-streams gauge value (for tests).

func LLMProxyStreamsRejectedTotal added in v0.5.0

func LLMProxyStreamsRejectedTotal() uint64

LLMProxyStreamsRejectedTotal returns the rejected-streams counter (for tests).

func MigrationStatusFor added in v0.5.0

func MigrationStatusFor(from, to, status string) int64

MigrationStatusFor returns the gauge value for a (from, to, status) triple (for tests).

func NotifyDroppedFor added in v0.5.0

func NotifyDroppedFor(notifier, reason string) uint64

NotifyDroppedFor returns the count for a specific (notifier, reason) pair (for tests).

func NotifyDroppedSnapshot added in v0.5.0

func NotifyDroppedSnapshot() map[notifyDroppedKey]uint64

NotifyDroppedSnapshot returns a copy of the current counts (for tests).

func ObserveNotifyDispatch added in v0.5.0

func ObserveNotifyDispatch(notifier string, seconds float64)

ObserveNotifyDispatch records a dispatch latency in seconds for the named notifier type. A missing histogram is created lazily; cardinality is bounded to the notifierType() domain in pkg/notify.

func RateLimitBucketEvictedFor added in v0.5.0

func RateLimitBucketEvictedFor(scope string) uint64

RateLimitBucketEvictedFor returns the eviction count for a scope (for tests).

func RequestRejectedSnapshot added in v0.5.0

func RequestRejectedSnapshot() map[string]uint64

RequestRejectedSnapshot returns a copy of the current counts (for tests).

func SSEEventDroppedFor added in v0.5.0

func SSEEventDroppedFor(reason string) uint64

SSEEventDroppedFor returns the count for a specific reason (for tests).

func SetAuditReplayDuration added in v0.5.0

func SetAuditReplayDuration(d time.Duration)

SetAuditReplayDuration records the duration of the startup audit replay. Expressed in seconds in the Prometheus output; nanoseconds are stored atomically under the hood so the setter is a single instruction.

func SetLLMProxyStreamsActive added in v0.5.0

func SetLLMProxyStreamsActive(n int64)

SetLLMProxyStreamsActive updates the active-streams gauge. Called from the llmproxy server on every stream entry/exit (which atomically also updates the underlying server-side counter — this metric mirrors that counter). 0 is a valid value (no streams in flight).

func SetMigrationStatus added in v0.5.0

func SetMigrationStatus(from, to, status string, value int64)

SetMigrationStatus updates the migration-status gauge for a given (from, to, status) triple. Callers typically record one ran/skipped/ failed value per migration per startup.

func SetNotifyQueueDepth added in v0.5.0

func SetNotifyQueueDepth(n int)

SetNotifyQueueDepth updates the notify dispatch queue depth gauge.

func SetPendingApprovals

func SetPendingApprovals(n int)

SetPendingApprovals sets the current queue depth gauge.

func SetRateLimitBuckets added in v0.5.0

func SetRateLimitBuckets(n int)

SetRateLimitBuckets updates the rate-limit bucket gauge. Called from the /metrics handler with Limiter.BucketCount() so operators can see bucket growth without exporting the limiter internals.

func WritePrometheus

func WritePrometheus(w io.Writer)

WritePrometheus writes all metrics to w in the Prometheus text exposition format (https://prometheus.io/docs/instrumenting/exposition_formats/).

Types

type Histogram

type Histogram struct {
	// contains filtered or unexported fields
}

Histogram tracks a distribution using cumulative bucket counts. Each bucket counts observations with value ≤ the bucket bound, which is the Prometheus histogram convention.

func (*Histogram) Observe

func (h *Histogram) Observe(ms float64)

Observe records one observation in milliseconds.

func (*Histogram) Snapshot

func (h *Histogram) Snapshot() (buckets []float64, counts []uint64, sum float64, total uint64)

Snapshot returns a copy of internal state under the lock.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL