prom

package
v0.18.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2026 License: MIT Imports: 17 Imported by: 6

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var Namespace = "parapet"

Namespace is the prometheus namespace

Functions

func Cache added in v0.18.0

func Cache() cache.ResultFunc

Cache returns a cache.ResultFunc that records cache observability metrics on the shared registry, for wiring into cache.Options.OnResult:

store := cache.NewMemory(256 << 20)
c := cache.New(store, cache.Options{OnResult: prom.Cache()})

It registers two metrics (lazily, once per process):

{namespace}_cache_total{host,result}             counter of cache outcomes
    (result = HIT|MISS|STALE|STALE_ERROR|BYPASS — a hit ratio is
     sum(HIT) / sum(all), the otherwise-invisible BYPASS path included)
{namespace}_cache_fill_duration_seconds{host}    histogram of origin-fill latency
    (observed only when the origin was contacted, i.e. MISS and stale-if-error)

The host label matches prom.Requests so the two can be joined. Keeping the metric wiring here leaves pkg/cache free of any Prometheus dependency. Compose it with cache.LogResult to also emit a per-request cacheStatus log field:

metrics := prom.Cache()
c := cache.New(store, cache.Options{
	OnResult: func(r *http.Request, info cache.ResultInfo) {
		metrics(r, info)
		cache.LogResult(r, info)
	},
})
Example

Cache observability: count outcomes and fill latency, and tag access logs.

store := cache.NewMemory(256 << 20)

metrics := Cache() // prom.Cache()
_ = cache.New(store, cache.Options{
	// Compose metrics with the cacheStatus log field; pass prom.Cache() alone if
	// you only want metrics.
	OnResult: func(r *http.Request, info cache.ResultInfo) {
		metrics(r, info)
		cache.LogResult(r, info)
	},
})

func Connections

func Connections(s *parapet.Server)

Connections collects connection metrics from server

func Handler

func Handler() http.Handler

Handler returns prom handler

func Mirror added in v0.18.0

func Mirror() mirror.MirrorFunc

Mirror returns a mirror.MirrorFunc that records shadow-traffic metrics on the shared registry, for wiring into Mirror.Observe — keeping pkg/mirror Prometheus-free (the prom.Upstream convention). mirror_total{outcome} counts each decision/result; mirror_request_duration_seconds observes completed round-trips.

Example
mr := mirror.New()
mr.Observe = Mirror() // prom.Mirror(): count by outcome, observe completed-round-trip latency
_ = mr                // mr.Use(canary); s.Use(mr)

func Networks added in v0.6.0

func Networks(s *parapet.Server)

Networks tracks network io

func RateLimit added in v0.18.0

func RateLimit() ratelimit.ObserveFunc

RateLimit returns a ratelimit.ObserveFunc that records rate-limit decisions on the shared registry, for wiring into RateLimiter.Observe — keeping pkg/ratelimit Prometheus-free (the prom.Mirror/prom.UpstreamState convention).

rl := ratelimit.FixedWindowPerSecond(100)
rl.Name = "api"            // bounded label; "" is fine
rl.Observe = prom.RateLimit()
s.Use(rl)

It records one metric (lazily, once per process):

{namespace}_ratelimit_total{name,result}  counter, result = allowed | limited

The hook is NON-replacing: it fires on every Take decision IN ADDITION to ExceededHandler, so an operator can count rejections (result="limited") without reimplementing the 429 response. Both labels are bounded — name is operator-set (RateLimiter.Name) and result is the two-value closed set; the client key, whose cardinality is unbounded, is never a label.

IMPORTANT (the silent fail-open): a RedisFixedWindowStrategy with FailOpen=true (the constructor default) ADMITS every request when Redis is down — and those admits land in result="allowed", indistinguishable here from genuinely-under-limit admits. Also wire RateLimitRedisError into RedisFixedWindowStrategy.OnError to make Redis-down a distinct, alertable series.

Example
rl := ratelimit.FixedWindowPerSecond(100)
rl.Name = "api"          // bounded label carried on every Event; "" is fine
rl.Observe = RateLimit() // prom.RateLimit(): count decisions by name+result
_ = rl                   // s.Use(rl)

func RateLimitRedisError added in v0.18.0

func RateLimitRedisError() func(error)

RateLimitRedisError returns a func(error) for wiring into RedisFixedWindowStrategy.OnError, so a swallowed Redis error (timeout, dial, script error) — which RedisFixedWindowStrategy.Take folds into the FailOpen bool and the RateLimiter therefore cannot see — surfaces as a distinct counter on the shared registry:

{namespace}_ratelimit_redis_errors_total  counter

s := &ratelimit.RedisFixedWindowStrategy{Runner: r, Max: 100, Size: time.Second, FailOpen: true}
s.OnError = prom.RateLimitRedisError()
rl := ratelimit.New(s)
rl.Observe = prom.RateLimit()

It is a SEPARATE counter rather than a result="error" row on ratelimit_total by design: the OnError hook fires on the strategy, which has no access to the RateLimiter.Name, and at a different granularity (once per failed Redis op, while a fail-open request ALSO increments result="allowed") — folding the two together would carry a permanently-empty name label and double-count the request. As its own series it stays honest: when Redis degrades it climbs while result="limited" stays flat, which is exactly the otherwise-invisible silent-admit signal to alert on. The error value is intentionally not labelled (unbounded). Optional and zero-cost when unset (a nil OnError is never called).

Example
s := &ratelimit.RedisFixedWindowStrategy{Max: 100, Size: time.Second, FailOpen: true}
s.OnError = RateLimitRedisError() // count Redis-down so silent fail-open admits are visible
rl := ratelimit.New(s)
rl.Name = "api"
rl.Observe = RateLimit()
_ = rl // s.Use(rl)

func Registry

func Registry() *prometheus.Registry

Registry returns prometheus registry

func Requests

func Requests() parapet.Middleware

Requests collects request count

func Start

func Start(addr string) error

Start starts prom server

func Upstream added in v0.18.0

func Upstream() upstream.RoundTripFunc

Upstream returns an upstream.RoundTripFunc that records per-backend origin metrics on the shared registry, for wiring into Upstream.OnRoundTrip:

u := upstream.New(lb)
u.OnRoundTrip = prom.Upstream()
s.Use(u)

It registers two metrics (lazily, once per process):

{namespace}_upstream_requests{host,status}                  counter of attempts
    (status = the origin's numeric code, or "error" for a transport failure;
     an origin-error rate is sum(status=~"5..") + sum(status="error"))
{namespace}_upstream_request_duration_seconds{host}         histogram of TTFB
    (transport round-trip latency: connect + send + time to response headers)
{namespace}_upstream_fast_rejects_total{host}               counter of requests
    a reliability balancer shed before any round-trip (ErrUnavailable). The host
    is "" for a shed before any pick; pair with prom.UpstreamState for circuit
    and ejection state.

It fires once per attempt, so retries are counted individually. The host label is the resolved upstream target (operator-configured, bounded), distinct from the client-facing host of prom.Requests. Keeping the wiring here leaves pkg/upstream free of any Prometheus dependency.

Example

Per-backend origin metrics: request count by status and time-to-first-byte.

lb := upstream.NewRoundRobinLoadBalancer([]*upstream.Target{
	{Host: "10.0.0.1:8080", Transport: &upstream.HTTPTransport{}},
	{Host: "10.0.0.2:8080", Transport: &upstream.HTTPTransport{}},
})

u := upstream.New(lb)
u.OnRoundTrip = Upstream() // prom.Upstream(): count by host+status, observe TTFB
_ = u                      // s.Use(u)

func UpstreamInflight added in v0.18.0

func UpstreamInflight(lb *upstream.LeastConnLoadBalancer)

UpstreamInflight registers a LeastConnLoadBalancer with a scrape-time collector that exports its live bulkhead occupancy on the shared registry:

lb := upstream.NewLeastConnLoadBalancer(targets)
prom.UpstreamInflight(lb)
s.Use(upstream.New(lb))

It exports two gauges, both read live at scrape time from lb.Inflight() — nothing runs on the claim/dec hot path:

{namespace}_upstream_inflight{host}           in-flight requests on the target right now
{namespace}_upstream_inflight_capacity{host}  the target's MaxConcurrent cap (bounded targets only)

Saturation of a target is inflight/inflight_capacity; a target pinned at 1.0 is the one driving any upstream_shed_total{reason="saturated"} (see prom.UpstreamShed). The host label is the operator-configured upstream target (bounded). Call it for each LeastConn pool you want observed; the single underlying collector is registered once per process. Host labels must be unique across registered pools: if the same host appears in two pools the collector dedups it (first registered pool wins, INCLUDING its inflight and capacity values) to keep the scrape valid. Keeping the wiring here leaves pkg/upstream free of any Prometheus dependency.

Example

Bulkhead saturation observability: a live per-target in-flight gauge plus a shed-by-cause counter, so a dashboard shows which target is pinned at its cap and whether the balancer is shedding because it's saturated vs an empty/dark pool.

lb := upstream.NewLeastConnLoadBalancer([]*upstream.Target{
	{Host: "10.0.0.1:8080", Transport: &upstream.HTTPTransport{}, MaxConcurrent: 100},
	{Host: "10.0.0.2:8080", Transport: &upstream.HTTPTransport{}, MaxConcurrent: 100},
})
UpstreamInflight(lb)       // gauges: parapet_upstream_inflight{host} + _capacity{host}
lb.OnShed = UpstreamShed() // counter: parapet_upstream_shed_total{reason}
_ = lb                     // s.Use(upstream.New(lb))

func UpstreamShed added in v0.18.0

func UpstreamShed() upstream.ShedFunc

UpstreamShed returns an upstream.ShedFunc that counts LeastConnLoadBalancer sheds on the shared registry, for wiring into LeastConnLoadBalancer.OnShed:

lb := upstream.NewLeastConnLoadBalancer(targets)
lb.OnShed = prom.UpstreamShed()
s.Use(upstream.New(lb))

It registers one metric (lazily, once per process):

{namespace}_upstream_shed_total{reason}  counter of pre-round-trip sheds, reason:
    "saturated" — every gate-up target was at its MaxConcurrent cap (the brownout)
    "empty"     — the pool had no targets configured
    "all_dark"  — the active-HC gate marked the whole pool down

This disambiguates a capacity brownout (reason="saturated") from a dead/empty pool (reason="empty"/"all_dark") and from a circuit-breaker all-open shed (a different balancer, reported via prom.UpstreamState) — all of which otherwise collapse into prom.Upstream's host-less upstream_fast_rejects_total{host=""}. Alert on the RATE of reason="saturated" to catch sustained bulkhead overload. The reason label is a closed three-value set, so the series can never grow unbounded. The wiring here leaves pkg/upstream free of any Prometheus dependency.

func UpstreamState added in v0.18.0

func UpstreamState() upstream.StateChangeFunc

UpstreamState returns an upstream.StateChangeFunc that records per-target circuit-breaker / ejection state on the shared registry, for wiring into a reliability balancer's OnStateChange:

lb := upstream.NewCircuitBreakingLoadBalancer(targets)
lb.OnStateChange = prom.UpstreamState()
s.Use(upstream.New(lb))

It registers three metrics (lazily, once per process):

{namespace}_upstream_breaker_state{host}                           gauge: 0 closed, 1 open, 2 half_open
{namespace}_upstream_state_transitions_total{host,from,to,reason}  counter of transitions
{namespace}_upstream_probe_down_total{host,cause}                  counter of active-HC probe-down events by cause

The transitions counter is the authoritative signal: its increments are exact and order-independent, so alert on it. The gauge is a best-effort convenience — under concurrent transitions on the same host the emit-after-commit ordering can leave it momentarily, or until the next transition, showing a stale value; and for the ejecting balancers it reflects committed eject/recover events, not cooldown-expiry rotation membership (a target whose cooldown expired but has not yet served a successful request reads open). A target that has never transitioned has no gauge sample. The host label is the operator-configured upstream target (bounded).

The probe_down counter breaks ActiveHealthCheck down-events out by classified failure cause (one of: timeout, refused, reset, dns, tls, status, error — a bounded closed set) for mid-incident triage; it is populated only by ActiveHealthCheck.OnStateChange and never by the circuit-breaker or ejecting balancers. Probe transitions still flow into transitions_total as reason="probe_down"/"probe_recover" with no change to that counter.

Example

Make circuit-breaker / ejection state observable: count transitions and track the current state per backend.

lb := upstream.NewCircuitBreakingLoadBalancer([]*upstream.Target{
	{Host: "10.0.0.1:8080", Transport: &upstream.HTTPTransport{}},
	{Host: "10.0.0.2:8080", Transport: &upstream.HTTPTransport{}},
})
lb.OnStateChange = UpstreamState() // prom.UpstreamState()

u := upstream.New(lb)
u.OnRoundTrip = Upstream() // prom.Upstream() — also counts fast-reject 503s
_ = u

func WAF added in v0.18.0

func WAF() waf.ObserveFunc

WAF returns a waf.ObserveFunc that records per-request WAF rule-evaluation latency on the shared registry, for wiring into WAF.Observe — keeping pkg/waf Prometheus-free (the prom.Mirror / prom.Cache convention):

w := waf.New()
w.Observe = prom.WAF()

It registers one metric (lazily, once per process):

{namespace}_waf_eval_duration_seconds{outcome}   histogram of rule-eval latency
    (outcome = pass|allow|block|error — pass is the no-match fall-through
     plus log-only matches plus FailOpen-swallowed errors; error is only a
     FailClosed terminating error). The span is rule evaluation ONLY: it
     excludes client body buffering and geo/ASN/request-map construction,
     matching waf.MatchEvent.Elapsed; the no-rules fast path is not recorded.
     Each outcome's _count child doubles as a per-outcome request rate, so no
     separate counter is registered.

Buckets are sub-ms..25ms, sized to the default 5ms EvalTimeout (DefBuckets would collapse the whole distribution into one bucket). A per-outcome tail-latency panel is:

histogram_quantile(0.99, sum by (le, outcome)
  (rate(parapet_waf_eval_duration_seconds_bucket[5m])))
Example

WAF observability: a per-request rule-eval latency histogram, split by outcome (pass|allow|block|error), so a dashboard can show whether the WAF is adding tail latency and on which path.

w := waf.New()
w.Observe = WAF() // prom.WAF(): records parapet_waf_eval_duration_seconds{outcome}
_ = w             // s.Use(w)

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL