quotaheaders

package
v0.10.10 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 6, 2026 License: MIT Imports: 4 Imported by: 0

Documentation

Overview

Package quotaheaders parses provider-supplied rate-limit / quota response headers into a structured Signal. The Signal feeds the per-provider quota state machine introduced in fizeau-92b4b823 so dispatch can route around providers whose subscription/daily cap is hit (or imminently will be).

This package is intentionally limited to the subscription/daily exhaustion case. Per-second / per-minute throttling (a 429 with a short Retry-After) stays in the per-request feedback path.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Signal

type Signal struct {
	// Present is true when the response carried at least one recognized
	// rate-limit header. Consumers must check Present before reading the
	// numeric fields.
	Present bool

	// RemainingTokens is the remaining token budget in the current window.
	// -1 means the provider did not report a token budget on this response.
	RemainingTokens int64
	// RemainingRequests is the remaining request budget in the current
	// window. -1 means the provider did not report a request budget.
	RemainingRequests int64

	// ResetTime is the absolute wall-clock instant when the smaller of the
	// remaining-tokens / remaining-requests window resets. Zero when no
	// reset header was supplied.
	ResetTime time.Time

	// RetryAfter is the duration carried by an explicit Retry-After header.
	// Zero when absent. RetryAfter > 0 always indicates the provider has
	// already returned a 429-class response and wants the caller to wait.
	RetryAfter time.Duration
}

Signal is the structured shape returned by every per-provider parser. All fields are zero values when the corresponding header is missing.

The router treats a Signal as actionable only when Present is true; that guards against treating a response with no rate-limit headers (e.g. a 200 from a self-hosted endpoint) as "remaining tokens = 0".

func ParseAnthropic

func ParseAnthropic(h http.Header, now time.Time) Signal

ParseAnthropic decodes Anthropic Messages-API rate-limit headers.

Recognized canonical headers (current Anthropic docs):

anthropic-ratelimit-requests-{limit,remaining,reset}
anthropic-ratelimit-tokens-{limit,remaining,reset}
anthropic-ratelimit-input-tokens-{limit,remaining,reset}
anthropic-ratelimit-output-tokens-{limit,remaining,reset}
retry-after

The bead also references the legacy "X-RateLimit-Remaining-*" family; those names are accepted for forward-compatibility but the canonical "anthropic-ratelimit-*" names are preferred when both are present.

Reset values are RFC3339 timestamps (Anthropic) and are returned as the minimum of the per-axis reset times so the consumer wakes at the earliest recovery boundary.

func ParseOpenAI

func ParseOpenAI(h http.Header, now time.Time) Signal

ParseOpenAI decodes OpenAI Chat Completions rate-limit headers.

Recognized headers (https://platform.openai.com/docs/guides/rate-limits):

x-ratelimit-limit-requests
x-ratelimit-limit-tokens
x-ratelimit-remaining-requests
x-ratelimit-remaining-tokens
x-ratelimit-reset-requests   (duration: "1s", "100ms", "2m30s")
x-ratelimit-reset-tokens     (duration)
retry-after

Reset values are durations; this parser converts them to absolute times using `now`.

func ParseOpenRouter

func ParseOpenRouter(h http.Header, now time.Time) Signal

ParseOpenRouter decodes OpenRouter rate-limit headers.

Recognized headers (https://openrouter.ai/docs/limits):

x-ratelimit-limit
x-ratelimit-remaining
x-ratelimit-reset    (Unix ms timestamp)
retry-after

OpenRouter expresses one combined budget rather than separate token / request axes; the parser maps it onto RemainingRequests and leaves RemainingTokens at -1 ("not reported").

func (Signal) IsExhausted

func (s Signal) IsExhausted(now time.Time) (bool, time.Time)

IsExhausted applies the conservative "remaining tokens/requests would not last to the next reset" heuristic from the bead. The current rule:

  • explicit Retry-After > 0 (provider already returned 429) — exhausted
  • RemainingRequests reported and == 0 — exhausted
  • RemainingTokens reported and == 0 — exhausted

Returns the wall-clock retry-after time (zero when nothing actionable was signaled). The "will-exhaust-before-reset" predicate is intentionally conservative: only zero-budget triggers exhaustion. That keeps false positives out of the dispatch path while still catching the cases where the provider has authoritatively said "no more headroom in this window."

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL