ratelimit

package

v1.0.10 Latest Latest Go to latest Published: Nov 30, 2025 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/cecil-the-coder/ai-provider-kit

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ratelimit provides rate limiting functionality for AI provider API requests. It supports tracking rate limits across multiple providers including Anthropic, OpenAI, Cerebras, OpenRouter, and others.

Index ¶

func FormatDuration(d time.Duration) string
func FormatQwenInfo(info *Info) string
type AnthropicParser
- func NewAnthropicParser() *AnthropicParser
- func (p *AnthropicParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *AnthropicParser) ParseAndValidate(headers http.Header, model string) (*Info, error)
- func (p *AnthropicParser) ProviderName() string
type CerebrasParser
- func (p *CerebrasParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *CerebrasParser) ProviderName() string
type GeminiParser
- func NewGeminiParser() *GeminiParser
- func (p *GeminiParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *GeminiParser) ProviderName() string
type Info
type OpenAIParser
- func NewOpenAIParser() *OpenAIParser
- func (p *OpenAIParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *OpenAIParser) ProviderName() string
type OpenRouterParser
- func NewOpenRouterParser() *OpenRouterParser
- func (p *OpenRouterParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *OpenRouterParser) ProviderName() string
type Parser
type QwenParser
- func NewQwenParser(logHeaders bool) *QwenParser
- func (p *QwenParser) Parse(headers http.Header, model string) (*Info, error)
- func (p *QwenParser) ProviderName() string
type Tracker
- func NewTracker() *Tracker
- func (t *Tracker) CanMakeRequest(model string, estimatedTokens int) bool
- func (t *Tracker) Get(model string) (*Info, bool)
- func (t *Tracker) GetWaitTime(model string) time.Duration
- func (t *Tracker) ShouldThrottle(model string, threshold float64) bool
- func (t *Tracker) Update(info *Info)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func FormatDuration ¶

func FormatDuration(d time.Duration) string

FormatDuration formats a duration string in a human-readable way. This is useful for displaying reset times.

func FormatQwenInfo ¶

func FormatQwenInfo(info *Info) string

FormatQwenInfo formats rate limit info for human-readable display. This is useful for logging and debugging.

Types ¶

type AnthropicParser ¶

type AnthropicParser struct{}

AnthropicParser implements the Parser interface for Anthropic's rate limit headers. Anthropic uses RFC 3339 timestamps for reset times and tracks input/output tokens separately.

Header format:

anthropic-ratelimit-requests-limit: Maximum requests allowed
anthropic-ratelimit-requests-remaining: Requests remaining
anthropic-ratelimit-requests-reset: RFC 3339 timestamp when limit resets
anthropic-ratelimit-tokens-limit: Maximum total tokens allowed
anthropic-ratelimit-tokens-remaining: Total tokens remaining
anthropic-ratelimit-tokens-reset: RFC 3339 timestamp when token limit resets
anthropic-ratelimit-input-tokens-limit: Maximum input tokens allowed
anthropic-ratelimit-input-tokens-remaining: Input tokens remaining
anthropic-ratelimit-input-tokens-reset: RFC 3339 timestamp when input token limit resets
anthropic-ratelimit-output-tokens-limit: Maximum output tokens allowed
anthropic-ratelimit-output-tokens-remaining: Output tokens remaining
anthropic-ratelimit-output-tokens-reset: RFC 3339 timestamp when output token limit resets
request-id: Unique request identifier
retry-after: Seconds to wait before retrying (on 429 responses)

func NewAnthropicParser ¶

func NewAnthropicParser() *AnthropicParser

NewAnthropicParser creates a new Anthropic rate limit parser.

func (*AnthropicParser) Parse ¶

func (p *AnthropicParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Anthropic API response headers. It handles both standard token limits and separate input/output token limits. Missing headers are handled gracefully by leaving the corresponding fields at zero values.

func (*AnthropicParser) ParseAndValidate ¶

func (p *AnthropicParser) ParseAndValidate(headers http.Header, model string) (*Info, error)

ParseAndValidate is a convenience method that parses headers and validates that at least some rate limit information was found.

func (*AnthropicParser) ProviderName ¶

func (p *AnthropicParser) ProviderName() string

ProviderName returns "anthropic" as the provider identifier.

type CerebrasParser ¶

type CerebrasParser struct{}

CerebrasParser implements the Parser interface for Cerebras API rate limits. Cerebras tracks both daily requests and per-minute limits for requests and tokens.

Cerebras Rate Limit Headers:

x-ratelimit-limit-requests-day: Daily request limit
x-ratelimit-remaining-requests-day: Remaining daily requests
x-ratelimit-reset-requests-day: Daily limit reset time (float seconds)
x-ratelimit-limit-requests-minute: Per-minute request limit
x-ratelimit-remaining-requests-minute: Remaining per-minute requests
x-ratelimit-reset-requests-minute: Per-minute limit reset time (float seconds)
x-ratelimit-limit-tokens-minute: Per-minute token limit
x-ratelimit-remaining-tokens-minute: Remaining per-minute tokens
x-ratelimit-reset-tokens-minute: Per-minute token reset time (float seconds)

Custom Headers:

cerebras-request-id: Unique request identifier
cerebras-processing-time: Processing time in seconds
cerebras-region: Data center region

func (*CerebrasParser) Parse ¶

func (p *CerebrasParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Cerebras API response headers. The model parameter is stored in the Info but doesn't affect parsing logic.

Note: Cerebras uses FLOAT SECONDS for reset times (e.g., "33011.382867"), which are converted to absolute time.Time values by adding to time.Now().

func (*CerebrasParser) ProviderName ¶

func (p *CerebrasParser) ProviderName() string

ProviderName returns "cerebras" as the provider identifier.

type GeminiParser ¶

type GeminiParser struct{}

GeminiParser implements the Parser interface for Google's Gemini API rate limit headers.

IMPORTANT LIMITATION: The Gemini API does NOT provide rate limit information in normal API responses. Unlike OpenAI, Anthropic, and other providers, Gemini does not include headers like x-ratelimit-limit-requests or x-ratelimit-remaining-requests in successful responses (200 OK).

This parser can only extract information from error responses (429 Too Many Requests):

retry-after: Number of seconds to wait before retrying (or HTTP date)

For proactive rate limiting with Gemini, client-side tracking is required:

Track your own request counts and timing
Implement token bucket or leaky bucket algorithms
Use official quota limits from Google Cloud Console
Monitor usage through Google Cloud Console/API

The Gemini API follows these rate limits (as of 2024):

Free tier: 15 RPM (requests per minute), 1 million TPM (tokens per minute)
Pay-as-you-go: 360 RPM, 4 million TPM (varies by model)
Limits are per project and can be viewed in Google Cloud Console

Since these limits are not provided in headers, this parser primarily serves to:

Extract retry-after duration from 429 error responses
Provide a consistent interface with other provider parsers
Return minimal Info with Provider and Model for tracking purposes

func NewGeminiParser ¶

func NewGeminiParser() *GeminiParser

NewGeminiParser creates a new Gemini rate limit parser.

func (*GeminiParser) Parse ¶

func (p *GeminiParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Gemini API response headers.

Unlike other providers, Gemini does not return proactive rate limit headers. This parser only extracts the retry-after header when present (typically in 429 responses).

The retry-after header can be in two formats:

Integer seconds: "60" (wait 60 seconds)
HTTP date: "Wed, 21 Oct 2015 07:28:00 GMT" (wait until this time)

For normal successful responses (200 OK), this will return minimal Info with:

Provider: "gemini"
Model: the provided model name
All limit/remaining fields: 0 (unknown)
RetryAfter: 0 (no retry needed)

Parameters:

headers: HTTP response headers from Gemini API
model: The model identifier (e.g., "gemini-pro", "gemini-pro-vision")

Returns:

Info with minimal rate limit information
error is always nil (this parser doesn't fail)

func (*GeminiParser) ProviderName ¶

func (p *GeminiParser) ProviderName() string

ProviderName returns "gemini" as the provider identifier.

type Info ¶

type Info struct {
	// Provider is the name of the AI provider (e.g., "anthropic", "openai")
	Provider string `json:"provider"`

	// Model is the specific model identifier (e.g., "claude-3-opus-20240229")
	Model string `json:"model"`

	// Timestamp is when this rate limit information was captured
	Timestamp time.Time `json:"timestamp"`

	// RequestsLimit is the maximum number of requests allowed in the current window
	RequestsLimit int `json:"requests_limit"`

	// RequestsRemaining is the number of requests remaining in the current window
	RequestsRemaining int `json:"requests_remaining"`

	// RequestsReset is when the request limit counter will reset
	RequestsReset time.Time `json:"requests_reset"`

	// TokensLimit is the maximum number of tokens allowed in the current window
	TokensLimit int `json:"tokens_limit"`

	// TokensRemaining is the number of tokens remaining in the current window
	TokensRemaining int `json:"tokens_remaining"`

	// TokensReset is when the token limit counter will reset
	TokensReset time.Time `json:"tokens_reset"`

	// InputTokensLimit is the maximum number of input tokens allowed (Anthropic)
	InputTokensLimit int `json:"input_tokens_limit,omitempty"`

	// InputTokensRemaining is the number of input tokens remaining (Anthropic)
	InputTokensRemaining int `json:"input_tokens_remaining,omitempty"`

	// InputTokensReset is when the input token limit resets (Anthropic)
	InputTokensReset time.Time `json:"input_tokens_reset,omitempty"`

	// OutputTokensLimit is the maximum number of output tokens allowed (Anthropic)
	OutputTokensLimit int `json:"output_tokens_limit,omitempty"`

	// OutputTokensRemaining is the number of output tokens remaining (Anthropic)
	OutputTokensRemaining int `json:"output_tokens_remaining,omitempty"`

	// OutputTokensReset is when the output token limit resets (Anthropic)
	OutputTokensReset time.Time `json:"output_tokens_reset,omitempty"`

	// DailyRequestsLimit is the maximum number of requests per day (Cerebras)
	DailyRequestsLimit int `json:"daily_requests_limit,omitempty"`

	// DailyRequestsRemaining is the number of daily requests remaining (Cerebras)
	DailyRequestsRemaining int `json:"daily_requests_remaining,omitempty"`

	// DailyRequestsReset is when the daily request limit resets (Cerebras)
	DailyRequestsReset time.Time `json:"daily_requests_reset,omitempty"`

	// CreditsLimit is the maximum credits available (OpenRouter)
	CreditsLimit float64 `json:"credits_limit,omitempty"`

	// CreditsRemaining is the number of credits remaining (OpenRouter)
	CreditsRemaining float64 `json:"credits_remaining,omitempty"`

	// IsFreeTier indicates if the account is on the free tier (OpenRouter)
	IsFreeTier bool `json:"is_free_tier,omitempty"`

	// RequestID is the unique identifier for the request that generated this info
	RequestID string `json:"request_id,omitempty"`

	// RetryAfter indicates how long to wait before retrying (from Retry-After header)
	RetryAfter time.Duration `json:"retry_after,omitempty"`

	// CustomData holds any additional provider-specific data that doesn't fit standard fields
	CustomData map[string]interface{} `json:"custom_data,omitempty"`
}

Info contains rate limit information for a specific model from an AI provider. It includes standard rate limit fields as well as provider-specific fields to accommodate the varying rate limit schemes used by different AI providers.

type OpenAIParser ¶

type OpenAIParser struct{}

OpenAIParser implements the Parser interface for OpenAI's rate limit headers. OpenAI provides rate limit information in the following format:

x-ratelimit-limit-requests: Maximum requests allowed per time window
x-ratelimit-remaining-requests: Requests remaining in current window
x-ratelimit-reset-requests: Duration until window resets (e.g., "6m0s", "1h30m")
x-ratelimit-limit-tokens: Maximum tokens allowed per time window
x-ratelimit-remaining-tokens: Tokens remaining in current window
x-ratelimit-reset-tokens: Duration until token window resets
x-request-id: Unique identifier for the request
retry-after: Optional retry delay in seconds

Example ¶

ExampleOpenAIParser demonstrates parsing OpenAI rate limit headers

package main

import (
	"fmt"
	"net/http"
	"time"

	"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)

func main() {
	// Simulate OpenAI response headers
	headers := http.Header{
		"X-Ratelimit-Limit-Requests":     []string{"60"},
		"X-Ratelimit-Remaining-Requests": []string{"58"},
		"X-Ratelimit-Reset-Requests":     []string{"6m0s"},
		"X-Ratelimit-Limit-Tokens":       []string{"90000"},
		"X-Ratelimit-Remaining-Tokens":   []string{"85000"},
		"X-Ratelimit-Reset-Tokens":       []string{"1m30s"},
		"X-Request-Id":                   []string{"req_abc123"},
	}

	// Create parser and parse headers
	parser := ratelimit.NewOpenAIParser()
	info, err := parser.Parse(headers, "gpt-4")
	if err != nil {
		fmt.Printf("Error parsing headers: %v\n", err)
		return
	}

	// Display parsed information
	fmt.Printf("Provider: %s\n", info.Provider)
	fmt.Printf("Model: %s\n", info.Model)
	fmt.Printf("Requests: %d / %d remaining\n", info.RequestsRemaining, info.RequestsLimit)
	fmt.Printf("Tokens: %d / %d remaining\n", info.TokensRemaining, info.TokensLimit)
	fmt.Printf("Request ID: %s\n", info.RequestID)
	fmt.Printf("Requests reset in: %v\n", time.Until(info.RequestsReset).Round(time.Second))
	fmt.Printf("Tokens reset in: %v\n", time.Until(info.TokensReset).Round(time.Second))

}

Output:
Provider: openai
Model: gpt-4
Requests: 58 / 60 remaining
Tokens: 85000 / 90000 remaining
Request ID: req_abc123
Requests reset in: 6m0s
Tokens reset in: 1m30s

Example (WithTracker) ¶

ExampleOpenAIParser_withTracker demonstrates using the parser with a rate limit tracker

package main

import (
	"fmt"
	"net/http"
	"time"

	"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)

func main() {
	// Create a tracker to manage rate limits
	tracker := ratelimit.NewTracker()

	// Simulate parsing response headers
	headers := http.Header{
		"X-Ratelimit-Limit-Requests":     []string{"60"},
		"X-Ratelimit-Remaining-Requests": []string{"5"},
		"X-Ratelimit-Reset-Requests":     []string{"30s"},
		"X-Ratelimit-Limit-Tokens":       []string{"90000"},
		"X-Ratelimit-Remaining-Tokens":   []string{"1000"},
		"X-Ratelimit-Reset-Tokens":       []string{"30s"},
	}

	parser := ratelimit.NewOpenAIParser()
	info, _ := parser.Parse(headers, "gpt-4")

	// Update tracker with parsed info
	tracker.Update(info)

	// Check if we can make a request
	if tracker.CanMakeRequest("gpt-4", 500) {
		fmt.Println("Request allowed")
	} else {
		waitTime := tracker.GetWaitTime("gpt-4")
		fmt.Printf("Rate limited. Retry after: %v\n", waitTime.Round(time.Second))
	}

	// Check if we should throttle (99% threshold)
	if tracker.ShouldThrottle("gpt-4", 0.99) {
		fmt.Println("Approaching rate limits - consider throttling")
	}

}

Output:
Request allowed

func NewOpenAIParser ¶

func NewOpenAIParser() *OpenAIParser

NewOpenAIParser creates a new OpenAI rate limit parser.

func (*OpenAIParser) Parse ¶

func (p *OpenAIParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from OpenAI response headers. It handles both request-based and token-based rate limits. Reset times are provided as duration strings (e.g., "6m0s") which are parsed and converted to absolute timestamps.

func (*OpenAIParser) ProviderName ¶

func (p *OpenAIParser) ProviderName() string

ProviderName returns "openai" as the provider identifier.

type OpenRouterParser ¶

type OpenRouterParser struct{}

OpenRouterParser implements the Parser interface for OpenRouter's rate limit headers. OpenRouter provides rate limit information in the following format:

x-ratelimit-limit: Maximum credits or requests allowed per time window
x-ratelimit-remaining: Credits or requests remaining in current window
x-ratelimit-reset: Milliseconds since epoch when the limit resets
x-ratelimit-requests: Optional request count limit
x-ratelimit-tokens: Optional token count limit

OpenRouter uses a credit-based system where different models consume different amounts of credits per request. The free tier has different limits than paid tiers.

Note: OpenRouter also provides a proactive rate limit checking endpoint at /api/v1/key which can be used to query rate limits without making actual model requests. This parser only handles rate limit information from response headers.

func NewOpenRouterParser ¶

func NewOpenRouterParser() *OpenRouterParser

NewOpenRouterParser creates a new OpenRouter rate limit parser.

func (*OpenRouterParser) Parse ¶

func (p *OpenRouterParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from OpenRouter response headers. OpenRouter uses a hybrid system that can include both credit-based and request-based limits.

Key differences from other providers:

Reset time is in MILLISECONDS since epoch (not seconds or duration)
May have credit-based limits (x-ratelimit-limit/remaining as floats)
May have request-based limits (x-ratelimit-requests)
May have token-based limits (x-ratelimit-tokens)
Free tier accounts may have different limits

func (*OpenRouterParser) ProviderName ¶

func (p *OpenRouterParser) ProviderName() string

ProviderName returns "openrouter" as the provider identifier.

type Parser ¶

type Parser interface {
	// Parse extracts rate limit information from HTTP response headers.
	// It takes the response headers and model name as input and returns
	// a populated Info struct or an error if parsing fails.
	Parse(headers http.Header, model string) (*Info, error)

	// ProviderName returns the name of the provider this parser handles.
	// This is used for logging and tracking purposes.
	ProviderName() string
}

Parser is the interface that must be implemented by provider-specific rate limit parsers. Each AI provider has different header formats and schemes for communicating rate limits, so each provider needs its own parser implementation.

type QwenParser ¶

type QwenParser struct {
	// contains filtered or unexported fields
}

QwenParser implements the Parser interface for Qwen's rate limit headers.

Qwen (DashScope API) uses a combination of:

OpenAI-compatible headers in compatible-mode (x-ratelimit-*)
DashScope-specific headers (dashscope-*, x-dashscope-*)
Request tracking headers (x-request-id, req-cost-time)
Standard retry-after headers for rate limit recovery

DISCOVERED HEADERS (through API testing):

x-request-id: Unique request identifier
req-cost-time: Request processing time in milliseconds
req-arrive-time: Request arrival timestamp
resp-start-time: Response start timestamp

LIKELY RATE LIMIT HEADERS (based on OpenAI-compatible mode):

x-ratelimit-limit-requests: Maximum requests allowed per time window
x-ratelimit-remaining-requests: Requests remaining in current window
x-ratelimit-reset-requests: Duration until window resets
x-ratelimit-limit-tokens: Maximum tokens allowed per time window
x-ratelimit-remaining-tokens: Tokens remaining in current window
x-ratelimit-reset-tokens: Duration until token window resets
retry-after: Seconds to wait before retrying (on 429 responses)

POTENTIAL DASHSCOPE-SPECIFIC HEADERS:

dashscope-ratelimit-*: Possible DashScope-specific rate limit headers
x-dashscope-*: Alternative DashScope header prefix

func NewQwenParser ¶

func NewQwenParser(logHeaders bool) *QwenParser

NewQwenParser creates a new Qwen rate limit parser. Set logHeaders to true to enable header logging for debugging and documentation.

func (*QwenParser) Parse ¶

func (p *QwenParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Qwen response headers. It attempts multiple parsing strategies to handle undocumented header formats:

Standard x-ratelimit-* headers (OpenAI-compatible)
Qwen-specific qwen-ratelimit-* headers
Retry-After header for backoff timing

The implementation is deliberately flexible to accommodate various possible formats.

func (*QwenParser) ProviderName ¶

func (p *QwenParser) ProviderName() string

ProviderName returns "qwen" as the provider identifier.

type Tracker ¶

type Tracker struct {
	// contains filtered or unexported fields
}

Tracker provides thread-safe tracking of rate limit information across multiple models. It maintains the current rate limit state for each model and provides methods to check if requests can be made and when to retry.

func NewTracker ¶

func NewTracker() *Tracker

NewTracker creates a new Tracker instance for tracking rate limits.

func (*Tracker) CanMakeRequest ¶

func (t *Tracker) CanMakeRequest(model string, estimatedTokens int) bool

CanMakeRequest checks if a request can be made for the given model with the estimated number of tokens. It returns true if the request is likely to succeed based on current rate limits, false otherwise. This method is thread-safe.

func (*Tracker) Get ¶

func (t *Tracker) Get(model string) (*Info, bool)

Get retrieves the rate limit information for a specific model. It returns the Info and a boolean indicating whether the model was found. This method is thread-safe and can be called concurrently.

func (*Tracker) GetWaitTime ¶

func (t *Tracker) GetWaitTime(model string) time.Duration

GetWaitTime returns the duration to wait before the next request can be made for the given model. If no waiting is required, it returns 0. This method is thread-safe.

func (*Tracker) ShouldThrottle ¶

func (t *Tracker) ShouldThrottle(model string, threshold float64) bool

ShouldThrottle determines if requests should be throttled based on the current rate limit usage. The threshold parameter is a value between 0 and 1 representing the percentage of limits consumed at which throttling should begin. For example, threshold=0.8 means throttle when 80% of limits are consumed. This method is thread-safe.

func (*Tracker) Update ¶

func (t *Tracker) Update(info *Info)

Update updates the rate limit information for a model. This method is thread-safe and can be called concurrently.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL