Documentation
¶
Overview ¶
Package ratelimit provides rate limiting functionality for AI provider API requests. It supports tracking rate limits across multiple providers including Anthropic, OpenAI, Cerebras, OpenRouter, and others.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func FormatDuration ¶
FormatDuration formats a duration string in a human-readable way. This is useful for displaying reset times.
func FormatQwenInfo ¶
FormatQwenInfo formats rate limit info for human-readable display. This is useful for logging and debugging.
Types ¶
type AnthropicParser ¶
type AnthropicParser struct{}
AnthropicParser implements the Parser interface for Anthropic's rate limit headers. Anthropic uses RFC 3339 timestamps for reset times and tracks input/output tokens separately.
Header format:
- anthropic-ratelimit-requests-limit: Maximum requests allowed
- anthropic-ratelimit-requests-remaining: Requests remaining
- anthropic-ratelimit-requests-reset: RFC 3339 timestamp when limit resets
- anthropic-ratelimit-tokens-limit: Maximum total tokens allowed
- anthropic-ratelimit-tokens-remaining: Total tokens remaining
- anthropic-ratelimit-tokens-reset: RFC 3339 timestamp when token limit resets
- anthropic-ratelimit-input-tokens-limit: Maximum input tokens allowed
- anthropic-ratelimit-input-tokens-remaining: Input tokens remaining
- anthropic-ratelimit-input-tokens-reset: RFC 3339 timestamp when input token limit resets
- anthropic-ratelimit-output-tokens-limit: Maximum output tokens allowed
- anthropic-ratelimit-output-tokens-remaining: Output tokens remaining
- anthropic-ratelimit-output-tokens-reset: RFC 3339 timestamp when output token limit resets
- request-id: Unique request identifier
- retry-after: Seconds to wait before retrying (on 429 responses)
func NewAnthropicParser ¶
func NewAnthropicParser() *AnthropicParser
NewAnthropicParser creates a new Anthropic rate limit parser.
func (*AnthropicParser) Parse ¶
Parse extracts rate limit information from Anthropic API response headers. It handles both standard token limits and separate input/output token limits. Missing headers are handled gracefully by leaving the corresponding fields at zero values.
func (*AnthropicParser) ParseAndValidate ¶
ParseAndValidate is a convenience method that parses headers and validates that at least some rate limit information was found.
func (*AnthropicParser) ProviderName ¶
func (p *AnthropicParser) ProviderName() string
ProviderName returns "anthropic" as the provider identifier.
type CerebrasParser ¶
type CerebrasParser struct{}
CerebrasParser implements the Parser interface for Cerebras API rate limits. Cerebras tracks both daily requests and per-minute limits for requests and tokens.
Cerebras Rate Limit Headers:
- x-ratelimit-limit-requests-day: Daily request limit
- x-ratelimit-remaining-requests-day: Remaining daily requests
- x-ratelimit-reset-requests-day: Daily limit reset time (float seconds)
- x-ratelimit-limit-requests-minute: Per-minute request limit
- x-ratelimit-remaining-requests-minute: Remaining per-minute requests
- x-ratelimit-reset-requests-minute: Per-minute limit reset time (float seconds)
- x-ratelimit-limit-tokens-minute: Per-minute token limit
- x-ratelimit-remaining-tokens-minute: Remaining per-minute tokens
- x-ratelimit-reset-tokens-minute: Per-minute token reset time (float seconds)
Custom Headers:
- cerebras-request-id: Unique request identifier
- cerebras-processing-time: Processing time in seconds
- cerebras-region: Data center region
func (*CerebrasParser) Parse ¶
Parse extracts rate limit information from Cerebras API response headers. The model parameter is stored in the Info but doesn't affect parsing logic.
Note: Cerebras uses FLOAT SECONDS for reset times (e.g., "33011.382867"), which are converted to absolute time.Time values by adding to time.Now().
func (*CerebrasParser) ProviderName ¶
func (p *CerebrasParser) ProviderName() string
ProviderName returns "cerebras" as the provider identifier.
type GeminiParser ¶
type GeminiParser struct{}
GeminiParser implements the Parser interface for Google's Gemini API rate limit headers.
IMPORTANT LIMITATION: The Gemini API does NOT provide rate limit information in normal API responses. Unlike OpenAI, Anthropic, and other providers, Gemini does not include headers like x-ratelimit-limit-requests or x-ratelimit-remaining-requests in successful responses (200 OK).
This parser can only extract information from error responses (429 Too Many Requests):
- retry-after: Number of seconds to wait before retrying (or HTTP date)
For proactive rate limiting with Gemini, client-side tracking is required:
- Track your own request counts and timing
- Implement token bucket or leaky bucket algorithms
- Use official quota limits from Google Cloud Console
- Monitor usage through Google Cloud Console/API
The Gemini API follows these rate limits (as of 2024):
- Free tier: 15 RPM (requests per minute), 1 million TPM (tokens per minute)
- Pay-as-you-go: 360 RPM, 4 million TPM (varies by model)
- Limits are per project and can be viewed in Google Cloud Console
Since these limits are not provided in headers, this parser primarily serves to:
- Extract retry-after duration from 429 error responses
- Provide a consistent interface with other provider parsers
- Return minimal Info with Provider and Model for tracking purposes
func NewGeminiParser ¶
func NewGeminiParser() *GeminiParser
NewGeminiParser creates a new Gemini rate limit parser.
func (*GeminiParser) Parse ¶
Parse extracts rate limit information from Gemini API response headers.
Unlike other providers, Gemini does not return proactive rate limit headers. This parser only extracts the retry-after header when present (typically in 429 responses).
The retry-after header can be in two formats:
- Integer seconds: "60" (wait 60 seconds)
- HTTP date: "Wed, 21 Oct 2015 07:28:00 GMT" (wait until this time)
For normal successful responses (200 OK), this will return minimal Info with:
- Provider: "gemini"
- Model: the provided model name
- All limit/remaining fields: 0 (unknown)
- RetryAfter: 0 (no retry needed)
Parameters:
- headers: HTTP response headers from Gemini API
- model: The model identifier (e.g., "gemini-pro", "gemini-pro-vision")
Returns:
- Info with minimal rate limit information
- error is always nil (this parser doesn't fail)
func (*GeminiParser) ProviderName ¶
func (p *GeminiParser) ProviderName() string
ProviderName returns "gemini" as the provider identifier.
type Info ¶
type Info struct {
// Provider is the name of the AI provider (e.g., "anthropic", "openai")
Provider string `json:"provider"`
// Model is the specific model identifier (e.g., "claude-3-opus-20240229")
Model string `json:"model"`
// Timestamp is when this rate limit information was captured
Timestamp time.Time `json:"timestamp"`
// RequestsLimit is the maximum number of requests allowed in the current window
RequestsLimit int `json:"requests_limit"`
// RequestsRemaining is the number of requests remaining in the current window
RequestsRemaining int `json:"requests_remaining"`
// RequestsReset is when the request limit counter will reset
RequestsReset time.Time `json:"requests_reset"`
// TokensLimit is the maximum number of tokens allowed in the current window
TokensLimit int `json:"tokens_limit"`
// TokensRemaining is the number of tokens remaining in the current window
TokensRemaining int `json:"tokens_remaining"`
// TokensReset is when the token limit counter will reset
TokensReset time.Time `json:"tokens_reset"`
// InputTokensLimit is the maximum number of input tokens allowed (Anthropic)
InputTokensLimit int `json:"input_tokens_limit,omitempty"`
// InputTokensRemaining is the number of input tokens remaining (Anthropic)
InputTokensRemaining int `json:"input_tokens_remaining,omitempty"`
// InputTokensReset is when the input token limit resets (Anthropic)
InputTokensReset time.Time `json:"input_tokens_reset,omitempty"`
// OutputTokensLimit is the maximum number of output tokens allowed (Anthropic)
OutputTokensLimit int `json:"output_tokens_limit,omitempty"`
// OutputTokensRemaining is the number of output tokens remaining (Anthropic)
OutputTokensRemaining int `json:"output_tokens_remaining,omitempty"`
// OutputTokensReset is when the output token limit resets (Anthropic)
OutputTokensReset time.Time `json:"output_tokens_reset,omitempty"`
// DailyRequestsLimit is the maximum number of requests per day (Cerebras)
DailyRequestsLimit int `json:"daily_requests_limit,omitempty"`
// DailyRequestsRemaining is the number of daily requests remaining (Cerebras)
DailyRequestsRemaining int `json:"daily_requests_remaining,omitempty"`
// DailyRequestsReset is when the daily request limit resets (Cerebras)
DailyRequestsReset time.Time `json:"daily_requests_reset,omitempty"`
// CreditsLimit is the maximum credits available (OpenRouter)
CreditsLimit float64 `json:"credits_limit,omitempty"`
// CreditsRemaining is the number of credits remaining (OpenRouter)
CreditsRemaining float64 `json:"credits_remaining,omitempty"`
// IsFreeTier indicates if the account is on the free tier (OpenRouter)
IsFreeTier bool `json:"is_free_tier,omitempty"`
// RequestID is the unique identifier for the request that generated this info
RequestID string `json:"request_id,omitempty"`
// RetryAfter indicates how long to wait before retrying (from Retry-After header)
RetryAfter time.Duration `json:"retry_after,omitempty"`
// CustomData holds any additional provider-specific data that doesn't fit standard fields
CustomData map[string]interface{} `json:"custom_data,omitempty"`
}
Info contains rate limit information for a specific model from an AI provider. It includes standard rate limit fields as well as provider-specific fields to accommodate the varying rate limit schemes used by different AI providers.
type OpenAIParser ¶
type OpenAIParser struct{}
OpenAIParser implements the Parser interface for OpenAI's rate limit headers. OpenAI provides rate limit information in the following format:
- x-ratelimit-limit-requests: Maximum requests allowed per time window
- x-ratelimit-remaining-requests: Requests remaining in current window
- x-ratelimit-reset-requests: Duration until window resets (e.g., "6m0s", "1h30m")
- x-ratelimit-limit-tokens: Maximum tokens allowed per time window
- x-ratelimit-remaining-tokens: Tokens remaining in current window
- x-ratelimit-reset-tokens: Duration until token window resets
- x-request-id: Unique identifier for the request
- retry-after: Optional retry delay in seconds
Example ¶
ExampleOpenAIParser demonstrates parsing OpenAI rate limit headers
package main
import (
"fmt"
"net/http"
"time"
"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)
func main() {
// Simulate OpenAI response headers
headers := http.Header{
"X-Ratelimit-Limit-Requests": []string{"60"},
"X-Ratelimit-Remaining-Requests": []string{"58"},
"X-Ratelimit-Reset-Requests": []string{"6m0s"},
"X-Ratelimit-Limit-Tokens": []string{"90000"},
"X-Ratelimit-Remaining-Tokens": []string{"85000"},
"X-Ratelimit-Reset-Tokens": []string{"1m30s"},
"X-Request-Id": []string{"req_abc123"},
}
// Create parser and parse headers
parser := ratelimit.NewOpenAIParser()
info, err := parser.Parse(headers, "gpt-4")
if err != nil {
fmt.Printf("Error parsing headers: %v\n", err)
return
}
// Display parsed information
fmt.Printf("Provider: %s\n", info.Provider)
fmt.Printf("Model: %s\n", info.Model)
fmt.Printf("Requests: %d / %d remaining\n", info.RequestsRemaining, info.RequestsLimit)
fmt.Printf("Tokens: %d / %d remaining\n", info.TokensRemaining, info.TokensLimit)
fmt.Printf("Request ID: %s\n", info.RequestID)
fmt.Printf("Requests reset in: %v\n", time.Until(info.RequestsReset).Round(time.Second))
fmt.Printf("Tokens reset in: %v\n", time.Until(info.TokensReset).Round(time.Second))
}
Output: Provider: openai Model: gpt-4 Requests: 58 / 60 remaining Tokens: 85000 / 90000 remaining Request ID: req_abc123 Requests reset in: 6m0s Tokens reset in: 1m30s
Example (WithTracker) ¶
ExampleOpenAIParser_withTracker demonstrates using the parser with a rate limit tracker
package main
import (
"fmt"
"net/http"
"time"
"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)
func main() {
// Create a tracker to manage rate limits
tracker := ratelimit.NewTracker()
// Simulate parsing response headers
headers := http.Header{
"X-Ratelimit-Limit-Requests": []string{"60"},
"X-Ratelimit-Remaining-Requests": []string{"5"},
"X-Ratelimit-Reset-Requests": []string{"30s"},
"X-Ratelimit-Limit-Tokens": []string{"90000"},
"X-Ratelimit-Remaining-Tokens": []string{"1000"},
"X-Ratelimit-Reset-Tokens": []string{"30s"},
}
parser := ratelimit.NewOpenAIParser()
info, _ := parser.Parse(headers, "gpt-4")
// Update tracker with parsed info
tracker.Update(info)
// Check if we can make a request
if tracker.CanMakeRequest("gpt-4", 500) {
fmt.Println("Request allowed")
} else {
waitTime := tracker.GetWaitTime("gpt-4")
fmt.Printf("Rate limited. Retry after: %v\n", waitTime.Round(time.Second))
}
// Check if we should throttle (99% threshold)
if tracker.ShouldThrottle("gpt-4", 0.99) {
fmt.Println("Approaching rate limits - consider throttling")
}
}
Output: Request allowed
func NewOpenAIParser ¶
func NewOpenAIParser() *OpenAIParser
NewOpenAIParser creates a new OpenAI rate limit parser.
func (*OpenAIParser) Parse ¶
Parse extracts rate limit information from OpenAI response headers. It handles both request-based and token-based rate limits. Reset times are provided as duration strings (e.g., "6m0s") which are parsed and converted to absolute timestamps.
func (*OpenAIParser) ProviderName ¶
func (p *OpenAIParser) ProviderName() string
ProviderName returns "openai" as the provider identifier.
type OpenRouterParser ¶
type OpenRouterParser struct{}
OpenRouterParser implements the Parser interface for OpenRouter's rate limit headers. OpenRouter provides rate limit information in the following format:
- x-ratelimit-limit: Maximum credits or requests allowed per time window
- x-ratelimit-remaining: Credits or requests remaining in current window
- x-ratelimit-reset: Milliseconds since epoch when the limit resets
- x-ratelimit-requests: Optional request count limit
- x-ratelimit-tokens: Optional token count limit
OpenRouter uses a credit-based system where different models consume different amounts of credits per request. The free tier has different limits than paid tiers.
Note: OpenRouter also provides a proactive rate limit checking endpoint at /api/v1/key which can be used to query rate limits without making actual model requests. This parser only handles rate limit information from response headers.
func NewOpenRouterParser ¶
func NewOpenRouterParser() *OpenRouterParser
NewOpenRouterParser creates a new OpenRouter rate limit parser.
func (*OpenRouterParser) Parse ¶
Parse extracts rate limit information from OpenRouter response headers. OpenRouter uses a hybrid system that can include both credit-based and request-based limits.
Key differences from other providers:
- Reset time is in MILLISECONDS since epoch (not seconds or duration)
- May have credit-based limits (x-ratelimit-limit/remaining as floats)
- May have request-based limits (x-ratelimit-requests)
- May have token-based limits (x-ratelimit-tokens)
- Free tier accounts may have different limits
func (*OpenRouterParser) ProviderName ¶
func (p *OpenRouterParser) ProviderName() string
ProviderName returns "openrouter" as the provider identifier.
type Parser ¶
type Parser interface {
// Parse extracts rate limit information from HTTP response headers.
// It takes the response headers and model name as input and returns
// a populated Info struct or an error if parsing fails.
Parse(headers http.Header, model string) (*Info, error)
// ProviderName returns the name of the provider this parser handles.
// This is used for logging and tracking purposes.
ProviderName() string
}
Parser is the interface that must be implemented by provider-specific rate limit parsers. Each AI provider has different header formats and schemes for communicating rate limits, so each provider needs its own parser implementation.
type QwenParser ¶
type QwenParser struct {
// contains filtered or unexported fields
}
QwenParser implements the Parser interface for Qwen's rate limit headers.
Qwen (DashScope API) uses a combination of:
- OpenAI-compatible headers in compatible-mode (x-ratelimit-*)
- DashScope-specific headers (dashscope-*, x-dashscope-*)
- Request tracking headers (x-request-id, req-cost-time)
- Standard retry-after headers for rate limit recovery
DISCOVERED HEADERS (through API testing):
x-request-id: Unique request identifier req-cost-time: Request processing time in milliseconds req-arrive-time: Request arrival timestamp resp-start-time: Response start timestamp
LIKELY RATE LIMIT HEADERS (based on OpenAI-compatible mode):
x-ratelimit-limit-requests: Maximum requests allowed per time window x-ratelimit-remaining-requests: Requests remaining in current window x-ratelimit-reset-requests: Duration until window resets x-ratelimit-limit-tokens: Maximum tokens allowed per time window x-ratelimit-remaining-tokens: Tokens remaining in current window x-ratelimit-reset-tokens: Duration until token window resets retry-after: Seconds to wait before retrying (on 429 responses)
POTENTIAL DASHSCOPE-SPECIFIC HEADERS:
dashscope-ratelimit-*: Possible DashScope-specific rate limit headers x-dashscope-*: Alternative DashScope header prefix
func NewQwenParser ¶
func NewQwenParser(logHeaders bool) *QwenParser
NewQwenParser creates a new Qwen rate limit parser. Set logHeaders to true to enable header logging for debugging and documentation.
func (*QwenParser) Parse ¶
Parse extracts rate limit information from Qwen response headers. It attempts multiple parsing strategies to handle undocumented header formats:
- Standard x-ratelimit-* headers (OpenAI-compatible)
- Qwen-specific qwen-ratelimit-* headers
- Retry-After header for backoff timing
The implementation is deliberately flexible to accommodate various possible formats.
func (*QwenParser) ProviderName ¶
func (p *QwenParser) ProviderName() string
ProviderName returns "qwen" as the provider identifier.
type Tracker ¶
type Tracker struct {
// contains filtered or unexported fields
}
Tracker provides thread-safe tracking of rate limit information across multiple models. It maintains the current rate limit state for each model and provides methods to check if requests can be made and when to retry.
func NewTracker ¶
func NewTracker() *Tracker
NewTracker creates a new Tracker instance for tracking rate limits.
func (*Tracker) CanMakeRequest ¶
CanMakeRequest checks if a request can be made for the given model with the estimated number of tokens. It returns true if the request is likely to succeed based on current rate limits, false otherwise. This method is thread-safe.
func (*Tracker) Get ¶
Get retrieves the rate limit information for a specific model. It returns the Info and a boolean indicating whether the model was found. This method is thread-safe and can be called concurrently.
func (*Tracker) GetWaitTime ¶
GetWaitTime returns the duration to wait before the next request can be made for the given model. If no waiting is required, it returns 0. This method is thread-safe.
func (*Tracker) ShouldThrottle ¶
ShouldThrottle determines if requests should be throttled based on the current rate limit usage. The threshold parameter is a value between 0 and 1 representing the percentage of limits consumed at which throttling should begin. For example, threshold=0.8 means throttle when 80% of limits are consumed. This method is thread-safe.