ratelimit

package
v1.0.62 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 24, 2025 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package ratelimit provides rate limiting functionality for AI provider API requests. It supports tracking rate limits across multiple providers including Anthropic, OpenAI, Cerebras, OpenRouter, and others.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func FormatDuration

func FormatDuration(d time.Duration) string

FormatDuration formats a duration string in a human-readable way. This is useful for displaying reset times.

func FormatQwenInfo

func FormatQwenInfo(info *Info) string

FormatQwenInfo formats rate limit info for human-readable display. This is useful for logging and debugging.

func MustParseTime added in v1.0.60

func MustParseTime(s string) time.Time

MustParseTime parses an RFC3339 timestamp or panics

func WithCredits added in v1.0.60

func WithCredits(limit, remaining float64) func(*Info)

WithCredits sets OpenRouter credit fields

func WithCustomData added in v1.0.60

func WithCustomData(data map[string]interface{}) func(*Info)

WithCustomData sets custom data map

func WithDailyRequests added in v1.0.60

func WithDailyRequests(limit, remaining int, reset string) func(*Info)

WithDailyRequests sets Cerebras daily request fields

func WithFreeTier added in v1.0.60

func WithFreeTier(freeTier bool) func(*Info)

WithFreeTier sets OpenRouter free tier flag

func WithInputTokens added in v1.0.60

func WithInputTokens(limit, remaining int, reset string) func(*Info)

WithInputTokens sets Anthropic input token fields

func WithOutputTokens added in v1.0.60

func WithOutputTokens(limit, remaining int, reset string) func(*Info)

WithOutputTokens sets Anthropic output token fields

func WithRequestID added in v1.0.60

func WithRequestID(id string) func(*Info)

WithRequestID sets the request ID

func WithRequests added in v1.0.60

func WithRequests(limit, remaining int, reset string) func(*Info)

WithRequests sets request-related rate limit fields

func WithRetryAfter added in v1.0.60

func WithRetryAfter(d time.Duration) func(*Info)

WithRetryAfter sets the retry-after duration

func WithTokens added in v1.0.60

func WithTokens(limit, remaining int, reset string) func(*Info)

WithTokens sets token-related rate limit fields

Types

type AnthropicInfo added in v1.0.60

type AnthropicInfo struct {
	// InputTokensLimit is the maximum number of input tokens allowed
	InputTokensLimit int `json:"input_tokens_limit,omitempty"`

	// InputTokensRemaining is the number of input tokens remaining
	InputTokensRemaining int `json:"input_tokens_remaining,omitempty"`

	// InputTokensReset is when the input token limit resets
	InputTokensReset time.Time `json:"input_tokens_reset,omitempty"`

	// OutputTokensLimit is the maximum number of output tokens allowed
	OutputTokensLimit int `json:"output_tokens_limit,omitempty"`

	// OutputTokensRemaining is the number of output tokens remaining
	OutputTokensRemaining int `json:"output_tokens_remaining,omitempty"`

	// OutputTokensReset is when the output token limit resets
	OutputTokensReset time.Time `json:"output_tokens_reset,omitempty"`
}

AnthropicInfo contains Anthropic-specific rate limit fields. Anthropic tracks input and output tokens separately from aggregate tokens.

type AnthropicParser

type AnthropicParser struct{}

AnthropicParser implements the Parser interface for Anthropic's rate limit headers. Anthropic uses RFC 3339 timestamps for reset times and tracks input/output tokens separately.

Header format:

  • anthropic-ratelimit-requests-limit: Maximum requests allowed
  • anthropic-ratelimit-requests-remaining: Requests remaining
  • anthropic-ratelimit-requests-reset: RFC 3339 timestamp when limit resets
  • anthropic-ratelimit-tokens-limit: Maximum total tokens allowed
  • anthropic-ratelimit-tokens-remaining: Total tokens remaining
  • anthropic-ratelimit-tokens-reset: RFC 3339 timestamp when token limit resets
  • anthropic-ratelimit-input-tokens-limit: Maximum input tokens allowed
  • anthropic-ratelimit-input-tokens-remaining: Input tokens remaining
  • anthropic-ratelimit-input-tokens-reset: RFC 3339 timestamp when input token limit resets
  • anthropic-ratelimit-output-tokens-limit: Maximum output tokens allowed
  • anthropic-ratelimit-output-tokens-remaining: Output tokens remaining
  • anthropic-ratelimit-output-tokens-reset: RFC 3339 timestamp when output token limit resets
  • request-id: Unique request identifier
  • retry-after: Seconds to wait before retrying (on 429 responses)

func NewAnthropicParser

func NewAnthropicParser() *AnthropicParser

NewAnthropicParser creates a new Anthropic rate limit parser.

func (*AnthropicParser) Parse

func (p *AnthropicParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Anthropic API response headers. It handles both standard token limits and separate input/output token limits. Missing headers are handled gracefully by leaving the corresponding fields at zero values.

func (*AnthropicParser) ParseAndValidate

func (p *AnthropicParser) ParseAndValidate(headers http.Header, model string) (*Info, error)

ParseAndValidate is a convenience method that parses headers and validates that at least some rate limit information was found.

func (*AnthropicParser) ProviderName

func (p *AnthropicParser) ProviderName() string

ProviderName returns "anthropic" as the provider identifier.

type BaseInfo added in v1.0.60

type BaseInfo struct {
	// Provider is the name of the AI provider (e.g., "anthropic", "openai")
	Provider string `json:"provider"`

	// Model is the specific model identifier (e.g., "claude-3-opus-20240229")
	Model string `json:"model"`

	// Timestamp is when this rate limit information was captured
	Timestamp time.Time `json:"timestamp"`

	// RequestsLimit is the maximum number of requests allowed in the current window
	RequestsLimit int `json:"requests_limit"`

	// RequestsRemaining is the number of requests remaining in the current window
	RequestsRemaining int `json:"requests_remaining"`

	// RequestsReset is when the request limit counter will reset
	RequestsReset time.Time `json:"requests_reset"`

	// TokensLimit is the maximum number of tokens allowed in the current window
	TokensLimit int `json:"tokens_limit"`

	// TokensRemaining is the number of tokens remaining in the current window
	TokensRemaining int `json:"tokens_remaining"`

	// TokensReset is when the token limit counter will reset
	TokensReset time.Time `json:"tokens_reset"`

	// RequestID is the unique identifier for the request that generated this info
	RequestID string `json:"request_id,omitempty"`

	// RetryAfter indicates how long to wait before retrying (from Retry-After header)
	RetryAfter time.Duration `json:"retry_after,omitempty"`

	// CustomData holds any additional provider-specific data that doesn't fit standard fields
	CustomData map[string]interface{} `json:"custom_data,omitempty"`
}

BaseInfo contains the common rate limit fields shared across all providers. This provides the foundation for provider-specific rate limit information.

type CerebrasInfo added in v1.0.60

type CerebrasInfo struct {
	// DailyRequestsLimit is the maximum number of requests per day
	DailyRequestsLimit int `json:"daily_requests_limit,omitempty"`

	// DailyRequestsRemaining is the number of daily requests remaining
	DailyRequestsRemaining int `json:"daily_requests_remaining,omitempty"`

	// DailyRequestsReset is when the daily request limit resets
	DailyRequestsReset time.Time `json:"daily_requests_reset,omitempty"`
}

CerebrasInfo contains Cerebras-specific rate limit fields. Cerebras provides daily request limits in addition to standard rate limits.

type CerebrasParser

type CerebrasParser struct{}

CerebrasParser implements the Parser interface for Cerebras API rate limits. Cerebras tracks both daily requests and per-minute limits for requests and tokens.

Cerebras Rate Limit Headers:

  • x-ratelimit-limit-requests-day: Daily request limit
  • x-ratelimit-remaining-requests-day: Remaining daily requests
  • x-ratelimit-reset-requests-day: Daily limit reset time (float seconds)
  • x-ratelimit-limit-requests-minute: Per-minute request limit
  • x-ratelimit-remaining-requests-minute: Remaining per-minute requests
  • x-ratelimit-reset-requests-minute: Per-minute limit reset time (float seconds)
  • x-ratelimit-limit-tokens-minute: Per-minute token limit
  • x-ratelimit-remaining-tokens-minute: Remaining per-minute tokens
  • x-ratelimit-reset-tokens-minute: Per-minute token reset time (float seconds)

Custom Headers:

  • cerebras-request-id: Unique request identifier
  • cerebras-processing-time: Processing time in seconds
  • cerebras-region: Data center region

func (*CerebrasParser) Parse

func (p *CerebrasParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Cerebras API response headers. The model parameter is stored in the Info but doesn't affect parsing logic.

Note: Cerebras uses FLOAT SECONDS for reset times (e.g., "33011.382867"), which are converted to absolute time.Time values by adding to time.Now().

func (*CerebrasParser) ProviderName

func (p *CerebrasParser) ProviderName() string

ProviderName returns "cerebras" as the provider identifier.

type GeminiParser

type GeminiParser struct{}

GeminiParser implements the Parser interface for Google's Gemini API rate limit headers.

IMPORTANT LIMITATION: The Gemini API does NOT provide rate limit information in normal API responses. Unlike OpenAI, Anthropic, and other providers, Gemini does not include headers like x-ratelimit-limit-requests or x-ratelimit-remaining-requests in successful responses (200 OK).

This parser can only extract information from error responses (429 Too Many Requests):

  • retry-after: Number of seconds to wait before retrying (or HTTP date)

For proactive rate limiting with Gemini, client-side tracking is required:

  • Track your own request counts and timing
  • Implement token bucket or leaky bucket algorithms
  • Use official quota limits from Google Cloud Console
  • Monitor usage through Google Cloud Console/API

The Gemini API follows these rate limits (as of 2024):

  • Free tier: 15 RPM (requests per minute), 1 million TPM (tokens per minute)
  • Pay-as-you-go: 360 RPM, 4 million TPM (varies by model)
  • Limits are per project and can be viewed in Google Cloud Console

Since these limits are not provided in headers, this parser primarily serves to:

  1. Extract retry-after duration from 429 error responses
  2. Provide a consistent interface with other provider parsers
  3. Return minimal Info with Provider and Model for tracking purposes

func NewGeminiParser

func NewGeminiParser() *GeminiParser

NewGeminiParser creates a new Gemini rate limit parser.

func (*GeminiParser) Parse

func (p *GeminiParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Gemini API response headers.

Unlike other providers, Gemini does not return proactive rate limit headers. This parser only extracts the retry-after header when present (typically in 429 responses).

The retry-after header can be in two formats:

  • Integer seconds: "60" (wait 60 seconds)
  • HTTP date: "Wed, 21 Oct 2015 07:28:00 GMT" (wait until this time)

For normal successful responses (200 OK), this will return minimal Info with:

  • Provider: "gemini"
  • Model: the provided model name
  • All limit/remaining fields: 0 (unknown)
  • RetryAfter: 0 (no retry needed)

Parameters:

  • headers: HTTP response headers from Gemini API
  • model: The model identifier (e.g., "gemini-pro", "gemini-pro-vision")

Returns:

  • Info with minimal rate limit information
  • error is always nil (this parser doesn't fail)

func (*GeminiParser) ProviderName

func (p *GeminiParser) ProviderName() string

ProviderName returns "gemini" as the provider identifier.

type Info

type Info struct {
	BaseInfo

	// Anthropic-specific fields (embedded for direct access)
	AnthropicInfo

	// Cerebras-specific fields (embedded for direct access)
	CerebrasInfo

	// OpenRouter-specific fields (embedded for direct access)
	OpenRouterInfo
}

Info contains rate limit information for a specific model from an AI provider. It uses composition to include common fields along with provider-specific fields. This structure allows for clean separation while maintaining backward compatibility.

func MakeTestInfo added in v1.0.60

func MakeTestInfo(provider, model string, setters ...func(*Info)) *Info

MakeTestInfo creates a test Info struct with proper initialization

type OpenAIParser

type OpenAIParser struct{}

OpenAIParser implements the Parser interface for OpenAI's rate limit headers. OpenAI provides rate limit information in the following format:

  • x-ratelimit-limit-requests: Maximum requests allowed per time window
  • x-ratelimit-remaining-requests: Requests remaining in current window
  • x-ratelimit-reset-requests: Duration until window resets (e.g., "6m0s", "1h30m")
  • x-ratelimit-limit-tokens: Maximum tokens allowed per time window
  • x-ratelimit-remaining-tokens: Tokens remaining in current window
  • x-ratelimit-reset-tokens: Duration until token window resets
  • x-request-id: Unique identifier for the request
  • retry-after: Optional retry delay in seconds
Example

ExampleOpenAIParser demonstrates parsing OpenAI rate limit headers

package main

import (
	"fmt"
	"net/http"
	"time"

	"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)

func main() {
	// Simulate OpenAI response headers
	headers := http.Header{
		"X-Ratelimit-Limit-Requests":     []string{"60"},
		"X-Ratelimit-Remaining-Requests": []string{"58"},
		"X-Ratelimit-Reset-Requests":     []string{"6m0s"},
		"X-Ratelimit-Limit-Tokens":       []string{"90000"},
		"X-Ratelimit-Remaining-Tokens":   []string{"85000"},
		"X-Ratelimit-Reset-Tokens":       []string{"1m30s"},
		"X-Request-Id":                   []string{"req_abc123"},
	}

	// Create parser and parse headers
	parser := ratelimit.NewOpenAIParser()
	info, err := parser.Parse(headers, "gpt-4")
	if err != nil {
		fmt.Printf("Error parsing headers: %v\n", err)
		return
	}

	// Display parsed information
	fmt.Printf("Provider: %s\n", info.Provider)
	fmt.Printf("Model: %s\n", info.Model)
	fmt.Printf("Requests: %d / %d remaining\n", info.RequestsRemaining, info.RequestsLimit)
	fmt.Printf("Tokens: %d / %d remaining\n", info.TokensRemaining, info.TokensLimit)
	fmt.Printf("Request ID: %s\n", info.RequestID)
	fmt.Printf("Requests reset in: %v\n", time.Until(info.RequestsReset).Round(time.Second))
	fmt.Printf("Tokens reset in: %v\n", time.Until(info.TokensReset).Round(time.Second))

}
Output:
Provider: openai
Model: gpt-4
Requests: 58 / 60 remaining
Tokens: 85000 / 90000 remaining
Request ID: req_abc123
Requests reset in: 6m0s
Tokens reset in: 1m30s
Example (WithTracker)

ExampleOpenAIParser_withTracker demonstrates using the parser with a rate limit tracker

package main

import (
	"fmt"
	"net/http"
	"time"

	"github.com/cecil-the-coder/ai-provider-kit/pkg/ratelimit"
)

func main() {
	// Create a tracker to manage rate limits
	tracker := ratelimit.NewTracker()

	// Simulate parsing response headers
	headers := http.Header{
		"X-Ratelimit-Limit-Requests":     []string{"60"},
		"X-Ratelimit-Remaining-Requests": []string{"5"},
		"X-Ratelimit-Reset-Requests":     []string{"30s"},
		"X-Ratelimit-Limit-Tokens":       []string{"90000"},
		"X-Ratelimit-Remaining-Tokens":   []string{"1000"},
		"X-Ratelimit-Reset-Tokens":       []string{"30s"},
	}

	parser := ratelimit.NewOpenAIParser()
	info, _ := parser.Parse(headers, "gpt-4")

	// Update tracker with parsed info
	tracker.Update(info)

	// Check if we can make a request
	if tracker.CanMakeRequest("gpt-4", 500) {
		fmt.Println("Request allowed")
	} else {
		waitTime := tracker.GetWaitTime("gpt-4")
		fmt.Printf("Rate limited. Retry after: %v\n", waitTime.Round(time.Second))
	}

	// Check if we should throttle (99% threshold)
	if tracker.ShouldThrottle("gpt-4", 0.99) {
		fmt.Println("Approaching rate limits - consider throttling")
	}

}
Output:
Request allowed

func NewOpenAIParser

func NewOpenAIParser() *OpenAIParser

NewOpenAIParser creates a new OpenAI rate limit parser.

func (*OpenAIParser) Parse

func (p *OpenAIParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from OpenAI response headers. It handles both request-based and token-based rate limits. Reset times are provided as duration strings (e.g., "6m0s") which are parsed and converted to absolute timestamps.

func (*OpenAIParser) ProviderName

func (p *OpenAIParser) ProviderName() string

ProviderName returns "openai" as the provider identifier.

type OpenRouterInfo added in v1.0.60

type OpenRouterInfo struct {
	// CreditsLimit is the maximum credits available
	CreditsLimit float64 `json:"credits_limit,omitempty"`

	// CreditsRemaining is the number of credits remaining
	CreditsRemaining float64 `json:"credits_remaining,omitempty"`

	// IsFreeTier indicates if the account is on the free tier
	IsFreeTier bool `json:"is_free_tier,omitempty"`
}

OpenRouterInfo contains OpenRouter-specific rate limit fields. OpenRouter uses a credit-based system for rate limiting.

type OpenRouterParser

type OpenRouterParser struct{}

OpenRouterParser implements the Parser interface for OpenRouter's rate limit headers. OpenRouter provides rate limit information in the following format:

  • x-ratelimit-limit: Maximum credits or requests allowed per time window
  • x-ratelimit-remaining: Credits or requests remaining in current window
  • x-ratelimit-reset: Milliseconds since epoch when the limit resets
  • x-ratelimit-requests: Optional request count limit
  • x-ratelimit-tokens: Optional token count limit

OpenRouter uses a credit-based system where different models consume different amounts of credits per request. The free tier has different limits than paid tiers.

Note: OpenRouter also provides a proactive rate limit checking endpoint at /api/v1/key which can be used to query rate limits without making actual model requests. This parser only handles rate limit information from response headers.

func NewOpenRouterParser

func NewOpenRouterParser() *OpenRouterParser

NewOpenRouterParser creates a new OpenRouter rate limit parser.

func (*OpenRouterParser) Parse

func (p *OpenRouterParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from OpenRouter response headers. OpenRouter uses a hybrid system that can include both credit-based and request-based limits.

Key differences from other providers:

  • Reset time is in MILLISECONDS since epoch (not seconds or duration)
  • May have credit-based limits (x-ratelimit-limit/remaining as floats)
  • May have request-based limits (x-ratelimit-requests)
  • May have token-based limits (x-ratelimit-tokens)
  • Free tier accounts may have different limits

func (*OpenRouterParser) ProviderName

func (p *OpenRouterParser) ProviderName() string

ProviderName returns "openrouter" as the provider identifier.

type Parser

type Parser interface {
	// Parse extracts rate limit information from HTTP response headers.
	// It takes the response headers and model name as input and returns
	// a populated Info struct or an error if parsing fails.
	Parse(headers http.Header, model string) (*Info, error)

	// ProviderName returns the name of the provider this parser handles.
	// This is used for logging and tracking purposes.
	ProviderName() string
}

Parser is the interface that must be implemented by provider-specific rate limit parsers. Each AI provider has different header formats and schemes for communicating rate limits, so each provider needs its own parser implementation.

type QwenParser

type QwenParser struct {
	// contains filtered or unexported fields
}

QwenParser implements the Parser interface for Qwen's rate limit headers.

Qwen (DashScope API) uses a combination of:

  1. OpenAI-compatible headers in compatible-mode (x-ratelimit-*)
  2. DashScope-specific headers (dashscope-*, x-dashscope-*)
  3. Request tracking headers (x-request-id, req-cost-time)
  4. Standard retry-after headers for rate limit recovery

DISCOVERED HEADERS (through API testing):

x-request-id: Unique request identifier
req-cost-time: Request processing time in milliseconds
req-arrive-time: Request arrival timestamp
resp-start-time: Response start timestamp

LIKELY RATE LIMIT HEADERS (based on OpenAI-compatible mode):

x-ratelimit-limit-requests: Maximum requests allowed per time window
x-ratelimit-remaining-requests: Requests remaining in current window
x-ratelimit-reset-requests: Duration until window resets
x-ratelimit-limit-tokens: Maximum tokens allowed per time window
x-ratelimit-remaining-tokens: Tokens remaining in current window
x-ratelimit-reset-tokens: Duration until token window resets
retry-after: Seconds to wait before retrying (on 429 responses)

POTENTIAL DASHSCOPE-SPECIFIC HEADERS:

dashscope-ratelimit-*: Possible DashScope-specific rate limit headers
x-dashscope-*: Alternative DashScope header prefix

func NewQwenParser

func NewQwenParser(logHeaders bool) *QwenParser

NewQwenParser creates a new Qwen rate limit parser. Set logHeaders to true to enable header logging for debugging and documentation.

func (*QwenParser) Parse

func (p *QwenParser) Parse(headers http.Header, model string) (*Info, error)

Parse extracts rate limit information from Qwen response headers. It attempts multiple parsing strategies to handle undocumented header formats:

  1. Standard x-ratelimit-* headers (OpenAI-compatible)
  2. Qwen-specific qwen-ratelimit-* headers
  3. Retry-After header for backoff timing

The implementation is deliberately flexible to accommodate various possible formats.

func (*QwenParser) ProviderName

func (p *QwenParser) ProviderName() string

ProviderName returns "qwen" as the provider identifier.

type Tracker

type Tracker struct {
	// contains filtered or unexported fields
}

Tracker provides thread-safe tracking of rate limit information across multiple models. It maintains the current rate limit state for each model and provides methods to check if requests can be made and when to retry.

func NewTracker

func NewTracker() *Tracker

NewTracker creates a new Tracker instance for tracking rate limits.

func (*Tracker) CanMakeRequest

func (t *Tracker) CanMakeRequest(model string, estimatedTokens int) bool

CanMakeRequest checks if a request can be made for the given model with the estimated number of tokens. It returns true if the request is likely to succeed based on current rate limits, false otherwise. This method is thread-safe.

func (*Tracker) Get

func (t *Tracker) Get(model string) (*Info, bool)

Get retrieves the rate limit information for a specific model. It returns the Info and a boolean indicating whether the model was found. This method is thread-safe and can be called concurrently.

func (*Tracker) GetWaitTime

func (t *Tracker) GetWaitTime(model string) time.Duration

GetWaitTime returns the duration to wait before the next request can be made for the given model. If no waiting is required, it returns 0. This method is thread-safe.

func (*Tracker) ShouldThrottle

func (t *Tracker) ShouldThrottle(model string, threshold float64) bool

ShouldThrottle determines if requests should be throttled based on the current rate limit usage. The threshold parameter is a value between 0 and 1 representing the percentage of limits consumed at which throttling should begin. For example, threshold=0.8 means throttle when 80% of limits are consumed. This method is thread-safe.

func (*Tracker) Update

func (t *Tracker) Update(info *Info)

Update updates the rate limit information for a model. This method is thread-safe and can be called concurrently.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL