Documentation
¶
Overview ¶
Package retry provides exponential backoff retry logic for handling transient errors.
Overview ¶
This package implements retry mechanisms with exponential backoff and jitter, designed specifically for handling rate limit and quota errors from external APIs. It integrates with the workqueue package to enable graceful backoff when retries are exhausted.
Features ¶
- Exponential backoff with configurable base and maximum delays
- Random jitter to prevent thundering herd problems
- Context-aware cancellation during retry loops
- Integration with workqueue for deferred retry after exhaustion
- Generic retry function supporting any return type
- Configurable retry predicates to distinguish retryable from permanent errors
Usage ¶
The primary entry point is RetryWithBackoff, which executes a function with automatic retry on transient errors:
cfg := retry.DefaultRetryConfig()
result, err := retry.RetryWithBackoff(
ctx,
cfg,
"fetch_data",
isRetryable,
func() (string, error) {
return api.FetchData()
},
)
For workqueue integration, use RequeueIfRetryable to convert exhausted retries into a delayed requeue:
err := doWorkWithRetries(ctx)
if requeueErr := retry.RequeueIfRetryable(ctx, err, isRetryable, "OpenAI"); requeueErr != nil {
return requeueErr
}
return err
Configuration ¶
RetryConfig controls retry behavior:
- MaxRetries: Maximum number of retry attempts (0 disables retries)
- BaseBackoff: Initial backoff duration (doubled each attempt)
- MaxBackoff: Maximum backoff duration (caps exponential growth)
- MaxJitter: Maximum random jitter added to each backoff
Use DefaultRetryConfig() for sensible defaults tuned for quota-based rate limits, or construct a custom RetryConfig for specific requirements.
Integration Patterns ¶
Retry with workqueue integration:
func (w *Worker) Process(ctx context.Context, item string) error {
result, err := retry.RetryWithBackoff(
ctx,
retry.DefaultRetryConfig(),
"process_item",
isQuotaError,
func() (string, error) {
return w.api.Process(item)
},
)
if err != nil {
if requeueErr := retry.RequeueIfRetryable(ctx, err, isQuotaError, "API"); requeueErr != nil {
return requeueErr
}
return err
}
return w.store.Save(result)
}
Custom retry predicate:
func isRetryable(err error) bool {
var apiErr *APIError
if errors.As(err, &apiErr) {
return apiErr.Code == 429 || apiErr.Code >= 500
}
return false
}
Index ¶
Examples ¶
Constants ¶
const LLMBackoffDelay = 5 * time.Minute
LLMBackoffDelay is the base delay for requeueing work after LLM API errors exhaust inner retries. This prevents the workqueue from immediately retrying and contributing to API overload.
Variables ¶
This section is empty.
Functions ¶
func RequeueIfRetryable ¶ added in v0.2.0
func RequeueIfRetryable(ctx context.Context, err error, isRetryable func(error) bool, provider string) error
RequeueIfRetryable checks whether err is a retryable LLM API error and, if so, returns a workqueue.RequeueAfter to signal the workqueue to back off instead of immediately retrying. If the error is not retryable, it returns nil and the caller should handle the error normally.
Example ¶
ExampleRequeueIfRetryable demonstrates using RequeueIfRetryable to convert exhausted retries into a workqueue requeue.
package main
import (
"context"
"errors"
"fmt"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
ctx := context.Background()
// Simulate an error after retries are exhausted
apiErr := errors.New("429 rate limit exceeded")
isRetryable := func(err error) bool {
return errors.Is(err, apiErr)
}
// Check if the error should trigger a requeue
requeueErr := retry.RequeueIfRetryable(ctx, apiErr, isRetryable, "OpenAI")
if requeueErr != nil {
fmt.Println("requeue requested")
return
}
fmt.Println("no requeue needed")
}
Output: requeue requested
Example (NonRetryable) ¶
ExampleRequeueIfRetryable_nonRetryable demonstrates that non-retryable errors do not trigger a requeue.
package main
import (
"context"
"errors"
"fmt"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
ctx := context.Background()
// Permanent error that should not be retried
permErr := errors.New("permission denied")
isRetryable := func(err error) bool {
return false // Not retryable
}
requeueErr := retry.RequeueIfRetryable(ctx, permErr, isRetryable, "API")
if requeueErr != nil {
fmt.Println("requeue requested")
return
}
fmt.Println("no requeue needed")
}
Output: no requeue needed
func RetryWithBackoff ¶
func RetryWithBackoff[T any](ctx context.Context, cfg RetryConfig, operation string, isRetryable func(error) bool, fn func() (T, error)) (T, error)
RetryWithBackoff executes the given function with exponential backoff retry. It only retries on errors that are classified as retryable by the provided isRetryable function.
Example ¶
ExampleRetryWithBackoff demonstrates using RetryWithBackoff to handle transient API errors with exponential backoff.
package main
import (
"context"
"errors"
"fmt"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
ctx := context.Background()
cfg := retry.DefaultRetryConfig()
// Define what errors are retryable
isRetryable := func(err error) bool {
return errors.Is(err, errRateLimited)
}
// Simulate an API call that may fail transiently
result, err := retry.RetryWithBackoff(
ctx,
cfg,
"fetch_user",
isRetryable,
func() (string, error) {
// Your API call here
return "user-data", nil
},
)
if err != nil {
fmt.Printf("failed: %v\n", err)
return
}
fmt.Printf("success: %s\n", result)
}
// errRateLimited is a sentinel error for examples.
var errRateLimited = errors.New("rate limited")
Output: success: user-data
Example (CustomConfig) ¶
ExampleRetryWithBackoff_customConfig demonstrates using a custom RetryConfig for specific retry requirements.
package main
import (
"context"
"errors"
"fmt"
"time"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
ctx := context.Background()
// Custom configuration for aggressive retries
cfg := retry.RetryConfig{
MaxRetries: 3,
BaseBackoff: 100 * time.Millisecond,
MaxBackoff: 5 * time.Second,
MaxJitter: 100 * time.Millisecond,
}
isRetryable := func(err error) bool {
return errors.Is(err, errRateLimited)
}
result, err := retry.RetryWithBackoff(
ctx,
cfg,
"quick_operation",
isRetryable,
func() (int, error) {
return 42, nil
},
)
if err != nil {
fmt.Printf("failed: %v\n", err)
return
}
fmt.Printf("result: %d\n", result)
}
// errRateLimited is a sentinel error for examples.
var errRateLimited = errors.New("rate limited")
Output: result: 42
Types ¶
type RetryConfig ¶
type RetryConfig struct {
// MaxRetries is the maximum number of retry attempts (default: 5)
// 0 means do not retry at all.
MaxRetries int
// BaseBackoff is the initial backoff duration (default: 1s, higher than typical due to quota nature)
BaseBackoff time.Duration
// MaxBackoff is the maximum backoff duration (default: 60s)
MaxBackoff time.Duration
// MaxJitter is the maximum random jitter added to backoff (default: 500ms)
MaxJitter time.Duration
}
RetryConfig configures retry behavior for API calls. This is particularly useful for handling rate limit and transient server errors.
func DefaultRetryConfig ¶
func DefaultRetryConfig() RetryConfig
DefaultRetryConfig returns a retry configuration suitable for quota and rate limit errors. Uses longer backoffs than typical retry configs because quota-based rate limits often require more time to recover.
Example ¶
ExampleDefaultRetryConfig demonstrates getting the default retry configuration.
package main
import (
"fmt"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
cfg := retry.DefaultRetryConfig()
fmt.Printf("MaxRetries: %d\n", cfg.MaxRetries)
fmt.Printf("BaseBackoff: %v\n", cfg.BaseBackoff)
fmt.Printf("MaxBackoff: %v\n", cfg.MaxBackoff)
fmt.Printf("MaxJitter: %v\n", cfg.MaxJitter)
}
Output: MaxRetries: 5 BaseBackoff: 1s MaxBackoff: 1m0s MaxJitter: 500ms
func (RetryConfig) Validate ¶
func (c RetryConfig) Validate() error
Validate checks that the retry configuration has valid values.
Example ¶
ExampleRetryConfig_Validate demonstrates validating a retry configuration.
package main
import (
"fmt"
"time"
"chainguard.dev/driftlessaf/agents/executor/retry"
)
func main() {
// Valid configuration
validCfg := retry.RetryConfig{
MaxRetries: 5,
BaseBackoff: time.Second,
MaxBackoff: time.Minute,
MaxJitter: 500 * time.Millisecond,
}
if err := validCfg.Validate(); err != nil {
fmt.Printf("validation failed: %v\n", err)
} else {
fmt.Println("valid configuration")
}
// Invalid configuration with negative values
invalidCfg := retry.RetryConfig{
MaxRetries: -1,
BaseBackoff: time.Second,
MaxBackoff: time.Minute,
MaxJitter: 500 * time.Millisecond,
}
if err := invalidCfg.Validate(); err != nil {
fmt.Printf("validation failed: %v\n", err)
}
}
Output: valid configuration validation failed: max retries cannot be negative