retry

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 2, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package retry provides exponential backoff retry logic for handling transient errors.

Overview

This package implements retry mechanisms with exponential backoff and jitter, designed specifically for handling rate limit and quota errors from external APIs. It integrates with the workqueue package to enable graceful backoff when retries are exhausted.

Features

  • Exponential backoff with configurable base and maximum delays
  • Random jitter to prevent thundering herd problems
  • Context-aware cancellation during retry loops
  • Integration with workqueue for deferred retry after exhaustion
  • Generic retry function supporting any return type
  • Configurable retry predicates to distinguish retryable from permanent errors

Usage

The primary entry point is RetryWithBackoff, which executes a function with automatic retry on transient errors:

cfg := retry.DefaultRetryConfig()
result, err := retry.RetryWithBackoff(
	ctx,
	cfg,
	"fetch_data",
	isRetryable,
	func() (string, error) {
		return api.FetchData()
	},
)

For workqueue integration, use RequeueIfRetryable to convert exhausted retries into a delayed requeue:

err := doWorkWithRetries(ctx)
if requeueErr := retry.RequeueIfRetryable(ctx, err, isRetryable, "OpenAI"); requeueErr != nil {
	return requeueErr
}
return err

Configuration

RetryConfig controls retry behavior:

  • MaxRetries: Maximum number of retry attempts (0 disables retries)
  • BaseBackoff: Initial backoff duration (doubled each attempt)
  • MaxBackoff: Maximum backoff duration (caps exponential growth)
  • MaxJitter: Maximum random jitter added to each backoff

Use DefaultRetryConfig() for sensible defaults tuned for quota-based rate limits, or construct a custom RetryConfig for specific requirements.

Integration Patterns

Retry with workqueue integration:

func (w *Worker) Process(ctx context.Context, item string) error {
	result, err := retry.RetryWithBackoff(
		ctx,
		retry.DefaultRetryConfig(),
		"process_item",
		isQuotaError,
		func() (string, error) {
			return w.api.Process(item)
		},
	)
	if err != nil {
		if requeueErr := retry.RequeueIfRetryable(ctx, err, isQuotaError, "API"); requeueErr != nil {
			return requeueErr
		}
		return err
	}
	return w.store.Save(result)
}

Custom retry predicate:

func isRetryable(err error) bool {
	var apiErr *APIError
	if errors.As(err, &apiErr) {
		return apiErr.Code == 429 || apiErr.Code >= 500
	}
	return false
}

Index

Examples

Constants

View Source
const LLMBackoffDelay = 5 * time.Minute

LLMBackoffDelay is the base delay for requeueing work after LLM API errors exhaust inner retries. This prevents the workqueue from immediately retrying and contributing to API overload.

Variables

This section is empty.

Functions

func RequeueIfRetryable added in v0.2.0

func RequeueIfRetryable(ctx context.Context, err error, isRetryable func(error) bool, provider string) error

RequeueIfRetryable checks whether err is a retryable LLM API error and, if so, returns a workqueue.RequeueAfter to signal the workqueue to back off instead of immediately retrying. If the error is not retryable, it returns nil and the caller should handle the error normally.

Example

ExampleRequeueIfRetryable demonstrates using RequeueIfRetryable to convert exhausted retries into a workqueue requeue.

package main

import (
	"context"
	"errors"
	"fmt"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	ctx := context.Background()

	// Simulate an error after retries are exhausted
	apiErr := errors.New("429 rate limit exceeded")

	isRetryable := func(err error) bool {
		return errors.Is(err, apiErr)
	}

	// Check if the error should trigger a requeue
	requeueErr := retry.RequeueIfRetryable(ctx, apiErr, isRetryable, "OpenAI")
	if requeueErr != nil {
		fmt.Println("requeue requested")
		return
	}
	fmt.Println("no requeue needed")
}
Output:
requeue requested
Example (NonRetryable)

ExampleRequeueIfRetryable_nonRetryable demonstrates that non-retryable errors do not trigger a requeue.

package main

import (
	"context"
	"errors"
	"fmt"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	ctx := context.Background()

	// Permanent error that should not be retried
	permErr := errors.New("permission denied")

	isRetryable := func(err error) bool {
		return false // Not retryable
	}

	requeueErr := retry.RequeueIfRetryable(ctx, permErr, isRetryable, "API")
	if requeueErr != nil {
		fmt.Println("requeue requested")
		return
	}
	fmt.Println("no requeue needed")
}
Output:
no requeue needed

func RetryWithBackoff

func RetryWithBackoff[T any](ctx context.Context, cfg RetryConfig, operation string, isRetryable func(error) bool, fn func() (T, error)) (T, error)

RetryWithBackoff executes the given function with exponential backoff retry. It only retries on errors that are classified as retryable by the provided isRetryable function.

Example

ExampleRetryWithBackoff demonstrates using RetryWithBackoff to handle transient API errors with exponential backoff.

package main

import (
	"context"
	"errors"
	"fmt"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	ctx := context.Background()
	cfg := retry.DefaultRetryConfig()

	// Define what errors are retryable
	isRetryable := func(err error) bool {
		return errors.Is(err, errRateLimited)
	}

	// Simulate an API call that may fail transiently
	result, err := retry.RetryWithBackoff(
		ctx,
		cfg,
		"fetch_user",
		isRetryable,
		func() (string, error) {
			// Your API call here
			return "user-data", nil
		},
	)
	if err != nil {
		fmt.Printf("failed: %v\n", err)
		return
	}
	fmt.Printf("success: %s\n", result)
}

// errRateLimited is a sentinel error for examples.
var errRateLimited = errors.New("rate limited")
Output:
success: user-data
Example (CustomConfig)

ExampleRetryWithBackoff_customConfig demonstrates using a custom RetryConfig for specific retry requirements.

package main

import (
	"context"
	"errors"
	"fmt"
	"time"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	ctx := context.Background()

	// Custom configuration for aggressive retries
	cfg := retry.RetryConfig{
		MaxRetries:  3,
		BaseBackoff: 100 * time.Millisecond,
		MaxBackoff:  5 * time.Second,
		MaxJitter:   100 * time.Millisecond,
	}

	isRetryable := func(err error) bool {
		return errors.Is(err, errRateLimited)
	}

	result, err := retry.RetryWithBackoff(
		ctx,
		cfg,
		"quick_operation",
		isRetryable,
		func() (int, error) {
			return 42, nil
		},
	)
	if err != nil {
		fmt.Printf("failed: %v\n", err)
		return
	}
	fmt.Printf("result: %d\n", result)
}

// errRateLimited is a sentinel error for examples.
var errRateLimited = errors.New("rate limited")
Output:
result: 42

Types

type RetryConfig

type RetryConfig struct {
	// MaxRetries is the maximum number of retry attempts (default: 5)
	// 0 means do not retry at all.
	MaxRetries int
	// BaseBackoff is the initial backoff duration (default: 1s, higher than typical due to quota nature)
	BaseBackoff time.Duration
	// MaxBackoff is the maximum backoff duration (default: 60s)
	MaxBackoff time.Duration
	// MaxJitter is the maximum random jitter added to backoff (default: 500ms)
	MaxJitter time.Duration
}

RetryConfig configures retry behavior for API calls. This is particularly useful for handling rate limit and transient server errors.

func DefaultRetryConfig

func DefaultRetryConfig() RetryConfig

DefaultRetryConfig returns a retry configuration suitable for quota and rate limit errors. Uses longer backoffs than typical retry configs because quota-based rate limits often require more time to recover.

Example

ExampleDefaultRetryConfig demonstrates getting the default retry configuration.

package main

import (
	"fmt"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	cfg := retry.DefaultRetryConfig()

	fmt.Printf("MaxRetries: %d\n", cfg.MaxRetries)
	fmt.Printf("BaseBackoff: %v\n", cfg.BaseBackoff)
	fmt.Printf("MaxBackoff: %v\n", cfg.MaxBackoff)
	fmt.Printf("MaxJitter: %v\n", cfg.MaxJitter)
}
Output:
MaxRetries: 5
BaseBackoff: 1s
MaxBackoff: 1m0s
MaxJitter: 500ms

func (RetryConfig) Validate

func (c RetryConfig) Validate() error

Validate checks that the retry configuration has valid values.

Example

ExampleRetryConfig_Validate demonstrates validating a retry configuration.

package main

import (
	"fmt"
	"time"

	"chainguard.dev/driftlessaf/agents/executor/retry"
)

func main() {
	// Valid configuration
	validCfg := retry.RetryConfig{
		MaxRetries:  5,
		BaseBackoff: time.Second,
		MaxBackoff:  time.Minute,
		MaxJitter:   500 * time.Millisecond,
	}
	if err := validCfg.Validate(); err != nil {
		fmt.Printf("validation failed: %v\n", err)
	} else {
		fmt.Println("valid configuration")
	}

	// Invalid configuration with negative values
	invalidCfg := retry.RetryConfig{
		MaxRetries:  -1,
		BaseBackoff: time.Second,
		MaxBackoff:  time.Minute,
		MaxJitter:   500 * time.Millisecond,
	}
	if err := invalidCfg.Validate(); err != nil {
		fmt.Printf("validation failed: %v\n", err)
	}
}
Output:
valid configuration
validation failed: max retries cannot be negative

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL