llmretry

package
v0.28.14 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package llmretry wraps a single LLM call with classified retry, exponential backoff, and an optional model fallback. It has no contenox-internal dependencies and is safe to use from any task handler.

The classifier inspects formatted error strings because contenox's provider clients (modelrepo/{openai,vllm,gemini,...}) return errors as fmt.Errorf-wrapped strings of the shape:

"OpenAI API returned non-200 status: 429, body: …"

Substring matching keeps llmretry decoupled from any specific provider.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Duration

type Duration time.Duration

Duration is a time.Duration that JSON-decodes from either a numeric nanosecond value or a duration string ("1s", "500ms", "2m"). This lets chain JSON files express timeouts in human form.

func (Duration) D

func (d Duration) D() time.Duration

D returns the underlying time.Duration.

func (Duration) MarshalJSON

func (d Duration) MarshalJSON() ([]byte, error)

MarshalJSON serializes as a duration string for readability.

func (*Duration) UnmarshalJSON

func (d *Duration) UnmarshalJSON(b []byte) error

UnmarshalJSON accepts either a JSON number (interpreted as nanoseconds, the stdlib default) or a JSON string parsed with time.ParseDuration.

type ErrorClass

type ErrorClass string

ErrorClass is a coarse classification of an LLM call failure used for retry and fallback decisions. Empty (ClassNone) means no error.

const (
	// ClassNone is returned for nil errors.
	ClassNone ErrorClass = ""
	// ClassRateLimit is HTTP 429 / 529 (Anthropic overload). Retried with a
	// longer floor (RateLimitMinWait).
	ClassRateLimit ErrorClass = "rate_limit"
	// ClassServerError is HTTP 5xx. Retried with normal backoff.
	ClassServerError ErrorClass = "server_error"
	// ClassTimeout is context.DeadlineExceeded or i/o timeout. Retried.
	ClassTimeout ErrorClass = "timeout"
	// ClassAuth is HTTP 401/403 or "invalid api key". Never retried.
	ClassAuth ErrorClass = "auth"
	// ClassCapacity is a context-length / token-overflow error. Never retried.
	ClassCapacity ErrorClass = "capacity"
	// ClassCanceled is context.Canceled. Never retried.
	ClassCanceled ErrorClass = "canceled"
	// ClassPermanent is anything that does not match a known transient pattern.
	// Never retried by default.
	ClassPermanent ErrorClass = "permanent"
)

func ClassifyError

func ClassifyError(err error) ErrorClass

ClassifyError inspects err for known transient classes. Returns ClassNone for nil errors. Detection is intentionally permissive (substring match against the formatted error) because providers do not expose typed errors.

func (ErrorClass) IsRetryable

func (c ErrorClass) IsRetryable() bool

IsRetryable reports whether an error of class c warrants another attempt.

type Outcome

type Outcome struct {
	Attempts       int
	UsedFallback   bool
	LastErrorClass ErrorClass
	Elapsed        time.Duration
}

Outcome reports what happened during Do. It is set even on error so callers can record retry/fallback usage in caveats or telemetry.

func Do

func Do(ctx context.Context, p RetryPolicy, primaryModel string, call func(modelID string) (any, error)) (any, Outcome, error)

Do invokes call with primaryModel, retrying on transient errors per p. After p.FallbackAfter consecutive failures, it switches to p.FallbackModelID (when set) for remaining attempts. Auth, capacity, canceled, and permanent errors never retry.

call receives the model id to use; on fallback, that id is p.FallbackModelID. The caller's closure is responsible for plumbing the id into the underlying provider call (e.g. by overriding the Request.ModelNames slice).

type RetryPolicy

type RetryPolicy struct {
	// MaxAttempts is the total attempts including the first. 0 or 1 disables retry.
	MaxAttempts int `yaml:"max_attempts,omitempty" json:"max_attempts,omitempty"`
	// InitialBackoff is the wait before the second attempt; doubled (capped at
	// MaxBackoff) before each subsequent attempt. Defaults to 500ms when zero.
	InitialBackoff Duration `yaml:"initial_backoff,omitempty" json:"initial_backoff,omitempty"`
	// MaxBackoff caps the exponential backoff. 0 = no cap.
	MaxBackoff Duration `yaml:"max_backoff,omitempty" json:"max_backoff,omitempty"`
	// Jitter is a 0..1 fraction added to backoff (uniform random).
	Jitter float64 `yaml:"jitter,omitempty" json:"jitter,omitempty"`
	// RateLimitMinWait sets a floor for ClassRateLimit backoff.
	RateLimitMinWait Duration `yaml:"rate_limit_min_wait,omitempty" json:"rate_limit_min_wait,omitempty"`
	// FallbackModelID is the alternate model id used after FallbackAfter
	// consecutive failures. Empty disables fallback.
	FallbackModelID string `yaml:"fallback_model_id,omitempty" json:"fallback_model_id,omitempty"`
	// FallbackAfter is the consecutive-failure threshold that triggers the
	// fallback swap. 0 disables fallback regardless of FallbackModelID.
	FallbackAfter int `yaml:"fallback_after,omitempty" json:"fallback_after,omitempty"`
}

RetryPolicy controls Do's retry/backoff/fallback behavior. The zero value disables retry (MaxAttempts = 0 → 1 attempt total).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL