llmretry

package

v0.28.1 Latest Latest Go to latest Published: Jun 6, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/contenox/runtime

Links

Open Source Insights

Documentation ¶

Overview ¶

Package llmretry wraps a single LLM call with classified retry, exponential backoff, and an optional model fallback. It has no contenox-internal dependencies and is safe to use from any task handler.

The classifier inspects formatted error strings because contenox's provider clients (modelrepo/{openai,vllm,gemini,...}) return errors as fmt.Errorf-wrapped strings of the shape:

"OpenAI API returned non-200 status: 429, body: …"

Substring matching keeps llmretry decoupled from any specific provider.

Index ¶

type Duration
type ErrorClass
- func ClassifyError(err error) ErrorClass
- func (c ErrorClass) IsRetryable() bool
type Outcome
- func Do(ctx context.Context, p RetryPolicy, primaryModel string, ...) (any, Outcome, error)
type RetryPolicy

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Duration ¶

type Duration time.Duration

Duration is a time.Duration that JSON-decodes from either a numeric nanosecond value or a duration string ("1s", "500ms", "2m"). This lets chain JSON files express timeouts in human form.

func (Duration) D ¶

func (d Duration) D() time.Duration

D returns the underlying time.Duration.

func (Duration) MarshalJSON ¶

func (d Duration) MarshalJSON() ([]byte, error)

MarshalJSON serializes as a duration string for readability.

func (*Duration) UnmarshalJSON ¶

func (d *Duration) UnmarshalJSON(b []byte) error

UnmarshalJSON accepts either a JSON number (interpreted as nanoseconds, the stdlib default) or a JSON string parsed with time.ParseDuration.

type ErrorClass ¶

type ErrorClass string

ErrorClass is a coarse classification of an LLM call failure used for retry and fallback decisions. Empty (ClassNone) means no error.

const (
	// ClassNone is returned for nil errors.
	ClassNone ErrorClass = ""
	// ClassRateLimit is HTTP 429 / 529 (Anthropic overload). Retried with a
	// longer floor (RateLimitMinWait).
	ClassRateLimit ErrorClass = "rate_limit"
	// ClassServerError is HTTP 5xx. Retried with normal backoff.
	ClassServerError ErrorClass = "server_error"
	// ClassTimeout is context.DeadlineExceeded or i/o timeout. Retried.
	ClassTimeout ErrorClass = "timeout"
	// ClassAuth is HTTP 401/403 or "invalid api key". Never retried.
	ClassAuth ErrorClass = "auth"
	// ClassCapacity is a context-length / token-overflow error. Never retried.
	ClassCapacity ErrorClass = "capacity"
	// ClassCanceled is context.Canceled. Never retried.
	ClassCanceled ErrorClass = "canceled"
	// ClassPermanent is anything that does not match a known transient pattern.
	// Never retried by default.
	ClassPermanent ErrorClass = "permanent"
)

func ClassifyError ¶

func ClassifyError(err error) ErrorClass

ClassifyError inspects err for known transient classes. Returns ClassNone for nil errors. Detection is intentionally permissive (substring match against the formatted error) because providers do not expose typed errors.

func (ErrorClass) IsRetryable ¶

func (c ErrorClass) IsRetryable() bool

IsRetryable reports whether an error of class c warrants another attempt.

type Outcome ¶

type Outcome struct {
	Attempts       int
	UsedFallback   bool
	LastErrorClass ErrorClass
	Elapsed        time.Duration
}

Outcome reports what happened during Do. It is set even on error so callers can record retry/fallback usage in caveats or telemetry.

func Do ¶

func Do(ctx context.Context, p RetryPolicy, primaryModel string, call func(modelID string) (any, error)) (any, Outcome, error)

Do invokes call with primaryModel, retrying on transient errors per p. After p.FallbackAfter consecutive failures, it switches to p.FallbackModelID (when set) for remaining attempts. Auth, capacity, canceled, and permanent errors never retry.

call receives the model id to use; on fallback, that id is p.FallbackModelID. The caller's closure is responsible for plumbing the id into the underlying provider call (e.g. by overriding the Request.ModelNames slice).

type RetryPolicy ¶

type RetryPolicy struct {
	// MaxAttempts is the total attempts including the first. 0 or 1 disables retry.
	MaxAttempts int `yaml:"max_attempts,omitempty" json:"max_attempts,omitempty"`
	// InitialBackoff is the wait before the second attempt; doubled (capped at
	// MaxBackoff) before each subsequent attempt. Defaults to 500ms when zero.
	InitialBackoff Duration `yaml:"initial_backoff,omitempty" json:"initial_backoff,omitempty"`
	// MaxBackoff caps the exponential backoff. 0 = no cap.
	MaxBackoff Duration `yaml:"max_backoff,omitempty" json:"max_backoff,omitempty"`
	// Jitter is a 0..1 fraction added to backoff (uniform random).
	Jitter float64 `yaml:"jitter,omitempty" json:"jitter,omitempty"`
	// RateLimitMinWait sets a floor for ClassRateLimit backoff.
	RateLimitMinWait Duration `yaml:"rate_limit_min_wait,omitempty" json:"rate_limit_min_wait,omitempty"`
	// FallbackModelID is the alternate model id used after FallbackAfter
	// consecutive failures. Empty disables fallback.
	FallbackModelID string `yaml:"fallback_model_id,omitempty" json:"fallback_model_id,omitempty"`
	// FallbackAfter is the consecutive-failure threshold that triggers the
	// fallback swap. 0 disables fallback regardless of FallbackModelID.
	FallbackAfter int `yaml:"fallback_after,omitempty" json:"fallback_after,omitempty"`
}

RetryPolicy controls Do's retry/backoff/fallback behavior. The zero value disables retry (MaxAttempts = 0 → 1 attempt total).

Source Files ¶

View all Source files

retry.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL