Documentation
¶
Overview ¶
Package llmretry wraps a single LLM call with classified retry, exponential backoff, and an optional model fallback. It has no contenox-internal dependencies and is safe to use from any task handler.
The classifier inspects formatted error strings because contenox's provider clients (modelrepo/{openai,vllm,gemini,...}) return errors as fmt.Errorf-wrapped strings of the shape:
"OpenAI API returned non-200 status: 429, body: …"
Substring matching keeps llmretry decoupled from any specific provider.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Duration ¶
Duration is a time.Duration that JSON-decodes from either a numeric nanosecond value or a duration string ("1s", "500ms", "2m"). This lets chain JSON files express timeouts in human form.
func (Duration) MarshalJSON ¶
MarshalJSON serializes as a duration string for readability.
func (*Duration) UnmarshalJSON ¶
UnmarshalJSON accepts either a JSON number (interpreted as nanoseconds, the stdlib default) or a JSON string parsed with time.ParseDuration.
type ErrorClass ¶
type ErrorClass string
ErrorClass is a coarse classification of an LLM call failure used for retry and fallback decisions. Empty (ClassNone) means no error.
const ( // ClassNone is returned for nil errors. ClassNone ErrorClass = "" // ClassRateLimit is HTTP 429 / 529 (Anthropic overload). Retried with a // longer floor (RateLimitMinWait). ClassRateLimit ErrorClass = "rate_limit" // ClassServerError is HTTP 5xx. Retried with normal backoff. ClassServerError ErrorClass = "server_error" // ClassTimeout is context.DeadlineExceeded or i/o timeout. Retried. ClassTimeout ErrorClass = "timeout" // ClassAuth is HTTP 401/403 or "invalid api key". Never retried. ClassAuth ErrorClass = "auth" // ClassCapacity is a context-length / token-overflow error. Never retried. ClassCapacity ErrorClass = "capacity" // ClassCanceled is context.Canceled. Never retried. ClassCanceled ErrorClass = "canceled" // ClassPermanent is anything that does not match a known transient pattern. // Never retried by default. ClassPermanent ErrorClass = "permanent" )
func ClassifyError ¶
func ClassifyError(err error) ErrorClass
ClassifyError inspects err for known transient classes. Returns ClassNone for nil errors. Detection is intentionally permissive (substring match against the formatted error) because providers do not expose typed errors.
func (ErrorClass) IsRetryable ¶
func (c ErrorClass) IsRetryable() bool
IsRetryable reports whether an error of class c warrants another attempt.
type Outcome ¶
type Outcome struct {
Attempts int
UsedFallback bool
LastErrorClass ErrorClass
Elapsed time.Duration
}
Outcome reports what happened during Do. It is set even on error so callers can record retry/fallback usage in caveats or telemetry.
func Do ¶
func Do(ctx context.Context, p RetryPolicy, primaryModel string, call func(modelID string) (any, error)) (any, Outcome, error)
Do invokes call with primaryModel, retrying on transient errors per p. After p.FallbackAfter consecutive failures, it switches to p.FallbackModelID (when set) for remaining attempts. Auth, capacity, canceled, and permanent errors never retry.
call receives the model id to use; on fallback, that id is p.FallbackModelID. The caller's closure is responsible for plumbing the id into the underlying provider call (e.g. by overriding the Request.ModelNames slice).
type RetryPolicy ¶
type RetryPolicy struct {
// MaxAttempts is the total attempts including the first. 0 or 1 disables retry.
MaxAttempts int `yaml:"max_attempts,omitempty" json:"max_attempts,omitempty"`
// InitialBackoff is the wait before the second attempt; doubled (capped at
// MaxBackoff) before each subsequent attempt. Defaults to 500ms when zero.
InitialBackoff Duration `yaml:"initial_backoff,omitempty" json:"initial_backoff,omitempty"`
// MaxBackoff caps the exponential backoff. 0 = no cap.
MaxBackoff Duration `yaml:"max_backoff,omitempty" json:"max_backoff,omitempty"`
// Jitter is a 0..1 fraction added to backoff (uniform random).
Jitter float64 `yaml:"jitter,omitempty" json:"jitter,omitempty"`
// RateLimitMinWait sets a floor for ClassRateLimit backoff.
RateLimitMinWait Duration `yaml:"rate_limit_min_wait,omitempty" json:"rate_limit_min_wait,omitempty"`
// FallbackModelID is the alternate model id used after FallbackAfter
// consecutive failures. Empty disables fallback.
FallbackModelID string `yaml:"fallback_model_id,omitempty" json:"fallback_model_id,omitempty"`
// FallbackAfter is the consecutive-failure threshold that triggers the
// fallback swap. 0 disables fallback regardless of FallbackModelID.
FallbackAfter int `yaml:"fallback_after,omitempty" json:"fallback_after,omitempty"`
}
RetryPolicy controls Do's retry/backoff/fallback behavior. The zero value disables retry (MaxAttempts = 0 → 1 attempt total).