longrun

package
v0.7.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 24, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

README

longrun

A self-contained Go package for long-running tasks with interval scheduling, per-error retry, and exponential backoff.

Zero external dependencies beyond golang.org/x/sync.

Overview

longrun provides two primitives:

  • Task — a self-contained unit of work: one-shot or interval, with optional per-error retry and backoff. Can be used standalone via Wait(ctx).
  • Runner — orchestrates N tasks. When any task dies permanently, cancels all others and runs shutdown hooks.

Task execution model

Task.Wait(ctx)
  └→ runWithPolicy (restart loop + backoff)
       └→ runLoop (ticker or one-shot)
            └→ runOnce (single invocation ± timeout)

Constructors

NewOneShotTask

Execute once. If rules is nil — no retries, any error is fatal.

// Simple one-shot (useful in Runner for coordination)
task := longrun.NewOneShotTask("migrate", db.AutoMigrate, nil)

// One-shot with retry
task := longrun.NewOneShotTask("migrate", db.AutoMigrate, []longrun.TransientRule{
    {Err: ErrConnRefused, MaxRetries: 5, Backoff: longrun.DefaultBackoff()},
})
NewIntervalTask

Ticker loop. If rules is nil — any error kills the task.

// Interval without retry
task := longrun.NewIntervalTask("healthcheck", 30*time.Second, check, nil)

// Interval with per-error retry
task := longrun.NewIntervalTask("poll", 10*time.Second, w.poll, []longrun.TransientRule{
    // GitHub API — might be under load, retry carefully
    {Err: ErrFetchIssues, MaxRetries: 5, Backoff: longrun.BackoffConfig{
        Initial: 2 * time.Second, Max: 60 * time.Second, Multiplier: 3.0,
    }},
    // Local DB — not loaded, retry aggressively
    {Err: ErrStoreIssues, MaxRetries: longrun.UnlimitedRetries, Backoff: longrun.BackoffConfig{
        Initial: 100 * time.Millisecond, Max: 2 * time.Second, Multiplier: 2.0,
    }},
}, longrun.WithLogger(logger))

TransientRule

Each rule binds an error to its retry settings. Different errors can have different retry budgets and backoff curves.

type TransientRule struct {
    Err        any           // error sentinel (errors.Is) or pointer-to-type (errors.As)
    MaxRetries int           // 0 = default (3), -1 = unlimited
    Backoff    BackoffConfig
}

The Err field accepts two forms:

  • error value (sentinel) → matched via errors.Is
  • *T where T implements error → matched via errors.As

Examples:

{Err: ErrTimeout}           // sentinel → errors.Is
{Err: (*net.OpError)(nil)}  // pointer-to-type → errors.As

Passing nil or an unsupported type panics at construction time: "longrun.NewMatcher: errVal must be an error value or pointer to error type (*T), got: %T"

Each rule has its own attempt budget. MaxRetries limits consecutive failures for a given rule. When an interval task completes a successful tick, all rule trackers reset to zero — so intermittent failures separated by successful ticks never accumulate toward MaxRetries. For one-shot tasks the budget is never reset mid-execution.

Building blocks

The package exposes low-level building blocks used internally by Task. They are exported for testability and advanced use cases, but most users should only create Tasks.

Type Purpose
Matcher Compiles an any error pattern into errors.Is/errors.As check
RuleTracker Per-rule retry budget with OnFailure()/Reset()
BackoffConfig Exponential backoff with Duration(attempt) and Wait(ctx, attempt)

Options

longrun.WithTimeout(30 * time.Second)  // per-invocation timeout
longrun.WithShutdown(server.Shutdown)  // graceful shutdown hook
longrun.WithDelay(5 * time.Second)     // delay before first execution
longrun.WithLogger(logger)             // custom slog.Logger
WithDelay

Delays the first execution by the given duration.

  • For interval tasks: first tick fires after delay, then every interval.
  • For one-shot tasks: execution starts after delay.
  • Delay is independent of interval.

Runner

Orchestrates multiple tasks. Does NOT handle OS signals — pass a cancellable context.

ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
defer cancel()

runner := longrun.NewRunner(longrun.RunnerOptions{Logger: logger})
runner.Add(migrate)
runner.Add(poll)
runner.Add(server)

err := runner.Wait(ctx)

When any task returns a permanent error, Runner cancels all remaining tasks via context, waits for all goroutines to finish, then runs shutdown hooks in LIFO order (reverse of Add).

Package structure

pkg/longrun/
├── backoff.go       BackoffConfig, DefaultBackoff(), constants
├── matcher.go       Matcher — errors.Is / errors.As pattern matching
├── tracker.go       RuleTracker — per-rule retry budget
├── rule.go          TransientRule (user config) + ruleState (internal)
├── option.go        Functional options (WithTimeout, WithDelay, ...)
├── task.go          Task, NewOneShotTask, NewIntervalTask, Wait
├── runner.go        Runner, NewRunner, Add, Wait, LIFO shutdown
├── *_test.go        Blackbox tests (package longrun_test)
├── README.md
└── TODO.md

Observability

Every invocation of the work function is automatically wrapped in an OpenTelemetry span inside runOnce. The span is named after the task (name parameter from the constructor).

  • No SDK configuredotel.Tracer returns a no-op tracer, zero overhead.
  • SDK configured → every invocation, retry, and error is visible in the tracing backend. The span records errors automatically: span.RecordError(err) + span.SetStatus(codes.Error, ...) on failure. Users get full observability without writing any OTEL code in their work functions.
[longrun/task: "polling issues"]           ← automatic span from longrun
└─[IssuePolling.work]                    ← user's child span (optional)
└─[Parser.Run]                       ← domain span (optional)
└─[SQL INSERT]                   ← infra span (optional)

Combined with a slog.Handler that extracts span context (trace_id, span_id, scope), every log line emitted via logger.InfoContext(ctx, ...) is automatically correlated with the active trace — zero boilerplate in business code.

Design decisions

  • Transient errors whitelist — empty rules = all errors permanent. Lower layers provide sentinel errors, orchestrator decides what to retry.
  • Per-error retry — different errors can have different MaxRetries and BackoffConfig. Careful retry for loaded external APIs, aggressive retry for local resources.
  • Own backoffmath.Pow based, no external dependencies.
  • Signals are not the package's responsibility — Runner accepts ctx, caller handles signals.
  • Shutdown after all tasks stop — shutdown hooks run after grp.Wait(), never concurrently with running tasks.
  • LIFO shutdown — last added task shuts down first (like defer). Will transition to reverse topological order when DependsOn is implemented.
  • Typed nil pointer for type matching(*MyError)(nil) triggers errors.As path; non-nil error values trigger errors.Is. Checked before error interface to avoid ambiguity.

MaxRetries semantics

Value Meaning
0 (zero-value) DefaultMaxRetries (3) — safe default
-1 (UnlimitedRetries) No limit — explicit opt-in
> 0 Exact retry count

Future

See TODO.md for planned features.

Documentation

Index

Constants

View Source
const (
	// UnlimitedRetries disables the retry limit — the task retries forever
	// (until a permanent error or context cancellation).
	// Use with caution: set this explicitly to opt in.
	UnlimitedRetries = -1

	// DefaultMaxRetries is used when MaxRetries is 0 (zero-value).
	DefaultMaxRetries = 3
)

Variables

This section is empty.

Functions

This section is empty.

Types

type BackoffConfig added in v0.5.0

type BackoffConfig struct {
	// Initial is the delay before the first retry (attempt 0).
	// Must be > 0. Validated at Task construction time.
	Initial time.Duration
	// Max caps the computed delay. 0 means no cap (use with caution —
	// combined with UnlimitedRetries, delay grows until overflow).
	Max time.Duration
	// Multiplier scales the delay on each consecutive attempt.
	// Must be > 0. Validated at Task construction time.
	// 1.0 = constant delay, 2.0 = classic exponential, 1.5 = gentle growth.
	Multiplier float64
}

BackoffConfig controls exponential backoff between retry attempts.

Formula: delay = Initial * Multiplier^attempt

Example with Initial=1s, Multiplier=2.0, Max=30s:

attempt 0: 1s * 2^0 = 1s
attempt 1: 1s * 2^1 = 2s
attempt 2: 1s * 2^2 = 4s
attempt 3: 1s * 2^3 = 8s
...
attempt 5: 1s * 2^5 = 32s → capped at 30s

func DefaultBackoff added in v0.5.0

func DefaultBackoff() BackoffConfig

DefaultBackoff returns a sensible default backoff configuration.

Configured as Initial=1s, Max=30s, Multiplier=2.0

Perfect for 5 retries

func (*BackoffConfig) Duration added in v0.6.0

func (b *BackoffConfig) Duration(attempt int) time.Duration

Duration returns the backoff duration for the given 0-based attempt index.

When Max > 0, the result is capped at Max. When Max is 0 (no cap) and the computed duration overflows (e.g. after thousands of attempts with UnlimitedRetries), Duration clamps to math.MaxInt64.

func (*BackoffConfig) Wait added in v0.6.0

func (b *BackoffConfig) Wait(ctx context.Context, attempt int) error

Wait blocks for the backoff duration of the given attempt, or until ctx is cancelled.

type Matcher added in v0.6.0

type Matcher struct {
	// contains filtered or unexported fields
}

Matcher checks whether an error matches a given pattern.

Two forms are supported:

  • error value (sentinel): matched via errors.Is
  • *T where T implements error: matched via errors.As

Examples:

NewMatcher(ErrTimeout)          // sentinel → errors.Is
NewMatcher((*net.OpError)(nil)) // pointer-to-type → errors.As

func NewMatcher added in v0.6.0

func NewMatcher(errVal any) Matcher

NewMatcher compiles an error pattern into a Matcher.

The errVal argument must be one of:

  • an error value (for errors.Is matching)
  • a pointer to an error type, i.e. *T where T implements error (for errors.As matching)

Panics if errVal is nil or an unsupported type.

func (Matcher) Match added in v0.6.0

func (m Matcher) Match(err error) bool

Match reports whether err matches the pattern.

type Option added in v0.6.0

type Option func(*taskConfig)

Option configures a Task. Use With* functions to create options.

func WithDelay added in v0.6.0

func WithDelay(d time.Duration) Option

WithDelay delays the first execution by the given duration. For interval tasks: first tick fires after delay, then every interval. For one-shot tasks: execution starts after delay. Delay is independent of interval.

func WithLogger added in v0.6.0

func WithLogger(l *slog.Logger) Option

WithLogger sets a custom logger for the task. Defaults to slog.Default().

func WithShutdown added in v0.6.0

func WithShutdown(fn ShutdownFunc) Option

WithShutdown registers a graceful shutdown hook for the task. The hook is called by Runner after all task goroutines have stopped.

func WithTimeout added in v0.6.0

func WithTimeout(d time.Duration) Option

WithTimeout sets a per-invocation timeout for the work function. Each call to work gets its own context with this deadline.

type RuleTracker added in v0.6.0

type RuleTracker struct {
	// contains filtered or unexported fields
}

RuleTracker tracks retry attempts for a single TransientRule.

Each rule has its own independent budget. The tracker is created internally by Task from TransientRule.MaxRetries.

func NewRuleTracker added in v0.6.0

func NewRuleTracker(maxRetries int) *RuleTracker

NewRuleTracker creates a tracker with the given max retries.

MaxRetries semantics:

0 (zero-value) → DefaultMaxRetries (3).
-1 (UnlimitedRetries) → no limit.
>0 → exact limit.

func (*RuleTracker) Attempt added in v0.6.0

func (rt *RuleTracker) Attempt() int

Attempt returns the current attempt count.

func (*RuleTracker) Max added in v0.6.0

func (rt *RuleTracker) Max() int

Max returns the resolved max retries.

func (*RuleTracker) OnFailure added in v0.6.0

func (rt *RuleTracker) OnFailure() (int, bool)

OnFailure records a failure and returns the 0-based attempt index and whether the caller is allowed to retry.

Example with max=3:

1st call: attempt=0, ok=true
2nd call: attempt=1, ok=true
3rd call: attempt=2, ok=true
4th call: attempt=3, ok=false (budget exhausted)

func (*RuleTracker) Reset added in v0.6.0

func (rt *RuleTracker) Reset()

Reset sets the attempt counter back to zero (e.g. after healthy progress).

type Runner

type Runner struct {
	// contains filtered or unexported fields
}

Runner orchestrates N tasks. When any task returns a permanent error the runner cancels all remaining tasks and performs graceful shutdown.

Runner does NOT handle OS signals — pass a cancellable context (e.g. via signal.NotifyContext).

func NewRunner

func NewRunner(opts RunnerOptions) *Runner

NewRunner creates a Runner with the given options.

func (*Runner) Add

func (r *Runner) Add(task *Task)

Add registers a task for concurrent execution.

func (*Runner) Wait

func (r *Runner) Wait(ctx context.Context) error

Wait starts all tasks concurrently and blocks until they all finish. When any task returns an error, all other tasks are cancelled via ctx. After all goroutines finish, shutdown hooks are called in LIFO order (reverse of Add). The ctx passed in controls the lifetime — the runner does NOT listen for OS signals; use signal.NotifyContext in the caller.

type RunnerOptions added in v0.5.0

type RunnerOptions struct {
	ShutdownTimeout time.Duration // default 30s
	Logger          *slog.Logger  // nil = slog.Default()
}

RunnerOptions configures a Runner.

type ShutdownFunc added in v0.5.0

type ShutdownFunc func(ctx context.Context) error

ShutdownFunc is called during graceful shutdown.

type Task added in v0.5.0

type Task struct {
	// contains filtered or unexported fields
}

Task is a self-contained unit of work with interval, retry and backoff support. It can be used standalone (via Wait) or managed by a Runner.

Task is NOT safe for concurrent use — call Wait from a single goroutine. Runner handles this automatically (one goroutine per task).

func NewIntervalTask added in v0.6.0

func NewIntervalTask(name string, interval time.Duration, work WorkFunc, rules []TransientRule, opts ...Option) *Task

NewIntervalTask creates a task that runs on a ticker loop. If rules is nil — any error kills the task. If rules is provided — transient errors are retried per their configuration, permanent errors (no matching rule) kill the task.

Each TransientRule binds an error to its own retry budget and backoff curve. TransientRule.MaxRetries limits consecutive failures for that rule. When a tick completes successfully, all rule trackers reset — so intermittent failures separated by successful ticks never accumulate toward MaxRetries.

Panics if work is nil or interval <= 0. Panics if any rule has nil Err, unsupported Err type, or Backoff.Initial <= 0.

func NewOneShotTask added in v0.6.0

func NewOneShotTask(name string, work WorkFunc, rules []TransientRule, opts ...Option) *Task

NewOneShotTask creates a task that executes once. If rules is nil — no retries, any error is fatal. If rules is provided — transient errors are retried per their configuration.

Each TransientRule binds an error to its own retry budget and backoff curve. TransientRule.MaxRetries limits consecutive failures for that rule — the budget is never reset mid-execution for one-shot tasks.

Panics if work is nil. Panics if any rule has nil Err, unsupported Err type, or Backoff.Initial <= 0.

func (*Task) Wait added in v0.5.0

func (t *Task) Wait(ctx context.Context) error

Wait runs the task to completion, respecting the configured retry policy, backoff and interval. It blocks until the task finishes or ctx is cancelled.

type TransientRule added in v0.6.0

type TransientRule struct {
	// Err is the error to match.
	// Must be an error value (for errors.Is) or a pointer to an error type (for errors.As).
	// Passing nil or an unsupported type panics at construction time.
	// Examples:
	//
	//	{Err: ErrTimeout}           // sentinel → errors.Is
	//	{Err: (*net.OpError)(nil)}  // pointer-to-type → errors.As
	Err error

	// MaxRetries limits consecutive retry attempts for this rule.
	//   0 (zero-value) → DefaultMaxRetries (3) — safe default.
	//  -1 (UnlimitedRetries) → no limit — explicit opt-in.
	//  >0 → exact retry count.
	MaxRetries int

	Backoff BackoffConfig
}

TransientRule binds an error to its retry settings. Different errors can have different retry budgets and backoff curves.

The Err field accepts two forms:

  • error value (sentinel): matched via errors.Is
  • *T where T implements error: matched via errors.As

Examples:

{Err: ErrTimeout}           // sentinel → errors.Is
{Err: (*net.OpError)(nil)}  // pointer-to-type → errors.As

func TransientGroup added in v0.7.0

func TransientGroup(maxRetries int, backoff BackoffConfig, errs ...error) []TransientRule

TransientGroup creates N rules with identical MaxRetries and BackoffConfig. Each rule gets its own independent retry budget — failures of one error do not count toward the budget of another.

Each error in errs must be a valid Err value (sentinel or typed nil pointer). See TransientRule.Err for details.

Example:

longrun.TransientGroup(longrun.UnlimitedRetries, longrun.DefaultBackoff(),
    (*net.OpError)(nil),
    ErrFetchIssues,
    ErrStoreIssues,
)

type WorkFunc added in v0.5.0

type WorkFunc func(ctx context.Context) error

WorkFunc is the function that performs the actual work of a task.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL