s3lect

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 6, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

README

S3lect

S3lect is a leader election package for Go which uses S3 (or S3-compatible object storage) as the coordination mechanism.

It is designed for cloud-native applications that require reliable leader election with minimal operational overhead and cost-effective scaling.

S3lect uses S3's consistency guarantees and atomic operations to provide safe leader election across multiple instances, with configurable polling intervals and an optional peer-polling setting for S3 read/cost reduction.

S3lect was created by Nadrama for use in components of its Open Source container PaaS.

Key Features

  • S3-based coordination: No additional infrastructure required beyond S3/S3-compatible object storage
  • Dual-interval optimization: Frequent polling during transitions, infrequent or peer-polling during stable periods
  • Peer communication mode: Optional HTTP-based leader health checks to minimize S3 read operations during polling
  • Configurable timeouts: Tunable leader detection and failover timing
  • Simple retry logic: Built-in resilience to transient failures such as networking errors

Leader Election Algorithm

Core Algorithm
  1. Read leader lockfile from S3 at the configured lockfile key/path

  2. Followers evaluate leader status:

    • If no lockfile exists → attempt to become leader, using empty ETag
    • If leader hasn't updated within timeout period → attempt to become leader, using last known ETag
  3. Attempting to become leader:

    • Write new leader record with conditional S3 operations (using ETag from point 2 above), exit attempt on failure
    • Ensure minimum/grace timeout period size gap since becoming leader to prevent "split-brain" scenarios
  4. Remaining a leader:

    • Leaders continuously update their timestamp in S3 every (configurable) interval
    • Failed updates (after configurable retries/timeout) result in automatic leadership resignation
Dual-Interval System

S3lect operates in two intervals, and the latter interval can use two different modes to balance cost and performance:

Frequent Interval (default: 5 seconds)

  • Used during leadership transitions and instability
  • All instances poll S3 directly every X seconds for leader status
  • Ensures fast failover detection (i.e. 11-15 seconds typical)
  • Automatically engaged when peer communication fails

Infrequent Interval (default: 30 seconds)

  • Used during stable periods with established leadership
  • Reduces S3 operations significantly
  • Two sub-modes available:
    • S3 Mode: Standard S3 polling at reduced frequency: all instances poll S3 directly every Y seconds for leadership status
    • Peer Mode: Followers check leader health via leader's HTTP API, and fallback to polling S3 on failure/timeout
S3 File Format

The S3 file format is a JSON document with the following fields:

  • leaderID: Unique identifier for the current leader instance
  • leaderAddr: Network address of the current leader (for peer communication)
  • lastUpdated: Timestamp of the last update to the leader record

e.g.

{
  "leaderID": "server-001",
  "leaderAddr": "10.0.1.42:8443",
  "lastUpdated": "2024-10-27T10:30:45Z"
}
Peer Communication Protocol

When peer mode is enabled, followers in infrequent interval will:

  1. Attempt peer health check: HTTPS GET to https://{leaderAddr}{peerHealthPath} using cached leader address (default peer health path: /health/leadership)
  2. On success: Use the leader data from peer response, skip S3 read entirely, continue infrequent polling
  3. On failure: Fall back to S3 read to get current leader info, switch to frequent interval with direct S3 polling

The peer health endpoint returns the same JSON document as stored in S3 (as described in the previous section above).

  • We have opted to use a JSON HTTP API instead of gRPC or alternative for simplicity and parity with the S3 lockfile format
Resilience and Retry Logic

All network operations (S3 and peer communication) include automatic retry:

  1. Immediate attempt
  2. 100ms delayed retry on failure
  3. 1-second delayed retry on second failure
  4. Give up and continue with election logic

This provides resilience against transient network issues while maintaining responsive failover timing.

Configuration

S3lect is configured through the ElectorConfig structure:

  • LockfilePath: S3 object key/path for the leader lockfile (e.g., "leader/my-group.json")
  • ServerID: Unique identifier for this instance
  • ServerAddr: Network address for peer communication (e.g., "10.0.1.42:8443")
  • FrequentInterval: Polling interval during transitions (default: 5s)
  • InfrequentInterval: Polling interval during stable periods (default: 30s)
  • LeaderTimeout: Time before considering leader failed (default: 15s)
  • PeerMode: Enable HTTP-based leader health checks (default: false)
  • PeerHealthPath: HTTP path for the peer leader health check endpoint (default: "/health/leadership")
  • PeerTimeout: Timeout for peer health check requests (default: 3s)

Integration Requirements

Storage Interface

S3lect requires an S3-compatible storage implementation providing:

  • Get(ctx, key) - Read object and return it with its ETag
  • PutIfMatch(ctx, key, data, etag) - Conditional put operation (using ETag from Get)

S3lect accepts the storage implementation as a parameter in the ElectorConfig structure, and if not specified it falls back to the S3 implementation in the AWS SDK for Go v2.

Leadership Callbacks

Applications can register callbacks to receive leadership change notifications, enabling immediate response to election events.

Operational Characteristics

  • Failover time: 11-15 seconds typical, up to 30 seconds worst-case
  • Scalability: Follower count doesn't significantly impact leader S3 operations in peer mode
  • Dependencies: Only requires S3-compatible storage and standard HTTP client

License

S3lect is licensed under the Apache License, Version 2.0. Copyright 2025 Nadrama Pty Ltd. See the LICENSE file for details.

Documentation

Overview

Package s3lect provides peer health server functionality for leader election.

The health server can be used in two ways:

  1. As a standalone HTTPS server: server := s3lect.NewHealthServer(s3lect.HealthServerConfig{...}) server.Start(ctx)

  2. As an HTTP handler embedded in existing servers: mux.HandleFunc(elector.GetConfig().PeerHealthPath, s3lect.NewLeadershipHandler(elector, logger))

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrStorageNotFound     = errors.New("object not found")
	ErrStoragePrecondition = errors.New("precondition failed")
)

Functions

func NewLeadershipHandler

func NewLeadershipHandler(elector Elector, logger *slog.Logger) http.HandlerFunc

NewLeadershipHandler creates an HTTP handler for embedding in existing servers. This allows you to add the leadership health endpoint to an existing HTTP server.

Example:

mux := http.NewServeMux()
mux.HandleFunc(elector.GetConfig().PeerHealthPath, s3lect.NewLeadershipHandler(elector, logger))
server := &http.Server{Handler: mux, ...}

Types

type Elector

type Elector interface {
	// Start begins the leader election process
	Start(ctx context.Context) error

	// Stop stops the leader election process
	Stop() error

	// IsLeader returns true if this instance is currently the leader
	IsLeader() bool

	// WaitForLeadership blocks until this instance becomes leader
	WaitForLeadership(ctx context.Context) error

	// WaitForNextElection blocks until an election cycle completes after the given timestamp
	// If the timestamp is zero/empty, blocks until the first-ever election cycle completes
	// Returns the leadership status after the election completes
	WaitForNextElection(ctx context.Context, since time.Time) (*LeadershipStatus, error)

	// LeaderID returns the current leader's identity
	LeaderID() string

	// GetLeadershipStatus returns detailed leadership status
	GetLeadershipStatus() *LeadershipStatus

	// EnablePeerMode enables peer mode with the provided CA certificate
	EnablePeerMode(caCert []byte) error

	// UpdateConfig allows dynamic reconfiguration of the elector
	UpdateConfig(newConfig ElectorConfig) error

	// GetConfig returns the current configuration
	GetConfig() *ElectorConfig
}

Elector handles leader election for a distributed system

type ElectorConfig

type ElectorConfig struct {
	// LockfilePath is the S3 object key/path for the leader lockfile (e.g., "leader/my-group.json")
	LockfilePath string

	// ServerID is this instance's unique identifier
	ServerID string

	// ServerAddr is the address to advertise for peer health checks
	ServerAddr string

	// FrequentInterval is how often to check for leadership during transitions (default: 5s)
	FrequentInterval time.Duration

	// InfrequentInterval is how often to check for leadership during stable periods (default: 30s)
	InfrequentInterval time.Duration

	// LeaderTimeout is how long to wait before considering leader dead (default: 15s)
	LeaderTimeout time.Duration

	// PeerMode enables HTTP-based leader health checks to reduce S3 operations
	PeerMode bool

	// PeerHealthPath is the HTTP path for the peer leader health check endpoint (default: "/health/leadership")
	PeerHealthPath string

	// PeerTimeout is timeout for peer health check requests (default: 3s)
	PeerTimeout time.Duration

	// PeerCACert is the CA certificate for validating peer HTTPS connections
	PeerCACert []byte

	// OnAcquireLeadership is called after successfully claiming leadership in S3.
	// If this hook returns an error, leadership will be automatically resigned.
	// This is useful for critical operations like attaching network interfaces.
	OnAcquireLeadership func(ctx context.Context) error

	// OnLoseLeadership is called after losing leadership.
	// Errors are logged but do not prevent leadership transition (best effort cleanup).
	// This is useful for detaching network interfaces or stopping leader-only services.
	OnLoseLeadership func(ctx context.Context) error
}

ElectorConfig contains configuration for leader election

func (*ElectorConfig) Validate

func (c *ElectorConfig) Validate() error

Validate validates the elector configuration

type HealthServer

type HealthServer struct {
	// contains filtered or unexported fields
}

HealthServer provides the peer health endpoint for leader election

func NewHealthServer

func NewHealthServer(config HealthServerConfig) (*HealthServer, error)

NewHealthServer creates a new standalone peer health server. This creates a dedicated HTTPS server for the leadership health endpoint.

Example:

cert, _ := tls.X509KeyPair(certPEM, keyPEM)
server, err := s3lect.NewHealthServer(s3lect.HealthServerConfig{
    BindAddress:   "0.0.0.0:8993",
    Certificate:   cert,
    Elector:       elector,
    Logger:        logger,
})
server.Start(ctx)

func (*HealthServer) Start

func (hs *HealthServer) Start(ctx context.Context) error

Start starts the health server

func (*HealthServer) Stop

func (hs *HealthServer) Stop(ctx context.Context) error

Stop stops the health server

type HealthServerConfig

type HealthServerConfig struct {
	BindAddress string
	Certificate tls.Certificate
	Elector     Elector
	Logger      *slog.Logger
}

HealthServerConfig contains configuration for the health server

type LeaderRecord

type LeaderRecord struct {
	LeaderID    string               `json:"leaderID,omitempty"`
	LeaderAddr  string               `json:"leaderAddr,omitempty"`
	LastUpdated time.Time            `json:"lastUpdated"`
	Metadata    LeaderRecordMetadata `json:"-"`
}

LeaderRecord represents the leader information stored in S3

type LeaderRecordMetadata

type LeaderRecordMetadata struct {
	ETag string
}

LeaderRecordMetadata represents the S3 metadata of the LeaderRecord file

type LeadershipStatus

type LeadershipStatus struct {
	ServerID         string    `json:"serverID"`
	LockfilePath     string    `json:"lockfilePath"`
	IsLeader         bool      `json:"isLeader"`
	LeaderID         string    `json:"leaderID"`
	LeaderAddr       string    `json:"leaderAddr"`
	LeaderLastSeen   time.Time `json:"leaderLastSeen"`
	LastElectionTime time.Time `json:"lastElectionTime"`
	ConsecutiveFails int       `json:"consecutiveFails"`
}

LeadershipStatus contains detailed information about current leadership

type MockStorage

type MockStorage struct {
	// contains filtered or unexported fields
}

MockStorage implements Storage interface using local filesystem for testing

func NewMock

func NewMock() *MockStorage

NewMock creates a new mock storage instance with a temporary directory

func NewMockStorage

func NewMockStorage(baseDir string) (*MockStorage, error)

NewMockStorage creates a new mock storage instance

func (*MockStorage) Cleanup

func (m *MockStorage) Cleanup() error

Cleanup removes all files from mock storage

func (*MockStorage) Get

func (m *MockStorage) Get(ctx context.Context, key string) ([]byte, string, error)

Get retrieves an object from mock storage Returns ErrStorageNotFound if object does not exist

func (*MockStorage) PutIfMatch

func (m *MockStorage) PutIfMatch(ctx context.Context, key string, data []byte, etag string) error

PutIfMatch stores an object only if the ETag matches (optimistic locking) Empty etag means object must not exist Returns ErrStoragePrecondition if ETag doesn't match

type S3Elector

type S3Elector struct {
	// contains filtered or unexported fields
}

S3Elector implements leader election using S3 as the coordination mechanism

func NewS3Elector

func NewS3Elector(opts S3ElectorOptions) (*S3Elector, error)

NewS3Elector creates a new S3-based leader elector

func (*S3Elector) EnablePeerMode

func (e *S3Elector) EnablePeerMode(caCert []byte) error

EnablePeerMode enables peer mode with the provided CA certificate

func (*S3Elector) GetConfig

func (e *S3Elector) GetConfig() *ElectorConfig

GetConfig returns the current configuration

func (*S3Elector) GetLeadershipStatus

func (e *S3Elector) GetLeadershipStatus() *LeadershipStatus

GetLeadershipStatus returns detailed leadership status

func (*S3Elector) IsLeader

func (e *S3Elector) IsLeader() bool

IsLeader returns true if this instance is currently the leader

func (*S3Elector) LeaderID

func (e *S3Elector) LeaderID() string

LeaderID returns the current leader's identity

func (*S3Elector) Start

func (e *S3Elector) Start(ctx context.Context) error

Start begins the leader election process

func (*S3Elector) Stop

func (e *S3Elector) Stop() error

Stop stops the leader election process

func (*S3Elector) UpdateConfig

func (e *S3Elector) UpdateConfig(newConfig ElectorConfig) error

UpdateConfig allows dynamic reconfiguration of the elector

func (*S3Elector) WaitForLeadership

func (e *S3Elector) WaitForLeadership(ctx context.Context) error

WaitForLeadership blocks until this instance becomes leader

func (*S3Elector) WaitForNextElection added in v1.0.1

func (e *S3Elector) WaitForNextElection(ctx context.Context, since time.Time) (*LeadershipStatus, error)

WaitForNextElection blocks until an election cycle completes after the given timestamp If the timestamp is zero/empty, blocks until the first-ever election cycle completes Returns the leadership status after the election completes

type S3ElectorOptions

type S3ElectorOptions struct {
	Config  *ElectorConfig
	Storage Storage
	Logger  *slog.Logger
}

S3ElectorOptions contains options for creating a new S3Elector

type Storage

type Storage interface {
	// Get retrieves an object from storage and returns its contents and its etag
	// Returns ErrStorageNotFound if object does not exist
	Get(ctx context.Context, key string) (data []byte, etag string, err error)

	// PutIfMatch stores an object only if the ETag matches (empty string = must not exist)
	// Returns ErrStoragePrecondition if ETag doesn't match
	PutIfMatch(ctx context.Context, key string, data []byte, etag string) error
}

Storage provides an abstraction for object storage operations

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL