Documentation
¶
Index ¶
- Constants
- type AllocationResources
- type ClusterCapacity
- type Config
- type ConsulClient
- type FailsafeMode
- type GroupScalingPolicy
- type JobScalingPolicies
- type NodeAllocation
- type NodeRegistry
- type NomadClient
- type Notification
- type ScalingMetric
- type ScalingState
- type TaskAllocation
- type Telemetry
- type WorkerPool
Constants ¶
const ( NodeStatusInit = "initializing" NodeStatusReady = "ready" NodeStatusDown = "down" )
Set of possible states for a node.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AllocationResources ¶
type AllocationResources struct {
MemoryMB int
CPUMHz int
DiskMB int
MemoryPercent float64
CPUPercent float64
DiskPercent float64
}
AllocationResources represents the allocation resource utilization.
type ClusterCapacity ¶ added in v0.0.2
type ClusterCapacity struct {
// NodeCount is the number of worker nodes in a ready and non-draining state
// across the cluster.
NodeCount int
// ScalingMetric indicates the most-utilized allocation resource across the
// cluster. The most-utilized resource is prioritized when making scaling
// decisions like identifying the least-allocated worker node.
ScalingMetric ScalingMetric
// MaxAllowedUtilization represents the max allowed cluster utilization after
// considering node fault-tolerance and task group scaling overhead.
MaxAllowedUtilization int
// ClusterTotalAllocationCapacity is the total allocation capacity across
// the cluster.
TotalCapacity AllocationResources
// ClusterUsedAllocationCapacity is the consumed allocation capacity across
// the cluster.
UsedCapacity AllocationResources
// TaskAllocation represents the total allocation requirements of a single
// instance (count 1) of all running jobs across the cluster. This is used to
// practively ensure the cluster has sufficient available capacity to scale
// each task by +1 if an increase in capacity is required.
TaskAllocation AllocationResources
// NodeList is a list of all worker nodes in a known good state.
NodeList []string
// NodeAllocations is a slice of node allocations.
NodeAllocations []*NodeAllocation
// ScalingDirection is the direction in/out of cluster scaling we require
// after performning the proper evalutation.
ScalingDirection string
}
ClusterCapacity is the central object used to track and evaluate cluster capacity, utilization and stores the data required to make scaling decisions. All data stored in this object is disposable and is generated during each evaluation.
type Config ¶
type Config struct {
// ClusterScalingDisable is a global parameter that can be used to disable
// Replicator from undertaking any cluster scaling evaluations.
ClusterScalingDisable bool `mapstructure:"cluster_scaling_disable"`
// ClusterScalingInterval is the period in seconds at which the ticker will
// run.
ClusterScalingInterval int `mapstructure:"cluster_scaling_interval"`
// Consul is the location of the Consul instance or cluster endpoint to query
// (may be an IP address or FQDN) with port.
Consul string `mapstructure:"consul"`
// ConsulClient provides a client to interact with the Consul API.
ConsulClient ConsulClient
// ConsulKeyRoot is the Consul key root location where Replicator stores
// and fetches critical information from.
ConsulKeyRoot string `mapstructure:"consul_key_root"`
// ConsulToken is the Consul ACL token used to access KeyValues from a
// secure Consul installation.
ConsulToken string `mapstructure:"consul_token"`
// JobScalingDisable is a global parameter that can be used to disable
// Replicator from undertaking any job scaling evaluations.
JobScalingDisable bool `mapstructure:"job_scaling_disable"`
// JobScalingInterval is the period in seconds at which the ticker will
// run.
JobScalingInterval int `mapstructure:"job_scaling_interval"`
// LogLevel is the level at which the application should log from.
LogLevel string `mapstructure:"log_level"`
// Nomad is the location of the Nomad instance or cluster endpoint to query
// (may be an IP address or FQDN) with port.
Nomad string `mapstructure:"nomad"`
// NomadClient provides a client to interact with the Nomad API.
NomadClient NomadClient
// Notification contains Replicators notification configuration params and
// initialized backends.
Notification *Notification `mapstructure:"notification"`
// Telemetry is the configuration struct that controls the telemetry settings.
Telemetry *Telemetry `mapstructure:"telemetry"`
}
Config is the main configuration struct used to configure the replicator application.
type ConsulClient ¶
type ConsulClient interface {
// AcquireLeadership attempts to acquire a Consul leadersip lock using the
// provided session. If the lock is already taken this will return false in
// a show that there is already a leader.
AcquireLeadership(string, *string) bool
// CreateSession creates a Consul session for use in the Leadership locking
// process and will spawn off the renewing of the session in order to ensure
// leadership can be maintained.
CreateSession(int, chan struct{}) (string, error)
// PersistState is responsible for persistently storing scaling
// state information in the Consul Key/Value Store.
PersistState(*ScalingState) error
// ReadState attempts to read state tracking information from the Consul
// Key/Value Store from the path provided.
ReadState(*ScalingState, bool)
// ResignLeadership attempts to remove the leadership lock upon shutdown of the
// replicator daemon. If this is unsuccessful there is not too much we can do
// therefore there is no return.
ResignLeadership(string, string)
}
The ConsulClient interface is used to provide common method signatures for interacting with the Consul API.
type FailsafeMode ¶ added in v0.0.2
type FailsafeMode struct {
// Config stores partial configuration required to interact with Consul.
Config *Config
// Disable instructs the failsafe CLI command to disable failsafe mode.
Disable bool
// Enable instructs the failsafe CLI command to enable failsafe mode.
Enable bool
// Force suppresses confirmation prompts when enabling/disabling failsafe.
Force bool
// Verb represents the action to be displayed during confirmation prompts.
Verb string
}
FailsafeMode is the configuration struct for administratively interacting with the distributed failsafe lock.
type GroupScalingPolicy ¶
type GroupScalingPolicy struct {
Cooldown time.Duration `mapstructure:"replicator_cooldown"`
Enabled bool `mapstructure:"replicator_enabled"`
GroupName string
Max int `mapstructure:"replicator_max"`
Min int `mapstructure:"replicator_min"`
ScaleDirection string `hash:"ignore"`
ScaleInCPU float64 `mapstructure:"replicator_scalein_cpu"`
ScaleInMem float64 `mapstructure:"replicator_scalein_mem"`
ScalingMetric string `hash:"ignore"`
ScaleOutCPU float64 `mapstructure:"replicator_scaleout_cpu"`
ScaleOutMem float64 `mapstructure:"replicator_scaleout_mem"`
Tasks TaskAllocation `hash:"ignore"`
UID string `mapstructure:"replicator_notification_uid"`
}
GroupScalingPolicy represents all the information needed to make JobTaskGroup scaling decisions.
func NewGroupScalingPolicy ¶ added in v1.0.0
func NewGroupScalingPolicy() *GroupScalingPolicy
NewGroupScalingPolicy is a constructor method that provides a pointer to a new group scaling policy object.
type JobScalingPolicies ¶
type JobScalingPolicies struct {
LastChangeIndex uint64
Lock sync.RWMutex
Policies map[string][]*GroupScalingPolicy
}
JobScalingPolicies tracks replicators view of Job scaling policies and states with a Lock to safe guard read/write/deletes to the Policies map.
type NodeAllocation ¶
type NodeAllocation struct {
// NodeID is the unique ID of the worker node.
NodeID string
// NodeIP is the private IP of the worker node.
NodeIP string
// UsedCapacity represents the percentage of total cluster resources consumed
// by the worker node.
UsedCapacity AllocationResources
}
NodeAllocation describes the resource consumption of a specific worker node.
type NodeRegistry ¶ added in v1.0.0
type NodeRegistry struct {
LastChangeIndex uint64
Lock sync.RWMutex
RegisteredNodes map[string]string
RegisteredNodesHash uint64
WorkerPools map[string]*WorkerPool
}
NodeRegistry tracks worker pools and nodes discovered by Replicator. The object contains a lock to provide mutual exclusion protection.
type NomadClient ¶
type NomadClient interface {
ClusterScalingSafe(*ClusterCapacity, *WorkerPool) bool
// DrainNode places a worker node in drain mode to stop future allocations
// and migrate existing allocations to other worker nodes.
DrainNode(string) error
// EvaluatePoolScaling evaluates a worker pool capacity and utilization,
// and determines whether a scaling operation is required and safe to
// implement.
EvaluatePoolScaling(*ClusterCapacity, *WorkerPool, *JobScalingPolicies) (bool, error)
// EvaluateJobScaling compares the consumed resource percentages of a Job
// group against its scaling policy to determine whether a scaling event is
// required.
EvaluateJobScaling(string, []*GroupScalingPolicy) error
// GetAllocationStats discovers the resources consumed by a particular Nomad
// allocation.
GetAllocationStats(*nomad.Allocation, *GroupScalingPolicy)
// GetJobAllocations identifies all allocations for an active job.
GetJobAllocations([]*nomad.AllocationListStub, *GroupScalingPolicy)
// IsJobInDeployment checks to see whether the supplied Nomad job is currently
// in the process of a deployment.
IsJobInDeployment(string) bool
// JobGroupScale scales a particular job group, confirming that the action
// completes successfully.
JobGroupScale(string, *GroupScalingPolicy, *ScalingState)
// JobWatcher is the main entry point into Replicators process of reading and
// updating its JobScalingPolicies tracking.
JobWatcher(*JobScalingPolicies)
// LeastAllocatedNode determines which worker pool node is consuming the
// least amount of the cluster's most-utilized resource.
LeastAllocatedNode(*ClusterCapacity, *ScalingState) (string, string)
// NodeReverseLookup provides a method to get the ID of the worker pool node
// running a given allocation.
NodeReverseLookup(string) (string, error)
// NodeWatcher provides an automated mechanism to discover worker pools and
// nodes and populate the node registry.
NodeWatcher(*NodeRegistry)
// MostUtilizedResource calculates which resource is most-utilized across the
// cluster. The worst-case allocation resource is prioritized when making
// scaling decisions.
MostUtilizedResource(*ClusterCapacity)
// VerifyNodeHealth evaluates whether a specified worker node is a healthy
// member of the Nomad cluster.
VerifyNodeHealth(string) bool
}
NomadClient exposes all API methods needed to interact with the Nomad API, evaluate cluster capacity and allocations and make scaling decisions.
type Notification ¶ added in v0.0.2
type Notification struct {
// ClusterIdentifier is a friendly name which is used when sending
// notifications for easy human identification.
ClusterIdentifier string `mapstructure:"cluster_identifier"`
// PagerDutyServiceKey is the PD integration key for the Events API v1.
PagerDutyServiceKey string `mapstructure:"pagerduty_service_key"`
// Notifiers is where our initialize notification backends are stored so they
// can be used on the fly when required.
Notifiers []notifier.Notifier
}
Notification is the control struct for Replicator notifications.
func (*Notification) Merge ¶ added in v0.0.2
func (n *Notification) Merge(b *Notification) *Notification
Merge is used to merge two Notification configurations together.
type ScalingMetric ¶ added in v1.0.1
ScalingMetric tracks information about the prioritized scaling metric.
type ScalingState ¶ added in v1.0.0
type ScalingState struct {
// FailsafeAdmin tracks whether failsafe mode is being toggled via the CLI
// tools.
FailsafeAdmin bool `json:"failsafe_admin"`
// FailsafeMode represents the status of the failsafe circuit breaker. This
// will be tripped automatically when enough consecutive failures are
// encountered.
FailsafeMode bool `json:"failsafe_mode"`
// FailureCount tracks the number of worker nodes that have failed to
// successfully join the worker pool after a scale-out operation.
FailureCount int `json:"failure_count"`
// LastNotificationEvent tracks the time of the last notification send run
// for this state object.
LastNotificationEvent time.Time `json:"last_notification_event"`
// LastScalingEvent represents the last time the daemon successfully
// completed a cluster scaling action.
LastScalingEvent time.Time `json:"last_scaling_event"`
// LastUpdated tracks the last time the state tracking data was updated.
LastUpdated time.Time `json:"last_updated"`
// Lock provides a mutex lock to protect concurrent read/write
// access to the object.
Lock sync.RWMutex `json:"-"`
// ProtectedNode represents the Nomad agent node on which the Replicator
// leader is running. This node will be excluded when identifying an eligible
// node for termination during scaling actions.
ProtectedNode string `json:"protected_node"`
// ResourceName provides a shortcut method for identifying the resource
// this state is associated with.
ResourceName string `json:"resource_name"`
// ResourceType represents the type of resource being tracked by this object.
ResourceType string `json:"resource_type"`
// ScaleInRequests tracks the number of consecutive times replicator
// has indicated the cluster worker pool should be scaled in.
ScaleInRequests int `json:"scalein_requests"`
// ScaleOutRequests tracks the number of consecutive times replicator
// has indicated the cluster worker pool should be scaled out.
ScaleOutRequests int `json:"scaleout_requests"`
// StatePath stores the path where the object should be persisted.
StatePath string `json:"state_path"`
}
ScalingState provides a state object that represents the state of a scalable worker pool or job group.
type TaskAllocation ¶
type TaskAllocation struct {
// TaskName is the name given to the task within the job specficiation.
TaskName string
// Resources tracks the resource requirements defined in the job spec and the
// real-time utilization of those resources.
Resources AllocationResources
}
TaskAllocation describes the resource requirements defined in the job specification.
type Telemetry ¶
type Telemetry struct {
// StatsdAddress specifies the address of a statsd server to forward metrics
// to and should include the port.
StatsdAddress string `mapstructure:"statsd_address"`
}
Telemetry is the struct that control the telemetry configuration. If a value is present then telemetry is enabled. Currently statsd is only supported for sending telemetry.
type WorkerPool ¶ added in v1.0.0
type WorkerPool struct {
Cooldown int `mapstructure:"replicator_cooldown"`
FaultTolerance int `mapstructure:"replicator_node_fault_tolerance"`
Max int `mapstructure:"replicator_max"`
Min int `mapstructure:"replicator_min"`
Name string `mapstructure:"replicator_worker_pool"`
Nodes map[string]*nomad.Node `hash:"ignore"`
NotificationUID string `mapstructure:"replicator_notification_uid"`
ProtectedNode string `hash:"ignore"`
Region string `mapstructure:"replicator_region"`
RetryThreshold int `mapstructure:"replicator_retry_threshold"`
ScalingEnabled bool `mapstructure:"replicator_enabled"`
ScalingThreshold int `mapstructure:"replicator_scaling_threshold"`
}
WorkerPool represents the scaling configuration of a discovered worker pool and its associated node membership.
func NewWorkerPool ¶ added in v1.0.0
func NewWorkerPool() *WorkerPool
NewWorkerPool is a constructor method that provides a pointer to a new worker pool object