structs

package

v0.0.2-alpha2 Latest Latest Go to latest Published: Jun 19, 2017 License: MIT Imports: 3 Imported by: 6

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/elsevier-core-engineering/replicator

Links

Open Source Insights

Documentation ¶

Index ¶

type AllocationResources
type ClusterCapacity
type ClusterScaling
- func (c *ClusterScaling) Merge(b *ClusterScaling) *ClusterScaling
type Config
- func (c *Config) Merge(b *Config) *Config
type ConsulClient
type FailsafeMode
type GroupScalingPolicy
type JobScaling
- func (j *JobScaling) Merge(b *JobScaling) *JobScaling
type JobScalingPolicies
type JobScalingPolicy
type NodeAllocation
type NomadClient
type Notification
- func (n *Notification) Merge(b *Notification) *Notification
type Scaling
type State
type TaskAllocation
type Telemetry
- func (t *Telemetry) Merge(b *Telemetry) *Telemetry

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AllocationResources ¶

type AllocationResources struct {
	MemoryMB      int
	CPUMHz        int
	DiskMB        int
	MemoryPercent float64
	CPUPercent    float64
	DiskPercent   float64
}

AllocationResources represents the allocation resource utilization.

type ClusterCapacity ¶ added in v0.0.2

type ClusterCapacity struct {
	// NodeCount is the number of worker nodes in a ready and non-draining state
	// across the cluster.
	NodeCount int

	// ScalingMetric indicates the most-utilized allocation resource across the
	// cluster. The most-utilized resource is prioritized when making scaling
	// decisions like identifying the least-allocated worker node.
	ScalingMetric string

	// MaxAllowedUtilization represents the max allowed cluster utilization after
	// considering node fault-tolerance and task group scaling overhead.
	MaxAllowedUtilization int

	// ClusterTotalAllocationCapacity is the total allocation capacity across
	// the cluster.
	TotalCapacity AllocationResources

	// ClusterUsedAllocationCapacity is the consumed allocation capacity across
	// the cluster.
	UsedCapacity AllocationResources

	// TaskAllocation represents the total allocation requirements of a single
	// instance (count 1) of all running jobs across the cluster. This is used to
	// practively ensure the cluster has sufficient available capacity to scale
	// each task by +1 if an increase in capacity is required.
	TaskAllocation AllocationResources

	// NodeList is a list of all worker nodes in a known good state.
	NodeList []string

	// NodeAllocations is a slice of node allocations.
	NodeAllocations []*NodeAllocation

	// ScalingDirection is the direction in/out of cluster scaling we require
	// after performning the proper evalutation.
	ScalingDirection string
}

ClusterCapacity is the central object used to track and evaluate cluster capacity, utilization and stores the data required to make scaling decisions. All data stored in this object is disposable and is generated during each evaluation.

type ClusterScaling ¶

type ClusterScaling struct {
	// Enabled indicates whether cluster scaling actions are permitted.
	Enabled bool `mapstructure:"enabled"`

	// MaxSize in the maximum number of instances the nomad node worker count is
	// allowed to reach. This stops runaway increases in size due to misbehaviour
	// but should be set high enough to accommodate usual workload peaks.
	MaxSize int `mapstructure:"max_size"`

	// MinSize is the minimum number of instances that should be present within
	// the nomad node worker pool.
	MinSize int `mapstructure:"min_size"`

	// CoolDown is the number of seconds after a scaling activity completes before
	// another can begin.
	CoolDown float64 `mapstructure:"cool_down"`

	// NodeFaultTolerance is the number of Nomad worker nodes the cluster can
	// support losing, whilst still maintaining all existing workload.
	NodeFaultTolerance int `mapstructure:"node_fault_tolerance"`

	// AutoscalingGroup is the name of the ASG assigned to the Nomad worker nodes.
	AutoscalingGroup string `mapstructure:"autoscaling_group"`

	// RetryPeriod is the number of times Replicator will retry scale-out when
	// new nodes do not join the worker pool and reach the join timeout.
	RetryThreshold int `mapstructure:"retry_threshold"`

	// ScalingThreshold is the number of consecutive times Replicator determines
	// as cluster scaling action should occur before that request is allowed to
	// be enforced.
	ScalingThreshold int `mapstructure:"scaling_threshold"`
}

ClusterScaling is the configuration struct for the Nomad worker node scaling activities.

func (*ClusterScaling) Merge ¶ added in v0.0.2

func (c *ClusterScaling) Merge(b *ClusterScaling) *ClusterScaling

Merge is used to merge two ClusterScaling configurations together.

type Config ¶

type Config struct {
	// Consul is the location of the Consul instance or cluster endpoint to query
	// (may be an IP address or FQDN) with port.
	Consul string `mapstructure:"consul"`

	// ConsulKeyLocation is the Consul key root location where Replicator stores
	// and fetches critical information from.
	ConsulKeyLocation string `mapstructure:"consul_key_location"`

	// ConsulToken is the Consul ACL token used to access KeyValues from a
	// secure Consul installation.
	ConsulToken string `mapstructure:"consul_token"`

	// Nomad is the location of the Nomad instance or cluster endpoint to query
	// (may be an IP address or FQDN) with port.
	Nomad string `mapstructure:"nomad"`

	// LogLevel is the level at which the application should log from.
	LogLevel string `mapstructure:"log_level"`

	// ScalingInterval is the duration in seconds between Replicator runs and thus
	// scaling requirement checks.
	ScalingInterval int `mapstructure:"scaling_interval"`

	// Region represents the AWS region the cluster resides in.
	Region string `mapstructure:"aws_region"`

	// ClusterScaling is the configuration struct that controls the basic Nomad
	// worker node scaling.
	ClusterScaling *ClusterScaling `mapstructure:"cluster_scaling"`

	// JobScaling is the configuration struct that controls the basic Nomad
	// job scaling.
	JobScaling *JobScaling `mapstructure:"job_scaling"`

	// Telemetry is the configuration struct that controls the telemetry settings.
	Telemetry *Telemetry `mapstructure:"telemetry"`

	// Notification
	Notification *Notification `mapstructure:"notification"`

	// ConsulClient provides a client to interact with the Consul API.
	ConsulClient ConsulClient

	// NomadClient provides a client to interact with the Nomad API.
	NomadClient NomadClient
}

Config is the main configuration struct used to configure the replicator application.

func (*Config) Merge ¶

func (c *Config) Merge(b *Config) *Config

Merge merges two configurations.

type ConsulClient ¶

type ConsulClient interface {
	// GetJobScalingPolicies provides a list of Nomad jobs with a defined scaling
	// policy document at a specified Consuk Key/Value Store location. Supports
	// the use of an ACL token if required by the Consul cluster.
	GetJobScalingPolicies(*Config, NomadClient) ([]*JobScalingPolicy, error)

	// WriteState is responsible for persistently storing state tracking
	// information in the Consul Key/Value Store.
	WriteState(*Config, *State) error

	// LoadState attempts to read state tracking information from the Consul
	// Key/Value Store. If state tracking information is present, it will be
	// preferred. If no persistent data is available, the method returns the
	// state tracking object unmodified.
	LoadState(*Config, *State) *State

	CreateSession(int, chan struct{}) (string, error)

	AcquireLeadership(string, string) bool

	ResignLeadership(string, string)
}

The ConsulClient interface is used to provide common method signatures for interacting with the Consul API.

type FailsafeMode ¶ added in v0.0.2

type FailsafeMode struct {
	// Config stores partial configuration required to interact with Consul.
	Config *Config

	// Disable instructs the failsafe CLI command to disable failsafe mode.
	Disable bool

	// Enable instructs the failsafe CLI command to enable failsafe mode.
	Enable bool

	// Force supresses confirmation prompts when enabling/disabling failsafe.
	Force bool

	// Verb represents the action to be displayed during confirmation prompts.
	Verb string
}

FailsafeMode is the configuration struct for administratively interacting with the distributed failsafe lock.

type GroupScalingPolicy ¶

type GroupScalingPolicy struct {
	// GroupName is the jobs Group name which this scaling policy represents.
	GroupName string `json:"name"`

	// TaskResources is a list
	Tasks TaskAllocation `json:"task_resources"`

	// ScalingMetric represents the most-utilized resource within the task group.
	ScalingMetric string

	// Scaling is a list of Scaling objects.
	Scaling *Scaling
}

GroupScalingPolicy represents the scaling policy of an individual group within a signle job.

type JobScaling ¶

type JobScaling struct {
	// Enabled indicates whether job scaling actions are permitted.
	Enabled bool `mapstructure:"enabled"`
}

JobScaling is the configuration struct for the Nomad job scaling activities.

func (*JobScaling) Merge ¶ added in v0.0.2

func (j *JobScaling) Merge(b *JobScaling) *JobScaling

Merge is used to merge two JobScaling configurations together.

type JobScalingPolicies ¶

type JobScalingPolicies []*JobScalingPolicy

JobScalingPolicies is a list of ScalingPolicy objects.

type JobScalingPolicy ¶

type JobScalingPolicy struct {
	// JobName is the name of the Nomad job represented by the Consul Key/Value.
	JobName string

	// Enabled is a boolean falg which dictates whether scaling events for the job
	// should be enforced and is used for testing purposes.
	Enabled bool `json:"enabled"`

	// GroupScalingPolicies is a list of GroupScalingPolicy objects.
	GroupScalingPolicies []*GroupScalingPolicy `json:"groups"`
}

JobScalingPolicy is a struct which represents an individual job scaling policy document.

type NodeAllocation ¶

type NodeAllocation struct {
	// NodeID is the unique ID of the worker node.
	NodeID string

	// NodeIP is the private IP of the worker node.
	NodeIP string

	// UsedCapacity represents the percentage of total cluster resources consumed
	// by the worker node.
	UsedCapacity AllocationResources
}

NodeAllocation describes the resource consumption of a specific worker node.

type NomadClient ¶

type NomadClient interface {
	// ClusterAllocationCapacity determines the total cluster capacity and current
	// number of worker nodes.
	ClusterAllocationCapacity(*ClusterCapacity) error

	// ClusterAssignedAllocation determines the consumed capacity across the
	// cluster and tracks the resource consumption of each worker node.
	ClusterAssignedAllocation(*ClusterCapacity) error

	// DrainNode places a worker node in drain mode to stop future allocations and
	// migrate existing allocations to other worker nodes.
	DrainNode(string) error

	// EvaluateClusterCapacity determines if a cluster scaling action is required.
	EvaluateClusterCapacity(*ClusterCapacity, *Config) (bool, error)

	// EvaluateJobScaling compares the consumed resource percentages of a Job group
	// against its scaling policy to determine whether a scaling event is required.
	EvaluateJobScaling([]*JobScalingPolicy)

	// GetAllocationStats discovers the resources consumed by a particular Nomad
	// allocation.
	GetAllocationStats(*nomad.Allocation, *GroupScalingPolicy)

	// GetJobAllocations identifies all allocations for an active job.
	GetJobAllocations([]*nomad.AllocationListStub, *GroupScalingPolicy)

	// IsJobRunning checks to see whether the specified jobID has any currently
	// task groups on the cluster.
	IsJobRunning(string) bool

	// JobScale takes a scaling policy and then attempts to scale the desired job
	// to the appropriate level whilst ensuring the event will not excede any job
	// thresholds set.
	JobScale(*JobScalingPolicy)

	// LeastAllocatedNode determines which worker pool node is consuming the
	// least amount of the cluster's most-utilized resource. If Replicator is
	// running as a Nomad job, the worker node running the Replicator leader will
	// be excluded.
	LeastAllocatedNode(*ClusterCapacity, *State) (string, string)

	// NodeReverseLookup provides a method to get the ID of the worker pool node
	// running a given allocation.
	NodeReverseLookup(string) (string, error)

	// MostUtilizedResource calculates which resource is most-utilized across the
	// cluster. The worst-case allocation resource is prioritized when making
	// scaling decisions.
	MostUtilizedResource(*ClusterCapacity)

	// TaskAllocationTotals calculates the allocations required by each running
	// job and what amount of resources required if we increased the count of
	// each job by one. This allows the cluster to proactively ensure it has
	// sufficient capacity for scaling events and deal with potential node failures.
	TaskAllocationTotals(*ClusterCapacity) error

	// VerifyNodeHealth evaluates whether a specified worker node is a healthy
	// member of the Nomad cluster.
	VerifyNodeHealth(string) bool
}

NomadClient exposes all API methods needed to interact with the Nomad API, evaluate cluster capacity and allocations and make scaling decisions.

type Notification ¶ added in v0.0.2

type Notification struct {
	// ClusterScalingUID is the UID to assosiate to the cluster scaling alert.
	ClusterScalingUID string `mapstructure:"cluster_scaling_uid"`

	// ClusterIdentifier is a friendly name which is used when sending
	// notifications for easy human identification.
	ClusterIdentifier string `mapstructure:"cluster_identifier"`

	// PagerDutyServiceKey is the PD integration key for the Events API v1.
	PagerDutyServiceKey string `mapstructure:"pagerduty_service_key"`

	// Notifiers is where our initialize notification backends are stored so they
	// can be used on the fly when required.
	Notifiers []notifier.Notifier
}

Notification is the control struct for Replicator notifications.

func (*Notification) Merge ¶ added in v0.0.2

func (n *Notification) Merge(b *Notification) *Notification

Merge is used to merge two Notification configurations together.

type Scaling ¶

type Scaling struct {
	// Min in the minimum number of tasks the job should have running at any one
	// time.
	Min int `json:"min"`

	// Max in the maximum number of tasks the job should have running at any one
	// time.
	Max int `json:"max"`

	// ScaleDirection is populated by either out/in/none depending on the evalution
	// of a scaling event happening.
	ScaleDirection string

	// ScaleOut is the job scaling out policy which will contain the thresholds
	// which control scaling activies.
	ScaleOut *scaleout `json:"scaleout"`

	// ScaleIn is the job scaling in policy which will contain the thresholds
	// which control scaling activies.
	ScaleIn *scalein `json:"scalein"`
}

Scaling struct represents the scaling policy of a Nomad Job Group as well as details of any scaling activities which should take place during the current deamon run.

type State ¶ added in v0.0.2

type State struct {
	// ClusterScaleInRequests tracks the number of consecutive times replicator
	// has indicated the cluster worker pool should be scaled in.
	ClusterScaleInRequests int `json:"cluster_scalein_requests"`

	// ClusterScaleOutRequests tracks the number of consecutive times replicator
	// has indicated the cluster worker pool should be scaled out.
	ClusterScaleOutRequests int `json:"cluster_scaleout_requests"`

	// FailsafeMode tracks whether the daemon has exceeded the fault threshold
	// while attempting to perform scaling operations. When operating in failsafe
	// mode, the daemon will decline to take scaling actions of any type.
	FailsafeMode bool `json:"failsafe_mode"`

	// Tracks whether the last failsafe mode change was initiated by an
	// operator via the CLI.
	FailsafeModeAdmin bool `json:"failsafe_mode_admin"`

	// LastFailedNode allows us to track the last node which was launched which
	// failed to join the cluster.
	LastFailedNode string `json:"last_failed_node"`

	// LastNodeFailure represents the last time a new worker node was launched
	// and failed to successfully join the worker pool.
	LastNodeFailure time.Time `json:"last_node_failure"`

	// LastScalingEvent represents the last time the daemon successfully
	// completed a cluster scaling action.
	LastScalingEvent time.Time `json:"last_scaling_event"`

	// LastUpdated tracks the last time the state tracking data was updated.
	LastUpdated time.Time `json:"last_updated"`

	// NodeFailureCount tracks the number of worker nodes that have failed to
	// successfully join the worker pool after a scale-out operation.
	NodeFailureCount int `json:"node_failure_count"`

	// ProtectedNode represents the Nomad agent node on which the Replicator
	// leader is running. This node will be excluded when identifying an eligible
	// node for termination during scaling actions.
	ProtectedNode string `json:"protected_node"`
}

State is the central object for managing and storing all cluster scaling state information.

type TaskAllocation ¶

type TaskAllocation struct {
	// TaskName is the name given to the task within the job specficiation.
	TaskName string

	// Resources tracks the resource requirements defined in the job spec and the
	// real-time utilization of those resources.
	Resources AllocationResources
}

TaskAllocation describes the resource requirements defined in the job specification.

type Telemetry ¶

type Telemetry struct {
	// StatsdAddress specifies the address of a statsd server to forward metrics
	// to and should include the port.
	StatsdAddress string `mapstructure:"statsd_address"`
}

Telemetry is the struct that control the telemetry configuration. If a value is present then telemetry is enabled. Currently statsd is only supported for sending telemetry.

func (*Telemetry) Merge ¶ added in v0.0.2

func (t *Telemetry) Merge(b *Telemetry) *Telemetry

Merge is used to merge two Telemetry configurations together.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL