failovermanager

package
v1.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 27, 2026 License: Apache-2.0 Imports: 25 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// FailoverWorkflowV2TypeName is the registered workflow type for FailoverWorkflowV2.
	FailoverWorkflowV2TypeName = "cadence-sys-failover-v2-workflow"
	// FailoverWorkflowV2ID is the fixed workflow ID, reused so only one run is active at a time.
	FailoverWorkflowV2ID = "cadence-failover-v2"
)
View Source
const (
	// RebalanceWorkflowV2TypeName is the registered workflow type for RebalanceWorkflowV2.
	RebalanceWorkflowV2TypeName = "cadence-sys-rebalance-v2-workflow"
	// RebalanceWorkflowV2ID is the fixed workflow ID, reused so only one run is active at a time.
	RebalanceWorkflowV2ID = "cadence-rebalance-v2"
)
View Source
const (

	// TaskListName tasklist
	TaskListName = "cadence-sys-failoverManager-tasklist"
	// FailoverWorkflowTypeName workflow type name
	FailoverWorkflowTypeName = "cadence-sys-failoverManager-workflow"
	// RebalanceWorkflowTypeName is rebalance workflow type name
	RebalanceWorkflowTypeName = "cadence-sys-rebalance-workflow"
	// WorkflowID will be reused to ensure only one workflow running
	FailoverWorkflowID  = "cadence-failover-manager"
	RebalanceWorkflowID = "cadence-rebalance-workflow"
	DrillWorkflowID     = FailoverWorkflowID + "-drill"

	// QueryType for failover workflow
	QueryType = "state"
	// PauseSignal signal name for pause
	PauseSignal = "pause"
	// ResumeSignal signal name for resume
	ResumeSignal = "resume"

	// WorkflowInitialized state
	WorkflowInitialized = "initialized"
	// WorkflowRunning state
	WorkflowRunning = "running"
	// WorkflowPaused state
	WorkflowPaused = "paused"
	// WorkflowCompleted state
	WorkflowCompleted = "complete"
	// WorkflowAborted state
	WorkflowAborted = "aborted"
)

Variables

This section is empty.

Functions

func GetDomainsActivity

func GetDomainsActivity(ctx context.Context, params *GetDomainsActivityParams) ([]string, error)

GetDomainsActivity activity def

Types

type BootstrapParams

type BootstrapParams struct {
	// Config contains the configuration for scanner
	Config Config
	// ServiceClient is an instance of cadence service client
	ServiceClient workflowserviceclient.Interface
	// MetricsClient is an instance of metrics object for emitting stats
	MetricsClient metrics.Client
	Logger        log.Logger
	// TallyScope is an instance of tally metrics scope
	TallyScope tally.Scope
	// ClientBean is an instance of client.Bean for a collection of clients
	ClientBean client.Bean
}

BootstrapParams contains the set of params needed to bootstrap failover manager

type ClusterAttributePreference added in v1.4.1

type ClusterAttributePreference struct {
	Scope            string `json:"scope"`
	Name             string `json:"name"`
	PreferredCluster string `json:"preferredCluster"`
}

ClusterAttributePreference declares the preferred active cluster for a single scope:name cluster attribute pair. It is the stored form in domain data (constants.DomainDataKeyForClusterAttributePreferences) and the unit of an attribute-level update carried in DomainFailoverPreferences.

type ClusterAttributeRebalanceMap added in v1.4.1

type ClusterAttributeRebalanceMap map[clusterAttributeScope]ClusterAttributeToClusterMap

ClusterAttributeRebalanceMap defines a mapping of cluster attribute to its preferred cluster. This allows Active-Active domains to be rebalanced to the preferred cluster for each attribute, without using the domain level preferred cluster. Example: {"region": {"us-west1": "prod-us-west1", "us-east1": "prod-us-east1"}}

type ClusterAttributeToClusterMap added in v1.4.1

type ClusterAttributeToClusterMap map[clusterAttributeName]string

ClusterAttributeToClusterMap specifies which ClusterAttribute names map to which cadence cluster

type Config

type Config struct {
	AdminOperationToken dynamicproperties.StringPropertyFn
	// ClusterMetadata contains the metadata for this cluster
	ClusterMetadata cluster.Metadata
}

Config defines the configuration for failover

type DomainFailoverFailure added in v1.4.1

type DomainFailoverFailure struct {
	DomainName string
	Error      string
}

DomainFailoverFailure records a domain that failed to fail over, along with the error that caused it. Error is a string (not error) so it survives JSON serialization across the activity boundary and through the workflow query/result.

type DomainFailoverPreferences added in v1.4.1

type DomainFailoverPreferences struct {
	// DomainName identifies the domain this instruction applies to.
	DomainName string
	// TargetCluster, when non-empty, is the destination cluster set as the domain-level
	// ActiveClusterName. During a failover it is the cluster being failed over to; during a
	// rebalance it is the domain's stored preferred cluster.
	TargetCluster string
	// ClusterAttributeUpdates lists all ClusterAttribute level changes.
	ClusterAttributeUpdates []ClusterAttributePreference
}

DomainFailoverPreferences carries the failover instruction for a single domain. It defines all the changes to be applied to the domain.

func GetDomainsForRebalanceV2Activity added in v1.4.1

func GetDomainsForRebalanceV2Activity(ctx context.Context) ([]DomainFailoverPreferences, error)

GetDomainsForRebalanceV2Activity collects the domains that have drifted from their stored preferences. A domain is included when its domain-level active cluster differs from PreferredCluster, or one or more cluster attributes differ from the ClusterAttributePreferences stored in domain data.

type DomainFailoverSuccess added in v1.4.1

type DomainFailoverSuccess struct {
	DomainName string
}

DomainFailoverSuccess records a domain that was successfully failed over.

type DomainRebalanceData added in v0.22.0

type DomainRebalanceData struct {
	DomainName       string
	PreferredCluster string
}

DomainRebalanceData contains the result from getRebalanceDomains activity

func GetDomainsForRebalanceActivity added in v0.22.0

func GetDomainsForRebalanceActivity(ctx context.Context) ([]*DomainRebalanceData, error)

GetDomainsForRebalanceActivity activity fetch domains for rebalance

type DomainSnapshot added in v1.4.1

type DomainSnapshot struct {
	// DomainName identifies the domain.
	DomainName string
	// PreviousActiveCluster is the domain-level ActiveClusterName before failover; empty when
	// the domain-level active cluster was not changed.
	PreviousActiveCluster string
	// PreviousClusterAttributes lists each changed attribute with the cluster it was on before
	// failover, suitable for replaying through FailoverActivityV2 to restore.
	PreviousClusterAttributes []ClusterAttributePreference
}

DomainSnapshot records a single domain's pre-failover state so a later restore can put it back exactly. Only the parts that were changed are populated.

type FailoverActivityParams

type FailoverActivityParams struct {
	Domains                          []string
	TargetCluster                    string
	GracefulFailoverTimeoutInSeconds *int32
	// ClusterAttributes specifies which attributes to fail over to the TargetCluster across the environment.
	// All active-active domains that have matching attributes (e.g scope:name) will be failed over.
	// Optional.
	ClusterAttributes []types.ClusterAttribute
}

FailoverActivityParams params for activity

type FailoverActivityResult

type FailoverActivityResult struct {
	SuccessDomains []string
	FailedDomains  []string
}

FailoverActivityResult result for failover activity

func FailoverActivity

func FailoverActivity(ctx context.Context, params *FailoverActivityParams) (*FailoverActivityResult, error)

FailoverActivity activity def

type FailoverActivityV2Params added in v1.4.1

type FailoverActivityV2Params struct {
	DomainPreferences []DomainFailoverPreferences
}

FailoverActivityV2Params is the arg for the shared FailoverActivityV2.

type FailoverActivityV2Result added in v1.4.1

type FailoverActivityV2Result struct {
	SuccessDomains []DomainFailoverSuccess
	FailedDomains  []DomainFailoverFailure
}

FailoverActivityV2Result is the result of the shared FailoverActivityV2.

func FailoverActivityV2 added in v1.4.1

func FailoverActivityV2(ctx context.Context, params *FailoverActivityV2Params) (*FailoverActivityV2Result, error)

FailoverActivityV2 is the single apply activity shared by FailoverWorkflowV2 and RebalanceWorkflowV2. It applies each DomainFailoverPreferences entry via FailoverDomain.

type FailoverManager

type FailoverManager struct {
	// contains filtered or unexported fields
}

FailoverManager of cadence worker service

func New

func New(params *BootstrapParams) *FailoverManager

New returns a new instance of FailoverManager

func (*FailoverManager) Start

func (s *FailoverManager) Start() error

Start starts the worker

func (*FailoverManager) Stop

func (s *FailoverManager) Stop()

Stop stops the worker

type FailoverParams

type FailoverParams struct {
	// TargetCluster is the destination of failover
	TargetCluster string
	// SourceCluster is from which cluster the domains are active before failover
	SourceCluster string
	// BatchFailoverSize is number of domains to failover in one batch
	BatchFailoverSize int
	// BatchFailoverWaitTimeInSeconds is the waiting time between batch failover
	BatchFailoverWaitTimeInSeconds int
	// Domains candidates to be failover
	Domains []string
	// DrillWaitTime defines the wait time of a failover drill
	DrillWaitTime time.Duration
	// GracefulFailoverTimeoutInSeconds
	GracefulFailoverTimeoutInSeconds *int32
	// ClusterAttributeRebalanceMap defines a mapping of cluster attribute to its preferred cluster.
	// Optional; carried as metadata for the active-active rebalance path.
	ClusterAttributeRebalanceMap ClusterAttributeRebalanceMap `json:",omitempty"`
	// ClusterAttributes, when non-empty, triggers failover for active-active domains.
	// each listed scope+name pair is moved to TargetCluster via UpdateDomain.ActiveClusters instead of ActiveClusterName.
	// GetDomainsActivity also switches to active-active mode when this is set.
	ClusterAttributes []types.ClusterAttribute `json:",omitempty"`
}

FailoverParams is the arg for failoverWorkflow

type FailoverResult

type FailoverResult struct {
	SuccessDomains      []string
	FailedDomains       []string
	SuccessResetDomains []string
	FailedResetDomains  []string
}

FailoverResult is workflow result

func FailoverWorkflow

func FailoverWorkflow(ctx workflow.Context, params *FailoverParams) (*FailoverResult, error)

FailoverWorkflow is the workflow that managed failover all domains with IsManagedByCadence=true

type FailoverV2Params added in v1.4.1

type FailoverV2Params struct {
	// SourceClusters are the clusters being evacuated; only domains active in one of these are moved.
	SourceClusters []string
	// TargetCluster is where evacuated domains and attributes are moved to.
	TargetCluster string
	// BatchSize is the number of domains failed over per batch.
	BatchSize int
	// WaitBetweenBatchSeconds is the pause between successive batches.
	WaitBetweenBatchSeconds int
	// Domains optionally restricts the run to a specific subset of domain names.
	Domains []string
	// ClusterAttributes specifies which cluster attributes should be included for failover.
	// If empty, cluster attributes are not included.
	ClusterAttributes []types.ClusterAttribute
}

FailoverV2Params is the arg for FailoverWorkflowV2.

type FailoverV2Result added in v1.4.1

type FailoverV2Result struct {
	SuccessDomains []DomainFailoverSuccess
	FailedDomains  []DomainFailoverFailure
	// Snapshots holds the pre-failover state of every moved domain. It is queryable while the
	// workflow runs and readable from history afterwards, and is the input a future restore
	// workflow would replay.
	Snapshots []DomainSnapshot
}

FailoverV2Result is the result of FailoverWorkflowV2.

func FailoverWorkflowV2 added in v1.4.1

func FailoverWorkflowV2(ctx workflow.Context, params *FailoverV2Params) (*FailoverV2Result, error)

FailoverWorkflowV2 fails all managed domains out of SourceClusters and onto TargetCluster, in batches, with pause/resume support. It is N-region safe: domains not currently active in one of SourceClusters are left untouched. It records a per-domain snapshot of everything it changes so the failover can be reversed later.

type GetDomainsActivityParams

type GetDomainsActivityParams struct {
	TargetCluster string
	SourceCluster string
	Domains       []string
	// ClusterAttributes specifies which cluster attributes to match for active-active failover.
	// All active-active domains that have matching attributes (e.g scope:name) will be returned.
	// Optional.
	ClusterAttributes []types.ClusterAttribute
}

GetDomainsActivityParams params for activity

type GetDomainsForFailoverV2Params added in v1.4.1

type GetDomainsForFailoverV2Params struct {
	// SourceClusters are the clusters being evacuated; only domains active in one of these are moved.
	SourceClusters []string
	// TargetCluster will be used as the new cluster for all domains that are failed over.
	TargetCluster string
	// Domains optionally restricts the run to a specific subset of domain names.
	Domains []string
	// ClusterAttributes specifies which cluster attributes should be included for failover.
	// If empty, cluster attributes are not included.
	ClusterAttributes []types.ClusterAttribute
}

GetDomainsForFailoverV2Params

type GetDomainsForFailoverV2Result added in v1.4.1

type GetDomainsForFailoverV2Result struct {
	Preferences []DomainFailoverPreferences
	Snapshots   []DomainSnapshot
}

GetDomainsForFailoverV2Result is what GetDomainsForFailoverV2Activity returns: the per-domain preferences to apply plus the snapshots of what is being changed.

func GetDomainsForFailoverV2Activity added in v1.4.1

func GetDomainsForFailoverV2Activity(ctx context.Context, params *GetDomainsForFailoverV2Params) (*GetDomainsForFailoverV2Result, error)

GetDomainsForFailoverV2Activity collects the domains to fail out of SourceClusters. For each managed domain, any domain-level active cluster or cluster attribute currently on one of SourceClusters is marked to move to TargetCluster; a snapshot of the prior values is recorded for restore. Domains not active in any of SourceClusters are skipped, keeping the operation N-region safe.

type QueryResult

type QueryResult struct {
	TotalDomains        int
	Success             int
	Failed              int
	State               string
	TargetCluster       string
	SourceCluster       string
	SuccessDomains      []string // SuccessDomains are guaranteed succeed processed
	FailedDomains       []string // FailedDomains contains false positive
	SuccessResetDomains []string // SuccessResetDomains are domains successfully reset in drill mode
	FailedResetDomains  []string // FailedResetDomains contains false positive in drill mode
	Operator            string
}

QueryResult for failover progress

type RebalanceParams added in v0.22.0

type RebalanceParams struct {
	// BatchFailoverSize is number of domains to failover in one batch
	BatchFailoverSize int
	// BatchFailoverWaitTimeInSeconds is the waiting time between batch failover
	BatchFailoverWaitTimeInSeconds int
}

RebalanceParams contains parameters for rebalance workflow

type RebalanceResult added in v0.22.0

type RebalanceResult struct {
	SuccessDomains []string
	FailedDomains  []string
}

RebalanceResult contains the result from the rebalance workflow

func RebalanceWorkflow added in v0.22.0

func RebalanceWorkflow(ctx workflow.Context, params *RebalanceParams) (*RebalanceResult, error)

RebalanceWorkflow is to rebalance domains across clusters based on rebalance policy.

type RebalanceV2Params added in v1.4.1

type RebalanceV2Params struct {
	// BatchSize is the number of domains rebalanced per batch.
	BatchSize int
	// WaitBetweenBatchSeconds is the pause between successive batches.
	WaitBetweenBatchSeconds int
}

RebalanceV2Params is the arg for RebalanceWorkflowV2.

type RebalanceV2Result added in v1.4.1

type RebalanceV2Result struct {
	SuccessDomains []DomainFailoverSuccess
	FailedDomains  []DomainFailoverFailure
}

RebalanceV2Result is the result of RebalanceWorkflowV2.

func RebalanceWorkflowV2 added in v1.4.1

func RebalanceWorkflowV2(ctx workflow.Context, params *RebalanceV2Params) (*RebalanceV2Result, error)

RebalanceWorkflowV2 corrects every managed domain whose live configuration has drifted from its stored preferences — domain-level PreferredCluster and/or per-attribute ClusterAttributePreferences — moving each back to its preferred cluster in batches, with pause/resume support. Returns lists of successfully and unsuccessfully rebalanced domains.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL