aws

package
v1.6.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 27, 2026 License: Apache-2.0 Imports: 29 Imported by: 0

Documentation

Index

Constants

View Source
const (
	AWSTagNameASG            = "aws:autoscaling:groupName"
	AWSTagNameNeuwerkLeader  = "neuwerk:leader"
	AWSTagNameNeuwerkVIP     = "neuwerk:vip"
	AWSTagNameNeuwerkCluster = "neuwerk:cluster"
	AWSTagEgress             = "neuwerk:egress"

	ManagementDeviceDescription = "management"
	IngressDeviceDescription    = "ingress"
	EgressDeviceDescription     = "egress"
)
View Source
const (
	// GENEVEPort is the UDP port used by AWS GWLB for GENEVE encapsulation
	GENEVEPort = 6081
)

Variables

View Source
var ErrNoPriorityAssigned = errors.New("no priority set")

Functions

func Apply

func Apply(ctx context.Context, ctrlConfig *controller.ControllerConfig) error

Apply orchestrates AWS GWLB integration on startup. It discovers the GWLB target group, registers with it, detects the GENEVE tunnel interface, updates controller configuration, and configures peer discovery.

Steps: 1. Query instance metadata via IMDSv2 2. Discover GWLB target group via tag-based search 3. Register with target group 4. Find GENEVE tunnel interface (created after registration) 5. Update ctrlConfig.IngressDevice to tunnel interface 6. Set ctrlConfig.TunnelMode to "geneve" 7. Configure peer discovery via AWS discovery with VPC filter

Returns error if any step fails (startup should abort).

func ApplyLegacy

func ApplyLegacy(ctx context.Context, ctrlConfig *controller.ControllerConfig) error

ApplyLegacy is the legacy route-table based integration for AWS. This is kept for backward compatibility but is deprecated in favor of GWLB integration (coordinator.Apply).

The integration mechanism will do the following: 1. discover peers by listing EC2 instances matching well known tags 2. if this is a leader: modify the ingress subnet route table and point the default route

Note: please refer to tf/ directory to see how it is supposed to be set up.

func Cleanup

func Cleanup(ctx context.Context) error

Cleanup deregisters from the GWLB target group on shutdown. Called by SIGTERM handler in cmd/root.go SetupSignalHandler.

It sets the ShuttingDown flag (health endpoint returns 503 immediately), then calls DeregisterWithDraining which waits for AWS to confirm draining state and completes the 60-second drain period.

Returns error if deregistration fails. Non-fatal - acceptable for forced shutdown.

func CountHealthyTargets

func CountHealthyTargets(ctx context.Context, targetGroupARN string) (int, error)

CountHealthyTargets returns number of healthy targets in target group. Healthy means TargetHealth.State == "healthy" (passed health checks).

func DeregisterWithDraining

func DeregisterWithDraining(ctx context.Context, cfg aws.Config, gwlbConfig *GWLBConfig) error

DeregisterWithDraining deregisters this instance from the GWLB target group and waits for AWS to confirm draining state before starting the 60-second drain period. This ensures graceful connection draining per CONTEXT.md.

Deregistration flow per RESEARCH.md Pattern 4 and CONTEXT.md requirements:

  1. Initiate deregistration via DeregisterTargets API
  2. Wait for AWS to confirm target is in "draining" state (30s timeout)
  3. After draining state confirmed, wait 60 seconds for drain period
  4. If parent context cancelled (second SIGTERM), interrupt drain and return error

Draining state confirmation per CONTEXT.md:

  • Poll DescribeTargetHealth every 2 seconds to check target state
  • States: "draining" (proceed to drain period), "unused" (already deregistered - return success)
  • If 30s timeout expires: log warning "timeout waiting for draining state - forcing shutdown", return error
  • Rationale: "Wait for AWS to confirm target is in 'draining' state before starting drain timer"

Drain period per CONTEXT.md:

  • 60 second wait after draining state confirmed (matches AWS deregistration_delay.timeout_seconds default)
  • If parent context cancelled during drain: log warning "drain period interrupted", return error
  • This enables second SIGTERM to force immediate shutdown per CONTEXT.md requirement

Parameters:

  • ctx: context for API calls and drain period (parent context cancellation forces shutdown)
  • cfg: AWS SDK config (already loaded with credentials)
  • gwlbConfig: GWLB configuration from DiscoverTargetGroup

Returns:

  • error: deregistration failures, draining state timeout, or drain period interruption

func DescribeInstanceState

func DescribeInstanceState(ctx context.Context, instanceID string) (string, error)

DescribeInstanceState returns the EC2 instance state. Returns state string ("running", "pending", "stopped", etc.)

func ExecuteRemoteCommand

func ExecuteRemoteCommand(ctx context.Context, instanceID string, command string, timeout time.Duration) (string, error)

ExecuteRemoteCommand executes a shell command on an EC2 instance via SSM Session Manager and returns the command output (stdout) or an error.

Uses SSM SendCommand with AWS-RunShellScript document, which executes bash commands on the target instance. Waits for command completion with configurable timeout.

Example:

output, err := ExecuteRemoteCommand(ctx, "i-1234567890abcdef0", "curl -s http://example.com", 30*time.Second)

func FindGENEVETunnelInterface

func FindGENEVETunnelInterface() (string, error)

FindGENEVETunnelInterface identifies the network interface receiving AWS GWLB GENEVE traffic. Returns interface name (e.g., "eth1") or error if not found after 60s retry window.

AWS GWLB creates the GENEVE tunnel after target registration and health check pass. Timing depends on GWLB infrastructure setup and may vary by deployment.

func GetAllNodeIPs

func GetAllNodeIPs(ctx context.Context, targetGroupARN string) ([]string, error)

GetAllNodeIPs returns private IPs for all instances in target group. Returns IPs regardless of health state (includes unhealthy targets).

func GetConfig

func GetConfig(ctx context.Context) (*aws.Config, error)

func GetMetadata

func GetMetadata(ctx context.Context, cfg aws.Config, path string) (string, error)

func GetPrivateIP

func GetPrivateIP(ctx context.Context, instanceID string) (string, error)

GetPrivateIP returns the primary private IP address of an EC2 instance. This is useful for constructing connection strings to the Neuwerk API.

func GetTargetGroupARN

func GetTargetGroupARN(ctx context.Context, instanceID string) (string, error)

GetTargetGroupARN finds the target group ARN for a given instance ID. This is used to discover which GWLB target group the Neuwerk instance is registered with.

func GetTargetHealthStates

func GetTargetHealthStates(ctx context.Context, targetGroupARN string) (map[string]string, error)

GetTargetHealthStates returns map of instance ID to health state string. Useful for debugging why targets aren't healthy.

func ReconcileCoordinator

func ReconcileCoordinator(ctx context.Context, isCoordinator bool) error

ReconcileCoordinator is the legacy route-table coordinator function. This is kept for backward compatibility with the old route-table based integration. DEPRECATED: Use Apply/Cleanup lifecycle for GWLB integration.

func RegisterWithTargetGroup

func RegisterWithTargetGroup(ctx context.Context, cfg aws.Config, gwlbConfig *GWLBConfig) error

RegisterWithTargetGroup registers this instance with the GWLB target group using the RegisterTargets API. Registration uses instance ID targeting, and the target group's default port (6081 for GENEVE) is used automatically.

Registration flow per RESEARCH.md Pattern 3:

  1. Create ELBv2 client from AWS config
  2. Build RegisterTargets request with instance ID (port omitted - uses target group default)
  3. Wrap API call in exponential backoff with rate limit detection
  4. Retry on ThrottlingException/RequestLimitExceeded (1s-30s intervals, max 2min elapsed)
  5. Return permanent error on validation/auth failures

Rate limit handling per CONTEXT.md:

  • ThrottlingException and RequestLimitExceeded trigger exponential backoff retry
  • Initial interval: 1s, max interval: 30s, max elapsed time: 2min
  • Logs rate limit hits at V(1) verbosity for debugging
  • Non-retriable errors (validation, auth) fail immediately with backoff.Permanent

Parameters:

  • ctx: context for API call (respects cancellation)
  • cfg: AWS SDK config (already loaded with credentials)
  • gwlbConfig: GWLB configuration from DiscoverTargetGroup

Returns:

  • error: registration failures, rate limit exhaustion, or context cancellation

func SetMetricsCollector

func SetMetricsCollector(mc *metrics.MetricsCollector)

SetMetricsCollector stores the metrics collector reference for registration status updates. Called from cmd/root.go after controller initialization.

func WaitForInstanceState

func WaitForInstanceState(ctx context.Context, instanceID string, expectedState types.InstanceStateName, timeout time.Duration) error

WaitForInstanceState polls instance state until it matches expected state or timeout.

func WaitForIperf3Server

func WaitForIperf3Server(ctx context.Context, instanceID string, timeout time.Duration) error

WaitForIperf3Server waits for iperf3 server to be ready on the consumer instance. Returns error if server not ready within timeout.

func WaitForTargetHealth

func WaitForTargetHealth(ctx context.Context, targetGroupARN string, timeout time.Duration) error

WaitForTargetHealth polls GWLB target health until healthy or timeout. Returns error if timeout exceeded or no targets found. Polls every 5 seconds.

Types

type DiscoveryOutput

type DiscoveryOutput struct {
	ClusterName         string
	InstanceID          string
	IsLeader            bool
	IngressInterface    NetworkInterface
	ManagementInterface NetworkInterface
	EgressInterface     NetworkInterface

	Peers []string
}

func Discover

func Discover(parentCtx context.Context) (*DiscoveryOutput, error)

type EC2Metadata

type EC2Metadata struct {
	InstanceID       string
	VpcID            string
	AvailabilityZone string
	Region           string
	PrivateIP        string
}

EC2Metadata holds the essential AWS EC2 instance metadata for GWLB integration. It includes instance identity, network configuration, and region information.

func QueryInstanceMetadata

func QueryInstanceMetadata(ctx context.Context) (*EC2Metadata, error)

QueryInstanceMetadata queries the AWS Instance Metadata Service v2 (IMDSv2) for EC2 instance self-discovery information.

It retrieves instance ID, VPC ID, availability zone, region, and private IP using the AWS SDK v2 IMDS client, which handles IMDSv2 session token authentication automatically.

All IMDS queries are wrapped with exponential backoff (1s-10s intervals, max 1min elapsed) to handle transient metadata service errors, rate limits, and startup timing issues.

For multi-NIC instances, it uses the first MAC address found (primary interface) and logs a warning if multiple NICs are detected.

Returns:

  • *EC2Metadata: Instance identity and network configuration
  • error: metadata service unavailable or query failures after retries

type GWLBConfig

type GWLBConfig struct {
	InstanceID       string
	VpcID            string
	AvailabilityZone string
	Region           string
	TargetGroupARN   string
	PrivateIP        string
}

GWLBConfig holds the complete AWS GWLB configuration for target registration and cluster orchestration. It combines EC2 instance metadata with discovered GWLB target group information.

func DiscoverTargetGroup

func DiscoverTargetGroup(ctx context.Context, cfg aws.Config, metadata *EC2Metadata, clusterName string) (*GWLBConfig, error)

DiscoverTargetGroup discovers the GWLB target group using tag-based filtering via the AWS ResourceGroupsTaggingAPI. It queries for target groups with the specified cluster tag, validates exactly one match exists, and verifies the target group uses GENEVE protocol.

Tag-based discovery flow:

  1. Query ResourceGroupsTaggingAPI with neuwerk:cluster=<clusterName> filter
  2. Validate exactly one target group matches (error on 0 or >1 matches)
  3. Verify target group protocol is GENEVE via DescribeTargetGroups
  4. Return GWLBConfig combining metadata and target group ARN

Error handling per CONTEXT.md:

  • Zero matches: "no target group found with tag neuwerk:cluster={clusterName}"
  • Multiple matches: "multiple target groups match tag... require explicit ARN to resolve ambiguity: [ARNs]"
  • Wrong protocol: "target group {arn} type is {protocol}, not GENEVE - misconfiguration"

All AWS API calls use exponential backoff (1s-30s intervals, max 2min elapsed) with rate limit detection via smithy.APIError.

Parameters:

  • ctx: context for API calls
  • cfg: AWS SDK config (already loaded with credentials)
  • metadata: EC2 instance metadata from QueryInstanceMetadata
  • clusterName: value for neuwerk:cluster tag filter

Returns:

  • *GWLBConfig: Complete configuration for target registration
  • error: discovery failures, validation errors, or rate limit exhaustion

type NetworkInterface

type NetworkInterface struct {
	DeviceName     string
	ENIID          string
	PrimaryAddress string
}

func (NetworkInterface) String

func (n NetworkInterface) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL