Documentation
¶
Index ¶
- Constants
- Variables
- func Apply(ctx context.Context, ctrlConfig *controller.ControllerConfig) error
- func ApplyLegacy(ctx context.Context, ctrlConfig *controller.ControllerConfig) error
- func Cleanup(ctx context.Context) error
- func CountHealthyTargets(ctx context.Context, targetGroupARN string) (int, error)
- func DeregisterWithDraining(ctx context.Context, cfg aws.Config, gwlbConfig *GWLBConfig) error
- func DescribeInstanceState(ctx context.Context, instanceID string) (string, error)
- func ExecuteRemoteCommand(ctx context.Context, instanceID string, command string, timeout time.Duration) (string, error)
- func FindGENEVETunnelInterface() (string, error)
- func GetAllNodeIPs(ctx context.Context, targetGroupARN string) ([]string, error)
- func GetConfig(ctx context.Context) (*aws.Config, error)
- func GetMetadata(ctx context.Context, cfg aws.Config, path string) (string, error)
- func GetPrivateIP(ctx context.Context, instanceID string) (string, error)
- func GetTargetGroupARN(ctx context.Context, instanceID string) (string, error)
- func GetTargetHealthStates(ctx context.Context, targetGroupARN string) (map[string]string, error)
- func ReconcileCoordinator(ctx context.Context, isCoordinator bool) error
- func RegisterWithTargetGroup(ctx context.Context, cfg aws.Config, gwlbConfig *GWLBConfig) error
- func SetMetricsCollector(mc *metrics.MetricsCollector)
- func WaitForInstanceState(ctx context.Context, instanceID string, expectedState types.InstanceStateName, ...) error
- func WaitForIperf3Server(ctx context.Context, instanceID string, timeout time.Duration) error
- func WaitForTargetHealth(ctx context.Context, targetGroupARN string, timeout time.Duration) error
- type DiscoveryOutput
- type EC2Metadata
- type GWLBConfig
- type NetworkInterface
Constants ¶
const ( AWSTagNameASG = "aws:autoscaling:groupName" AWSTagNameNeuwerkLeader = "neuwerk:leader" AWSTagNameNeuwerkVIP = "neuwerk:vip" AWSTagNameNeuwerkCluster = "neuwerk:cluster" AWSTagEgress = "neuwerk:egress" ManagementDeviceDescription = "management" IngressDeviceDescription = "ingress" EgressDeviceDescription = "egress" )
const (
// GENEVEPort is the UDP port used by AWS GWLB for GENEVE encapsulation
GENEVEPort = 6081
)
Variables ¶
var ErrNoPriorityAssigned = errors.New("no priority set")
Functions ¶
func Apply ¶
func Apply(ctx context.Context, ctrlConfig *controller.ControllerConfig) error
Apply orchestrates AWS GWLB integration on startup. It discovers the GWLB target group, registers with it, detects the GENEVE tunnel interface, updates controller configuration, and configures peer discovery.
Steps: 1. Query instance metadata via IMDSv2 2. Discover GWLB target group via tag-based search 3. Register with target group 4. Find GENEVE tunnel interface (created after registration) 5. Update ctrlConfig.IngressDevice to tunnel interface 6. Set ctrlConfig.TunnelMode to "geneve" 7. Configure peer discovery via AWS discovery with VPC filter
Returns error if any step fails (startup should abort).
func ApplyLegacy ¶
func ApplyLegacy(ctx context.Context, ctrlConfig *controller.ControllerConfig) error
ApplyLegacy is the legacy route-table based integration for AWS. This is kept for backward compatibility but is deprecated in favor of GWLB integration (coordinator.Apply).
The integration mechanism will do the following: 1. discover peers by listing EC2 instances matching well known tags 2. if this is a leader: modify the ingress subnet route table and point the default route
Note: please refer to tf/ directory to see how it is supposed to be set up.
func Cleanup ¶
Cleanup deregisters from the GWLB target group on shutdown. Called by SIGTERM handler in cmd/root.go SetupSignalHandler.
It sets the ShuttingDown flag (health endpoint returns 503 immediately), then calls DeregisterWithDraining which waits for AWS to confirm draining state and completes the 60-second drain period.
Returns error if deregistration fails. Non-fatal - acceptable for forced shutdown.
func CountHealthyTargets ¶
CountHealthyTargets returns number of healthy targets in target group. Healthy means TargetHealth.State == "healthy" (passed health checks).
func DeregisterWithDraining ¶
DeregisterWithDraining deregisters this instance from the GWLB target group and waits for AWS to confirm draining state before starting the 60-second drain period. This ensures graceful connection draining per CONTEXT.md.
Deregistration flow per RESEARCH.md Pattern 4 and CONTEXT.md requirements:
- Initiate deregistration via DeregisterTargets API
- Wait for AWS to confirm target is in "draining" state (30s timeout)
- After draining state confirmed, wait 60 seconds for drain period
- If parent context cancelled (second SIGTERM), interrupt drain and return error
Draining state confirmation per CONTEXT.md:
- Poll DescribeTargetHealth every 2 seconds to check target state
- States: "draining" (proceed to drain period), "unused" (already deregistered - return success)
- If 30s timeout expires: log warning "timeout waiting for draining state - forcing shutdown", return error
- Rationale: "Wait for AWS to confirm target is in 'draining' state before starting drain timer"
Drain period per CONTEXT.md:
- 60 second wait after draining state confirmed (matches AWS deregistration_delay.timeout_seconds default)
- If parent context cancelled during drain: log warning "drain period interrupted", return error
- This enables second SIGTERM to force immediate shutdown per CONTEXT.md requirement
Parameters:
- ctx: context for API calls and drain period (parent context cancellation forces shutdown)
- cfg: AWS SDK config (already loaded with credentials)
- gwlbConfig: GWLB configuration from DiscoverTargetGroup
Returns:
- error: deregistration failures, draining state timeout, or drain period interruption
func DescribeInstanceState ¶
DescribeInstanceState returns the EC2 instance state. Returns state string ("running", "pending", "stopped", etc.)
func ExecuteRemoteCommand ¶
func ExecuteRemoteCommand(ctx context.Context, instanceID string, command string, timeout time.Duration) (string, error)
ExecuteRemoteCommand executes a shell command on an EC2 instance via SSM Session Manager and returns the command output (stdout) or an error.
Uses SSM SendCommand with AWS-RunShellScript document, which executes bash commands on the target instance. Waits for command completion with configurable timeout.
Example:
output, err := ExecuteRemoteCommand(ctx, "i-1234567890abcdef0", "curl -s http://example.com", 30*time.Second)
func FindGENEVETunnelInterface ¶
FindGENEVETunnelInterface identifies the network interface receiving AWS GWLB GENEVE traffic. Returns interface name (e.g., "eth1") or error if not found after 60s retry window.
AWS GWLB creates the GENEVE tunnel after target registration and health check pass. Timing depends on GWLB infrastructure setup and may vary by deployment.
func GetAllNodeIPs ¶
GetAllNodeIPs returns private IPs for all instances in target group. Returns IPs regardless of health state (includes unhealthy targets).
func GetMetadata ¶
func GetPrivateIP ¶
GetPrivateIP returns the primary private IP address of an EC2 instance. This is useful for constructing connection strings to the Neuwerk API.
func GetTargetGroupARN ¶
GetTargetGroupARN finds the target group ARN for a given instance ID. This is used to discover which GWLB target group the Neuwerk instance is registered with.
func GetTargetHealthStates ¶
GetTargetHealthStates returns map of instance ID to health state string. Useful for debugging why targets aren't healthy.
func ReconcileCoordinator ¶
ReconcileCoordinator is the legacy route-table coordinator function. This is kept for backward compatibility with the old route-table based integration. DEPRECATED: Use Apply/Cleanup lifecycle for GWLB integration.
func RegisterWithTargetGroup ¶
RegisterWithTargetGroup registers this instance with the GWLB target group using the RegisterTargets API. Registration uses instance ID targeting, and the target group's default port (6081 for GENEVE) is used automatically.
Registration flow per RESEARCH.md Pattern 3:
- Create ELBv2 client from AWS config
- Build RegisterTargets request with instance ID (port omitted - uses target group default)
- Wrap API call in exponential backoff with rate limit detection
- Retry on ThrottlingException/RequestLimitExceeded (1s-30s intervals, max 2min elapsed)
- Return permanent error on validation/auth failures
Rate limit handling per CONTEXT.md:
- ThrottlingException and RequestLimitExceeded trigger exponential backoff retry
- Initial interval: 1s, max interval: 30s, max elapsed time: 2min
- Logs rate limit hits at V(1) verbosity for debugging
- Non-retriable errors (validation, auth) fail immediately with backoff.Permanent
Parameters:
- ctx: context for API call (respects cancellation)
- cfg: AWS SDK config (already loaded with credentials)
- gwlbConfig: GWLB configuration from DiscoverTargetGroup
Returns:
- error: registration failures, rate limit exhaustion, or context cancellation
func SetMetricsCollector ¶
func SetMetricsCollector(mc *metrics.MetricsCollector)
SetMetricsCollector stores the metrics collector reference for registration status updates. Called from cmd/root.go after controller initialization.
func WaitForInstanceState ¶
func WaitForInstanceState(ctx context.Context, instanceID string, expectedState types.InstanceStateName, timeout time.Duration) error
WaitForInstanceState polls instance state until it matches expected state or timeout.
func WaitForIperf3Server ¶
WaitForIperf3Server waits for iperf3 server to be ready on the consumer instance. Returns error if server not ready within timeout.
Types ¶
type DiscoveryOutput ¶
type DiscoveryOutput struct {
ClusterName string
InstanceID string
IsLeader bool
IngressInterface NetworkInterface
ManagementInterface NetworkInterface
EgressInterface NetworkInterface
Peers []string
}
type EC2Metadata ¶
type EC2Metadata struct {
InstanceID string
VpcID string
AvailabilityZone string
Region string
PrivateIP string
}
EC2Metadata holds the essential AWS EC2 instance metadata for GWLB integration. It includes instance identity, network configuration, and region information.
func QueryInstanceMetadata ¶
func QueryInstanceMetadata(ctx context.Context) (*EC2Metadata, error)
QueryInstanceMetadata queries the AWS Instance Metadata Service v2 (IMDSv2) for EC2 instance self-discovery information.
It retrieves instance ID, VPC ID, availability zone, region, and private IP using the AWS SDK v2 IMDS client, which handles IMDSv2 session token authentication automatically.
All IMDS queries are wrapped with exponential backoff (1s-10s intervals, max 1min elapsed) to handle transient metadata service errors, rate limits, and startup timing issues.
For multi-NIC instances, it uses the first MAC address found (primary interface) and logs a warning if multiple NICs are detected.
Returns:
- *EC2Metadata: Instance identity and network configuration
- error: metadata service unavailable or query failures after retries
type GWLBConfig ¶
type GWLBConfig struct {
InstanceID string
VpcID string
AvailabilityZone string
Region string
TargetGroupARN string
PrivateIP string
}
GWLBConfig holds the complete AWS GWLB configuration for target registration and cluster orchestration. It combines EC2 instance metadata with discovered GWLB target group information.
func DiscoverTargetGroup ¶
func DiscoverTargetGroup(ctx context.Context, cfg aws.Config, metadata *EC2Metadata, clusterName string) (*GWLBConfig, error)
DiscoverTargetGroup discovers the GWLB target group using tag-based filtering via the AWS ResourceGroupsTaggingAPI. It queries for target groups with the specified cluster tag, validates exactly one match exists, and verifies the target group uses GENEVE protocol.
Tag-based discovery flow:
- Query ResourceGroupsTaggingAPI with neuwerk:cluster=<clusterName> filter
- Validate exactly one target group matches (error on 0 or >1 matches)
- Verify target group protocol is GENEVE via DescribeTargetGroups
- Return GWLBConfig combining metadata and target group ARN
Error handling per CONTEXT.md:
- Zero matches: "no target group found with tag neuwerk:cluster={clusterName}"
- Multiple matches: "multiple target groups match tag... require explicit ARN to resolve ambiguity: [ARNs]"
- Wrong protocol: "target group {arn} type is {protocol}, not GENEVE - misconfiguration"
All AWS API calls use exponential backoff (1s-30s intervals, max 2min elapsed) with rate limit detection via smithy.APIError.
Parameters:
- ctx: context for API calls
- cfg: AWS SDK config (already loaded with credentials)
- metadata: EC2 instance metadata from QueryInstanceMetadata
- clusterName: value for neuwerk:cluster tag filter
Returns:
- *GWLBConfig: Complete configuration for target registration
- error: discovery failures, validation errors, or rate limit exhaustion
type NetworkInterface ¶
func (NetworkInterface) String ¶
func (n NetworkInterface) String() string