loadbalancer

package
v1.18.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 12, 2025 License: Apache-2.0 Imports: 35 Imported by: 61

README

Load-balancing control-plane

This package implements the core load-balancing control-plane. Built on top of StateDB around 3 core tables:

  • Table[*Frontend] (frontends): Service frontends keyed by address, port and protocol. The frontend references backends associated with it. Frontends are what are reconciled to BPF maps.
  • Table[*Service] (services): Service metadata shared by multiple frontends.
  • Table[*Backend] (backends): Backends associated with services.

Modifying the state is done via the Writer API. It ensures consistency and manages the references between services, frontends and backends.

The basic architecture looks as follows:

  [Data source A]       [Data source B]      [Health checker]    
        \                     |                  /   ^     
         ----------------.    v   .--------------   /
                          [Writer]                 /
                         /    |   \               /
              .---------'     v    '------.      /
         [Services]      [Frontends]    [Backends]
                             | ^              |
                             v |              v
                         [Reconciler]    [ L7 proxy (Envoy) ]
                              |
                              v
                          [BPF maps]

The different data sources insert data using Writer.UpsertService, Writer.UpsertFrontend, etc. methods.

The main data source is of course the Kubernetes one which is implemented in reflector.go. It batches up changes to Services and EndpointSlices and commits them periodically so the system can process changes in batches. Kubernetes Services are mapped to a Service object with zero or more Frontends. From EndpointSlices a Backend instance is created. Each Backend object contains a set of instances, one per service that references it.

The BPF reconciler (bpf_reconciler.go) watches the frontend table (service and backend objects are referenced by the frontend object) and reconcile updates towards the BPF maps. The reconciliation status is written back to frontends (Status field).

This architecture enables:

  • Easy addition of new data sources
  • Ability to observe changes to the data at coarse or fine granularity via StateDB watch channels
  • Separation of concerns: the error handling is performed by the reconciler and low-level errors are not bubbled up to data sources (which wouldn't know how to handle it). Readers are not in the critical path.

Source code organization

The core source files are:

  • service.go, frontend.go, backend.go: The struct and table definitions
  • config.go: Configuration structs. Config is the main configuration, ExtConfig bridges from option.DaemonConfig to avoid direct references to it (to be removed eventually) and TestConfig allows tests to tweak the behavior.
  • writer: API for modifying the tables while maintaining cross-table references
  • reconciler: Reconciliation from frontends to BPF maps
  • maps: "LBMaps" wrapper around BPF maps
  • reflectors: Reflects Kubernetes Service, Pod and EndpointSlices to the tables (via Writer)

Inspecting state

State of load-balancing can be inspected via cilium-dbg shell:

  $ kubectl exec -it -n kube-system ds/cilium -- cilium-dbg shell
  ...
  > db/show frontends
  > db/show backends
  > db/show services
  > lb/maps-dump

You can also explore this in the standalone "repl". See repl/main.go.

Testing

The new architecture makes it easier to write integration tests due to the decoupling. An integration test for a data source can depend just on the tables & writer and verify that the tables are correctly populated without having to mock other features or the BPF operations.

An "end-to-end" style test can be found in tests/script_test.go that tests going from Kubernetes objects specified in YAML to the BPF map contents using script tests in testdata. This is the main way of testing the control-plane.

Similar tests can be found from components building on top of load-balancing in redirectpolicy/script_test.go and pkg/ciliumenvoyconfig/script_test.go. For more info on script tests see https://docs.cilium.io/en/latest/contributing/development/hive/#testing-with-hive-script and https://docs.cilium.io/en/latest/contributing/development/statedb/#script-commands.

For quick feedback loop you can use the watch.sh script (in tests/) to run a script test when the txtar file changes.

IMPORTANT: When adding new test cases it's important to stress test to make sure the new test is not flaky. Run stress.sh to run 500 iterations of all tests.

Tests can be executed normally with "go test":

  $ go test ./...

To run the privileged tests that use the real BPF maps:

  $ cd tests
  $ go test -c
  $ PRIVILEGED_TESTS=1 sudo -E ./tests.test -test.run . -test.v -test.count 1

Benchmarking

The benchmark/ directory contains a benchmark testing the throughput and memory usage when going from Kubernetes objects to BPF map contents. Run with go run ./benchmark/cmd. Highly encouraged to use this when doing structural changes or adding additional indexing.

The average result that you can expect is around ~50k per second with ~50 objects allocated per service.

Documentation

Overview

Package loadbalancer contains load-balancing types and tables

Index

Constants

View Source
const (
	// LBMapEntriesName configures max entries for BPF lbmap.
	LBMapEntriesName = "bpf-lb-map-max"

	// LBServiceMapMaxEntries configures max entries of bpf map for services.
	LBServiceMapMaxEntries = "bpf-lb-service-map-max"

	// LBBackendMapMaxEntries configures max entries of bpf map for service backends.
	LBBackendMapMaxEntries = "bpf-lb-service-backend-map-max"

	// LBRevNatMapMaxEntries configures max entries of bpf map for reverse NAT.
	LBRevNatMapMaxEntries = "bpf-lb-rev-nat-map-max"

	// LBAffinityMapMaxEntries configures max entries of bpf map for session affinity.
	LBAffinityMapMaxEntries = "bpf-lb-affinity-map-max"

	// LBSourceRangeAllTypes configures service source ranges for all service types.
	LBSourceRangeAllTypes = "bpf-lb-source-range-all-types"

	// LBSourceRangeMapMaxEntries configures max entries of bpf map for service source ranges.
	LBSourceRangeMapMaxEntries = "bpf-lb-source-range-map-max"

	// LBMaglevMapMaxEntries configures max entries of bpf map for Maglev.
	LBMaglevMapMaxEntries = "bpf-lb-maglev-map-max"

	// SockRevNatEntriesName configures max entries for BPF sock reverse nat
	// entries.
	LBSockRevNatEntriesName = "bpf-sock-rev-map-max"

	// NodePortRange defines a custom range where to look up NodePort services
	NodePortRange = "node-port-range"

	LBAlgorithmName = "bpf-lb-algorithm"

	// Deprecated option for setting [LBAlgorithm]
	NodePortAlgName = "node-port-algorithm"

	// LoadBalancerMode indicates in which mode NodePort implementation should run
	// ("snat", "dsr" or "hybrid")
	LoadBalancerModeName = "bpf-lb-mode"

	// LoadBalancerModeAnnotation tells whether controller should check service
	// level annotation for configuring bpf loadbalancing method (snat vs dsr).
	LoadBalancerModeAnnotationName = "bpf-lb-mode-annotation"

	// Deprecated option for setting [LoadBalancerMode]
	NodePortModeName = "node-port-mode"

	// LoadBalancerDSRDispatchName is the config option for setting the method for
	// pushing packets to backends under DSR ("opt" or "ipip")
	LoadBalancerDSRDispatchName = "bpf-lb-dsr-dispatch"

	// ExternalClusterIPName is the name of the option to enable
	// cluster external access to ClusterIP services.
	ExternalClusterIPName = "bpf-lb-external-clusterip"

	// AlgorithmAnnotationName tells whether controller should check service
	// level annotation for configuring bpf loadbalancing algorithm.
	AlgorithmAnnotationName = "bpf-lb-algorithm-annotation"

	// EnableHealthCheckNodePort is the name of the EnableHealthCheckNodePort option
	EnableHealthCheckNodePortName = "enable-health-check-nodeport"

	// EnableServiceTopologyName is the flag name of for the EnableServiceTopology option
	EnableServiceTopologyName = "enable-service-topology"
)

Configuration option names

View Source
const (
	// DefaultLBMapMaxEntries is the default size for the load-balancing BPF maps.
	DefaultLBMapMaxEntries = 65536

	// NodePortMinDefault is the minimal port to listen for NodePort requests
	NodePortMinDefault = 30000

	// NodePortMaxDefault is the maximum port to listen for NodePort requests
	NodePortMaxDefault = 32767
)

Configuration option defaults

View Source
const (
	// LBAlgorithmRandom is for randomly selecting a backend
	LBAlgorithmRandom = "random"

	// LBAlgorithmMaglev is for using maglev consistent hashing for backend selection
	LBAlgorithmMaglev = "maglev"

	// LBModeSNAT is for SNATing requests to remote nodes
	LBModeSNAT = "snat"

	// LBModeDSR is for performing DSR for requests to remote nodes
	LBModeDSR = "dsr"

	// LBModeHybrid is a dual mode of the above, that is, DSR for TCP and SNAT for UDP
	LBModeHybrid = "hybrid"

	// DSR dispatch mode to encode service into IP option or extension header
	DSRDispatchOption = "opt"

	// DSR dispatch mode to encapsulate to IPIP
	DSRDispatchIPIP = "ipip"

	// DSR dispatch mode to encapsulate to Geneve
	DSRDispatchGeneve = "geneve"
)
View Source
const (
	IPFamilyIPv4 = IPFamily(false)
	IPFamilyIPv6 = IPFamily(true)
)
View Source
const (
	SVCTypeNone          = SVCType("NONE")
	SVCTypeHostPort      = SVCType("HostPort")
	SVCTypeClusterIP     = SVCType("ClusterIP")
	SVCTypeNodePort      = SVCType("NodePort")
	SVCTypeExternalIPs   = SVCType("ExternalIPs")
	SVCTypeLoadBalancer  = SVCType("LoadBalancer")
	SVCTypeLocalRedirect = SVCType("LocalRedirect")
)
View Source
const (
	SVCTrafficPolicyNone    = SVCTrafficPolicy("NONE")
	SVCTrafficPolicyCluster = SVCTrafficPolicy("Cluster")
	SVCTrafficPolicyLocal   = SVCTrafficPolicy("Local")
)
View Source
const (
	SVCNatPolicyNone  = SVCNatPolicy("NONE")
	SVCNatPolicyNat46 = SVCNatPolicy("Nat46")
	SVCNatPolicyNat64 = SVCNatPolicy("Nat64")
)
View Source
const (
	SVCForwardingModeUndef = SVCForwardingMode("")
	SVCForwardingModeDSR   = SVCForwardingMode("dsr")
	SVCForwardingModeSNAT  = SVCForwardingMode("snat")
)
View Source
const (
	SVCSourceRangesPolicyAllow = SVCSourceRangesPolicy("allow")
	SVCSourceRangesPolicyDeny  = SVCSourceRangesPolicy("deny")
)
View Source
const (
	SVCProxyDelegationNone            = SVCProxyDelegation("none")
	SVCProxyDelegationDelegateIfLocal = SVCProxyDelegation("delegate-if-local")
)
View Source
const (
	// NONE type.
	NONE = L4Type("NONE")
	// ANY type.
	ANY = L4Type("ANY")
	// TCP type.
	TCP = L4Type("TCP")
	// UDP type.
	UDP = L4Type("UDP")
	// SCTP type.
	SCTP = L4Type("SCTP")
)
View Source
const (
	// ScopeExternal is the lookup scope for services from outside the node.
	ScopeExternal uint8 = iota
	// ScopeInternal is the lookup scope for services from inside the node.
	ScopeInternal
)
View Source
const (
	BackendStateActiveFlag = iota
	BackendStateTerminatingFlag
	BackendStateQuarantinedFlag
	BackendStateMaintenanceFlag
)
View Source
const (
	// TrafficDistributionDefault will ignore any topology aware hints for choosing the backends.
	TrafficDistributionDefault = TrafficDistribution("")

	// TrafficDistributionPreferClose Indicates preference for routing traffic to topologically close backends,
	// that is to backends that are in the same zone.
	TrafficDistributionPreferClose = TrafficDistribution("PreferClose")
)
View Source
const (
	BackendTableName = "backends"
)
View Source
const DefaultBackendWeight = 100

DefaultBackendWeight is used when backend weight is not set in ServiceSpec

View Source
const (
	FrontendTableName = "frontends"
)
View Source
const (
	ServiceTableName = "services"
)

Variables

View Source
var (
	BackendByAddress = backendAddrIndex.Query

	BackendByServiceName = backendServiceIndex.Query
)
View Source
var (
	// ErrServiceNotFound occurs when a frontend is being upserted that refers to
	// a non-existing service.
	ErrServiceNotFound = errors.New("service not found")

	// ErrFrontendConflict occurs when a frontend is being upserted but it already
	// exists and is owned by a different service.
	ErrFrontendConflict = errors.New("frontend already owned by another service")
)
View Source
var (
	FrontendByAddress = frontendAddressIndex.Query

	FrontendByServiceName = frontendServiceIndex.Query
)
View Source
var AllProtocols = []L4Type{TCP, UDP, SCTP}

AllProtocols is the list of all supported L4 protocols

ConfigCell provides the Config and ExternalConfig configurations.

View Source
var DefaultConfig = Config{
	UserConfig:  DefaultUserConfig,
	NodePortMin: NodePortMinDefault,
	NodePortMax: NodePortMaxDefault,
}
View Source
var DefaultUserConfig = UserConfig{
	RetryBackoffMin:           time.Second,
	RetryBackoffMax:           time.Minute,
	LBMapEntries:              DefaultLBMapMaxEntries,
	LBPressureMetricsInterval: 5 * time.Minute,

	LBServiceMapEntries:     0,
	LBBackendMapEntries:     0,
	LBRevNatEntries:         0,
	LBAffinityMapEntries:    0,
	LBSourceRangeMapEntries: 0,
	LBMaglevMapEntries:      0,

	LBSockRevNatEntries: 0,

	LBSourceRangeAllTypes: false,

	NodePortRange: []string{},
	LBAlgorithm:   LBAlgorithmRandom,

	LBMode: LBModeSNAT,

	DSRDispatch: DSRDispatchOption,

	ExternalClusterIP: false,

	AlgorithmAnnotation: false,

	EnableHealthCheckNodePort: true,

	EnableServiceTopology: false,

	InitWaitTimeout: 1 * time.Minute,
}
View Source
var (
	ServiceByName = serviceNameIndex.Query
)

Functions

func L3n4AddrFromString added in v1.17.0

func L3n4AddrFromString(key string) (index.Key, error)

L3n4AddrFromString constructs a StateDB key by parsing the input in the form of L3n4Addr.String(), e.g. <addr>:<port>/protocol. The input can be partial to construct keys for prefix searches, e.g. "1.2.3.4". This must be kept in sync with Bytes().

func L4TypeAsByte added in v1.17.0

func L4TypeAsByte(l4 L4Type) byte

func L4TypeAsProtocolNumber added in v1.18.3

func L4TypeAsProtocolNumber(l4 L4Type) u8proto.U8proto

Given an L4Type, return the underlying Layer 4 protocol number as defined by IANA.

This routine can be used by other components to translate something like Frontend.Address.Protocol into the underlying IANA number, without having to roll their own.

Eventually, perhaps this should be pushed into the U8Proto component and stored as that type instead. This routine would then go away.

func NewBackendsTable added in v1.18.0

func NewBackendsTable(db *statedb.DB) (statedb.RWTable[*Backend], error)

func NewFrontendsTable added in v1.18.0

func NewFrontendsTable(cfg Config, db *statedb.DB) (statedb.RWTable[*Frontend], error)

func NewServicesTable added in v1.18.0

func NewServicesTable(cfg Config, db *statedb.DB) (statedb.RWTable[*Service], error)

Types

type Backend

type Backend struct {
	Address L3n4Addr

	// Instances of this backend. A backend is always linked to a specific
	// service and the instances may call the backend by different name
	// (PortName) or they may come from  differents sources.
	// Instances may contain multiple [BackendInstance]s per service
	// coming from different sources. The version from the source with the
	// highest priority (smallest uint8) is used. This is needed for smooth
	// transitions when ownership of endpoints is passed between upstream
	// data sources.
	Instances part.Map[BackendInstanceKey, BackendParams]
}

Backend is a composite of the per-service backend instances that share the same IP address and port.

func (*Backend) Clone added in v1.18.0

func (be *Backend) Clone() *Backend

Clone returns a shallow clone of the backend.

func (*Backend) GetInstance added in v1.18.0

func (be *Backend) GetInstance(name ServiceName) *BackendParams

func (*Backend) GetInstanceForFrontend added in v1.18.0

func (be *Backend) GetInstanceForFrontend(fe *Frontend) *BackendParams

func (*Backend) GetInstanceFromSource added in v1.18.0

func (be *Backend) GetInstanceFromSource(name ServiceName, src source.Source) *BackendParams

func (*Backend) GetInstancesOfService added in v1.18.0

func (be *Backend) GetInstancesOfService(name ServiceName) iter.Seq2[BackendInstanceKey, BackendParams]

func (*Backend) IsAlive added in v1.18.0

func (be *Backend) IsAlive() bool

IsAlive returns true if any of the instances are marked active or terminating and healthy. This signals whether the backend should still be considered alive or not for the purposes of terminating connections to it.

func (*Backend) PreferredInstances added in v1.18.0

func (be *Backend) PreferredInstances() iter.Seq2[BackendInstanceKey, BackendParams]

func (*Backend) String

func (be *Backend) String() string

func (*Backend) TableHeader added in v1.18.0

func (be *Backend) TableHeader() []string

func (*Backend) TableRow added in v1.18.0

func (be *Backend) TableRow() []string

type BackendID

type BackendID uint32

BackendID is the backend's ID.

type BackendInstanceKey added in v1.18.0

type BackendInstanceKey struct {
	ServiceName    ServiceName
	SourcePriority uint8
}

func (BackendInstanceKey) Key added in v1.18.0

func (k BackendInstanceKey) Key() []byte

type BackendParams added in v1.18.0

type BackendParams struct {
	Address L3n4Addr

	// PortNames are the optional names for the ports. A frontend can specify which
	// backends to select by port name.
	PortNames []string

	// Weight of backend for load-balancing.
	Weight uint16

	// Node hosting this backend. This is used to determine backends local to
	// a node.
	NodeName string

	// Zone where backend is located.
	Zone string

	// ForZones where this backend should be consumed in
	ForZones []string

	// ClusterID of the cluster in which the backend is located. 0 for local cluster.
	ClusterID uint32

	// Source of the backend.
	Source source.Source

	// State of the backend, e.g. active, quarantined or terminating.
	State BackendState

	// Unhealthy marks a backend as unhealthy and overrides [State] to mark the backend
	// as quarantined. We require a separate field for active health checking to merge
	// with the original source of this backend. Negative is used here to allow the
	// zero value to mean that the backend is healthy.
	Unhealthy bool

	// UnhealthyUpdatedAt is the timestamp for when [Unhealthy] was last updated. Zero
	// value if never updated.
	// +deepequal-gen=false
	UnhealthyUpdatedAt time.Time
}

BackendParams defines the parameters of a backend for insertion into the backends table. +deepequal-gen=true +deepequal-gen:private-method=true

func (*BackendParams) DeepEqual added in v1.18.3

func (bep *BackendParams) DeepEqual(other *BackendParams) bool

type BackendState

type BackendState uint8

BackendState is the state of a backend for load-balancing service traffic.

const (
	// BackendStateActive refers to the backend state when it's available for
	// load-balancing traffic. It's the default state for a backend.
	// Backends in this state can be health-checked.
	BackendStateActive BackendState = iota
	// BackendStateTerminating refers to the terminating backend state so that
	// it can be gracefully removed.
	// Backends in this state won't be health-checked.
	BackendStateTerminating
	// BackendStateQuarantined refers to the backend state when it's unreachable,
	// and will not be selected for load-balancing traffic.
	// Backends in this state can be health-checked.
	BackendStateQuarantined
	// BackendStateMaintenance refers to the backend state where the backend
	// is put under maintenance, and will neither be selected for load-balancing
	// traffic nor be health-checked.
	BackendStateMaintenance
	// BackendStateInvalid is an invalid state, and is used to report error conditions.
	// Keep this as the last entry.
	BackendStateInvalid
)

BackendState tracks backend's ability to load-balance service traffic.

Valid transition states for a backend - BackendStateActive -> BackendStateTerminating, BackendStateQuarantined, BackendStateMaintenance BackendStateTerminating -> No valid state transition BackendStateQuarantined -> BackendStateActive, BackendStateTerminating BackendStateMaintenance -> BackendStateActive

Sources setting the states - BackendStateActive - Kubernetes events, service API BackendStateTerminating - Kubernetes events BackendStateQuarantined - service API BackendStateMaintenance - service API

func GetBackendStateFromFlags

func GetBackendStateFromFlags(flags uint8) BackendState

func (BackendState) String

func (state BackendState) String() (string, error)

type BackendStateFlags

type BackendStateFlags = uint8

BackendStateFlags is the datapath representation of the backend flags that are used in (lb{4,6}_backend.flags) to store backend state.

func NewBackendFlags

func NewBackendFlags(state BackendState) BackendStateFlags

type BackendsSeq2 added in v1.18.0

type BackendsSeq2 iter.Seq2[BackendParams, statedb.Revision]

BackendsSeq2 is an iterator for sequence of backends that is also JSON and YAML marshalable.

func (BackendsSeq2) MarshalJSON added in v1.18.0

func (s BackendsSeq2) MarshalJSON() ([]byte, error)

func (BackendsSeq2) MarshalYAML added in v1.18.0

func (s BackendsSeq2) MarshalYAML() (any, error)

type Config added in v1.18.0

type Config struct {
	UserConfig

	// NodePortMin is the minimum port address for the NodePort range
	NodePortMin uint16

	// NodePortMax is the maximum port address for the NodePort range
	NodePortMax uint16
}

Config for load-balancing +deepequal-gen=true

func NewConfig added in v1.18.0

func NewConfig(log *slog.Logger, userConfig UserConfig, deprecatedConfig DeprecatedConfig, dcfg *option.DaemonConfig) (cfg Config, err error)

NewConfig takes the user-provided configuration, validates and processes it to produce the final configuration for load-balancing.

func (*Config) DeepEqual added in v1.18.0

func (in *Config) DeepEqual(other *Config) bool

DeepEqual is an autogenerated deepequal function, deeply comparing the receiver with other. in must be non-nil.

func (*Config) LoadBalancerUsesDSR added in v1.18.0

func (c *Config) LoadBalancerUsesDSR() bool

type DeprecatedConfig added in v1.18.0

type DeprecatedConfig struct {
	// NodePortAlg indicates which backend selection algorithm is used
	// ("random" or "maglev")
	NodePortAlg string `mapstructure:"node-port-algorithm"`

	// NodePortMode indicates in which mode NodePort implementation should run
	// ("snat", "dsr" or "hybrid")
	NodePortMode string `mapstructure:"node-port-mode"`
}

func (DeprecatedConfig) Flags added in v1.18.0

func (DeprecatedConfig) Flags(flags *pflag.FlagSet)

type ExternalConfig added in v1.18.0

type ExternalConfig struct {
	ZoneMapper

	EnableIPv4, EnableIPv6                 bool
	KubeProxyReplacement                   bool
	NodePortMin, NodePortMax               uint16
	NodePortAlg                            string
	LoadBalancerAlgorithmAnnotation        bool
	BPFSocketLBHostnsOnly                  bool
	EnableSocketLB                         bool
	EnableSocketLBPodConnectionTermination bool
	EnableHealthCheckLoadBalancerIP        bool

	// The following options will be removed in v1.19
	EnableHostPort              bool
	EnableSessionAffinity       bool
	EnableSVCSourceRangeCheck   bool
	EnableInternalTrafficPolicy bool
}

ExternalConfig are configuration options derived from external sources such as DaemonConfig. This avoids direct access of larger configuration structs.

func NewExternalConfig added in v1.18.0

func NewExternalConfig(cfg *option.DaemonConfig, kprCfg kpr.KPRConfig) ExternalConfig

NewExternalConfig maps the daemon config to ExternalConfig.

type FEPortName

type FEPortName string

FEPortName is the name of the frontend's port.

type Frontend added in v1.18.0

type Frontend struct {
	FrontendParams

	// Status is the reconciliation status for this frontend and
	// reflects whether or not the frontend and the associated backends
	// have been reconciled with the BPF maps.
	// Managed by [Writer].
	Status reconciler.Status

	// Backends associated with the frontend.
	Backends BackendsSeq2

	// HealthCheckBackends associated with the frontend that includes the ones that should be health checked.
	HealthCheckBackends BackendsSeq2

	// ID is the identifier allocated to this frontend. Used as the key
	// in the services BPF map. This field is populated by the reconciler
	// and is initially set to zero. It can be considered valid only when
	// [Status] is set to done.
	ID ServiceID

	// RedirectTo if set selects the backends from this service name instead
	// of that of [FrontendParams.ServiceName]. This is used to implement the
	// local redirect policies where traffic going to a specific service/frontend
	// is redirected to a local pod instead.
	RedirectTo *ServiceName

	// Service associated with the frontend. If service is updated
	// this pointer to the service will update as well and the
	// frontend is marked for reconciliation.
	Service *Service `json:"-" yaml:"-"`
}

func LookupFrontendByTuple added in v1.18.3

func LookupFrontendByTuple(txn statedb.ReadTxn, fes statedb.Table[*Frontend], addrCluster cmtypes.AddrCluster, proto L4Type, port uint16, scope uint8) (fe *Frontend, found bool)

LookupFrontendByTuple looks up a frontend with an address without constructing a L3n4Addr. This is used in hubble code when doing lots of lookups with low hit rates and we want to avoid constructing a unique L3n4Addr. On Go v1.24 this avoids a memory leak with L3n4Addr if they're constructed faster than they're cleaned up. On Go v1.25 the issue no longer exists. See also https://github.com/cilium/cilium/issues/41623

Prefer FrontendByAddress over this.

func (*Frontend) Clone added in v1.18.0

func (fe *Frontend) Clone() *Frontend

func (*Frontend) TableHeader added in v1.18.0

func (fe *Frontend) TableHeader() []string

func (*Frontend) TableRow added in v1.18.0

func (fe *Frontend) TableRow() []string

func (*Frontend) ToModel added in v1.18.0

func (fe *Frontend) ToModel() *models.Service

type FrontendParams added in v1.18.0

type FrontendParams struct {
	// Frontend address and port
	Address L3n4Addr

	// Service type (e.g. ClusterIP, NodePort, ...)
	Type SVCType

	// Name of the associated service
	ServiceName ServiceName

	// PortName if set will select only backends with matching
	// port name.
	PortName FEPortName

	// ServicePort is the associated "ClusterIP" port of this frontend.
	// Same as [Address.L4Addr.Port] except when [Type] NodePort or
	// LoadBalancer. This is used to match frontends with the [Ports] of
	// [Service.ProxyRedirect].
	ServicePort uint16
}

FrontendParams defines the static parameters of a frontend. This is separate from Frontend to clearly separate which fields can be manipulated and which are internally managed by [Writer].

type IPFamily added in v1.18.0

type IPFamily = bool

type InitWaitFunc added in v1.18.0

type InitWaitFunc hive.WaitFunc

InitWaitFunc is provided by the load-balancing cell to wait until the load-balancing control-plane has finished reconciliation of the initial data set.

type L3n4Addr

type L3n4Addr unique.Handle[l3n4AddrRep]

L3n4Addr is an unique L3+L4 address and scope (for traffic policies).

func NewL3n4Addr

func NewL3n4Addr(protocol L4Type, addrCluster cmtypes.AddrCluster, portNumber uint16, scope uint8) L3n4Addr

NewL3n4Addr creates a new L3n4Addr.

func NewL3n4AddrFromBackendModel

func NewL3n4AddrFromBackendModel(base *models.BackendAddress) (*L3n4Addr, error)

func NewL3n4AddrFromModel

func NewL3n4AddrFromModel(base *models.FrontendAddress) (*L3n4Addr, error)

func (L3n4Addr) Addr added in v1.18.1

func (l L3n4Addr) Addr() netip.Addr

func (L3n4Addr) AddrCluster

func (l L3n4Addr) AddrCluster() cmtypes.AddrCluster

func (L3n4Addr) AddrString added in v1.18.0

func (l L3n4Addr) AddrString() string

func (L3n4Addr) Bytes added in v1.17.0

func (l L3n4Addr) Bytes() []byte

Bytes returns the address as a byte slice for indexing purposes. The returned byte slice must not be mutated.

func (*L3n4Addr) DeepEqual

func (l *L3n4Addr) DeepEqual(other *L3n4Addr) bool

func (*L3n4Addr) GetModel

func (a *L3n4Addr) GetModel() *models.FrontendAddress

func (L3n4Addr) IsIPv6

func (a L3n4Addr) IsIPv6() bool

IsIPv6 returns true if the IP address in the given L3n4Addr is IPv6 or not.

func (L3n4Addr) MarshalJSON added in v1.18.1

func (l L3n4Addr) MarshalJSON() ([]byte, error)

func (L3n4Addr) MarshalYAML added in v1.18.0

func (l L3n4Addr) MarshalYAML() (any, error)

func (*L3n4Addr) ParseFromString added in v1.18.0

func (l *L3n4Addr) ParseFromString(s string) error

func (L3n4Addr) Port added in v1.18.1

func (l L3n4Addr) Port() uint16

func (L3n4Addr) Protocol added in v1.18.1

func (l L3n4Addr) Protocol() L4Type

func (L3n4Addr) Scope

func (l L3n4Addr) Scope() uint8

func (L3n4Addr) String

func (a L3n4Addr) String() string

String returns the L3n4Addr in the "IPv4:Port/Protocol[/Scope]" format for IPv4 and "[IPv6]:Port/Protocol[/Scope]" format for IPv6.

func (L3n4Addr) StringID

func (a L3n4Addr) StringID() string

StringID returns the L3n4Addr as string to be used for unique identification

func (L3n4Addr) StringWithProtocol

func (a L3n4Addr) StringWithProtocol() string

StringWithProtocol returns the L3n4Addr in the "IPv4:Port/Protocol[/Scope]" format for IPv4 and "[IPv6]:Port/Protocol[/Scope]" format for IPv6.

func (*L3n4Addr) UnmarshalYAML added in v1.18.0

func (l *L3n4Addr) UnmarshalYAML(value *yaml.Node) error

type L4Addr

type L4Addr struct {
	Protocol L4Type
	Port     uint16
}

L4Addr is an abstraction for the backend port with a L4Type, usually tcp or udp, and the Port number.

+k8s:deepcopy-gen=true +deepequal-gen=true +deepequal-gen:private-method=true

func NewL4Addr

func NewL4Addr(protocol L4Type, number uint16) L4Addr

NewL4Addr creates a new L4Addr.

func (*L4Addr) DeepCopy

func (in *L4Addr) DeepCopy() *L4Addr

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new L4Addr.

func (*L4Addr) DeepCopyInto

func (in *L4Addr) DeepCopyInto(out *L4Addr)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

func (*L4Addr) DeepEqual

func (l *L4Addr) DeepEqual(o *L4Addr) bool

DeepEqual returns true if both the receiver and 'o' are deeply equal.

func (L4Addr) Equals added in v1.5.0

func (l L4Addr) Equals(o L4Addr) bool

Equals returns true if both L4Addr are considered equal.

func (L4Addr) String added in v1.17.0

func (l L4Addr) String() string

String returns a string representation of an L4Addr

type L4Type

type L4Type = string

L4Type name.

func NewL4Type

func NewL4Type(name string) (L4Type, error)

func NewL4TypeFromNumber added in v1.17.0

func NewL4TypeFromNumber(proto uint8) L4Type

type ProxyRedirect added in v1.18.0

type ProxyRedirect struct {
	ProxyPort uint16

	// Ports if non-empty will only redirect a frontend with a matching port.
	Ports []uint16
}

func (*ProxyRedirect) Equal added in v1.18.0

func (pr *ProxyRedirect) Equal(other *ProxyRedirect) bool

func (*ProxyRedirect) Redirects added in v1.18.0

func (pr *ProxyRedirect) Redirects(port uint16) bool

func (*ProxyRedirect) String added in v1.18.0

func (pr *ProxyRedirect) String() string

type SVCForwardingMode added in v1.17.0

type SVCForwardingMode string

func ToSVCForwardingMode added in v1.17.0

func ToSVCForwardingMode(s string) SVCForwardingMode

type SVCLoadBalancingAlgorithm added in v1.17.0

type SVCLoadBalancingAlgorithm uint8
const (
	SVCLoadBalancingAlgorithmUndef  SVCLoadBalancingAlgorithm = 0
	SVCLoadBalancingAlgorithmRandom SVCLoadBalancingAlgorithm = 1
	SVCLoadBalancingAlgorithmMaglev SVCLoadBalancingAlgorithm = 2
)

func ToSVCLoadBalancingAlgorithm added in v1.17.0

func ToSVCLoadBalancingAlgorithm(s string) SVCLoadBalancingAlgorithm

func (SVCLoadBalancingAlgorithm) String added in v1.18.0

func (d SVCLoadBalancingAlgorithm) String() string

type SVCNatPolicy

type SVCNatPolicy string

SVCNatPolicy defines whether we need NAT46/64 translation for backends

type SVCProxyDelegation added in v1.18.0

type SVCProxyDelegation string

type SVCSourceRangesPolicy added in v1.17.0

type SVCSourceRangesPolicy string

type SVCTrafficPolicy

type SVCTrafficPolicy string

SVCTrafficPolicy defines which backends are chosen

type SVCType

type SVCType string

SVCType is a type of a service.

type Service added in v1.18.0

type Service struct {
	// Name is the fully qualified service name, e.g. (<cluster>/)<namespace>/<name>.
	Name ServiceName

	// Source is the data source from which this service was ingested from.
	Source source.Source

	// Labels associated with the service.
	Labels labels.Labels

	// Annotations associated with this service.
	Annotations map[string]string

	// Selector specifies which pods should be associated with this service. If
	// this is empty the backends associated to this service are managed externally
	// and not by Kubernetes.
	Selector map[string]string

	// NatPolicy defines whether we need NAT46/64 translation for backends.
	NatPolicy SVCNatPolicy

	// ExtTrafficPolicy controls how backends are selected for North-South traffic.
	// If set to "Local", only node-local backends are chosen.
	ExtTrafficPolicy SVCTrafficPolicy

	// IntTrafficPolicy controls how backends are selected for East-West traffic.
	// If set to "Local", only node-local backends are chosen.
	IntTrafficPolicy SVCTrafficPolicy

	// ForwardingMode controls whether DSR or SNAT should be used for the dispatch
	// to the backend. If undefined the default mode is used (--bpf-lb-mode).
	ForwardingMode SVCForwardingMode

	// SessionAffinity if true will enable the client IP based session affinity.
	SessionAffinity bool

	// SessionAffinityTimeout is the duration of inactivity before the session
	// affinity is cleared for a specific client IP.
	SessionAffinityTimeout time.Duration

	// LoadBalancerClass if set specifies the load-balancer class to be used
	// for a LoadBalancer service. If unset the default implementation is used.
	LoadBalancerClass *string

	// ProxyRedirect if non-nil redirects the traffic going to the frontends
	// towards a locally running proxy.
	ProxyRedirect *ProxyRedirect

	// HealthCheckNodePort defines on which port the node runs a HTTP health
	// check server which may be used by external loadbalancers to determine
	// if a node has local backends. This will only have effect if both
	// LoadBalancerIPs is not empty and ExtTrafficPolicy is SVCTrafficPolicyLocal.
	HealthCheckNodePort uint16

	// LoopbackHostPort defines that HostPort frontends for this service should
	// only be exposed internally to the node.
	LoopbackHostPort bool

	// SourceRanges if non-empty will restrict access to the service to the specified
	// client addresses.
	SourceRanges []netip.Prefix

	// PortNames maps a port name to a port number.
	PortNames map[string]uint16

	// TrafficDistribution if not default will influence how backends are chosen for
	// frontends associated with this service.
	TrafficDistribution TrafficDistribution
}

Service defines the common properties for a load-balancing service. Associated with a service are a set of frontends that receive the traffic, and a set of backends to which the traffic is directed. A single frontend can map to a partial subset of backends depending on its properties.

func (*Service) Clone added in v1.18.0

func (svc *Service) Clone() *Service

Clone returns a shallow clone of the service, e.g. for updating a service with UpsertService. Fields that are references (e.g. Labels or Annotations) must be further cloned if mutated.

func (*Service) GetAnnotations added in v1.18.0

func (svc *Service) GetAnnotations() map[string]string

func (*Service) GetLBAlgorithmAnnotation added in v1.18.0

func (svc *Service) GetLBAlgorithmAnnotation() SVCLoadBalancingAlgorithm

func (*Service) GetProxyDelegation added in v1.18.0

func (svc *Service) GetProxyDelegation() SVCProxyDelegation

func (*Service) GetSourceRangesPolicy added in v1.18.0

func (svc *Service) GetSourceRangesPolicy() SVCSourceRangesPolicy

func (*Service) TableHeader added in v1.18.0

func (svc *Service) TableHeader() []string

func (*Service) TableRow added in v1.18.0

func (svc *Service) TableRow() []string

type ServiceFlags

type ServiceFlags uint16

ServiceFlags is the datapath representation of the service flags that can be used (lb{4,6}_service.flags)

func NewSvcFlag

func NewSvcFlag(p *SvcFlagParam) ServiceFlags

NewSvcFlag creates service flag

func (ServiceFlags) IsL7LB added in v1.13.14

func (s ServiceFlags) IsL7LB() bool

func (ServiceFlags) SVCExtTrafficPolicy

func (s ServiceFlags) SVCExtTrafficPolicy() SVCTrafficPolicy

SVCExtTrafficPolicy returns a service traffic policy from the flags

func (ServiceFlags) SVCIntTrafficPolicy

func (s ServiceFlags) SVCIntTrafficPolicy() SVCTrafficPolicy

SVCIntTrafficPolicy returns a service traffic policy from the flags

func (ServiceFlags) SVCNatPolicy

func (s ServiceFlags) SVCNatPolicy(fe L3n4Addr) SVCNatPolicy

SVCNatPolicy returns a service NAT policy from the flags

func (ServiceFlags) SVCSlotQuarantined added in v1.17.0

func (s ServiceFlags) SVCSlotQuarantined() bool

SVCSlotQuarantined

func (ServiceFlags) SVCType

func (s ServiceFlags) SVCType() SVCType

SVCType returns a service type from the flags

func (ServiceFlags) String

func (s ServiceFlags) String() string

String returns the string implementation of ServiceFlags.

func (ServiceFlags) UInt16

func (s ServiceFlags) UInt16() uint16

UInt8 returns the UInt16 representation of the ServiceFlags.

type ServiceID

type ServiceID uint16

ServiceID is the service's ID.

type ServiceName

type ServiceName struct {
	// contains filtered or unexported fields
}

ServiceName represents the fully-qualified reference to the service by name, including both the namespace and name of the service (and optionally the cluster). +k8s:deepcopy-gen=true

func NewServiceName added in v1.18.0

func NewServiceName(namespace, name string) ServiceName

func NewServiceNameInCluster added in v1.18.0

func NewServiceNameInCluster(cluster, namespace, name string) ServiceName

func (ServiceName) AppendSuffix added in v1.18.0

func (n ServiceName) AppendSuffix(suffix string) ServiceName

func (ServiceName) Cluster

func (s ServiceName) Cluster() string

func (ServiceName) Compare added in v1.17.0

func (n ServiceName) Compare(other ServiceName) int

func (*ServiceName) DeepCopy

func (in *ServiceName) DeepCopy() *ServiceName

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ServiceName.

func (*ServiceName) DeepCopyInto

func (in *ServiceName) DeepCopyInto(out *ServiceName)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

func (*ServiceName) Equal added in v1.15.15

func (n *ServiceName) Equal(other ServiceName) bool

func (ServiceName) Key added in v1.18.0

func (n ServiceName) Key() index.Key

func (ServiceName) MarshalJSON added in v1.18.0

func (n ServiceName) MarshalJSON() ([]byte, error)

func (ServiceName) MarshalYAML added in v1.18.0

func (n ServiceName) MarshalYAML() (any, error)

func (ServiceName) Name

func (s ServiceName) Name() string

func (ServiceName) Namespace

func (s ServiceName) Namespace() string

func (ServiceName) String

func (n ServiceName) String() string

func (*ServiceName) UnmarshalJSON added in v1.18.0

func (n *ServiceName) UnmarshalJSON(bs []byte) error

func (*ServiceName) UnmarshalYAML added in v1.18.0

func (n *ServiceName) UnmarshalYAML(value *yaml.Node) error

type SvcFlagParam

type SvcFlagParam struct {
	SvcType          SVCType
	SvcNatPolicy     SVCNatPolicy
	SvcFwdModeDSR    bool
	SvcExtLocal      bool
	SvcIntLocal      bool
	SessionAffinity  bool
	IsRoutable       bool
	CheckSourceRange bool
	SourceRangeDeny  bool
	L7LoadBalancer   bool
	LoopbackHostport bool
	Quarantined      bool
}

+k8s:deepcopy-gen=true

func (*SvcFlagParam) DeepCopy

func (in *SvcFlagParam) DeepCopy() *SvcFlagParam

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new SvcFlagParam.

func (*SvcFlagParam) DeepCopyInto

func (in *SvcFlagParam) DeepCopyInto(out *SvcFlagParam)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type TestConfig added in v1.18.0

type TestConfig struct {
	TestFaultProbability float32 `mapstructure:"lb-test-fault-probability"`
}

TestConfig are the configuration options for testing. Only provided by tests and not present in the agent.

func (TestConfig) Flags added in v1.18.0

func (def TestConfig) Flags(flags *pflag.FlagSet)

type TrafficDistribution added in v1.18.0

type TrafficDistribution string

type UserConfig added in v1.18.0

type UserConfig struct {
	RetryBackoffMin time.Duration `mapstructure:"lb-retry-backoff-min"`
	RetryBackoffMax time.Duration `mapstructure:"lb-retry-backoff-max"`

	// LBMapEntries is the maximum number of entries allowed in BPF lbmap.
	LBMapEntries int `mapstructure:"bpf-lb-map-max"`

	// LBServiceMapEntries is the maximum number of entries allowed in BPF lbmap for services.
	LBServiceMapEntries int `mapstructure:"bpf-lb-service-map-max"`

	// LBBackendMapEntries is the maximum number of entries allowed in BPF lbmap for service backends.
	LBBackendMapEntries int `mapstructure:"bpf-lb-service-backend-map-max"`

	// LBRevNatEntries is the maximum number of entries allowed in BPF lbmap for reverse NAT.
	LBRevNatEntries int `mapstructure:"bpf-lb-rev-nat-map-max"`

	// LBAffinityMapEntries is the maximum number of entries allowed in BPF lbmap for session affinities.
	LBAffinityMapEntries int `mapstructure:"bpf-lb-affinity-map-max"`

	// LBSourceRangeAllTypes enables propagation of loadbalancerSourceRanges to all Kubernetes
	// service types which were created from the LoadBalancer service.
	LBSourceRangeAllTypes bool `mapstructure:"bpf-lb-source-range-all-types"`

	// LBSourceRangeMapEntries is the maximum number of entries allowed in BPF lbmap for source ranges.
	LBSourceRangeMapEntries int `mapstructure:"bpf-lb-source-range-map-max"`

	// LBMaglevMapEntries is the maximum number of entries allowed in BPF lbmap for maglev.
	LBMaglevMapEntries int `mapstructure:"bpf-lb-maglev-map-max"`

	// LBSockRevNatEntries is the maximum number of sock rev nat mappings
	// allowed in the BPF rev nat table
	LBSockRevNatEntries int `mapstructure:"bpf-sock-rev-map-max"`

	// NodePortRange is the minimum and maximum ports to use for NodePort
	NodePortRange []string

	// LBMode indicates in which mode NodePort implementation should run
	// ("snat", "dsr" or "hybrid")
	LBMode string `mapstructure:"bpf-lb-mode"`

	// LBModeAnnotation tells whether controller should check service
	// level annotation for configuring bpf load balancing algorithm.
	LBModeAnnotation bool `mapstructure:"bpf-lb-mode-annotation"`

	// LoadBalancerAlgorithm indicates which backend selection algorithm is used
	// ("random" or "maglev")
	LBAlgorithm string `mapstructure:"bpf-lb-algorithm"`

	// DSRDispatch indicates the method for pushing packets to
	// backends under DSR ("opt" or "ipip")
	DSRDispatch string `mapstructure:"bpf-lb-dsr-dispatch"`

	// ExternalClusterIP enables routing to ClusterIP services from outside
	// the cluster. This mirrors the behaviour of kube-proxy.
	ExternalClusterIP bool `mapstructure:"bpf-lb-external-clusterip"`

	// AlgorithmAnnotation tells whether controller should check service
	// level annotation for configuring bpf load balancing algorithm.
	AlgorithmAnnotation bool `mapstructure:"bpf-lb-algorithm-annotation"`

	// EnableHealthCheckNodePort enables health checking of NodePort by
	// cilium
	EnableHealthCheckNodePort bool `mapstructure:"enable-health-check-nodeport"`

	// LBPressureMetricsInterval sets the interval for updating the load-balancer BPF map
	// pressure metrics. A batch lookup is performed for all maps periodically to count
	// the number of elements that are then reported in the `bpf-map-pressure` metric.
	LBPressureMetricsInterval time.Duration `mapstructure:"lb-pressure-metrics-interval"`

	// Enable processing of service topology aware hints
	EnableServiceTopology bool

	// InitWaitTimeout is the amount of time we wait for the load-balancing tables to be initialized before
	// we start reconciling towards the BPF maps. This reduces the probability that load-balancing is scaled
	// down temporarily due to not yet seeing all backends.
	//
	// The delay happens only when existing BPF state existed.
	//
	// We must not wait forever for initialization though due to potential interdependencies between load-balancing
	// data sources. For example we might depend on Kubernetes data to connect to the ClusterMesh api-server and
	// thus may need to first reconcile the Kubernetes services to connect to ClusterMesh (if endpoints have changed
	// while agent was down).
	InitWaitTimeout time.Duration `mapstructure:"lb-init-wait-timeout"`
}

UserConfig is the configuration provided by the user that has not been processed. +deepequal-gen=true

func (*UserConfig) DeepEqual added in v1.18.0

func (in *UserConfig) DeepEqual(other *UserConfig) bool

DeepEqual is an autogenerated deepequal function, deeply comparing the receiver with other. in must be non-nil.

func (UserConfig) Flags added in v1.18.0

func (def UserConfig) Flags(flags *pflag.FlagSet)

type ZoneMapper added in v1.18.0

type ZoneMapper interface {
	GetZoneID(string) uint8
}

Directories

Path Synopsis
cmd command
SPDX-License-Identifier: Apache-2.0 Copyright Authors of Cilium
SPDX-License-Identifier: Apache-2.0 Copyright Authors of Cilium

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL