Documentation
¶
Overview ¶
Package routing is L4 — peer discovery & routing.
The Manager owns:
- relayPeers map[uint32]bool — peers that need relay (symmetric NAT)
- relayPinned map[uint32]bool — peers whose relay flag is authoritative
- blackholeMissCount map[uint32]int — consecutive miss observations
- directClearCount map[uint32]int — consecutive direct receipts
- sendErrCount map[uint32]int — consecutive ICMP-unreachable errors
- lastDirectRecv map[uint32]time.Time — last direct-path receipt
- lastOutboundSend map[uint32]time.Time — last successful send
- firstOutboundSend map[uint32]time.Time — first ever send (blackhole baseline)
- beaconAddr — the picked beacon for relay/punch
Per docs/architecture/01-LAYERS.md L4:
Role: given a nodeID, return a reachable UDP endpoint (or relay path); coordinate NAT traversal. Owns: relayPeers map, relayPinned map, blackholeMissCount map, lastDirectRecv, beacon list, beacon-cache file. Consumes: L1, L2; reads identity from L3 (FNV-32 hash on Ed25519 pubkey for stable beacon pick). Exposes: Resolve(nodeID) → endpoint, RouteFrame(dst, frame), PunchTo(peer).
Per docs/architecture/03-INVARIANTS.md §3 (lock graph): the Manager's mu is a LEAF lock — never held while taking any TunnelManager mutex. All callers obey this by reading state under mu, releasing, then mutating tm-side state in a separate critical section.
Index ¶
- Constants
- Variables
- func BeaconCachePath(identityPath string) string
- func DiscoverEndpoint(beaconAddr string, nodeID uint32, conn *net.UDPConn, readDeadline time.Time) (*net.UDPAddr, error)
- func FetchBeaconList(client BeaconLister) ([]string, error)
- func FilterUnreachable(addrs []string) []string
- func FirstBeacon(s string) string
- func InitialJitter() time.Duration
- func IsUnreachableBeaconHost(host string) bool
- func LoadBeaconCache(identityPath string) ([]string, error)
- func MergeBeaconLists(bootstrap, discovered []string) []string
- func ParseBeaconList(s string) []string
- func PickBeacon(list []string, key []byte) string
- func PickBeaconWithRTT(list []string, key []byte, rttMap map[string]time.Duration) string
- func ProbeBeaconRTT(beaconAddr string, nodeID uint32, timeout time.Duration) (time.Duration, error)
- func SaveBeaconCache(identityPath string, addrs []string) error
- type BeaconCacheEntry
- type BeaconLister
- type BeaconSelectionState
- type CounterTarget
- type LocalNodeIDFn
- type Manager
- func (m *Manager) AdmitRelayFromBeacon(peerNodeID uint32) bool
- func (m *Manager) BeaconAddr() *net.UDPAddr
- func (m *Manager) BlackholeMissCount(nodeID uint32) int
- func (m *Manager) ClearRelayOnDirect(peerNodeID uint32, from *net.UDPAddr) bool
- func (m *Manager) ClearSendErrCount(nodeID uint32)
- func (m *Manager) HandlePunchCommand(data []byte)
- func (m *Manager) HandleSendError(nodeID uint32, err error) (flipped bool, count int)
- func (m *Manager) HasDirectRecv(nodeID uint32) bool
- func (m *Manager) IsFromBeacon(from *net.UDPAddr) bool
- func (m *Manager) IsRelayPeer(nodeID uint32) bool
- func (m *Manager) IsRelayPinned(nodeID uint32) bool
- func (m *Manager) LastDirectRecv(nodeID uint32) time.Time
- func (m *Manager) LastOutboundSend(nodeID uint32) (time.Time, bool)
- func (m *Manager) MarkRelayActivatedIfHadCrypto(srcNodeID uint32, hadCrypto bool) (admitted, newlyActivated, shouldAliasPeer bool)
- func (m *Manager) MaybeFlipBlackhole(nodeID uint32) (shouldRelay, flipped bool, silentFor time.Duration, misses int)
- func (m *Manager) RecordDirectRecv(nodeID uint32, t time.Time)
- func (m *Manager) RecordOutboundSend(nodeID uint32, t time.Time)
- func (m *Manager) RegisterWithBeacon() error
- func (m *Manager) RelayPeerCount() int
- func (m *Manager) RelayPeerIDs() []uint32
- func (m *Manager) RemovePeer(nodeID uint32)
- func (m *Manager) RequestHolePunch(targetNodeID uint32) error
- func (m *Manager) SendErrCount(nodeID uint32) int
- func (m *Manager) SetBeaconAddr(addr string) error
- func (m *Manager) SetBeaconAddrUDP(a *net.UDPAddr)
- func (m *Manager) SetLocalNodeIDFn(f LocalNodeIDFn)
- func (m *Manager) SetRelayPeer(nodeID uint32, relay bool)
- func (m *Manager) SetRelayPeerPinned(nodeID uint32, relay bool)
- func (m *Manager) SetSocket(s SocketSender)
- func (m *Manager) WriteFrame(nodeID uint32, addr *net.UDPAddr, frame []byte, counters CounterTarget) (SendOutcome, error)
- type RefreshDecision
- type SendOutcome
- type SocketSender
Constants ¶
const BeaconCacheFilename = "beacons.json"
BeaconCacheFilename is the on-disk fallback used when the registry is unreachable at cold-start. Lives next to the identity file.
const BeaconRefreshInterval = 60 * time.Second
BeaconRefreshInterval is how often the daemon re-fetches beacon_list from the registry. At 100k daemons × 1 call/min = ~1.7k req/sec on the registry — bounded, small responses, well within capacity.
const BeaconRefreshJitter = 10 * time.Second
BeaconRefreshJitter spreads the initial tick across a window so a fleet-wide simultaneous restart doesn't thunder-herd the registry. The first refresh fires at t = rand[0..BeaconRefreshJitter).
const BlackholeMissesRequired = 3
BlackholeMissesRequired is the number of consecutive WriteFrame observations of "direct path silent for >threshold" needed before the tunnel flips to relay.
const DirectBlackholeThreshold = 8 * time.Second
DirectBlackholeThreshold is how long a peer can go without a direct recv before WriteFrame auto-flips to relay. Tuned for force_relay_* integration tests which sleep 10s after partitioning UDP between peers.
const DirectClearsRequired = 3
DirectClearsRequired is the symmetric hysteresis on the relay→direct auto-clear path.
const MaxRelayPeers = 4096
MaxRelayPeers caps the relayPeers map. The beacon relays packets to us with a caller-supplied srcNodeID, so without a bound an attacker can grow relayPeers indefinitely. Real networks never approach this cap.
const SendErrThreshold = 3
SendErrThreshold is the number of consecutive ICMP-unreachable errors from a single peer required before HandleSendError flips them to relay.
Variables ¶
var ErrNoAddress = errors.New("routing: no address for peer")
ErrNoAddress is returned by WriteFrame when no UDP address is available for a peer (neither a stored direct addr nor a beacon for relay).
Functions ¶
func BeaconCachePath ¶
BeaconCachePath returns the path to the on-disk beacon cache derived from the identity path. Returns "" if identityPath is empty (in-memory daemons skip caching).
func DiscoverEndpoint ¶
func DiscoverEndpoint(beaconAddr string, nodeID uint32, conn *net.UDPConn, readDeadline time.Time) (*net.UDPAddr, error)
DiscoverEndpoint sends a STUN-style discover to the beacon over the supplied UDP socket and returns the public endpoint the beacon observed for us. Synchronous: blocks until reply or timeout.
This is the cold-start STUN path; it runs BEFORE the long-lived readLoop is wired up so the same conn that will later host tunnel traffic is reused without contention. Callers must close the returned conn (or hand it off to udpio.WrapConn) once registration is complete.
Format on the wire matches the existing pre-extraction shape:
tx: [BeaconMsgDiscover(1)][nodeID(4)] rx: [BeaconMsgDiscoverReply(1)][iplen(1)][IP(4 or 16)][port(2)]
func FetchBeaconList ¶
func FetchBeaconList(client BeaconLister) ([]string, error)
FetchBeaconList queries the registry's beacon_list endpoint and returns just the addresses, in the registry's response order. Empty addrs are dropped. Returns an error if the call fails or the response shape is wrong; the caller treats that as "keep the current list".
func FilterUnreachable ¶
FilterUnreachable removes addresses whose host part is an RFC1918 / loopback / link-local literal. Used on the DISCOVERED list before merging with bootstrap — the operator's bootstrap list is preserved verbatim (intra-VPC deployments can still pin a private beacon there).
func FirstBeacon ¶
FirstBeacon returns the first entry of a parsed beacon list, used for pre-identity STUN discovery where any beacon will do. Returns "" if the list is empty.
func InitialJitter ¶
InitialJitter returns a duration in [0, BeaconRefreshJitter) for avoiding thundering-herd on the registry at fleet restart.
func IsUnreachableBeaconHost ¶
IsUnreachableBeaconHost reports whether host is an RFC1918 / loopback / link-local / unspecified address — an address that a public-internet daemon cannot reach. The registry's beacon_list endpoint can return internal addresses for beacons running on the same VPC (e.g. GCP 10.128.0.0/16); those are useless to off-VPC daemons and, if picked, silently black-hole all relay traffic. Only IP literals are checked — DNS hostnames are kept (they may resolve to public addresses).
func LoadBeaconCache ¶
LoadBeaconCache reads the on-disk cache. Returns (nil, nil) if no cache exists or identityPath is empty (not an error). Returns an error only on parse failures.
func MergeBeaconLists ¶
MergeBeaconLists combines the operator-configured bootstrap list with addresses discovered from the registry's beacon_list endpoint. The bootstrap entries always come first in the result and are deduplicated — discovered entries that match a bootstrap entry are dropped, not duplicated. The returned slice is stable-ordered: bootstrap entries in their input order, then unique discovered entries in their input order. This determinism matters because PickBeacon hashes-modulo-N and any reordering would cause sticky picks to flip when set membership is unchanged.
func ParseBeaconList ¶
ParseBeaconList splits the -beacon flag value on commas and returns the list of trimmed, non-empty entries. An empty input returns nil.
func PickBeacon ¶
PickBeacon returns one beacon address chosen deterministically from the list using FNV-32 of key. Stable across restarts when key is stable (e.g. the daemon's Ed25519 public key bytes). Returns "" if the list is empty.
With len(list)==1, always returns list[0] regardless of key — guarantees the back-compat path for single-beacon configs.
func PickBeaconWithRTT ¶
PickBeaconWithRTT selects a beacon using measured RTT data as a tiebreaker. The hash-based pick (PickBeacon) is preferred to maintain load distribution across the fleet, but is overridden when it is more than 2× slower than the fastest measured beacon — a strong signal that geography or routing has placed this node far from its hash pick.
rttMap maps beacon address → measured probe RTT. Beacons absent from rttMap (probe timed out or failed) are treated as unreachable and excluded from RTT comparison. When rttMap is empty, falls back to the pure hash pick. When the hash pick itself failed to probe but others succeeded, the fastest responding beacon is used.
This function is gated behind the BeaconRTTProbe feature flag in pkg/daemon — call PickBeacon directly when the flag is off.
func ProbeBeaconRTT ¶
ProbeBeaconRTT measures the UDP round-trip time to a beacon by opening a temporary ephemeral socket, sending a BeaconMsgDiscover, and timing the reply. The probe socket is separate from the daemon's tunnel socket so it never interferes with live traffic.
nodeID is included in the discover payload as the protocol requires, but the beacon's reply depends only on the sender's UDP source address — any valid uint32 (including 0) works for a pure RTT measurement.
Returns (0, err) when the beacon is unreachable within timeout. Used by the beacon-rtt-probe feature flag to rank beacons before selection.
func SaveBeaconCache ¶
SaveBeaconCache writes the current addr list to disk as a fallback for next cold-start. Best-effort: errors are returned but the caller typically logs and continues.
Types ¶
type BeaconCacheEntry ¶
BeaconCacheEntry is the on-disk format. We keep "saved_at" so a stale cache (older than e.g. an hour) can be sniffed out by an operator.
type BeaconLister ¶
BeaconLister abstracts the registry call. Production wires this to (*registry.Client).Send; tests inject a fake.
type BeaconSelectionState ¶
type BeaconSelectionState struct {
// contains filtered or unexported fields
}
BeaconSelectionState tracks the daemon's beacon picks across refresh ticks. Pure data — the refresh logic mutates it under its mutex, and the daemon hot path reads via GetCurrentPick().
func NewBeaconSelectionState ¶
func NewBeaconSelectionState(bootstrap []string) *BeaconSelectionState
NewBeaconSelectionState returns a state seeded with the operator's bootstrap list (copied).
func (*BeaconSelectionState) ApplyRefreshDecision ¶
func (s *BeaconSelectionState) ApplyRefreshDecision(d RefreshDecision)
ApplyRefreshDecision commits a refresh outcome to the state struct. Called by the production refresh loop after a successful SetBeaconAddr.
func (*BeaconSelectionState) GetCurrentPick ¶
func (s *BeaconSelectionState) GetCurrentPick() string
GetCurrentPick returns the currently-selected beacon address (or "").
type CounterTarget ¶
CounterTarget is supplied by TunnelManager so WriteFrame can bump the public PktsSent / BytesSent atomics on every successful transmit.
type LocalNodeIDFn ¶
type LocalNodeIDFn func() uint32
LocalNodeIDFn returns our own node ID. Plumbed in from the daemon (atomic load over tm.nodeID).
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager owns the L4 routing state.
func New ¶
func New() *Manager
New returns a fresh Manager with empty state. The Socket may be nil initially and set later via SetSocket once the L2 listener is bound.
func (*Manager) AdmitRelayFromBeacon ¶
AdmitRelayFromBeacon is called when an inbound key-exchange or "no key" rekey arrives from the beacon's listen port. It atomically:
- returns true if the peer was admitted (relay flag set + pinned),
- returns false if the relay-peer cap is full and the peer was a fresh entry (no prior relay flag).
Caller (TunnelManager) handles the surrounding peers-map mutation.
func (*Manager) BeaconAddr ¶
BeaconAddr returns the currently-set beacon UDP endpoint (or nil).
func (*Manager) BlackholeMissCount ¶
BlackholeMissCount returns the current consecutive-miss count.
func (*Manager) ClearRelayOnDirect ¶
ClearRelayOnDirect is called from the L7 handleEncrypted path after a successful decrypt. Resets blackholeMissCount unconditionally. For unpinned relay peers, increments directClearCount and clears the relay flag once DirectClearsRequired consecutive direct packets have arrived. Pinned peers (registry relay_only=true, beacon-admitted symmetric NAT) are never auto-cleared — only an explicit SetRelayPeerPinned(id, false) can clear them.
func (*Manager) ClearSendErrCount ¶
ClearSendErrCount drops accumulated ICMP-unreachable errors for a peer. Called after a successful inbound decrypt — proof the peer is alive.
func (*Manager) HandlePunchCommand ¶
HandlePunchCommand processes a beacon punch command, sending punch packets to the specified target to create a NAT mapping.
Format: [iplen(1)][IP(4 or 16)][port(2)]
func (*Manager) HandleSendError ¶
HandleSendError records an ICMP-unreachable error and may flip the peer to relay mode if the threshold is reached. Returns true if the peer was newly flipped to relay (caller logs/publishes).
Non-ICMP errors (generic write failure, EAGAIN, etc.) are ignored.
func (*Manager) HasDirectRecv ¶
HasDirectRecv reports whether we've ever recorded a direct receipt for the peer. Used by tests asserting RemovePeer cleanup.
func (*Manager) IsFromBeacon ¶
IsFromBeacon reports whether `from` matches the configured beacon's IP and port. Cheap helper used by the caller's address-learning paths to avoid pinning a peer to the beacon's listen port.
func (*Manager) IsRelayPeer ¶
IsRelayPeer reports whether the peer is currently in relay mode.
func (*Manager) IsRelayPinned ¶
IsRelayPinned reports whether the peer's relay flag is pinned.
func (*Manager) LastDirectRecv ¶
LastDirectRecv returns the recorded timestamp (zero value if none).
func (*Manager) LastOutboundSend ¶
LastOutboundSend returns the last recorded outbound timestamp.
func (*Manager) MarkRelayActivatedIfHadCrypto ¶
func (m *Manager) MarkRelayActivatedIfHadCrypto(srcNodeID uint32, hadCrypto bool) (admitted, newlyActivated, shouldAliasPeer bool)
MarkRelayActivatedIfHadCrypto is called from HandleRelayDeliver. It returns:
- admitted: false if hadCrypto && relay-cap reached for fresh entry — caller should drop the relay packet.
- newlyActivated: true if hadCrypto && this is the first time the peer was added to relayPeers — caller should publish the "tunnel.relay_activated" event.
- shouldAliasPeer: true if hadCrypto && peer was promoted to relayPeers and caller should alias peers[srcNodeID] = beaconAddr when no entry exists.
func (*Manager) MaybeFlipBlackhole ¶
func (m *Manager) MaybeFlipBlackhole(nodeID uint32) (shouldRelay, flipped bool, silentFor time.Duration, misses int)
MaybeFlipBlackhole evaluates the blackhole heuristic. Caller invokes before sending a frame. Returns shouldRelay=true if the peer is now in relay mode (either because it already was, or because the heuristic just tripped). flipped=true means this call did the flip.
func (*Manager) RecordDirectRecv ¶
RecordDirectRecv stamps the lastDirectRecv timestamp for a peer. Called from the L7 handleEncrypted path after a successful direct (non-beacon) decrypt. Also clears firstOutboundSend since the direct path is confirmed.
func (*Manager) RecordOutboundSend ¶
RecordOutboundSend stamps the lastOutboundSend timestamp for a peer. Also sets firstOutboundSend on the very first write (never updated after that), which gives MaybeFlipBlackhole a baseline for peers that have never been heard from on the direct path. Used by NAT-keepalive logic and blackhole detection.
func (*Manager) RegisterWithBeacon ¶
RegisterWithBeacon sends a MsgDiscover to the beacon from the tunnel socket using our nodeID, so the beacon knows our endpoint for punch coordination. Returns nil if no beacon is configured.
func (*Manager) RelayPeerCount ¶
RelayPeerCount returns the number of peers currently in relay mode. Cheaper than RelayPeerIDs when only the size matters.
func (*Manager) RelayPeerIDs ¶
RelayPeerIDs returns the node IDs of all relay-flagged peers.
func (*Manager) RemovePeer ¶
RemovePeer wipes per-peer L4 state.
func (*Manager) RequestHolePunch ¶
RequestHolePunch asks the beacon to coordinate NAT hole-punching with a target peer. Returns nil if no beacon is configured.
func (*Manager) SendErrCount ¶
SendErrCount returns the current consecutive-error count.
func (*Manager) SetBeaconAddr ¶
SetBeaconAddr resolves and stores the beacon address.
func (*Manager) SetBeaconAddrUDP ¶
SetBeaconAddrUDP stores an already-resolved beacon address. Used by tests that construct a fake beacon on a test loopback port.
func (*Manager) SetLocalNodeIDFn ¶
func (m *Manager) SetLocalNodeIDFn(f LocalNodeIDFn)
SetLocalNodeIDFn supplies the closure used to read our own node ID.
func (*Manager) SetRelayPeer ¶
SetRelayPeer marks a peer as needing relay through the beacon. Pinning is left clear — the relay flag may be auto-cleared by direct-path observations. Use SetRelayPeerPinned for authoritative-signal cases.
func (*Manager) SetRelayPeerPinned ¶
SetRelayPeerPinned marks the peer as relay-bound and pins the flag — ClearRelayOnDirect will never auto-flip a pinned peer back to direct based on observed packet sources. Used by ensureTunnel when the registry's resolve response carries relay_only=true, or by writeFrame when an empirically-confirmed signal triggers the flip.
func (*Manager) SetSocket ¶
func (m *Manager) SetSocket(s SocketSender)
SetSocket wires the L2 socket. Called by TunnelManager.Listen once the UDP listener is bound.
func (*Manager) WriteFrame ¶
func (m *Manager) WriteFrame(nodeID uint32, addr *net.UDPAddr, frame []byte, counters CounterTarget) (SendOutcome, error)
WriteFrame ships a raw UDP frame to a peer. Routes through the beacon (relay-wrapped MsgRelay) if the peer is in relay mode; otherwise sends direct to addr.
Side effects:
- May flip the peer to relay mode if the blackhole heuristic trips.
- Updates lastOutboundSend on success.
- Bumps the caller's PktsSent / BytesSent atomics via counters.
- Records ICMP-unreachable errors and may flip to relay on threshold.
Returns the underlying UDP error (or ErrNoAddress if neither a relay path nor a direct addr is available). The caller logs flips and records auxiliary errors.
type RefreshDecision ¶
type RefreshDecision struct {
NewList []string // merged bootstrap + discovered, deduped
NewPick string // hash-of-pubkey selection from NewList
ShouldSwap bool // true if NewPick != previous currentPick
}
RefreshDecision describes the outcome of a discovery tick: should the daemon swap its beacon and what's the new picked address?
func ComputeRefreshDecision ¶
func ComputeRefreshDecision(state *BeaconSelectionState, discovered []string, identityKey []byte) RefreshDecision
ComputeRefreshDecision runs the merge-pick-compare logic on a snapshot of state. Pure function; tests drive it directly without a registry.
If discovered is nil/empty, the function still merges with bootstrap — i.e. if discovery fails / returns nothing, fall back to bootstrap- only. NewList is empty only if BOTH inputs are empty.
ShouldSwap is true when:
- currentPick is empty (initial pick at startup), OR
- currentPick is no longer present in NewList (failover: the picked beacon was scaled down / removed from the registry), OR
- hash-pick over NewList disagrees with currentPick (rare; happens when the bootstrap list changes via config reload, not in steady state since pubkey + list both stay constant).
NOTE on stickiness vs failover: a hash-of-pubkey pick is stable as long as the list set is stable. When a beacon is REMOVED from the list, the modulo result for ~ all daemons hashing past that index shifts — so the daemon migrates naturally. When a NEW beacon is added at a higher index, only the daemons whose hash%N now points at the new entry migrate. This is the standard mod-N failover; consistent hashing would minimize migration but mod-N is fine at our scale.
type SendOutcome ¶
type SendOutcome struct {
BytesSent int // bytes written on the wire
WasRelay bool // true if frame was wrapped + sent to beacon
}
SendOutcome describes the outcome of a WriteFrame call. The caller (TunnelManager) uses this to maintain its own atomics (PktsSent, BytesSent) without the Manager owning those counters.