Documentation
¶
Overview ¶
Package keyexchange is L5 — establishes the per-peer AEAD session key (with optional Ed25519 authentication of identity).
The Manager owns:
- pendingRekey map[uint32]*PendingRekeyState — tracks key exchanges we sent and are awaiting a reply on. Drives the retransmit loop.
- peerPubKeys map[uint32]ed25519.PublicKey — cached peer Ed25519 pubkeys (filled from the verifyFunc callback).
- lastInboundDecrypt map[uint32]time.Time — used by the handleAuthKeyExchange "stale reply" gate.
- the rekey retransmit goroutine (Loop).
It owns the L5 Store — the per-peer Crypto registry. L6 (envelope) imports Store to look up keys when wrapping/unwrapping AEAD frames.
Per docs/architecture/01-LAYERS.md L5:
Role: establish the per-peer AEAD session key (with optional Ed25519 authentication of identity). Owns: pendingRekey, peerPubKeys cache, the rekeyRetransmitLoop goroutine, AND (post T5.x-followup) the per-peer Crypto Store + replay window + salvage ring + decryptFailCount. Consumes: L1, L2 (sends bootstrap frames; first auth-key-exchange is the only frame that bypasses L6 envelope), L3 (signs/verifies), L4 (routes the frame). Exposes: Establish(peer) → SessionKey, RekeyTrigger(peer), key-install events.
Per docs/architecture/03-INVARIANTS.md §3 (lock graph): rkPendingMu is a LEAF lock — never held while taking any TunnelManager mutex. All callers obey this by reading state under rkPendingMu, releasing, then mutating tm-side state in a separate critical section.
Per docs/architecture/03-INVARIANTS.md §9 (horizontal-incest): L6 → L5 is the canonical downward import. keyexchange/ MUST NOT import envelope/ — that would be upward and contradict 01-LAYERS.md §"Layer dependencies".
Index ¶
- Constants
- Variables
- type Crypto
- type EventPublisher
- type FrameSender
- type Manager
- func (m *Manager) BuildAuthFrame() []byte
- func (m *Manager) BuildUnauthFrame() []byte
- func (m *Manager) ClearPendingRekey(peerNodeID uint32)
- func (m *Manager) ClearRekeyGaveUp(peerNodeID uint32)
- func (m *Manager) DeriveSecret(peerPubKeyBytes []byte) (*Crypto, error)
- func (m *Manager) GetPeerPubKey(nodeID uint32) (ed25519.PublicKey, error)
- func (m *Manager) HandleAuthFrame(data []byte, from *net.UDPAddr, fromRelay bool) bool
- func (m *Manager) HandleUnauthFrame(data []byte, from *net.UDPAddr, fromRelay bool) bool
- func (m *Manager) HasIdentity() bool
- func (m *Manager) HasPeerPubKey(nodeID uint32) bool
- func (m *Manager) Identity() *crypto.Identity
- func (m *Manager) InboundDecryptStale(peerNodeID uint32) bool
- func (m *Manager) InjectPendingRekeyForTest(peerNodeID uint32, st *PendingRekeyState)
- func (m *Manager) InvalidatePeerPubKey(nodeID uint32)
- func (m *Manager) LastInboundDecryptHas(peerNodeID uint32) bool
- func (m *Manager) Loop(ctx context.Context)
- func (m *Manager) MarkPendingRekey(peerNodeID uint32) bool
- func (m *Manager) PeerInRekeyGaveUp(peerNodeID uint32) bool
- func (m *Manager) PendingRekeyAttempts(peerNodeID uint32) int
- func (m *Manager) PendingRekeyForTest(peerNodeID uint32) *PendingRekeyState
- func (m *Manager) PendingRekeyHas(peerNodeID uint32) bool
- func (m *Manager) PrivKey() *ecdh.PrivateKey
- func (m *Manager) PubKey() []byte
- func (m *Manager) RecordInboundDecrypt(peerNodeID uint32)
- func (m *Manager) RekeyRetransmitTick()
- func (m *Manager) RemovePeer(nodeID uint32)
- func (m *Manager) ResetPendingRekeyAttempts(peerNodeID uint32)
- func (m *Manager) SendKeyExchangeToNode(peerNodeID uint32)
- func (m *Manager) SetAddrLookup(f PeerAddrLookup)
- func (m *Manager) SetIdentity(id *crypto.Identity)
- func (m *Manager) SetLastInboundDecryptForTest(peerNodeID uint32, t time.Time)
- func (m *Manager) SetLocalNodeIDFn(f func() uint32)
- func (m *Manager) SetPeerPubKey(nodeID uint32, pk ed25519.PublicKey)
- func (m *Manager) SetPeerVerifyFunc(f VerifyFunc)
- func (m *Manager) SetPostInstallHook(h PostInstallHook)
- func (m *Manager) SetPreRetransmitHook(h PreRetransmitHook)
- func (m *Manager) SetPublisher(p EventPublisher)
- func (m *Manager) SetSender(s FrameSender)
- func (m *Manager) SetX25519Keys(priv *ecdh.PrivateKey, pub []byte)
- func (m *Manager) Store() *Store
- type PeerAddrLookup
- type PendingRekeyState
- type PostInstallEvent
- type PostInstallHook
- type PreRetransmitHook
- type SalvageEntry
- type Store
- func (s *Store) CompareAndDrop(peerNodeID uint32, expected *Crypto) bool
- func (s *Store) DrainSalvage(c *Crypto) []SalvageEntry
- func (s *Store) Drop(peerNodeID uint32)
- func (s *Store) Get(peerNodeID uint32) *Crypto
- func (s *Store) Has(peerNodeID uint32) bool
- func (s *Store) Install(peerNodeID uint32, c *Crypto)
- func (s *Store) IsReady(peerNodeID uint32) bool
- func (s *Store) Len() int
- func (s *Store) LocalNodeID() uint32
- func (s *Store) PeerIDs() []uint32
- func (s *Store) RecordSalvage(c *Crypto, plaintext []byte)
- func (s *Store) SetLocalNodeID(id uint32)
- func (s *Store) ShouldDropOnDecryptFail(peerNodeID uint32, c *Crypto) bool
- func (s *Store) ShouldDropOnOutsideWindow(peerNodeID uint32, c *Crypto) bool
- func (s *Store) ShouldDropOnReplay(peerNodeID uint32, c *Crypto) bool
- type VerifyFunc
Constants ¶
const ( // RekeyRetransmitInterval is how long we wait for an inbound encrypted // packet to confirm the peer received our key_exchange. After this we // retransmit on the assumption either our key_exchange or the peer's // reply was dropped. RekeyRetransmitInterval = 4 * time.Second // MaxRekeyAttempts caps the retransmit loop. The first send + this // many retries = 1 + MaxRekeyAttempts total attempts. After that we // give up — peer is presumed gone. MaxRekeyAttempts = 5 // RekeyRelayFallbackAfter is the attempt count at which the rekey // loop's PreRetransmitHook is invited to flip the peer's routing // path to relay. The first attempt goes direct on the optimistic // assumption the cached endpoint still resolves; once that has // silently failed RekeyRelayFallbackAfter times, the routing layer's // blackhole heuristic typically hasn't yet accumulated enough silent // observations to flip on its own (BlackholeMissesRequired=3 // observations × 4s = 12s, and rekey gives up at 24s), so we'd // burn the remaining budget on the dead direct path. Forcing the // flip here guarantees ≥3 attempts go via relay before give-up. // // Reproduces the recovery for the 2026-05-11 Mac-restart NAT- // remapping wedge: peer endpoint caches pointed at the Mac's old // external port (NAT recycled after daemon restart), every direct // rekey landed in a black hole, and the loop exhausted all 6 // attempts before the blackhole heuristic could engage. RekeyRelayFallbackAfter = 2 // KeyExchangeReplyStaleThreshold: when an auth_key_exchange arrives // and we already have crypto for the peer with no inbound traffic in // this window, reply with our key_exchange too (in case our previous // reply was dropped). Loosens handleAuthKeyExchange's "send back" gate. KeyExchangeReplyStaleThreshold = 6 * time.Second )
Tunable timing constants. Frozen wire-adjacent behavior — changing these affects observable retransmit cadence + stale-reply gating.
const DecryptFailDropGrace = 3 * time.Second
DecryptFailDropGrace is the minimum age a Crypto must reach before repeated decrypt failures can drop it. Set above the typical relay RTT (≤200 ms) plus a comfortable margin: stale ciphertext from before the peer's last rekey takes ~1 RTT to drain, but if salvage replay is in play it can stretch slightly. 3 s covers worst-case drain without holding a genuinely diverged session for long.
const DecryptFailDropThreshold = 5
DecryptFailDropThreshold is how many consecutive AEAD-authentication failures from a single peer trigger a full Crypto drop + re-handshake. Sized to swallow a small burst of legitimate packet corruption (kernel buffer overruns, mid-flight key rotation crossing the wire) while still recovering from peer-side AEAD-key divergence within a few seconds — a daemon restart on either side cycles the peer through a fresh handshake; this is the "equivalent recovery without restarting" path.
Lives on L5 with the per-peer state it gates rather than on L6 framing, because the drop decision uses Crypto.CreatedAt (key-install time, an L5 concept) and Crypto.DecryptFailCount (per-peer state).
const HKDFInfo = "pilot-tunnel-v1"
HKDFInfo is the frozen HKDF info string used to derive AEAD keys from the X25519 shared secret. Changing it would break wire compatibility with every existing peer — DO NOT change.
const MaxCryptoPeers = 16384
MaxCryptoPeers caps the crypto map for unauth key-exchange insertions. HandleAuthFrame is implicitly bounded by the registry-verified pubkey lookup, but HandleUnauthFrame accepts any peerNodeID and performs an X25519 scalar multiplication per packet. Without a cap, a peer spraying unauth key-exchange frames with random node IDs can grow the map to 2^32 entries while also burning CPU on derivation. Set high enough that real deployments never hit it.
Lives on L5 with the per-peer Store rather than as a free constant in L6 framing — the cap is a property of key-state ownership.
const OutsideWindowDropGrace = 3 * time.Second
OutsideWindowDropGrace mirrors DecryptFailDropGrace but for the outside-window path. Reuses the same 3-second value: a freshly installed Crypto can legitimately see a small flurry of stale in-flight frames at low counters; only after the new state has had time to drain those should we treat the persistence as divergence.
const OutsideWindowDropThreshold = 30
OutsideWindowDropThreshold is how many consecutive outside-replay-window rejections cause us to drop the peer's Crypto and trigger a fresh key exchange. Tuned conservatively: ReplayWindowSize=256 means an in-spec burst of late relay-buffered frames can produce a few outside-window rejections normally. 30 consecutive without any successful decrypt is strong evidence the peer's send counter is far behind our window max-water-mark and no in-band recovery exists. Reset to 0 on success.
const ReplayDropGrace = 3 * time.Second
ReplayDropGrace mirrors OutsideWindowDropGrace. A freshly installed Crypto can legitimately see a few replays as both direct and relay paths catch up to the new key; only after the new state has had time to drain those should we treat the persistence as divergence.
const ReplayDropThreshold = 30
ReplayDropThreshold is how many consecutive in-window replay rejections trigger a Crypto drop + fresh key exchange. Symmetric counterpart to OutsideWindowDropThreshold for the *other* side of the window: the peer's send counter is INSIDE our ReplayWindowSize range but at positions we've already seen.
Reproduces the rc5 list-agents wedge (2026-05-11): peer daemon restarts but keeps its persisted X25519 identity, so no PILA is negotiated; the peer's send counter resets to 1 while our MaxRecvNonce sits at ~50. Every subsequent frame from the peer authenticates with the same key, lands at counter=1..50, and is dropped as a replay before AEAD-Open ever runs — neither DecryptFailCount nor OutsideWindowCount increment, so the existing recovery paths never engage. Sustained replays are the only signal that the peer's counter and our window are mis-aligned.
30 matches OutsideWindowDropThreshold for the same reason: a legitimate duplicate-delivery burst (direct + relay both delivering the same frame) typically tops out at 1–3 collisions per frame pair; 30 consecutive same-peer replays without any successful decrypt cleanly distinguishes the wedge from legitimate duplication. Reset to 0 on successful decrypt.
const ReplayWindowSize = 256
ReplayWindowSize is the number of nonces tracked in the sliding window bitmap for replay detection (H8 fix). Nonces within [maxNonce-ReplayWindowSize, maxNonce] are tracked; nonces below the window are rejected.
Per-peer state — owned with the key, hence on L5 (keyexchange) rather than L6 (envelope).
const SalvageMaxAge = 5 * time.Second
SalvageMaxAge is how far back we replay sends after a rekey. The rekey round-trip itself is ~1 RTT plus the rate-limit window (3 s); 5 s gives margin for slow handshakes under loss.
const SalvageMaxEntries = 4
SalvageMaxEntries bounds memory + replay-storm size on rekey. Originally 32, but a 32-frame burst replayed on rekey caused receiver-side nonce confusion (concurrent dataexchange retransmits filling salvage + out-of-order delivery on the wire). 4 covers the typical "send-message + a couple of retries" without overwhelming the receiver. Memory at max: 4 × ~1500 B = 6 KiB / peer, ~6 MiB across maxCryptoPeers (1024).
Variables ¶
var ErrNoKey = errors.New("keyexchange: no key installed for peer")
ErrNoKey is returned when no Crypto is installed for a peer (encrypt-side: no key to wrap with; decrypt-side: nothing to unwrap with). Caller's responsibility to trigger a rekey request.
var ErrNotReady = errors.New("keyexchange: crypto installed but not ready")
ErrNotReady is returned when a Crypto exists but is not yet marked Ready (key exchange in progress).
Functions ¶
This section is empty.
Types ¶
type Crypto ¶
type Crypto struct {
AEAD cipher.AEAD
Nonce uint64 // monotonic send counter (atomic)
NoncePrefix [4]byte // random prefix for nonce domain separation
// Replay detection (H8 fix): sliding window bitmap instead of simple
// high-water mark.
ReplayMu sync.Mutex
MaxRecvNonce uint64 // highest nonce received
ReplayBitmap [ReplayWindowSize / 64]uint64 // bitmap for nonces in [max-windowSize, max]
Ready bool // true once key exchange is complete
Authenticated bool // true if peer proved Ed25519 identity
PeerX25519Key [32]byte // peer's X25519 public key (for detecting rekeying)
// DecryptFailCount tracks consecutive AEAD authentication failures
// since the last successful decrypt. Older-version peer daemons drift
// into a state where their derived AEAD key no longer matches ours
// (likely a session-id or info-string mismatch in HKDF), at which
// point every encrypted packet from the peer fails with "cipher:
// message authentication failed". Once this counter exceeds
// DecryptFailDropThreshold the daemon drops this Crypto entirely
// and triggers a fresh key exchange — equivalent to a daemon-restart
// recovery without actually restarting. Reset to 0 on any successful
// decrypt.
DecryptFailCount int
// OutsideWindowCount tracks consecutive frames rejected because their
// counter was more than ReplayWindowSize behind MaxRecvNonce. A few
// of these are normal (late relay-buffered frames). A sustained burst
// means the peer's send counter and our receive window have diverged
// far enough that no in-band recovery is possible — the only signal
// the peer has to know there's a problem is silence, and the only
// fix is a fresh key exchange. Once this counter exceeds
// OutsideWindowDropThreshold AND the Crypto is older than
// OutsideWindowDropGrace, the daemon drops this Crypto and requests
// a rekey. Reset to 0 on any successful decrypt.
OutsideWindowCount int
// ReplayCount tracks consecutive in-window replay rejections (counter
// within ReplayWindowSize of MaxRecvNonce but already marked in the
// bitmap). Distinct from OutsideWindowCount, which handles the
// counter-far-behind case. A few replays are normal (duplicate
// delivery across direct + relay paths); a sustained burst means
// peer's send counter reset (peer restart with persistent identity)
// and our window's high-water-mark sits ahead of every frame the
// peer can now produce — the wedge resolves only when peer's counter
// climbs past our max, but with no traffic in flight it never will.
// Once this counter exceeds ReplayDropThreshold AND the Crypto is
// older than ReplayDropGrace, drop the Crypto and request rekey.
// Reset to 0 on any successful decrypt.
ReplayCount int
// CreatedAt is when this Crypto was installed. Used by L5's
// handleAuthKeyExchange to decide between preserving existing state
// (fresh handshake retransmit/reply within seconds) and resetting it
// (long-lived peer's own state desynced — e.g. older-version daemon
// rotated its send counter without rotating X25519 keys, which
// otherwise leaves us with a high MaxRecvNonce that rejects the
// peer's resumed packets as replays). Also gates the rc5 grace
// period (see DecryptFailDropGrace).
CreatedAt time.Time
// P1-010 desync salvage: ring buffer of recent plaintext sent with
// this key. On a peer-initiated rekey, these are re-encrypted with
// the new key and re-sent — recovering the data that was vaporized
// when the peer dropped our stale-keyed frames.
SalvageMu sync.Mutex
Salvage []SalvageEntry
}
Crypto holds per-peer encryption state. Created by L5 (DeriveSecret) after key derivation; mutated by L6 on every encrypt/decrypt; cleared on rekey or peer drop.
Per docs/architecture/03-INVARIANTS.md §3 (lock graph): Crypto's ReplayMu and SalvageMu are LEAF locks. They are never nested with any TunnelManager-side mutex (tm.mu, tm.rkPendingMu, etc.). Methods on Crypto must not acquire any external lock while holding them.
Ownership lives on L5 (keyexchange) per docs/architecture/01-LAYERS.md: L6 'Consumes: ... L5 (key state)' — the envelope is a downward consumer of L5's per-peer state, not its owner.
func (*Crypto) CheckAndRecordNonce ¶
CheckAndRecordNonce returns true if the nonce is valid (not replayed, not too old). Must be called with c.ReplayMu held.
Note on nonce wraparound: the counter is uint64, so it wraps after 2^64 packets. At 1 billion packets/sec this takes ~585 years — purely theoretical. If a connection ever approaches this limit, rekeying (new secure handshake) resets the counter naturally.
func (*Crypto) SetReplayBit ¶
SetReplayBit sets the replay-window bit corresponding to counter. Must be called with c.ReplayMu held.
func (*Crypto) UndoReplayBit ¶
UndoReplayBit clears the bit set by a prior SetReplayBit. Used by L6 when AEAD-Open fails to roll back the speculative bookkeeping — without this, a flood of failed-decrypt frames would wedge legitimate later packets out of the replay window. Must be called with c.ReplayMu held.
type EventPublisher ¶
EventPublisher is the daemon's event-bus surface; nil-safe. Provided by tunnel.go via a closure over tm.publishEvent.
type FrameSender ¶
FrameSender is L2's contribution to L5: ship a raw UDP frame to a peer (relay-aware). tunnel.go satisfies this with writeFrame.
When sub-pass 4 extracts L2, this becomes an L2 interface; for now the daemon supplies a closure capturing tm.writeFrame.
type Manager ¶
type Manager struct {
// contains filtered or unexported fields
}
Manager owns the L5 state.
func New ¶
New returns a fresh Manager. The Manager installs into store; pass store=nil to have New construct one. The same Store pointer is imported by L6 (envelope) for AEAD encrypt/decrypt operations.
func (*Manager) BuildAuthFrame ¶
BuildAuthFrame builds an authenticated key exchange frame. Returns nil if our X25519 pubkey or Ed25519 identity is unavailable.
Layout: [PILA(4)][nodeID(4 BE)][X25519 pubkey(32)][Ed25519 pubkey(32)][signature(64)] = 136 bytes.
func (*Manager) BuildUnauthFrame ¶
BuildUnauthFrame builds an unauthenticated key exchange frame. Returns nil if our X25519 pubkey is unavailable.
Layout: [PILK(4)][nodeID(4 BE)][X25519 pubkey(32)] = 40 bytes.
func (*Manager) ClearPendingRekey ¶
ClearPendingRekey cancels any in-flight retransmit tracking for peerNodeID. Called from Handle*Frame paths (we received their key exchange, so we no longer need to keep hammering them with ours). Does NOT clear rekeyGaveUp — receiving a key exchange is not proof of bidirectional reachability; only a successful decrypt is. Clearing rekeyGaveUp here caused the bounce to re-arm whenever a peer with no cooldown fix kept sending us key exchanges.
func (*Manager) ClearRekeyGaveUp ¶
ClearRekeyGaveUp lifts the post-give-up cooldown for peerNodeID. Must only be called after a successful decrypt — proof that bidirectional crypto works.
func (*Manager) DeriveSecret ¶
DeriveSecret computes a shared AES-256-GCM cipher from the peer's X25519 public key, returning a fresh Crypto ready to be installed.
Uses HKDF-SHA256 (info = "pilot-tunnel-v1", per HKDFInfo) to derive the AEAD key from the X25519 shared secret. Intermediate key material is zeroed before return (H4 fix).
func (*Manager) GetPeerPubKey ¶
GetPeerPubKey returns the cached Ed25519 public key for a peer, fetching from the registry (via verifyFunc) on cache miss.
func (*Manager) HandleAuthFrame ¶
HandleAuthFrame processes an authenticated key exchange packet (PILA).
Frame layout (after 4-byte magic stripped by caller):
[4 nodeID][32 X25519 pubkey][32 Ed25519 pubkey][64 Ed25519 signature]
The signature is over: "auth"(4) || nodeID(4 BE) || X25519-pubkey(32). fromRelay indicates this was received via beacon relay — the PostInstallHook decides what to do with the peer-endpoint update.
Returns true if a Crypto was installed (or already-present-and-fresh crypto was kept), false on any reject path (signature mismatch, missing identity, encryption disabled, malformed input).
func (*Manager) HandleUnauthFrame ¶
HandleUnauthFrame processes an unauthenticated key exchange packet (PILK).
Frame layout (after 4-byte magic stripped by caller):
[4 nodeID][32 X25519 pubkey]
If we have an identity AND the peer has a registered pubkey, reject the unauthenticated exchange and reply with our authenticated frame instead.
func (*Manager) HasIdentity ¶
HasIdentity is a fast path used to gate auth/unauth frame selection.
func (*Manager) HasPeerPubKey ¶
HasPeerPubKey reports whether the cache currently holds an entry for nodeID.
func (*Manager) InboundDecryptStale ¶
InboundDecryptStale returns true if we haven't successfully decrypted any packet from peerNodeID within the staleness window.
func (*Manager) InjectPendingRekeyForTest ¶
func (m *Manager) InjectPendingRekeyForTest(peerNodeID uint32, st *PendingRekeyState)
InjectPendingRekeyForTest sets a state directly. Test-only; production code never bypasses MarkPendingRekey.
func (*Manager) InvalidatePeerPubKey ¶
InvalidatePeerPubKey deletes a cache entry.
func (*Manager) LastInboundDecryptHas ¶
LastInboundDecryptHas reports whether we've recorded any successful decrypt timestamp for the peer.
func (*Manager) Loop ¶
Loop runs the rekey retransmit loop. It scans pendingRekey every RekeyRetransmitInterval, retransmits stale entries via SendKeyExchangeToNode, and gives up after MaxRekeyAttempts.
Cancel via ctx — typical wiring uses tunnel.done as the cancellation source through a tiny adapter goroutine.
func (*Manager) MarkPendingRekey ¶
MarkPendingRekey records that we sent a key_exchange to peerNodeID and are awaiting confirmation. Idempotent — re-calls bump LastSentAt and Attempts but preserve FirstSentAt.
Returns false and is a no-op if the peer is within the rekeyGaveUpCooldown window, preventing an immediate restart of a just-failed rekey cycle.
func (*Manager) PeerInRekeyGaveUp ¶
PeerInRekeyGaveUp reports whether peerNodeID is within the give-up cooldown. Used by the tunnel to skip queuing outbound packets for unreachable peers.
func (*Manager) PendingRekeyAttempts ¶
PendingRekeyAttempts returns the current attempts count (0 if not pending).
func (*Manager) PendingRekeyForTest ¶
func (m *Manager) PendingRekeyForTest(peerNodeID uint32) *PendingRekeyState
PendingRekeyForTest returns the live PendingRekeyState pointer (or nil) for a peer. Exposed so package-internal tests can inspect the FirstSentAt / LastSentAt timestamps without re-deriving them via the public counters. Acquires rkPendingMu briefly; safe for concurrent use.
func (*Manager) PendingRekeyHas ¶
PendingRekeyHas reports whether an entry exists.
func (*Manager) PrivKey ¶
func (m *Manager) PrivKey() *ecdh.PrivateKey
PrivKey returns the local X25519 private key.
func (*Manager) PubKey ¶
PubKey returns the local X25519 public key (32 bytes). May be nil if SetX25519Keys was never called.
func (*Manager) RecordInboundDecrypt ¶
RecordInboundDecrypt updates the per-peer last-decrypt timestamp.
func (*Manager) RekeyRetransmitTick ¶
func (m *Manager) RekeyRetransmitTick()
RekeyRetransmitTick is the per-tick body of Loop, split out for direct testing without a real ticker. Iterates pendingRekey under rkPendingMu, releases it before invoking SendKeyExchangeToNode (which re-takes rkPendingMu via MarkPendingRekey — splitting the lock keeps rkPendingMu a true leaf).
func (*Manager) RemovePeer ¶
RemovePeer wipes per-peer L5 state (called from TunnelManager.RemovePeer).
func (*Manager) ResetPendingRekeyAttempts ¶
ResetPendingRekeyAttempts zeroes the Attempts counter for peerNodeID without cancelling the pending-rekey entry. Called when the routing path changes (e.g. direct→relay flip) so the peer gets a fresh set of retransmit slots on the new path rather than immediately hitting the MaxRekeyAttempts give-up threshold from prior direct attempts.
func (*Manager) SendKeyExchangeToNode ¶
SendKeyExchangeToNode sends an authenticated key exchange if identity is available, otherwise falls back to unauthenticated.
This function carries the single annotated bootstrap-exception site (see the marker comment inside the body). After Stage 2 sub-pass 2, the canonical home for the marker is here in keyexchange/bootstrap.go; layers.yaml's bootstrap_exception.allowed_call_sites tracks this path.
func (*Manager) SetAddrLookup ¶
func (m *Manager) SetAddrLookup(f PeerAddrLookup)
SetAddrLookup wires the peer-address resolver.
func (*Manager) SetIdentity ¶
SetIdentity sets our Ed25519 identity for signing authenticated key exchanges. May be called concurrently with HandleAuthFrame / Send.
func (*Manager) SetLastInboundDecryptForTest ¶
SetLastInboundDecryptForTest writes a timestamp directly.
func (*Manager) SetLocalNodeIDFn ¶
SetLocalNodeIDFn supplies the closure used to read our own node ID (atomic read living in the daemon).
func (*Manager) SetPeerPubKey ¶
SetPeerPubKey installs a cache entry directly. Used by handle paths after verifying a packet-carried Ed25519 pubkey against the registry.
func (*Manager) SetPeerVerifyFunc ¶
func (m *Manager) SetPeerVerifyFunc(f VerifyFunc)
SetPeerVerifyFunc sets the callback used to fetch a peer's Ed25519 public key from the registry on cache miss.
func (*Manager) SetPostInstallHook ¶
func (m *Manager) SetPostInstallHook(h PostInstallHook)
SetPostInstallHook wires the post-install daemon callback (peer endpoint bookkeeping, salvage replay, etc.).
func (*Manager) SetPreRetransmitHook ¶
func (m *Manager) SetPreRetransmitHook(h PreRetransmitHook)
SetPreRetransmitHook wires a callback invoked once per peer immediately before each rekey retransmit. The daemon uses this to apply cross-layer policy (e.g. force the peer onto the relay path) without leaking routing concerns into the keyexchange package.
func (*Manager) SetPublisher ¶
func (m *Manager) SetPublisher(p EventPublisher)
SetPublisher wires the daemon's event bus (nil-safe).
func (*Manager) SetSender ¶
func (m *Manager) SetSender(s FrameSender)
SetSender wires L2's frame-send hook.
func (*Manager) SetX25519Keys ¶
func (m *Manager) SetX25519Keys(priv *ecdh.PrivateKey, pub []byte)
SetX25519Keys installs our X25519 keypair (used for ECDH on inbound key-exchange frames and for the pubkey slot in outbound frames).
type PeerAddrLookup ¶
PeerAddrLookup returns the currently-known UDP address for peerNodeID. Used by sendKeyExchangeToNode to find the destination. tunnel.go satisfies this with a closure reading tm.peers under tm.mu.
type PendingRekeyState ¶
PendingRekeyState tracks a key-exchange we sent and are waiting on. Cleared when handleEncrypted records a successful decrypt from peer (proof their crypto matches ours). The retransmit loop bumps LastSentAt and Attempts on each retry; gives up after MaxRekeyAttempts to avoid hammering a peer that's just gone.
type PostInstallEvent ¶
type PostInstallEvent struct {
PeerNodeID uint32
From *net.UDPAddr
FromRelay bool
Authenticated bool
HadCrypto bool // true if an entry existed before
KeyChanged bool // true if the peer's X25519 ephemeral key actually changed
OldCrypto *Crypto
NewCrypto *Crypto
PeerEd25519 ed25519.PublicKey // non-nil for auth path
}
PostInstallEvent describes a freshly-installed Crypto.
type PostInstallHook ¶
type PostInstallHook func(ev PostInstallEvent)
PostInstallHook is invoked AFTER a successful HandleAuthFrame / HandleUnauthFrame installs (or refreshes) a Crypto. Carries enough context for the daemon to do peer-endpoint bookkeeping, salvage replay, flushPending, etc. Implemented in tunnel.go.
type PreRetransmitHook ¶
PreRetransmitHook is invoked by RekeyRetransmitTick once per peer that is about to be retransmitted, BEFORE SendKeyExchangeToNode runs. The hook receives the peer ID and the upcoming attempt count (1 for the first retransmit, 2 for the second, ...). Used by tunnel.go to flip the peer's routing path to relay once direct attempts have failed often enough that the next direct attempt is unlikely to succeed. Implemented in tunnel.go.
type SalvageEntry ¶
SalvageEntry is a single plaintext frame retained for post-rekey replay.
type Store ¶
type Store struct {
// EncryptOK / EncryptFail track AEAD operation success/failure
// counts. Exposed as atomic counters so L7 / metrics readers don't
// need to take Store.mu. Bumped by the L6 framing functions in
// pkg/daemon/envelope.
EncryptOK atomic.Uint64
EncryptFail atomic.Uint64
// contains filtered or unexported fields
}
Store is the per-tunnel key-state registry. It owns the map of node ID → Crypto. Counter, replay-window, and salvage state live on the per-peer Crypto values; Store's mutex only protects the map itself.
The mutex is intentionally a simple sync.RWMutex used as a leaf lock — it must NEVER be held while invoking a TunnelManager-side mutex (tm.mu, tm.rkPendingMu, tm.pendMu). All callers obey this by reading from Store first, then mutating tm state in a separate critical section.
Store is L5-owned. L6 (envelope) imports Store to pull per-peer Crypto pointers when wrapping/unwrapping AEAD frames; the import direction is L6 → L5, matching docs/architecture/01-LAYERS.md.
func (*Store) CompareAndDrop ¶
CompareAndDrop deletes the entry only if the currently-installed pointer equals expected. Returns true if a delete occurred. Used by the decrypt-fail drop path to avoid deleting a Crypto that a concurrent rekey already replaced.
func (*Store) DrainSalvage ¶
func (s *Store) DrainSalvage(c *Crypto) []SalvageEntry
DrainSalvage atomically removes and returns all non-aged salvage entries from c. Returns nil if c is nil or empty. Each returned Plaintext is the same byte slice that was stored — RecordSalvage already copied at insert time.
func (*Store) Install ¶
Install adds (or replaces) the Crypto for the given peer. Caller is responsible for any "preserve on duplicate same-pubkey" logic — Store just unconditionally writes.
func (*Store) IsReady ¶
IsReady returns true if a Crypto is installed AND its handshake is complete (Ready=true).
func (*Store) Len ¶
Len returns the number of installed Cryptos. Used by the MaxCryptoPeers cap pre-check.
func (*Store) LocalNodeID ¶
LocalNodeID returns our own node ID.
func (*Store) RecordSalvage ¶
RecordSalvage stashes a plaintext send into the per-peer ring buffer. On a subsequent peer-initiated rekey, DrainSalvage will hand back the entries; the caller re-encrypts with the new key and re-sends — recovering data that the peer dropped because our frame was keyed under their now-stale crypto context.
Bounded by SalvageMaxEntries and SalvageMaxAge. The plaintext is copied, not referenced — caller can reuse its buffer.
nil c is a no-op (safe for callers that don't gate on Ready).
func (*Store) SetLocalNodeID ¶
SetLocalNodeID stores our own node ID (used in encrypt-side AAD and secure-frame headers). Safe to call concurrently with other Store operations.
func (*Store) ShouldDropOnDecryptFail ¶
ShouldDropOnDecryptFail implements the rc5 grace gate: returns true if the failing Crypto's DecryptFailCount has reached DecryptFailDropThreshold AND the Crypto is older than DecryptFailDropGrace AND it is still the currently-installed entry for peerNodeID.
The caller (L6 framing path) is responsible for the side-effect: drop via CompareAndDrop and trigger a fresh key exchange. This split keeps the rekey-trigger / event-publish responsibility OUT of Store.
func (*Store) ShouldDropOnOutsideWindow ¶
ShouldDropOnOutsideWindow mirrors ShouldDropOnDecryptFail for the outside-replay-window divergence path. Returns true when the peer's OutsideWindowCount has reached OutsideWindowDropThreshold AND the Crypto is older than OutsideWindowDropGrace AND it is still the currently-installed entry for peerNodeID.
The caller (L7 handleEncrypted) is responsible for dropping the Crypto via CompareAndDrop and triggering a fresh key exchange when this returns true. This is the symmetric-state recovery for the case where our receive-window's high-water-mark has drifted so far past the peer's actual send counter that no in-band signal exists. Reproduces the rc3 list-agents bug on 2026-05-11 where MaxRecvNonce stayed at 8518 while the sender's outbound counter sat at ~400.
func (*Store) ShouldDropOnReplay ¶
ShouldDropOnReplay is the symmetric counterpart to ShouldDropOnOutsideWindow for the in-window replay-collision path. Returns true when ReplayCount has reached ReplayDropThreshold AND the Crypto is older than ReplayDropGrace AND it is still the currently-installed entry for peerNodeID.
The caller (L7 handleEncrypted) is responsible for the side effect: drop via CompareAndDrop and request a fresh key exchange. This reproduces the recovery for the rc5 list-agents wedge (2026-05-11) where peer daemons restarted but kept their persisted X25519 identity — no PILA was sent, their send counter reset to 1, our MaxRecvNonce stayed at ~50, every subsequent frame from the peer landed at an already-marked counter and was dropped before AEAD-Open ever ran. Neither DecryptFailCount nor OutsideWindowCount incremented, so the existing recovery gates never engaged.