keyexchange

package
v1.10.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: AGPL-3.0 Imports: 18 Imported by: 0

Documentation

Overview

Package keyexchange is L5 — establishes the per-peer AEAD session key (with optional Ed25519 authentication of identity).

The Manager owns:

  • pendingRekey map[uint32]*PendingRekeyState — tracks key exchanges we sent and are awaiting a reply on. Drives the retransmit loop.
  • peerPubKeys map[uint32]ed25519.PublicKey — cached peer Ed25519 pubkeys (filled from the verifyFunc callback).
  • lastInboundDecrypt map[uint32]time.Time — used by the handleAuthKeyExchange "stale reply" gate.
  • the rekey retransmit goroutine (Loop).

It owns the L5 Store — the per-peer Crypto registry. L6 (envelope) imports Store to look up keys when wrapping/unwrapping AEAD frames.

Per docs/architecture/01-LAYERS.md L5:

Role: establish the per-peer AEAD session key (with optional
  Ed25519 authentication of identity).
Owns: pendingRekey, peerPubKeys cache, the rekeyRetransmitLoop
  goroutine, AND (post T5.x-followup) the per-peer Crypto Store +
  replay window + salvage ring + decryptFailCount.
Consumes: L1, L2 (sends bootstrap frames; first auth-key-exchange
  is the only frame that bypasses L6 envelope), L3 (signs/verifies),
  L4 (routes the frame).
Exposes: Establish(peer) → SessionKey, RekeyTrigger(peer),
  key-install events.

Per docs/architecture/03-INVARIANTS.md §3 (lock graph): rkPendingMu is a LEAF lock — never held while taking any TunnelManager mutex. All callers obey this by reading state under rkPendingMu, releasing, then mutating tm-side state in a separate critical section.

Per docs/architecture/03-INVARIANTS.md §9 (horizontal-incest): L6 → L5 is the canonical downward import. keyexchange/ MUST NOT import envelope/ — that would be upward and contradict 01-LAYERS.md §"Layer dependencies".

Index

Constants

View Source
const (
	// RekeyRetransmitInterval is how long we wait for an inbound encrypted
	// packet to confirm the peer received our key_exchange. After this we
	// retransmit on the assumption either our key_exchange or the peer's
	// reply was dropped.
	RekeyRetransmitInterval = 4 * time.Second

	// MaxRekeyAttempts caps the retransmit loop. The first send + this
	// many retries = 1 + MaxRekeyAttempts total attempts. After that we
	// give up — peer is presumed gone.
	MaxRekeyAttempts = 5

	// RekeyRelayFallbackAfter is the attempt count at which the rekey
	// loop's PreRetransmitHook is invited to flip the peer's routing
	// path to relay. The first attempt goes direct on the optimistic
	// assumption the cached endpoint still resolves; once that has
	// silently failed RekeyRelayFallbackAfter times, the routing layer's
	// blackhole heuristic typically hasn't yet accumulated enough silent
	// observations to flip on its own (BlackholeMissesRequired=3
	// observations × 4s = 12s, and rekey gives up at 24s), so we'd
	// burn the remaining budget on the dead direct path. Forcing the
	// flip here guarantees ≥3 attempts go via relay before give-up.
	//
	// Reproduces the recovery for the 2026-05-11 Mac-restart NAT-
	// remapping wedge: peer endpoint caches pointed at the Mac's old
	// external port (NAT recycled after daemon restart), every direct
	// rekey landed in a black hole, and the loop exhausted all 6
	// attempts before the blackhole heuristic could engage.
	RekeyRelayFallbackAfter = 2

	// KeyExchangeReplyStaleThreshold: when an auth_key_exchange arrives
	// and we already have crypto for the peer with no inbound traffic in
	// this window, reply with our key_exchange too (in case our previous
	// reply was dropped). Loosens handleAuthKeyExchange's "send back" gate.
	KeyExchangeReplyStaleThreshold = 6 * time.Second
)

Tunable timing constants. Frozen wire-adjacent behavior — changing these affects observable retransmit cadence + stale-reply gating.

View Source
const DecryptFailDropGrace = 3 * time.Second

DecryptFailDropGrace is the minimum age a Crypto must reach before repeated decrypt failures can drop it. Set above the typical relay RTT (≤200 ms) plus a comfortable margin: stale ciphertext from before the peer's last rekey takes ~1 RTT to drain, but if salvage replay is in play it can stretch slightly. 3 s covers worst-case drain without holding a genuinely diverged session for long.

View Source
const DecryptFailDropThreshold = 5

DecryptFailDropThreshold is how many consecutive AEAD-authentication failures from a single peer trigger a full Crypto drop + re-handshake. Sized to swallow a small burst of legitimate packet corruption (kernel buffer overruns, mid-flight key rotation crossing the wire) while still recovering from peer-side AEAD-key divergence within a few seconds — a daemon restart on either side cycles the peer through a fresh handshake; this is the "equivalent recovery without restarting" path.

Lives on L5 with the per-peer state it gates rather than on L6 framing, because the drop decision uses Crypto.CreatedAt (key-install time, an L5 concept) and Crypto.DecryptFailCount (per-peer state).

View Source
const HKDFInfo = "pilot-tunnel-v1"

HKDFInfo is the frozen HKDF info string used to derive AEAD keys from the X25519 shared secret. Changing it would break wire compatibility with every existing peer — DO NOT change.

View Source
const MaxCryptoPeers = 16384

MaxCryptoPeers caps the crypto map for unauth key-exchange insertions. HandleAuthFrame is implicitly bounded by the registry-verified pubkey lookup, but HandleUnauthFrame accepts any peerNodeID and performs an X25519 scalar multiplication per packet. Without a cap, a peer spraying unauth key-exchange frames with random node IDs can grow the map to 2^32 entries while also burning CPU on derivation. Set high enough that real deployments never hit it.

Lives on L5 with the per-peer Store rather than as a free constant in L6 framing — the cap is a property of key-state ownership.

View Source
const OutsideWindowDropGrace = 3 * time.Second

OutsideWindowDropGrace mirrors DecryptFailDropGrace but for the outside-window path. Reuses the same 3-second value: a freshly installed Crypto can legitimately see a small flurry of stale in-flight frames at low counters; only after the new state has had time to drain those should we treat the persistence as divergence.

View Source
const OutsideWindowDropThreshold = 30

OutsideWindowDropThreshold is how many consecutive outside-replay-window rejections cause us to drop the peer's Crypto and trigger a fresh key exchange. Tuned conservatively: ReplayWindowSize=256 means an in-spec burst of late relay-buffered frames can produce a few outside-window rejections normally. 30 consecutive without any successful decrypt is strong evidence the peer's send counter is far behind our window max-water-mark and no in-band recovery exists. Reset to 0 on success.

View Source
const ReplayDropGrace = 3 * time.Second

ReplayDropGrace mirrors OutsideWindowDropGrace. A freshly installed Crypto can legitimately see a few replays as both direct and relay paths catch up to the new key; only after the new state has had time to drain those should we treat the persistence as divergence.

View Source
const ReplayDropThreshold = 30

ReplayDropThreshold is how many consecutive in-window replay rejections trigger a Crypto drop + fresh key exchange. Symmetric counterpart to OutsideWindowDropThreshold for the *other* side of the window: the peer's send counter is INSIDE our ReplayWindowSize range but at positions we've already seen.

Reproduces the rc5 list-agents wedge (2026-05-11): peer daemon restarts but keeps its persisted X25519 identity, so no PILA is negotiated; the peer's send counter resets to 1 while our MaxRecvNonce sits at ~50. Every subsequent frame from the peer authenticates with the same key, lands at counter=1..50, and is dropped as a replay before AEAD-Open ever runs — neither DecryptFailCount nor OutsideWindowCount increment, so the existing recovery paths never engage. Sustained replays are the only signal that the peer's counter and our window are mis-aligned.

30 matches OutsideWindowDropThreshold for the same reason: a legitimate duplicate-delivery burst (direct + relay both delivering the same frame) typically tops out at 1–3 collisions per frame pair; 30 consecutive same-peer replays without any successful decrypt cleanly distinguishes the wedge from legitimate duplication. Reset to 0 on successful decrypt.

View Source
const ReplayWindowSize = 256

ReplayWindowSize is the number of nonces tracked in the sliding window bitmap for replay detection (H8 fix). Nonces within [maxNonce-ReplayWindowSize, maxNonce] are tracked; nonces below the window are rejected.

Per-peer state — owned with the key, hence on L5 (keyexchange) rather than L6 (envelope).

View Source
const SalvageMaxAge = 5 * time.Second

SalvageMaxAge is how far back we replay sends after a rekey. The rekey round-trip itself is ~1 RTT plus the rate-limit window (3 s); 5 s gives margin for slow handshakes under loss.

View Source
const SalvageMaxEntries = 4

SalvageMaxEntries bounds memory + replay-storm size on rekey. Originally 32, but a 32-frame burst replayed on rekey caused receiver-side nonce confusion (concurrent dataexchange retransmits filling salvage + out-of-order delivery on the wire). 4 covers the typical "send-message + a couple of retries" without overwhelming the receiver. Memory at max: 4 × ~1500 B = 6 KiB / peer, ~6 MiB across maxCryptoPeers (1024).

Variables

View Source
var ErrNoKey = errors.New("keyexchange: no key installed for peer")

ErrNoKey is returned when no Crypto is installed for a peer (encrypt-side: no key to wrap with; decrypt-side: nothing to unwrap with). Caller's responsibility to trigger a rekey request.

View Source
var ErrNotReady = errors.New("keyexchange: crypto installed but not ready")

ErrNotReady is returned when a Crypto exists but is not yet marked Ready (key exchange in progress).

Functions

This section is empty.

Types

type Crypto

type Crypto struct {
	AEAD        cipher.AEAD
	Nonce       uint64  // monotonic send counter (atomic)
	NoncePrefix [4]byte // random prefix for nonce domain separation

	// Replay detection (H8 fix): sliding window bitmap instead of simple
	// high-water mark.
	ReplayMu     sync.Mutex
	MaxRecvNonce uint64                        // highest nonce received
	ReplayBitmap [ReplayWindowSize / 64]uint64 // bitmap for nonces in [max-windowSize, max]

	Ready         bool     // true once key exchange is complete
	Authenticated bool     // true if peer proved Ed25519 identity
	PeerX25519Key [32]byte // peer's X25519 public key (for detecting rekeying)

	// DecryptFailCount tracks consecutive AEAD authentication failures
	// since the last successful decrypt. Older-version peer daemons drift
	// into a state where their derived AEAD key no longer matches ours
	// (likely a session-id or info-string mismatch in HKDF), at which
	// point every encrypted packet from the peer fails with "cipher:
	// message authentication failed". Once this counter exceeds
	// DecryptFailDropThreshold the daemon drops this Crypto entirely
	// and triggers a fresh key exchange — equivalent to a daemon-restart
	// recovery without actually restarting. Reset to 0 on any successful
	// decrypt.
	DecryptFailCount int

	// OutsideWindowCount tracks consecutive frames rejected because their
	// counter was more than ReplayWindowSize behind MaxRecvNonce. A few
	// of these are normal (late relay-buffered frames). A sustained burst
	// means the peer's send counter and our receive window have diverged
	// far enough that no in-band recovery is possible — the only signal
	// the peer has to know there's a problem is silence, and the only
	// fix is a fresh key exchange. Once this counter exceeds
	// OutsideWindowDropThreshold AND the Crypto is older than
	// OutsideWindowDropGrace, the daemon drops this Crypto and requests
	// a rekey. Reset to 0 on any successful decrypt.
	OutsideWindowCount int

	// ReplayCount tracks consecutive in-window replay rejections (counter
	// within ReplayWindowSize of MaxRecvNonce but already marked in the
	// bitmap). Distinct from OutsideWindowCount, which handles the
	// counter-far-behind case. A few replays are normal (duplicate
	// delivery across direct + relay paths); a sustained burst means
	// peer's send counter reset (peer restart with persistent identity)
	// and our window's high-water-mark sits ahead of every frame the
	// peer can now produce — the wedge resolves only when peer's counter
	// climbs past our max, but with no traffic in flight it never will.
	// Once this counter exceeds ReplayDropThreshold AND the Crypto is
	// older than ReplayDropGrace, drop the Crypto and request rekey.
	// Reset to 0 on any successful decrypt.
	ReplayCount int

	// CreatedAt is when this Crypto was installed. Used by L5's
	// handleAuthKeyExchange to decide between preserving existing state
	// (fresh handshake retransmit/reply within seconds) and resetting it
	// (long-lived peer's own state desynced — e.g. older-version daemon
	// rotated its send counter without rotating X25519 keys, which
	// otherwise leaves us with a high MaxRecvNonce that rejects the
	// peer's resumed packets as replays). Also gates the rc5 grace
	// period (see DecryptFailDropGrace).
	CreatedAt time.Time

	// P1-010 desync salvage: ring buffer of recent plaintext sent with
	// this key. On a peer-initiated rekey, these are re-encrypted with
	// the new key and re-sent — recovering the data that was vaporized
	// when the peer dropped our stale-keyed frames.
	SalvageMu sync.Mutex
	Salvage   []SalvageEntry
}

Crypto holds per-peer encryption state. Created by L5 (DeriveSecret) after key derivation; mutated by L6 on every encrypt/decrypt; cleared on rekey or peer drop.

Per docs/architecture/03-INVARIANTS.md §3 (lock graph): Crypto's ReplayMu and SalvageMu are LEAF locks. They are never nested with any TunnelManager-side mutex (tm.mu, tm.rkPendingMu, etc.). Methods on Crypto must not acquire any external lock while holding them.

Ownership lives on L5 (keyexchange) per docs/architecture/01-LAYERS.md: L6 'Consumes: ... L5 (key state)' — the envelope is a downward consumer of L5's per-peer state, not its owner.

func (*Crypto) CheckAndRecordNonce

func (c *Crypto) CheckAndRecordNonce(counter uint64) bool

CheckAndRecordNonce returns true if the nonce is valid (not replayed, not too old). Must be called with c.ReplayMu held.

Note on nonce wraparound: the counter is uint64, so it wraps after 2^64 packets. At 1 billion packets/sec this takes ~585 years — purely theoretical. If a connection ever approaches this limit, rekeying (new secure handshake) resets the counter naturally.

func (*Crypto) SetReplayBit

func (c *Crypto) SetReplayBit(counter uint64)

SetReplayBit sets the replay-window bit corresponding to counter. Must be called with c.ReplayMu held.

func (*Crypto) UndoReplayBit

func (c *Crypto) UndoReplayBit(counter uint64)

UndoReplayBit clears the bit set by a prior SetReplayBit. Used by L6 when AEAD-Open fails to roll back the speculative bookkeeping — without this, a flood of failed-decrypt frames would wedge legitimate later packets out of the replay window. Must be called with c.ReplayMu held.

type EventPublisher

type EventPublisher func(topic string, payload map[string]any)

EventPublisher is the daemon's event-bus surface; nil-safe. Provided by tunnel.go via a closure over tm.publishEvent.

type FrameSender

type FrameSender func(peerNodeID uint32, addr *net.UDPAddr, frame []byte) error

FrameSender is L2's contribution to L5: ship a raw UDP frame to a peer (relay-aware). tunnel.go satisfies this with writeFrame.

When sub-pass 4 extracts L2, this becomes an L2 interface; for now the daemon supplies a closure capturing tm.writeFrame.

type Manager

type Manager struct {
	// contains filtered or unexported fields
}

Manager owns the L5 state.

func New

func New(store *Store) *Manager

New returns a fresh Manager. The Manager installs into store; pass store=nil to have New construct one. The same Store pointer is imported by L6 (envelope) for AEAD encrypt/decrypt operations.

func (*Manager) BuildAuthFrame

func (m *Manager) BuildAuthFrame() []byte

BuildAuthFrame builds an authenticated key exchange frame. Returns nil if our X25519 pubkey or Ed25519 identity is unavailable.

Layout: [PILA(4)][nodeID(4 BE)][X25519 pubkey(32)][Ed25519 pubkey(32)][signature(64)] = 136 bytes.

func (*Manager) BuildUnauthFrame

func (m *Manager) BuildUnauthFrame() []byte

BuildUnauthFrame builds an unauthenticated key exchange frame. Returns nil if our X25519 pubkey is unavailable.

Layout: [PILK(4)][nodeID(4 BE)][X25519 pubkey(32)] = 40 bytes.

func (*Manager) ClearPendingRekey

func (m *Manager) ClearPendingRekey(peerNodeID uint32)

ClearPendingRekey cancels any in-flight retransmit tracking for peerNodeID. Called from Handle*Frame paths (we received their key exchange, so we no longer need to keep hammering them with ours). Does NOT clear rekeyGaveUp — receiving a key exchange is not proof of bidirectional reachability; only a successful decrypt is. Clearing rekeyGaveUp here caused the bounce to re-arm whenever a peer with no cooldown fix kept sending us key exchanges.

func (*Manager) ClearRekeyGaveUp

func (m *Manager) ClearRekeyGaveUp(peerNodeID uint32)

ClearRekeyGaveUp lifts the post-give-up cooldown for peerNodeID. Must only be called after a successful decrypt — proof that bidirectional crypto works.

func (*Manager) DeriveSecret

func (m *Manager) DeriveSecret(peerPubKeyBytes []byte) (*Crypto, error)

DeriveSecret computes a shared AES-256-GCM cipher from the peer's X25519 public key, returning a fresh Crypto ready to be installed.

Uses HKDF-SHA256 (info = "pilot-tunnel-v1", per HKDFInfo) to derive the AEAD key from the X25519 shared secret. Intermediate key material is zeroed before return (H4 fix).

func (*Manager) GetPeerPubKey

func (m *Manager) GetPeerPubKey(nodeID uint32) (ed25519.PublicKey, error)

GetPeerPubKey returns the cached Ed25519 public key for a peer, fetching from the registry (via verifyFunc) on cache miss.

func (*Manager) HandleAuthFrame

func (m *Manager) HandleAuthFrame(data []byte, from *net.UDPAddr, fromRelay bool) bool

HandleAuthFrame processes an authenticated key exchange packet (PILA).

Frame layout (after 4-byte magic stripped by caller):

[4 nodeID][32 X25519 pubkey][32 Ed25519 pubkey][64 Ed25519 signature]

The signature is over: "auth"(4) || nodeID(4 BE) || X25519-pubkey(32). fromRelay indicates this was received via beacon relay — the PostInstallHook decides what to do with the peer-endpoint update.

Returns true if a Crypto was installed (or already-present-and-fresh crypto was kept), false on any reject path (signature mismatch, missing identity, encryption disabled, malformed input).

func (*Manager) HandleUnauthFrame

func (m *Manager) HandleUnauthFrame(data []byte, from *net.UDPAddr, fromRelay bool) bool

HandleUnauthFrame processes an unauthenticated key exchange packet (PILK).

Frame layout (after 4-byte magic stripped by caller):

[4 nodeID][32 X25519 pubkey]

If we have an identity AND the peer has a registered pubkey, reject the unauthenticated exchange and reply with our authenticated frame instead.

func (*Manager) HasIdentity

func (m *Manager) HasIdentity() bool

HasIdentity is a fast path used to gate auth/unauth frame selection.

func (*Manager) HasPeerPubKey

func (m *Manager) HasPeerPubKey(nodeID uint32) bool

HasPeerPubKey reports whether the cache currently holds an entry for nodeID.

func (*Manager) Identity

func (m *Manager) Identity() *crypto.Identity

Identity returns the currently-set identity (or nil).

func (*Manager) InboundDecryptStale

func (m *Manager) InboundDecryptStale(peerNodeID uint32) bool

InboundDecryptStale returns true if we haven't successfully decrypted any packet from peerNodeID within the staleness window.

func (*Manager) InjectPendingRekeyForTest

func (m *Manager) InjectPendingRekeyForTest(peerNodeID uint32, st *PendingRekeyState)

InjectPendingRekeyForTest sets a state directly. Test-only; production code never bypasses MarkPendingRekey.

func (*Manager) InvalidatePeerPubKey

func (m *Manager) InvalidatePeerPubKey(nodeID uint32)

InvalidatePeerPubKey deletes a cache entry.

func (*Manager) LastInboundDecryptHas

func (m *Manager) LastInboundDecryptHas(peerNodeID uint32) bool

LastInboundDecryptHas reports whether we've recorded any successful decrypt timestamp for the peer.

func (*Manager) Loop

func (m *Manager) Loop(ctx context.Context)

Loop runs the rekey retransmit loop. It scans pendingRekey every RekeyRetransmitInterval, retransmits stale entries via SendKeyExchangeToNode, and gives up after MaxRekeyAttempts.

Cancel via ctx — typical wiring uses tunnel.done as the cancellation source through a tiny adapter goroutine.

func (*Manager) MarkPendingRekey

func (m *Manager) MarkPendingRekey(peerNodeID uint32) bool

MarkPendingRekey records that we sent a key_exchange to peerNodeID and are awaiting confirmation. Idempotent — re-calls bump LastSentAt and Attempts but preserve FirstSentAt.

Returns false and is a no-op if the peer is within the rekeyGaveUpCooldown window, preventing an immediate restart of a just-failed rekey cycle.

func (*Manager) PeerInRekeyGaveUp

func (m *Manager) PeerInRekeyGaveUp(peerNodeID uint32) bool

PeerInRekeyGaveUp reports whether peerNodeID is within the give-up cooldown. Used by the tunnel to skip queuing outbound packets for unreachable peers.

func (*Manager) PendingRekeyAttempts

func (m *Manager) PendingRekeyAttempts(peerNodeID uint32) int

PendingRekeyAttempts returns the current attempts count (0 if not pending).

func (*Manager) PendingRekeyForTest

func (m *Manager) PendingRekeyForTest(peerNodeID uint32) *PendingRekeyState

PendingRekeyForTest returns the live PendingRekeyState pointer (or nil) for a peer. Exposed so package-internal tests can inspect the FirstSentAt / LastSentAt timestamps without re-deriving them via the public counters. Acquires rkPendingMu briefly; safe for concurrent use.

func (*Manager) PendingRekeyHas

func (m *Manager) PendingRekeyHas(peerNodeID uint32) bool

PendingRekeyHas reports whether an entry exists.

func (*Manager) PrivKey

func (m *Manager) PrivKey() *ecdh.PrivateKey

PrivKey returns the local X25519 private key.

func (*Manager) PubKey

func (m *Manager) PubKey() []byte

PubKey returns the local X25519 public key (32 bytes). May be nil if SetX25519Keys was never called.

func (*Manager) RecordInboundDecrypt

func (m *Manager) RecordInboundDecrypt(peerNodeID uint32)

RecordInboundDecrypt updates the per-peer last-decrypt timestamp.

func (*Manager) RekeyRetransmitTick

func (m *Manager) RekeyRetransmitTick()

RekeyRetransmitTick is the per-tick body of Loop, split out for direct testing without a real ticker. Iterates pendingRekey under rkPendingMu, releases it before invoking SendKeyExchangeToNode (which re-takes rkPendingMu via MarkPendingRekey — splitting the lock keeps rkPendingMu a true leaf).

func (*Manager) RemovePeer

func (m *Manager) RemovePeer(nodeID uint32)

RemovePeer wipes per-peer L5 state (called from TunnelManager.RemovePeer).

func (*Manager) ResetPendingRekeyAttempts

func (m *Manager) ResetPendingRekeyAttempts(peerNodeID uint32)

ResetPendingRekeyAttempts zeroes the Attempts counter for peerNodeID without cancelling the pending-rekey entry. Called when the routing path changes (e.g. direct→relay flip) so the peer gets a fresh set of retransmit slots on the new path rather than immediately hitting the MaxRekeyAttempts give-up threshold from prior direct attempts.

func (*Manager) SendKeyExchangeToNode

func (m *Manager) SendKeyExchangeToNode(peerNodeID uint32)

SendKeyExchangeToNode sends an authenticated key exchange if identity is available, otherwise falls back to unauthenticated.

This function carries the single annotated bootstrap-exception site (see the marker comment inside the body). After Stage 2 sub-pass 2, the canonical home for the marker is here in keyexchange/bootstrap.go; layers.yaml's bootstrap_exception.allowed_call_sites tracks this path.

func (*Manager) SetAddrLookup

func (m *Manager) SetAddrLookup(f PeerAddrLookup)

SetAddrLookup wires the peer-address resolver.

func (*Manager) SetIdentity

func (m *Manager) SetIdentity(id *crypto.Identity)

SetIdentity sets our Ed25519 identity for signing authenticated key exchanges. May be called concurrently with HandleAuthFrame / Send.

func (*Manager) SetLastInboundDecryptForTest

func (m *Manager) SetLastInboundDecryptForTest(peerNodeID uint32, t time.Time)

SetLastInboundDecryptForTest writes a timestamp directly.

func (*Manager) SetLocalNodeIDFn

func (m *Manager) SetLocalNodeIDFn(f func() uint32)

SetLocalNodeIDFn supplies the closure used to read our own node ID (atomic read living in the daemon).

func (*Manager) SetPeerPubKey

func (m *Manager) SetPeerPubKey(nodeID uint32, pk ed25519.PublicKey)

SetPeerPubKey installs a cache entry directly. Used by handle paths after verifying a packet-carried Ed25519 pubkey against the registry.

func (*Manager) SetPeerVerifyFunc

func (m *Manager) SetPeerVerifyFunc(f VerifyFunc)

SetPeerVerifyFunc sets the callback used to fetch a peer's Ed25519 public key from the registry on cache miss.

func (*Manager) SetPostInstallHook

func (m *Manager) SetPostInstallHook(h PostInstallHook)

SetPostInstallHook wires the post-install daemon callback (peer endpoint bookkeeping, salvage replay, etc.).

func (*Manager) SetPreRetransmitHook

func (m *Manager) SetPreRetransmitHook(h PreRetransmitHook)

SetPreRetransmitHook wires a callback invoked once per peer immediately before each rekey retransmit. The daemon uses this to apply cross-layer policy (e.g. force the peer onto the relay path) without leaking routing concerns into the keyexchange package.

func (*Manager) SetPublisher

func (m *Manager) SetPublisher(p EventPublisher)

SetPublisher wires the daemon's event bus (nil-safe).

func (*Manager) SetSender

func (m *Manager) SetSender(s FrameSender)

SetSender wires L2's frame-send hook.

func (*Manager) SetX25519Keys

func (m *Manager) SetX25519Keys(priv *ecdh.PrivateKey, pub []byte)

SetX25519Keys installs our X25519 keypair (used for ECDH on inbound key-exchange frames and for the pubkey slot in outbound frames).

func (*Manager) Store

func (m *Manager) Store() *Store

Store returns the per-peer Crypto registry owned by this Manager. L6 (envelope) imports it to wrap/unwrap AEAD frames; L7 (tunnel) imports it for membership probes (Has/IsReady) and bookkeeping (Install/Drop) on rekey paths.

type PeerAddrLookup

type PeerAddrLookup func(peerNodeID uint32) *net.UDPAddr

PeerAddrLookup returns the currently-known UDP address for peerNodeID. Used by sendKeyExchangeToNode to find the destination. tunnel.go satisfies this with a closure reading tm.peers under tm.mu.

type PendingRekeyState

type PendingRekeyState struct {
	FirstSentAt time.Time
	LastSentAt  time.Time
	Attempts    int
}

PendingRekeyState tracks a key-exchange we sent and are waiting on. Cleared when handleEncrypted records a successful decrypt from peer (proof their crypto matches ours). The retransmit loop bumps LastSentAt and Attempts on each retry; gives up after MaxRekeyAttempts to avoid hammering a peer that's just gone.

type PostInstallEvent

type PostInstallEvent struct {
	PeerNodeID    uint32
	From          *net.UDPAddr
	FromRelay     bool
	Authenticated bool
	HadCrypto     bool // true if an entry existed before
	KeyChanged    bool // true if the peer's X25519 ephemeral key actually changed
	OldCrypto     *Crypto
	NewCrypto     *Crypto
	PeerEd25519   ed25519.PublicKey // non-nil for auth path
}

PostInstallEvent describes a freshly-installed Crypto.

type PostInstallHook

type PostInstallHook func(ev PostInstallEvent)

PostInstallHook is invoked AFTER a successful HandleAuthFrame / HandleUnauthFrame installs (or refreshes) a Crypto. Carries enough context for the daemon to do peer-endpoint bookkeeping, salvage replay, flushPending, etc. Implemented in tunnel.go.

type PreRetransmitHook

type PreRetransmitHook func(peerNodeID uint32, attempt int)

PreRetransmitHook is invoked by RekeyRetransmitTick once per peer that is about to be retransmitted, BEFORE SendKeyExchangeToNode runs. The hook receives the peer ID and the upcoming attempt count (1 for the first retransmit, 2 for the second, ...). Used by tunnel.go to flip the peer's routing path to relay once direct attempts have failed often enough that the next direct attempt is unlikely to succeed. Implemented in tunnel.go.

type SalvageEntry

type SalvageEntry struct {
	Plaintext []byte
	When      time.Time
}

SalvageEntry is a single plaintext frame retained for post-rekey replay.

type Store

type Store struct {

	// EncryptOK / EncryptFail track AEAD operation success/failure
	// counts. Exposed as atomic counters so L7 / metrics readers don't
	// need to take Store.mu. Bumped by the L6 framing functions in
	// pkg/daemon/envelope.
	EncryptOK   atomic.Uint64
	EncryptFail atomic.Uint64
	// contains filtered or unexported fields
}

Store is the per-tunnel key-state registry. It owns the map of node ID → Crypto. Counter, replay-window, and salvage state live on the per-peer Crypto values; Store's mutex only protects the map itself.

The mutex is intentionally a simple sync.RWMutex used as a leaf lock — it must NEVER be held while invoking a TunnelManager-side mutex (tm.mu, tm.rkPendingMu, tm.pendMu). All callers obey this by reading from Store first, then mutating tm state in a separate critical section.

Store is L5-owned. L6 (envelope) imports Store to pull per-peer Crypto pointers when wrapping/unwrapping AEAD frames; the import direction is L6 → L5, matching docs/architecture/01-LAYERS.md.

func NewStore

func NewStore() *Store

NewStore returns an empty Store.

func (*Store) CompareAndDrop

func (s *Store) CompareAndDrop(peerNodeID uint32, expected *Crypto) bool

CompareAndDrop deletes the entry only if the currently-installed pointer equals expected. Returns true if a delete occurred. Used by the decrypt-fail drop path to avoid deleting a Crypto that a concurrent rekey already replaced.

func (*Store) DrainSalvage

func (s *Store) DrainSalvage(c *Crypto) []SalvageEntry

DrainSalvage atomically removes and returns all non-aged salvage entries from c. Returns nil if c is nil or empty. Each returned Plaintext is the same byte slice that was stored — RecordSalvage already copied at insert time.

func (*Store) Drop

func (s *Store) Drop(peerNodeID uint32)

Drop removes the Crypto for peerNodeID. Safe even if no entry exists.

func (*Store) Get

func (s *Store) Get(peerNodeID uint32) *Crypto

Get returns the Crypto installed for peerNodeID, or nil if none.

func (*Store) Has

func (s *Store) Has(peerNodeID uint32) bool

Has returns true if a Crypto is installed for peerNodeID.

func (*Store) Install

func (s *Store) Install(peerNodeID uint32, c *Crypto)

Install adds (or replaces) the Crypto for the given peer. Caller is responsible for any "preserve on duplicate same-pubkey" logic — Store just unconditionally writes.

func (*Store) IsReady

func (s *Store) IsReady(peerNodeID uint32) bool

IsReady returns true if a Crypto is installed AND its handshake is complete (Ready=true).

func (*Store) Len

func (s *Store) Len() int

Len returns the number of installed Cryptos. Used by the MaxCryptoPeers cap pre-check.

func (*Store) LocalNodeID

func (s *Store) LocalNodeID() uint32

LocalNodeID returns our own node ID.

func (*Store) PeerIDs

func (s *Store) PeerIDs() []uint32

PeerIDs returns a snapshot of all peer IDs with installed Cryptos.

func (*Store) RecordSalvage

func (s *Store) RecordSalvage(c *Crypto, plaintext []byte)

RecordSalvage stashes a plaintext send into the per-peer ring buffer. On a subsequent peer-initiated rekey, DrainSalvage will hand back the entries; the caller re-encrypts with the new key and re-sends — recovering data that the peer dropped because our frame was keyed under their now-stale crypto context.

Bounded by SalvageMaxEntries and SalvageMaxAge. The plaintext is copied, not referenced — caller can reuse its buffer.

nil c is a no-op (safe for callers that don't gate on Ready).

func (*Store) SetLocalNodeID

func (s *Store) SetLocalNodeID(id uint32)

SetLocalNodeID stores our own node ID (used in encrypt-side AAD and secure-frame headers). Safe to call concurrently with other Store operations.

func (*Store) ShouldDropOnDecryptFail

func (s *Store) ShouldDropOnDecryptFail(peerNodeID uint32, c *Crypto) bool

ShouldDropOnDecryptFail implements the rc5 grace gate: returns true if the failing Crypto's DecryptFailCount has reached DecryptFailDropThreshold AND the Crypto is older than DecryptFailDropGrace AND it is still the currently-installed entry for peerNodeID.

The caller (L6 framing path) is responsible for the side-effect: drop via CompareAndDrop and trigger a fresh key exchange. This split keeps the rekey-trigger / event-publish responsibility OUT of Store.

func (*Store) ShouldDropOnOutsideWindow

func (s *Store) ShouldDropOnOutsideWindow(peerNodeID uint32, c *Crypto) bool

ShouldDropOnOutsideWindow mirrors ShouldDropOnDecryptFail for the outside-replay-window divergence path. Returns true when the peer's OutsideWindowCount has reached OutsideWindowDropThreshold AND the Crypto is older than OutsideWindowDropGrace AND it is still the currently-installed entry for peerNodeID.

The caller (L7 handleEncrypted) is responsible for dropping the Crypto via CompareAndDrop and triggering a fresh key exchange when this returns true. This is the symmetric-state recovery for the case where our receive-window's high-water-mark has drifted so far past the peer's actual send counter that no in-band signal exists. Reproduces the rc3 list-agents bug on 2026-05-11 where MaxRecvNonce stayed at 8518 while the sender's outbound counter sat at ~400.

func (*Store) ShouldDropOnReplay

func (s *Store) ShouldDropOnReplay(peerNodeID uint32, c *Crypto) bool

ShouldDropOnReplay is the symmetric counterpart to ShouldDropOnOutsideWindow for the in-window replay-collision path. Returns true when ReplayCount has reached ReplayDropThreshold AND the Crypto is older than ReplayDropGrace AND it is still the currently-installed entry for peerNodeID.

The caller (L7 handleEncrypted) is responsible for the side effect: drop via CompareAndDrop and request a fresh key exchange. This reproduces the recovery for the rc5 list-agents wedge (2026-05-11) where peer daemons restarted but kept their persisted X25519 identity — no PILA was sent, their send counter reset to 1, our MaxRecvNonce stayed at ~50, every subsequent frame from the peer landed at an already-marked counter and was dropped before AEAD-Open ever ran. Neither DecryptFailCount nor OutsideWindowCount incremented, so the existing recovery gates never engaged.

type VerifyFunc

type VerifyFunc func(nodeID uint32) (ed25519.PublicKey, error)

VerifyFunc fetches the canonical Ed25519 public key for nodeID (typically from the registry).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL