Documentation
¶
Overview ¶
Package encryption implements the AES-256-GCM envelope encryption scheme described in docs/design/2026_04_29_proposed_data_at_rest_encryption.md §4.
Stage 0 (foundation) provides the primitive Encrypt/Decrypt operations, the wire format envelope encoder/decoder, and the in-memory keystore. Composition of AAD bytes for storage-layer envelopes (§4.1) and raft-layer envelopes (§4.2) is the responsibility of callers in store/ and internal/raftengine/etcd/, added in later stages.
Wire format (§4.1):
+--------+------+---------+----------+-----------+--------+ | 0x01 | flag | key_id | nonce | ciphertext| tag | | 1 byte | 1 B | 4 bytes | 12 bytes | N bytes | 16 B | +--------+------+---------+----------+-----------+--------+
Per-value overhead is 34 bytes (HeaderSize + TagSize).
Index ¶
- Constants
- Variables
- func AppendHeaderAADBytes(dst []byte, version, flag byte, keyID uint32) []byte
- func BuildRaftAAD(version byte, keyID uint32) []byte
- func BumpLocalEpoch(sidecarPath string, dekID uint32) (uint16, error)
- func CheckLocalEpochRollback(registry WriterRegistryStore, fullNodeID uint64, activeStorageDEKID uint32, ...) error
- func CheckNodeIDCollision(fullNodeIDs []uint64) error
- func CheckStartupGuards(cfg StartupConfig) error
- func DecodeRegistryKey(key []byte) (dekID uint32, nodeID16 uint16, err error)
- func EncodeRegistryValue(rv RegistryValue) []byte
- func GuardSidecarBehindRaftLog(sidecarAppliedIdx, engineAppliedIdx uint64, scanner EncryptionRelevantScanner) error
- func HeaderAADBytes(version, flag byte, keyID uint32) []byte
- func HydrateKeystoreFromSidecar(ks *Keystore, kek KEKUnwrapper, sc *Sidecar) error
- func IsEncryptionRelevantOpcode(opcode byte) bool
- func IsNotExist(err error) bool
- func NodeID16(fullNodeID uint64) uint16
- func ProbeSidecarFilesystem(sidecarPath string) error
- func RegistryDEKPrefix(dekID uint32) []byte
- func RegistryKey(dekID uint32, nodeID16 uint16) []byte
- func UnwrapRaftPayload(c *Cipher, encoded []byte) ([]byte, error)
- func WrapRaftPayload(c *Cipher, keyID uint32, nonce, payload []byte) ([]byte, error)
- func WriteSidecar(path string, sc *Sidecar) (retErr error)
- type ActiveKeys
- type Applier
- func (a *Applier) ActiveStorageKeyID() (uint32, bool)
- func (a *Applier) ApplyBootstrap(raftIdx uint64, p fsmwire.BootstrapPayload) error
- func (a *Applier) ApplyRegistration(p fsmwire.RegistrationPayload) error
- func (a *Applier) ApplyRotation(raftIdx uint64, p fsmwire.RotationPayload) error
- func (a *Applier) StateCache() *StateCache
- func (a *Applier) StorageEnvelopeActive() bool
- type ApplierOption
- type Cipher
- type DeterministicNonceFactory
- type EncryptionRelevantScanner
- type Envelope
- type KEKUnwrapper
- type Keystore
- func (k *Keystore) AEAD(keyID uint32) (cipher.AEAD, bool)
- func (k *Keystore) DEK(keyID uint32) ([KeySize]byte, bool)
- func (k *Keystore) Delete(keyID uint32)
- func (k *Keystore) Has(keyID uint32) bool
- func (k *Keystore) IDs() []uint32
- func (k *Keystore) Len() int
- func (k *Keystore) Set(keyID uint32, dek []byte) error
- type RegistryValue
- type Sidecar
- type SidecarKey
- type StartupConfig
- type StateCache
- type WriterRegistryStore
Constants ¶
const ( // EnvelopeVersionV1 is the current envelope format version. §11.3 // reserves 0x02..0x0F for future authenticated formats. The current // build only understands 0x01; ANY other version byte (including the // 0x02..0x0F reserved range) causes DecodeEnvelope to return // ErrEnvelopeVersion. Future decoders that know how to handle the // reserved range will widen this check. EnvelopeVersionV1 byte = 0x01 // FlagCompressed (bit 0) is set when ciphertext encrypts a Snappy- // compressed plaintext (§6.4). The flag participates in the AAD so a // post-hoc bit-flip is rejected by GCM verification. FlagCompressed byte = 1 << 0 // KeySize is the AES-256 key length in bytes. KeySize = 32 // NonceSize is the AES-GCM standard nonce size in bytes. NonceSize = 12 // TagSize is the AES-GCM authentication tag size in bytes. TagSize = 16 // HeaderAADSize covers version + flag + key_id (the bytes that // participate in storage AAD, distinct from the full envelope header // which also carries nonce). Exposed as the input length of // HeaderAADBytes. HeaderAADSize = versionBytes + flagBytes + keyIDBytes // 6 // HeaderSize covers version + flag + key_id + nonce, in that order. HeaderSize = HeaderAADSize + NonceSize // 18 // EnvelopeOverhead is the per-value byte overhead introduced by the // envelope: HeaderSize + TagSize. EnvelopeOverhead = HeaderSize + TagSize // 34 // ReservedKeyID is the cluster-wide "not bootstrapped" sentinel // (§5.1). Implementations MUST refuse to install or look up this // key_id. ReservedKeyID uint32 = 0 )
Public constants for the §4.1 wire format.
const ( SidecarPurposeStorage = "storage" SidecarPurposeRaft = "raft" )
SidecarPurposeStorage / SidecarPurposeRaft are the only purposes the reader recognises. Stage 6 may add more.
const MinReadableSidecarVersion = 1
MinReadableSidecarVersion is the oldest wire version ReadSidecar accepts. Set to 1 so a freshly-upgraded post-6D-4 node can read the legacy pre-6D-4 sidecar that an in-place upgrade leaves on disk; the very next WriteSidecar transitions it to SidecarVersion. A future major version bump would raise this constant to drop legacy-read support.
const RaftAADPurpose byte = 'R' // 0x52
RaftAADPurpose is the literal byte 'R' (0x52) that prefixes the raft-envelope AAD per design §4.2. It distinguishes a raft envelope from a storage envelope: a storage-layer ciphertext replayed into the raft layer (or the reverse) fails GCM verification because the AAD prefix does not match.
const SidecarFilename = "keys.json"
SidecarFilename is the standard filename inside <dataDir>/encryption/.
const SidecarTmpFilename = SidecarFilename + ".tmp"
SidecarTmpFilename is the filename used for the §5.1 crash-durable write protocol's intermediate write.
const SidecarVersion = 2
SidecarVersion is the current wire version written by WriteSidecar. Versions 1 (pre-6D-4) and 2 (6D-4 and later) are readable. Future versions extend the layout via additive JSON fields plus a bump here; mismatched versions are rejected at read time so an older binary cannot silently drop fields it does not understand.
Version 2 adds storage_envelope_cutover_index (6D-4 §6.4). The upgrade path is automatic: a post-6D-4 binary reads a version-1 sidecar with the new field defaulting to 0, then writes version 2 on the next sidecar mutation (WriteSidecar overrides the in-struct Version with SidecarVersion). A subsequent downgrade to a pre-6D-4 binary then refuses to boot because that binary only accepts version 1 — preventing the silent field-drop that the design doc §6.4 warns about (gemini medium #2 on PR804).
Variables ¶
var ( // ErrUnknownKeyID is returned when a wrap/unwrap call references a key_id // that is not present in the Keystore. Surfaces as `unknown_key_id` on // the §9.2 elastickv_encryption_decrypt_failures_total counter. ErrUnknownKeyID = errors.New("encryption: unknown key_id") // ErrReservedKeyID is returned when a caller tries to install or use // key_id 0; that value is reserved cluster-wide as the // "not bootstrapped" sentinel per §5.1. ErrReservedKeyID = errors.New("encryption: key_id 0 is reserved as the not-bootstrapped sentinel") // ErrBadNonceSize indicates the nonce passed to Encrypt/Decrypt was not // exactly NonceSize bytes. ErrBadNonceSize = errors.New("encryption: nonce size invalid") // ErrBadKeySize indicates the DEK passed to Keystore.Set was not exactly // KeySize bytes (AES-256 requires 32). ErrBadKeySize = errors.New("encryption: DEK size invalid") // ErrIntegrity indicates a GCM tag mismatch on Decrypt — i.e., the // ciphertext was tampered with, the AAD does not match the one used at // Encrypt, or the wrong DEK is loaded. Per §4.1, callers MUST treat this // as a typed read error and never silently zero or retry. ErrIntegrity = errors.New("encryption: integrity check failed (GCM tag mismatch)") // ErrEnvelopeShort indicates DecodeEnvelope received fewer bytes than the // minimum envelope size (HeaderSize + TagSize). ErrEnvelopeShort = errors.New("encryption: envelope shorter than header+tag") // ErrEnvelopeVersion indicates DecodeEnvelope saw a version byte the // current build does not know how to parse. Reserved values per §11.3. ErrEnvelopeVersion = errors.New("encryption: unknown envelope version") // ErrNilKeystore indicates NewCipher was called with a nil Keystore. // Surfaced at construction time so a wiring mistake is caught // before the first Encrypt/Decrypt would otherwise nil-deref panic. ErrNilKeystore = errors.New("encryption: keystore is nil") // ErrKeyConflict indicates Keystore.Set was called with a keyID // already loaded under DIFFERENT key material. Replacing live key // bytes for an in-use key_id would render every envelope already // persisted under that id undecryptable, so Set fails closed // rather than silently overwriting. Set with the SAME bytes is // idempotent (returns nil) and does not raise this error. ErrKeyConflict = errors.New("encryption: key_id already loaded with different key material") // ErrUnsupportedFilesystem indicates the parent directory of the // sidecar cannot guarantee crash-durability of os.Rename via // fsync (typical on NFS, some FUSE mounts). Per §5.1 the // encryption package refuses to start in that situation rather // than silently degrading the durability guarantee. Two paths // surface this sentinel: // // - WriteSidecar wraps any fsync-on-directory failure on the // real keys.json write path. This catches the failure on the // first encryption-relevant write — which on a fresh cluster // may be hours into operation, well past the point where // catching the misconfiguration would have been cheap. // // - ProbeSidecarFilesystem (Stage 6C-2) runs at process startup // and exercises the same write+rename+dir.Sync sequence on a // sentinel file, then deletes it. The probe surfaces the // failure BEFORE any encryption-relevant Raft entry commits, // so the operator gets the unambiguous startup-time refusal // rather than a halted apply loop later. ErrUnsupportedFilesystem = errors.New("encryption: filesystem does not support durable directory sync (NFS, some FUSE mounts are unsupported)") // ErrSidecarActiveKeyMissing indicates the Sidecar has a non-zero // Active.{Storage,Raft} key_id that does not appear in the Keys // map. The two halves are written together by every rotation / // bootstrap path; an Active id without a corresponding wrapped // DEK is malformed input. ErrSidecarActiveKeyMissing = errors.New("encryption: sidecar active key_id has no entry in keys map") // ErrSidecarActivePurposeMismatch indicates the Sidecar has a // non-zero Active.{Storage,Raft} key_id pointing to a Keys entry // whose Purpose does not match the slot. e.g., active.storage=7 // but Keys["7"].purpose == "raft". Crossed pointers would route // the wrong DEK into a purpose-specific encryption path after // restart or rotation, so the reader fails closed. ErrSidecarActivePurposeMismatch = errors.New("encryption: sidecar active key_id references a key with mismatched purpose") // ErrEncryptionApply is the §6.3 / §11.3 fatal-apply sentinel // surfaced by encryption FSM handlers (kv/fsm_encryption.go) // when one of the opcodes (0x03 registration, 0x04 bootstrap, // 0x05 rotation) cannot be applied — malformed payload, // KEK-unwrap failure, local-epoch rollback, sidecar write // failure, etc. // // The FSM packs this error in a haltApplyResponse value; // internal/raftengine/etcd's applyNormalCommitted recognises // the HaltApply interface, returns the error, and runLoop's // fatal-error path takes the process down without advancing // setApplied — the next restart must replay the entry. // // Defined here (and not in internal/raftengine/etcd) so // kv/fsm_encryption.go can errors.Mark its handler outputs // without importing the engine package, which would close // the kv ↔ engine cycle (engine_test imports kv as a fake // FSM). ErrEncryptionApply = errors.New("encryption: FSM apply failed; halting apply (see design §6.3)") // ErrKEKNotConfigured is the defense-in-depth marker returned // by Applier.ApplyBootstrap and Applier.ApplyRotation when no // KEK unwrapper is wired (Stage 6A scaffolding before Stage 6B // fills the KEK plumbing). It is wrapped with ErrEncryptionApply // at the kv/fsm dispatch layer so it routes through the same // HaltApply seam as any other applier error. // // Per PR #762's Stage 6 plan, the production safety boundary // is the gRPC-layer mutator gate in registerEncryptionAdminServer // (which Stage 5D left OFF, and which 6B re-enables only when // both --encryption-enabled is set AND KEKConfigured() is true); // this typed error exists so a future refactor that bypasses // that gate produces a named, grep-able failure mode rather // than a nil-pointer panic deep in the apply path. ErrKEKNotConfigured = errors.New("encryption: KEK not configured on this node; cannot unwrap wrapped DEK material") // ErrWriterUint16Collision is the Stage 7c §3.3 typed error returned // by the encryption-aware MembershipChangeInterceptor when a new // node's NodeID16 collides with an existing member's writer- // registry row at the same uint16 truncation but with a different // FullNodeID. The §3.1 read-before-propose guard catches the // common case BEFORE any propose, so the admin RPC returns a // retryable client-facing error rather than triggering a §4.1 // case-4 halt apply (which would, without 7c's interceptor, fire // AFTER the conf-change had already committed — bricking the // cluster). Operators choose a non-colliding raftID and retry. ErrWriterUint16Collision = errors.New("encryption: writer registry uint16 collision; choose a different raftID") // ErrSidecarPresentWithoutFlag is the §9.1 startup-refusal guard // raised when the data dir already contains a sidecar (keys.json) // but --encryption-enabled is NOT set. Continuing would silently // downgrade the cluster to cleartext: new writes would land // unencrypted while the prior wrapped DEKs sit untouched on disk, // inverting the operator's intent. The process refuses to start // rather than half-honor a prior bootstrap. Recovery is either // to set --encryption-enabled (resume encrypted operation) or to // move/delete the sidecar with deliberate runbook acknowledgement. ErrSidecarPresentWithoutFlag = errors.New("encryption: sidecar present on disk but --encryption-enabled is not set; refusing to start to avoid silent downgrade to cleartext (set --encryption-enabled or remove the sidecar per the §9.1 runbook)") // ErrKEKRequiredWithFlag is the §9.1 startup-refusal guard // raised when --encryption-enabled is set but no KEK source // (--kekFile) is supplied. Without a KEK the applier cannot // unwrap any sidecar DEK, so the first mutating EncryptionAdmin // RPC would HaltApply on every replica via ErrKEKNotConfigured. // The Stage 6B-2 mutator gate keeps that path unreachable at // the RPC boundary, but a flag-on / KEK-off node is misconfigured // and the operator's clear intent (enable encryption) cannot be // satisfied. Fail fast at startup rather than discover the // mismatch later via a halted apply loop. ErrKEKRequiredWithFlag = errors.New("encryption: --encryption-enabled is set but no KEK source (--kekFile) was provided; refusing to start (set --kekFile or unset --encryption-enabled)") // ErrKEKMismatch is the §9.1 startup-refusal guard raised when // the data dir contains a sidecar whose wrapped DEKs do NOT // decrypt under the configured KEK. The classic operator // error this catches is "wrong --kekFile points at a key from // a different cluster / environment" — continuing past it // would render every encrypted value on disk effectively // permanently lost the moment a write tries to use the wrong // DEK. Recovery requires the operator to either point // --kekFile at the correct KEK file or restore the data dir // from a backup that matches the supplied KEK. ErrKEKMismatch = errors.New("encryption: configured KEK cannot unwrap one or more wrapped DEKs in the sidecar; refusing to start (verify --kekFile matches the KEK that bootstrapped this data dir)") // ErrLocalEpochExhausted is the §9.1 startup-refusal guard // raised when any active DEK in the sidecar has reached the // uint16 saturation value (0xFFFF). The §4.1 nonce construction // reserves only 16 bits for local_epoch, so a node that already // emitted a nonce with local_epoch == 0xFFFF cannot safely // emit another one under the same DEK without rolling the // counter back to 0 and re-issuing a nonce that has already // been used (GCM catastrophic — distinct plaintexts encrypted // under the same (key, nonce) pair reveal plaintext XOR via // the keystream). Recovery is a deliberate DEK rotation (§5.2) // which retires the exhausted DEK; the next process startup // then sees a fresh DEK with local_epoch=0 and can proceed. // // The check runs at process startup, before any apply or new // write can reach the cipher; an active DEK with // local_epoch==0xFFFF is a guaranteed-future nonce-reuse // liability that must be rotated before the node is allowed // to participate. ErrLocalEpochExhausted = errors.New("encryption: active DEK has reached local_epoch=0xFFFF saturation; refusing to start (rotate the affected DEK via `encryption rotate-dek` before the next startup or risk GCM nonce reuse — see §4.1)") // ErrWriteCountExhausted is returned by DeterministicNonceFactory.Next // when the 64-bit §4.1 write_count counter wraps within a single // process load. A wrap re-issues write_count=1 under the same // (node_id, local_epoch), recycling every GCM nonce already // produced this load under the active DEK — catastrophic. The // boundary is unreachable in any realistic deployment (2^64 ≈ // 1.8e19 writes per load), but the factory fails closed rather // than silently wrapping. Recovery is a process restart, which // bumps local_epoch (BumpLocalEpoch) and resets write_count to a // fresh, non-overlapping range. ErrWriteCountExhausted = errors.New("encryption: nonce write_count exhausted (uint64 wrap) within a process load; restart to bump local_epoch — see §4.1") // ErrSidecarBehindRaftLog is the §9.1 startup-refusal guard // raised when the sidecar's raft_applied_index is behind the // raftengine's persisted applied index AND the gap covers any // SIDECAR-MUTATING Raft entry (per §5.5's // IsEncryptionRelevantOpcode predicate: 0x04 OpBootstrap, // 0x05 OpRotation, plus the reserved opcodes 0x06 / 0x07 // from the fsmwire OpEncryption range). 0x03 OpRegistration // is in the OpEncryption range but is NOT sidecar-mutating // — its apply path writes writer-registry rows only and // never WriteSidecar, so registration-only gaps would be // spurious refusals. // // The classic scenario this catches is a partial-write crash: // an encryption-relevant entry was applied to the engine and // the engine's applied index was advanced, but the §5.1 // sidecar write that should have committed the corresponding // keys.json change did not complete. On the next startup the // node sees a stale sidecar that does not reflect the // already-Raft-committed mutation, and the encryption package // would silently rebuild keystore state from an outdated // wrapped-DEK snapshot — a fail-closed read would fire on the // next post-cutover entry with `unknown_key_id`, halting apply. // // Refusing at startup with this typed error means the operator // sees a single unambiguous failure pointing at the right // runbook (`encryption resync-sidecar`) rather than a // downstream HaltApply on the first post-cutover Raft entry. // // Restricting the gap check to specific encryption opcodes // (rather than ANY gap) keeps non-encryption-relevant lag // from forcing a spurious refusal. See §5.5 for the // IsEncryptionRelevantOpcode rationale. ErrSidecarBehindRaftLog = errors.New("encryption: sidecar raft_applied_index is behind the raftengine's persisted applied index and the gap covers an encryption-relevant entry; refusing to start (run `encryption resync-sidecar` to advance the sidecar past the encryption-relevant entries before retrying — see §9.1 + §5.5)") // ErrNodeIDCollision is the §9.1 / 6C-3 startup-refusal guard // raised when two distinct `full_node_id` values in the local // route-catalog snapshot map to the same 16-bit `node_id`. The // derivation is `uint16(full_node_id & 0xFFFF)` — same narrowing // the writer-registry keying and §4.1 GCM nonce prefix already // use (see `applier.go` `nodeIDMask`). A collision means the two // members would write under the same `(node_id, local_epoch)` // prefix and reuse the GCM counter under the same DEK, which // breaks AES-GCM's nonce-uniqueness requirement and is a // catastrophic confidentiality + integrity failure. // // The guard reads the LOCAL route-catalog snapshot only — no // RPC fan-out — because the startup-guard phase runs BEFORE the // gRPC server is up, and the local Raft log is authoritative // for cluster membership (every node applies the same // `ConfChange` entries). See `2026_05_18_proposed_6d_enable_storage_envelope.md` // §5.1 for the lifecycle rationale and the operator-mitigation // menu (`probe-node-id` CLI, full_node_id re-roll, etc.). // // Skip conditions: encryption disabled (no nonce-reuse risk), // empty membership snapshot (single-node pre-bootstrap; nothing // to compare yet). ErrNodeIDCollision = errors.New("encryption: two distinct full_node_id values in the local route-catalog hash to the same 16-bit node_id, which would reuse GCM nonces under the active DEK; refusing to start (re-roll one of the colliding full_node_id values — use `elastickv-admin encryption probe-node-id --full-node-id=<u64>` to verify before joining — see §9.1 + §5.1 of the 6D design doc)") // ErrLocalEpochRollback is the §9.1 / 6C-3 startup-refusal // guard raised when this node's sidecar `local_epoch` for the // active storage DEK is less than OR equal to the local Pebble // writer-registry's `LastSeenLocalEpoch` for the // `(full_node_id, active_storage_dek_id)` row. The strict-ahead // posture — `sidecar > registry`, not `>=` — is the §5.2 // nonce-monotonicity invariant: a node may only resume issuing // nonces when its sidecar is STRICTLY ahead of the registry // record. The equality case would replay the same // `(node_id, local_epoch)` prefix and reuse the GCM counter // under the same DEK, identical to the collision scenario // `ErrNodeIDCollision` catches but at the single-node-restart // timescale rather than the cluster-membership timescale. // // Classic trigger: operator restored the sidecar from an old // backup, leaving `local_epoch` behind the registry record // that the node has already used for prior writes. // // The guard reads LOCAL Pebble state only — no RPC. The // writer-registry is local state on every node post-bootstrap // (every node applies the same OpRegistration and OpBootstrap // entries), so the local row is authoritative for this node's // own monotonicity contract. // // Skip conditions: encryption disabled, sidecar's // `Active.Storage == 0` (bootstrap not yet committed). The // missing-registry-row case is NOT an unconditional skip — // the §5.2 split fires based on `sidecar.StorageEnvelopeActive`: // pre-cutover (active=false) the missing-row case is a // freshly-joined-learner skip and `CheckLocalEpochRollback` // returns nil; post-cutover (active=true) the missing-row // case is a missing rollback anchor and the function fires // `ErrLocalEpochRollback`. See the comprehensive doc on // `CheckLocalEpochRollback` (`local_epoch_rollback.go`) for // the full split. ErrLocalEpochRollback = errors.New("encryption: sidecar local_epoch for the active storage DEK is at or below the local writer-registry's last_seen_local_epoch, which would replay the GCM nonce prefix under the same DEK; refusing to start (verify the sidecar was not restored from an old backup; run `encryption resync-sidecar` if appropriate, or rotate the affected DEK — see §9.1 + §5.2 of the 6D design doc)") )
var ( // ErrRegistryKeyMalformed indicates a Pebble key returned to // DecodeRegistryKey did not match the WriterRegistryPrefix + // fixed-suffix shape produced by RegistryKey. ErrRegistryKeyMalformed = errors.New("encryption: writer registry key is malformed") // ErrRegistryValueMalformed indicates a Pebble value returned to // DecodeRegistryValue did not match the fixed registryValueSize // layout. Stage 4 callers fail closed on this — a corrupted // registry row is fail-fast rather than silently treated as // "no entry". ErrRegistryValueMalformed = errors.New("encryption: writer registry value is malformed") )
Errors surfaced by registry codec helpers.
var ( // ErrSidecarVersion indicates ReadSidecar saw a wire version this // build does not know how to parse. Use the message and the offending // version to decide whether to upgrade the binary or fall back. ErrSidecarVersion = errors.New("encryption: unsupported sidecar version") // ErrSidecarPurpose indicates a Sidecar.Keys entry has a "purpose" // field outside the recognised set ({"storage","raft"}). The reader // fails closed rather than silently treating an unknown purpose as a // known one — a typo'd or future-version sidecar must be the // operator's explicit upgrade decision. ErrSidecarPurpose = errors.New("encryption: unsupported sidecar key purpose") // ErrSidecarKeyIDFormat indicates a Sidecar.Keys map key was not a // decimal uint32 string per §5.1. ErrSidecarKeyIDFormat = errors.New("encryption: sidecar key_id is not a decimal uint32") // ErrSidecarReservedKeyID indicates a Sidecar.Keys map carries // key_id 0, which §5.1 reserves as the "not bootstrapped" sentinel. // On-disk presence of 0 in the keys map is malformed input. ErrSidecarReservedKeyID = errors.New("encryption: sidecar key_id 0 is reserved") )
Errors returned by sidecar I/O.
var WriterRegistryPrefix = []byte("!encryption|writers|")
WriterRegistryPrefix reserves the Pebble key prefix used by the §4.1 writer registry. Format: `!encryption|writers|<be4 dek_id>|<be2 uint16(node_id)>`.
The leading `!` and the pipe-separated layout match the existing `!admin|` reservation in adapter/distribution_server.go and the `txnInternalKeyPrefix` reservation in store/. The pebbleStore's applyMutationsBatch refuses user mutations whose key starts with this prefix so a malformed user Put cannot clobber registry rows.
Registry rows are written through pebbleStore.EncryptionRegistryBatch (Stage 4 admin path), NOT through the MVCC encoded-key flow — these are operational state, not versioned user values.
Functions ¶
func AppendHeaderAADBytes ¶
AppendHeaderAADBytes appends the same 6-byte header prefix (version, flag, key_id) onto dst and returns the extended slice. Allocation-free when dst already has HeaderAADSize spare capacity, which lets storage callers in later stages write the AAD directly into a pooled buffer alongside the per-record context (e.g., pebble_key) without an intermediate make().
func BuildRaftAAD ¶
BuildRaftAAD composes the §4.2 raft-envelope AAD: a single-byte purpose tag ('R'), the envelope version, and the 4-byte big-endian key_id. Exposed for tests; production callers go through WrapRaftPayload / UnwrapRaftPayload.
func BumpLocalEpoch ¶
BumpLocalEpoch increments the §4.1 local_epoch for the DEK at dekID, fsyncs the sidecar, and returns the new value. It MUST be called on every process load that will issue storage-envelope nonces, BEFORE the first such nonce — the §4.1 nonce construction resets the in-process write_count to 0 each load, and a bumped-and-fsync'd local_epoch is what keeps `node_id ‖ local_epoch ‖ write_count` unique across restarts. Without the bump a restart would re-issue `node_id ‖ epoch ‖ {0,1,2,…}` and recycle previously-used GCM nonces under the same DEK — catastrophic.
The bump refuses at the uint16 ceiling: if the current epoch is already 0xFFFF the function returns ErrLocalEpochExhausted (the §9.1 CheckStartupGuards guard catches this earlier on the happy path; the check here is defense-in-depth for callers that bump without having run the guard). DEK rotation resets the epoch to 0 under a fresh DEK and is the only recovery.
Durability: the increment is persisted via WriteSidecar (the same write-temp + fsync + rename + dir-sync sequence the apply paths use), so a crash after the return guarantees the new epoch is on disk. A crash before the return leaves the old epoch and the next start bumps again — no nonce was issued in between because no store has opened yet.
func CheckLocalEpochRollback ¶
func CheckNodeIDCollision ¶
CheckNodeIDCollision is the §9.1 / 6C-3 startup-guard primitive for `ErrNodeIDCollision`. It walks the supplied list of `full_node_id` values (typically every voter + learner in the default group's local route-catalog snapshot), narrows each to its 16-bit `node_id` via the same `uint16(full_node_id & 0xFFFF)` mask that the writer-registry keying and §4.1 GCM nonce prefix use, and returns `ErrNodeIDCollision` if any two DISTINCT `full_node_id` values share the same `node_id`.
Skip conditions handled by the caller (the startup-guard wiring in main.go), not by this primitive:
- Encryption disabled (no nonce-reuse risk).
- Membership snapshot empty (single-node pre-bootstrap; nothing to compare yet).
This primitive does NOT consult any sidecar, registry, or RPC transport. It runs in the startup-before-serving phase where none of those are available yet (the gRPC server is not up).
Determinism: the caller may pass full_node_ids in any order; detection is symmetric. On a hit, the returned error wraps the two colliding `full_node_id` values and the shared `node_id` so the operator triage line names the conflict concretely.
Returns nil when:
- The slice has fewer than two unique values (no possible collision).
- No two distinct values map to the same 16-bit `node_id`.
Returns `ErrNodeIDCollision` wrapped with offending IDs when any two distinct `full_node_id` values share a `node_id`.
func CheckStartupGuards ¶
func CheckStartupGuards(cfg StartupConfig) error
CheckStartupGuards runs the §9.1 startup-refusal guards covered by Stage 6C-1 + 6C-2 and returns the first guard that fires. nil means every guard passed and the process is safe to proceed past startup into buildShardGroups / Raft engine wiring.
Scope (Stage 6C-1, PR #778):
- ErrSidecarPresentWithoutFlag: sidecar on disk but flag off (downgrade prevention)
- ErrKEKRequiredWithFlag: flag on but no KEK source (fail-fast)
- ErrKEKMismatch: flag on, KEK loaded, sidecar present, at least one wrapped DEK fails to unwrap under the configured KEK (operator-error catch)
Scope (Stage 6C-2, this PR):
- ErrLocalEpochExhausted: any active DEK in the sidecar has reached local_epoch == 0xFFFF (would-be GCM nonce-reuse on the next encrypted write). Refuse to start until rotated.
- ErrUnsupportedFilesystem: ProbeSidecarFilesystem exercises the §5.1 write+rename+dir.Sync sequence at startup so the filesystem incompatibility surfaces BEFORE the first encryption-relevant Raft entry, not on the first WriteSidecar call (which on a fresh node may be hours into operation).
Out of scope (deferred to later sub-milestones / Stage 6D / 6E):
- ErrNodeIDCollision — requires the cluster-wide Voters ∪ Learners membership view that Stage 6D's capability fan-out provides; node-local hashing cannot detect cross-node collisions on its own. Bundled with Stage 6C-3 / 6D.
- ErrLocalEpochRollback — needs the writer-registry record (Stage 7); the exhaustion peer ships here, the rollback peer ships once the registry is available. Bundled with Stage 6C-3.
- ErrSidecarBehindRaftLog — requires raftengine integration (read the persisted applied index) which is the same data path Stage 6D / 6E touch for the cutover gate; bundled there. Deferred from 6C-2 to keep this PR focused on guards that need only the encryption package's own state.
- Snapshot cutover divergence, raft-envelope-without-bootstrap, RPC local_epoch range — Stage 6E (raft cutover) / 6C-4.
The function reads the sidecar AT MOST ONCE (cached across every sidecar-reading guard) and is safe to call before any other encryption package state is constructed; it does NOT mutate the on-disk sidecar. ProbeSidecarFilesystem creates a sentinel file in the sidecar's parent directory and removes it before returning, leaving the on-disk state indistinguishable from before the call.
The single-read invariant matters as new sidecar-reading guards land in 6C-2b / 6C-3 / 6C-4: shipping each new guard with its own ReadSidecar call would silently grow the startup IO from O(1) to O(N guards), and a partial-write race between the guards' reads is harder to reason about than a single snapshot shared across all of them (claude r1 medium on PR #781).
func DecodeRegistryKey ¶
DecodeRegistryKey parses a registry-row key back into its (dek_id, uint16 node_id) tuple. Returns ErrRegistryKeyMalformed when the key does not start with WriterRegistryPrefix or has the wrong length. The decoder does NOT range-check the parsed dek_id against ReservedKeyID — that is the caller's policy.
func EncodeRegistryValue ¶
func EncodeRegistryValue(rv RegistryValue) []byte
EncodeRegistryValue serialises rv to its on-disk byte form. Always returns exactly registryValueSize bytes (12).
func GuardSidecarBehindRaftLog ¶
func GuardSidecarBehindRaftLog(sidecarAppliedIdx, engineAppliedIdx uint64, scanner EncryptionRelevantScanner) error
GuardSidecarBehindRaftLog implements the §9.1 ErrSidecarBehindRaftLog refusal logic as a pure function: it reads no I/O and depends on nothing beyond the supplied indices and the caller-provided scanner.
The contract:
- If sidecarAppliedIdx >= engineAppliedIdx, the sidecar is caught up (or ahead, which would itself be a bug — but this guard's scope is "behind", not "ahead"). Return nil.
- If the gap is non-empty, ask the scanner whether any entry in (sidecarAppliedIdx, engineAppliedIdx] is encryption- relevant. If yes, fire ErrSidecarBehindRaftLog. If no, the gap is harmless and we return nil.
- Propagate any scanner I/O error wrapped with context but NOT marked as ErrSidecarBehindRaftLog — scanner failure is a different operator problem than the gap-coverage refusal.
The function does not consult flags or encryption state; it is the caller's responsibility to skip the call when --encryption-enabled is off (the gap is irrelevant in that case) or when the engine hasn't yet been opened (no applied index to compare against).
func HeaderAADBytes ¶
HeaderAADBytes returns the first 6 bytes of the envelope header (version, flag, key_id) in their on-disk order. These bytes participate in the §4.1 storage-layer AAD (storage AAD = HeaderAADBytes ‖ pebble_key) and in the §4.2 raft-layer AAD's middle slice (raft AAD = "R" ‖ version ‖ key_id, computed by raft-layer callers in a later stage).
Allocates HeaderAADSize bytes. Hot-path callers should prefer AppendHeaderAADBytes to reuse a buffer.
func HydrateKeystoreFromSidecar ¶
func HydrateKeystoreFromSidecar(ks *Keystore, kek KEKUnwrapper, sc *Sidecar) error
HydrateKeystoreFromSidecar unwraps every DEK recorded in the sidecar under the supplied KEK and installs it into ks. It is the startup counterpart to the FSM-apply keystore install path (ApplyBootstrap / ApplyRotation): on a fresh process load the keystore comes up empty, and FSM replay only re-installs DEKs whose OpBootstrap / OpRotation entries are still in the Raft log. After a log-compaction window those entries are gone, so a node restarting against a compacted log would have no DEK bytes and the cipher could not decrypt existing §4.1 envelopes. The sidecar is the durable record of every unretired wrapped DEK, so this function rebuilds the in-memory keystore from it.
Every key in sc.Keys is hydrated (not just sc.Active.Storage): reads of versions written before a rotation need the historical DEK, so the cipher must hold every unretired DEK to satisfy Cipher.LoadedKeyIDs.
Idempotent: keystore.Set is a no-op for byte-identical DEKs, so calling this after some DEKs were already installed by FSM replay is safe. A Set that returns ErrKeyConflict means the KEK-unwrap produced different bytes for an id the keystore already holds — a halt condition surfaced to the caller.
nil ks or nil kek is a configuration error (the caller gates this on encryption being active); an empty sidecar (no Keys) is a no-op.
func IsEncryptionRelevantOpcode ¶
IsEncryptionRelevantOpcode reports whether the supplied FSM opcode byte is one of the §5.5 SIDECAR-MUTATING opcodes — the set of entry types whose apply path writes the §5.1 keys.json sidecar and therefore matters for the ErrSidecarBehindRaftLog gap-coverage check:
- 0x04 OpBootstrap (bootstrap-encryption — §5.6)
- 0x05 OpRotation (rotate-dek + future sub-tags rewrap-deks / retire-dek / enable-storage-envelope / enable-raft-envelope — §5.2 + §7.1)
- 0x06 / 0x07 (reserved slots in the fsmwire OpEncryption range for Stage 6E wire extensions; whether they end up sidecar-mutating is design-stage-defined, but conservatively treating them as relevant matches the §5.5 enumeration and forces a future ABI extension to be explicit about adding a NOT-relevant opcode).
IMPORTANT — 0x03 OpRegistration is EXCLUDED. ApplyRegistration in internal/encryption/applier.go only mutates writer-registry rows via SetRegistryRow; it never calls WriteSidecar. Treating it as sidecar-relevant would fire ErrSidecarBehindRaftLog on EVERY startup after the first registration, since the sidecar's raft_applied_index is not advanced by registration applies. The design's §5.5 enumeration matches this exclusion (rotate- dek, rewrap-deks, bootstrap-encryption, enable-storage-envelope, enable-raft-envelope, retire-dek — no register-writer).
IMPORTANT — wire-format contract: the opcode byte is `data[0]` of the FSM entry payload (the LEADING opcode tag), NOT the byte after a wire-version prefix. See kv/fsm_encryption.go which dispatches via `wireBytes[0]` and the EncodeBootstrap / EncodeRotation / EncodeRegistration helpers in fsmwire/wire.go which produce `[opcode, version=0x01, ...payload]`. A scanner that misreads the layout and inspects payload bytes (e.g., a rotation sub-tag at position 1+) would return false negatives, silently letting GuardSidecarBehindRaftLog miss encryption- relevant gaps and start with a stale sidecar.
The predicate is the §9.1 ErrSidecarBehindRaftLog guard's gap-coverage check: an unapplied entry in the sidecar/engine gap matters iff its `data[0]` is in this range. Non-sidecar- mutating entries (writes, transactions, control-plane RPCs, AND OpRegistration) in the gap do not affect the encryption sidecar and are safe to ignore.
Defined here (rather than in fsmwire) so the encryption package owns its semantic-level predicate; the fsmwire package owns the wire-level constants (OpBootstrap / OpEncryptionMax) that this function reads.
func IsNotExist ¶
IsNotExist reports whether err is a "sidecar file does not exist" error from ReadSidecar. Provided as a convenience so callers can branch on first-boot vs. malformed sidecar without unwrapping the fs.PathError manually.
func NodeID16 ¶
NodeID16 narrows a 64-bit full node id to the 16-bit node_id field of the §4.1 nonce (and of the writer-registry key). The truncation is the documented `uint16(full_node_id & 0xFFFF)` rule; cluster-wide uniqueness of the narrowed value is enforced by the writer registry + ErrNodeIDCollision guard, not by this function. Centralised here so call sites (main.go nonce-factory wiring, applier registry keys) share one masked-narrowing site instead of repeating the gosec-suppressed conversion.
func ProbeSidecarFilesystem ¶
ProbeSidecarFilesystem exercises the §5.1 crash-durable write protocol against the parent directory of sidecarPath WITHOUT disturbing an existing sidecar file. The probe writes a small sentinel file, fsyncs it, renames it, dir.Syncs the parent directory, and removes the sentinel. Any step that fails returns the failure wrapped with ErrUnsupportedFilesystem so callers can errors.Is-match it identically to the WriteSidecar path.
The probe runs before any encryption-relevant Raft entry can commit (CheckStartupGuards is called from main.go before buildShardGroups). Detecting NFS or a FUSE mount whose dir.Sync is a no-op at startup is dramatically cheaper than discovering it days later when the first bootstrap proposal commits and the sidecar write fails halfway through, leaving the cluster's Raft-committed bootstrap entry without a durable on-disk DEK.
Implementation notes:
- The parent directory must exist; we do not mkdir it because the operator-supplied --encryptionSidecarPath is meant to name an EXISTING data directory. Missing parent is propagated as the underlying os.MkdirTemp error (NOT wrapped with ErrUnsupportedFilesystem — a missing dir is a config error, not a filesystem-capability problem).
- The sentinel filename is prefixed with .encryption-probe- so a stale file left behind by an interrupted probe is easy to identify, and so it sorts away from the real keys.json in ls output.
- We do not need to test atomicity of os.Rename (POSIX guarantees it on the same filesystem); the failure mode this probe specifically catches is dir.Sync being a no-op or erroring out.
func RegistryDEKPrefix ¶
RegistryDEKPrefix returns the prefix that covers every writer registry row for dekID. Used by §5.4 retirement to drop the entire `!encryption|writers|<dek_id>|*` slice in the same Raft entry that retires the DEK.
func RegistryKey ¶
RegistryKey constructs the Pebble key for the (dek_id, uint16(node_id)) writer-registry row. The 16-bit truncation of the full FNV-64a node id is per §4.1 — the registry distinguishes truncation collisions via the value's full_node_id field, not the key.
func UnwrapRaftPayload ¶
UnwrapRaftPayload reverses WrapRaftPayload. Decodes the envelope, rebuilds the AAD identically, and calls Decrypt. The same `*Cipher` instance used at wrap time must hold the embedded keyID (or one of its rotated successors) for unwrap to succeed.
Surfaces typed errors callers can disambiguate via errors.Is:
- ErrEnvelopeShort: encoded shorter than HeaderSize+TagSize
- ErrEnvelopeVersion: unknown version byte
- ErrUnknownKeyID: DEK is not loaded (retired or sidecar missing)
- ErrIntegrity: GCM tag mismatch (tampered envelope, wrong DEK, or layer confusion with a storage envelope)
A storage envelope fed to UnwrapRaftPayload fails with ErrIntegrity because the storage AAD prefix ('envelope_version ‖ flag ‖ key_id ‖ value_header(9B) ‖ pebble_key') does not start with the raft-purpose byte 'R'.
func WrapRaftPayload ¶
WrapRaftPayload wraps payload in a §4.2 raft envelope under the DEK identified by keyID, using the supplied 12-byte nonce. The cipher must already hold the keyID under the "raft" purpose (the keystore itself does not enforce purpose — that contract is maintained by the sidecar loader).
The flag byte is fixed at 0x00; raft proposals do not carry the Snappy compression bit (the apply path is latency-sensitive and proposals are small / high-entropy).
Nonce uniqueness is the caller's responsibility: re-using a (keyID, nonce) pair under the same DEK is a catastrophic AES-GCM failure (key-recovery + plaintext XOR). The §4.2 deterministic nonce construction (`node_id ‖ local_epoch ‖ write_count`) guarantees uniqueness by construction; do not substitute a different scheme without an equivalent uniqueness proof.
func WriteSidecar ¶
WriteSidecar persists sc to path using the §5.1 crash-durable write protocol:
- Build the new contents in memory (sc.Version is set to SidecarVersion so the caller never has to remember).
- Write to <path>.tmp, then file.Sync().
- os.Rename(<path>.tmp, <path>).
- dir.Sync() on the parent directory so the rename is durable.
Skipping step 2 or 4 is a data-loss-class bug: a power loss between the rename and the directory inode flush can roll back keys.json while the rotation's Raft entry is already committed, stranding ciphertext under a wrap that has effectively disappeared. Per §5.1 this is treated as a hard precondition, not an optimisation.
The temp file is created with mode 0o600 so a stale tmp left behind after a crash is not world-readable.
Types ¶
type ActiveKeys ¶
ActiveKeys holds the active key_id per envelope purpose. A zero value (== ReservedKeyID) means "not bootstrapped" per §5.1.
type Applier ¶
type Applier struct {
// contains filtered or unexported fields
}
Applier is the §6.3 EncryptionApplier concrete implementation. It satisfies kv.EncryptionApplier and is wired at FSM construction via kv.NewKvFSMWithHLC(..., kv.WithEncryption(applier)).
Stage 6A ships:
ApplyRegistration — the §4.1 case 1 / case 2 dispatch. For (dek_id, uint16(full_node_id)) keying:
case 1: no existing row → insert. case 2 (strictly greater epoch, same full_node_id) → advance LastSeen. case 2-idempotent (equal epoch, same full_node_id) → no-op (legit Raft replay; rejecting it would halt-on-replay-loop after crash). case 3 (strictly smaller epoch, same full_node_id) → halt apply (rollback; recovered via §9.1). case 4 (different full_node_id at same uint16 truncation) → halt apply (§6.1 uniqueness invariant).
ApplyBootstrap and ApplyRotation — return the defense-in-depth ErrKEKNotConfigured marker. Stage 6B will swap these for the real KEK-unwrap + sidecar mutate + keystore install path.
Apart from the shared StateCache pointer (see below), the Applier carries no in-memory state of its own; durable state lives in the supplied WriterRegistryStore and the on-disk sidecar. The StateCache mirrors a small subset of sidecar fields the storage hot path consults on every Put — kept coherent by durable write-then-cache ordering inside each apply path.
func NewApplier ¶
func NewApplier(registry WriterRegistryStore, opts ...ApplierOption) (*Applier, error)
NewApplier wires an Applier against the supplied registry store plus optional KEK / Keystore / sidecar / clock dependencies. Returns an error if registry is nil so misconfiguration is caught at construction time rather than at first apply (the panic site is much harder to map back to a "you forgot to wire X" diagnosis when it fires deep inside a Raft apply loop).
Without WithKEK / WithKeystore / WithSidecarPath, the Applier retains the Stage 6A behaviour — ApplyRegistration is fully functional, ApplyBootstrap and ApplyRotation return the typed ErrKEKNotConfigured marker. This is the test default and the pre-Stage-6B production posture.
func (*Applier) ActiveStorageKeyID ¶
ActiveStorageKeyID delegates to the shared StateCache. Convenience for tests and single-applier callers; multi-shard wiring should prefer reading StateCache().ActiveStorageKeyID directly so the closure target is independent of which shard's Applier received the encryption apply.
func (*Applier) ApplyBootstrap ¶
func (a *Applier) ApplyBootstrap(raftIdx uint64, p fsmwire.BootstrapPayload) error
ApplyBootstrap implements §5.6 step 1a's initial bootstrap apply:
- KEK-unwrap the wrapped storage + raft DEK pair.
- Install both into the in-memory Keystore.
- Update the §5.1 sidecar — Active.{Storage,Raft} slots, keys[] map for both DEK IDs — and crash-durably persist via WriteSidecar.
- Batch-insert every RegistrationPayload in BatchRegistry as the cluster's initial writer-registry rows.
Without the trio of WithKEK / WithKeystore / WithSidecarPath supplied at construction, returns ErrKEKNotConfigured wrapped for the HaltApply pipeline — the defense-in-depth marker that keeps the no-options posture consistent with the FSM contract.
Ordering for crash recovery: Keystore.Set fires before WriteSidecar so an ErrKeyConflict from Set aborts the apply before any disk mutation. A crash before WriteSidecar loses the in-memory keystore on restart, but the entry stays in the Raft log unapplied — replay re-runs the full sequence. A crash after WriteSidecar but before batch insert is recovered by replay because §4.1 case-2-idempotent makes the per-row inserts no-op on the second pass.
Keystore.Set is idempotent for matching DEK bytes (returns nil) and returns ErrKeyConflict only if the same key_id maps to different bytes — which means a buggy KEK-unwrap path produced different output for the same wrapped input. That's a halt condition; the wrapped output is propagated.
Blocking behaviour: ApplyBootstrap performs synchronous IO for each step (KEK Unwrap may dial a KMS in production providers, WriteSidecar fsyncs to disk, every BatchRegistry row triggers a pebble.Sync). Stage 6A's main.go wiring invokes this from the FSM apply path, which is already serialised under the engine's applyMu, so the synchronous IO is part of the §6.3 contract — a slow Bootstrap blocks the apply loop until commit, which is the same shape every other 0x03/0x04/0x05 entry uses. The BatchRegistry insert is the most likely contributor to apply latency at scale (one pebble.Sync per cluster member); the §5.6 BootstrapBatchRowCap = 1<<14 keeps the worst case bounded. A future optimisation could replace the per-row Set with a pebble.Batch.Commit(pebble.Sync), but the current shape is correct and matches the Stage 6A ApplyRegistration semantics row-for-row.
func (*Applier) ApplyRegistration ¶
func (a *Applier) ApplyRegistration(p fsmwire.RegistrationPayload) error
ApplyRegistration implements §4.1's writer-registry insert dispatch. The payload's (DEKID, FullNodeID, LocalEpoch) maps to the Pebble key RegistryKey(DEKID, uint16(FullNodeID)) and the value RegistryValue{FullNodeID, FirstSeenLocalEpoch, LastSeenLocalEpoch}.
The four-case dispatch:
case 1 (no existing row at this key): insert with FirstSeen = LastSeen = payload.LocalEpoch.
case 2 (existing row, same FullNodeID, payload.LocalEpoch > existing.LastSeenLocalEpoch): update LastSeenLocalEpoch to payload.LocalEpoch. FirstSeen is preserved as the original first-registered value.
case 2-idempotent (existing row, same FullNodeID, payload.LocalEpoch == existing.LastSeenLocalEpoch): legitimate Raft replay. Raft re-applies committed entries after restart until the latest FSM snapshot, so a RegisterEncryptionWriter entry can be applied again with the same (dek_id, full_node_id, local_epoch). Returns nil with no row change. Rejecting equal epochs as rollback would pin a node in a halt-on-replay loop after any crash before snapshotting this entry.
case 3 (existing row, same FullNodeID, payload.LocalEpoch < existing.LastSeenLocalEpoch — strictly less): epoch rollback. Returns an error wrapped with ErrEncryptionApply so the kv dispatch layer halts apply. The §9.1 ErrLocalEpochRollback startup guard is what 6C ships to prevent this from being reachable in production; until then the apply-time halt is the load-bearing backstop.
case 4 (existing row, DIFFERENT FullNodeID under the same uint16 truncation): node-id collision per §6.1. Returns an error wrapped with ErrEncryptionApply. The startup ErrNodeIDCollision guard (6C) covers the cluster-wide check; this apply-time halt is the per-node backstop.
The fail-closed paths exist as defense-in-depth: PR760 established that the gRPC-layer mutator gate (registerEncryptionAdminServer) is the primary safety boundary; the apply-time checks here exist so a malformed entry that somehow committed (e.g., during a future refactor that bypasses the gate, or in a forensic / corruption scenario) still halts rather than silently advancing setApplied.
func (*Applier) ApplyRotation ¶
func (a *Applier) ApplyRotation(raftIdx uint64, p fsmwire.RotationPayload) error
ApplyRotation implements §5.2 / §5.4 rotation apply. The sub-tag dispatches the entry to the per-variant handler:
- RotateSubRotateDEK — install a new wrapped DEK and re-point the Active slot for its Purpose (Stage 6B-1).
- RotateSubEnableStorageEnvelope — one-shot storage-layer cutover (Stage 6D-4). Flips StorageEnvelopeActive and records StorageEnvelopeCutoverIndex inside a single sidecar fsync.
- RotateSubEnableRaftEnvelope — one-shot raft-layer cutover (Stage 6E-1). Records RaftEnvelopeCutoverIndex inside a single sidecar fsync. The engine apply-hook installed by 6E-2 dispatches `entry.Index > sidecar.RaftEnvelopeCutoverIndex` through the unwrap path; strict `>` makes the cutover entry itself (at index == cutover) flow through unwrap-free, which is the chicken/egg bootstrap.
Other sub-tags (rewrap-deks, retire-dek) land in later stages and return ErrEncryptionApply so the HaltApply seam fires on an unrecognised sub-tag rather than silently advancing setApplied.
Same WithKEK / WithKeystore / WithSidecarPath trio requirement as ApplyBootstrap; partial wiring returns ErrKEKNotConfigured at apply time.
func (*Applier) StateCache ¶
func (a *Applier) StateCache() *StateCache
StateCache returns the shared cache this Applier writes to on every apply path. main.go wires one StateCache across all per-shard Appliers via WithStateCache, but for callers that constructed an Applier without supplying one this accessor returns the privately-installed instance so tests can still reach the atomics directly.
func (*Applier) StorageEnvelopeActive ¶
StorageEnvelopeActive delegates to the shared StateCache. Same rationale as ActiveStorageKeyID above.
type ApplierOption ¶
type ApplierOption func(*Applier)
ApplierOption configures an Applier at construction. Stage 6A shipped with only the WriterRegistryStore required; Stage 6B adds the KEK / Keystore / SidecarPath plumbing that unlocks real ApplyBootstrap / ApplyRotation. The functional-option shape keeps the Stage 6A test surface (NewApplier(reg)) working byte-for-byte while letting production main.go layer in the new dependencies.
func WithKEK ¶
func WithKEK(unwrap KEKUnwrapper) ApplierOption
WithKEK wires the KEKUnwrapper used by ApplyBootstrap / ApplyRotation. Passing nil leaves the Applier in the Stage 6A posture where both paths short-circuit with ErrKEKNotConfigured.
func WithKeystore ¶
func WithKeystore(ks *Keystore) ApplierOption
WithKeystore wires the in-memory keystore the Applier mutates on Bootstrap / Rotation. The keystore lifetime spans the process — main.go passes the same instance the storage cipher is reading from.
func WithLocalEpoch ¶
func WithLocalEpoch(epoch uint16) ApplierOption
WithLocalEpoch installs the §4.1 storage write-path `local_epoch` this process load pinned its nonce factory to. Threaded through from `encryptionWriteWiring.epoch` so Stage 7b' rotation applies can record THIS node's highest-emitted local_epoch under the new DEK (see the `localEpoch` field doc on Applier and 7b' §3.1).
A static `uint16` rather than a `func() uint16` provider is deliberate (7b' §3.1): `BumpLocalEpoch` only runs at process start, so there is no runtime epoch-bump path that would require late binding. The option is omitted by FSM-internal test harnesses; the zero value preserves today's `LocalEpoch: 0` behaviour for them.
func WithNowFunc ¶
func WithNowFunc(now func() time.Time) ApplierOption
WithNowFunc overrides the wall-clock used for the sidecar's Created field. Tests pin this to a deterministic clock; the production default is time.Now. The Created field is diagnostic only — different replicas timestamp independently and that is fine (§5.1 does not require byte-equal sidecars).
func WithSidecarPath ¶
func WithSidecarPath(path string) ApplierOption
WithSidecarPath wires the §5.1 keys.json path the Applier crash-durably mutates on Bootstrap / Rotation. Empty path disables sidecar mutation (Stage 6A posture).
func WithStateCache ¶
func WithStateCache(c *StateCache) ApplierOption
WithStateCache installs a shared StateCache so that an apply landing on this Applier (typically the per-shard Applier whose FSM accepted the encryption proposal) updates atomics that every other Applier in the process reads. main.go owns one StateCache for the lifetime of the binary and threads the same pointer into every per-shard Applier and into the storage-layer per-Put closures.
If WithStateCache is omitted, NewApplier installs a private instance — preserves the single-applier ergonomics that tests and pre-multi-shard callers rely on.
type Cipher ¶
type Cipher struct {
// contains filtered or unexported fields
}
Cipher is the AES-256-GCM primitive over a Keystore.
Cipher does NOT compose AAD — callers in store/ (§4.1 AAD) and internal/raftengine/etcd/ (§4.2 AAD) supply the full AAD bytes. This keeps the cipher narrow and lets each layer choose the right AAD without baking storage/raft assumptions into the foundation package.
AES key expansion and GCM initialization happen once per DEK at Keystore.Set time; the hot path only needs an atomic.Pointer load and a Seal/Open call.
The zero value is NOT safe to use: Encrypt/Decrypt return ErrNilKeystore for a zero-value or nil Cipher rather than nil-deref panicking. Always construct via NewCipher.
func NewCipher ¶
NewCipher returns a Cipher backed by ks.
Returns ErrNilKeystore if ks is nil. Catching this at construction time turns a wiring mistake into a typed error during process startup or DEK rotation, rather than a nil-deref panic on the first Encrypt/Decrypt — important for the dynamic dependency-wiring paths where the encryption stack may be re-initialised after a sidecar resync (§5.5) or rotation (§5.2).
func (*Cipher) Decrypt ¶
Decrypt verifies and decrypts (ciphertext ‖ tag) using the DEK identified by keyID, the supplied nonce, and the same aad bytes that were passed to Encrypt.
On GCM tag mismatch, Decrypt returns an error wrapping ErrIntegrity. Per §4.1, callers MUST treat this as a typed read error and never silently zero or retry. The original Open error is attached as a secondary cause for diagnostic logging.
func (*Cipher) Encrypt ¶
Encrypt produces (ciphertext ‖ tag) for plaintext under the DEK identified by keyID and the supplied nonce. aad is treated verbatim.
Constraints:
- keyID must not be ReservedKeyID; otherwise ErrReservedKeyID.
- nonce must be NonceSize bytes; otherwise ErrBadNonceSize.
- keyID must be present in the Keystore; otherwise ErrUnknownKeyID.
CRITICAL: callers MUST NOT reuse the same (keyID, nonce) pair with any two distinct plaintexts. Nonce reuse under AES-GCM is catastrophic: it leaks the XOR of the two plaintexts and enables authentication-key recovery. The §4.1 storage-layer integration uses the nonce construction (node_id ‖ local_epoch ‖ write_count) to guarantee uniqueness by construction; do not substitute a different nonce scheme in that layer without a corresponding uniqueness proof. (For tests / benchmarks, fresh crypto/rand nonces are perfectly safe.)
The returned slice has length len(plaintext) + TagSize. It is freshly allocated; the caller may retain it indefinitely.
func (*Cipher) LoadedKeyIDs ¶
LoadedKeyIDs returns the sorted list of key_ids currently loaded in the underlying keystore. Used by the storage layer's rebadge guard to trial-decrypt a cleartext-labelled body against every candidate DEK — rotation leaves multiple DEKs active at once, and the on-disk envelope's key_id field can be rewritten by an attacker, so the guard must iterate rather than trust it.
Returns nil for a nil receiver or zero-value Cipher; callers MUST NOT treat that as "no keys" without considering the surrounding context.
type DeterministicNonceFactory ¶
type DeterministicNonceFactory struct {
// contains filtered or unexported fields
}
DeterministicNonceFactory is the production §4.1 storage-envelope nonce source. It emits the 12-byte deterministic nonce
bytes 0-1 node_id (big-endian uint16) bytes 2-3 local_epoch (big-endian uint16) bytes 4-11 write_count (big-endian uint64)
with zero random bits — nonce uniqueness is by construction across (node, process-load, write). It satisfies the store.NonceFactory interface structurally (Next() ([NonceSize]byte, error)).
Safety contract (see the parent encryption design §4.1):
- node_id is uint16(DeriveNodeID(--raftId)); cluster-wide 16-bit uniqueness is enforced by the writer registry + the ErrNodeIDCollision / membership-snapshot startup guards.
- local_epoch is pinned at construction from a value that was bumped and fsync'd on THIS process load (BumpLocalEpoch). The factory never advances the epoch; one factory instance == one process load == one epoch.
- write_count is an atomic counter that resets to 0 each process load. The reset is only safe BECAUSE local_epoch advanced, so constructing this factory with an un-bumped epoch reused across restarts is a correctness bug. Always pair NewDeterministicNonceFactory with a fresh BumpLocalEpoch on the active storage DEK.
This is the durable analogue of the test-only store.CounterNonceFactory: identical byte layout, but the epoch here carries the restart-safety guarantee the test factory lacks.
func NewDeterministicNonceFactory ¶
func NewDeterministicNonceFactory(nodeID, localEpoch uint16) *DeterministicNonceFactory
NewDeterministicNonceFactory constructs a factory pinned to (nodeID, localEpoch). write_count starts at 0 and increments on every Next(). The caller is responsible for having bumped and fsync'd localEpoch for this process load before issuing any nonce.
func (*DeterministicNonceFactory) Next ¶
func (f *DeterministicNonceFactory) Next() ([NonceSize]byte, error)
Next returns the next 12-byte nonce. The write_count is pre-incremented (the first nonce of a process load carries write_count=1, not 0) so that no nonce ever carries the all-zero write_count — keeping the nonce space disjoint from any future scheme that might want write_count=0 as a sentinel. The atomic add makes Next safe for concurrent callers.
Overflow: atomic.Uint64.Add wraps silently at 2^64. A wrap returns the value 0 (the pre-increment all-ones value plus one), which would recycle write_count=1.. under the same (node_id, local_epoch) — a catastrophic GCM nonce reuse. On the wrapping call Next latches the factory exhausted and returns ErrWriteCountExhausted; every subsequent call also returns ErrWriteCountExhausted (the latch is permanent — resuming from the recycled low range would reuse nonces already emitted this load). The boundary is unreachable in practice (2^64 writes per process load); recovery is a restart, which bumps local_epoch and resets write_count to a fresh, non-overlapping range.
type EncryptionRelevantScanner ¶
type EncryptionRelevantScanner interface {
HasEncryptionRelevantEntryInRange(startExclusive, endInclusive uint64) (bool, error)
}
EncryptionRelevantScanner is the cross-package contract that GuardSidecarBehindRaftLog uses to inspect a Raft entry-index range for encryption-relevant opcodes WITHOUT importing the raftengine into the encryption package.
The raftengine implements this in a follow-up PR (Stage 6C-2c) against its WAL + applied-snapshot state, exposing only the predicate result. Shipping the encryption-side primitive here without the engine-side implementation is intentional: it lets Stage 6D-and-later RPC handlers reuse the same predicate (`IsEncryptionRelevantOpcode`) without depending on raftengine having shipped its scanner first.
HasEncryptionRelevantEntryInRange returns true iff at least one Raft entry with index in [startExclusive+1, endInclusive] carries a §5.5-relevant opcode. The startExclusive parameter is the sidecar's last-applied index (which has already been reflected in the sidecar), so the entry AT that index is NOT in the gap; the entries that follow it are.
Implementations must handle the empty-range case (startExclusive >= endInclusive) by returning (false, nil) — the guard precomputes the gap-non-empty branch and only calls scan when there's actually a range to inspect, but defensive implementations are safer in the face of caller bugs.
type Envelope ¶
type Envelope struct {
Version byte
Flag byte
KeyID uint32
Nonce [NonceSize]byte
// Body is the concatenation of ciphertext and the GCM tag, as produced
// by AEAD.Seal. Length is plaintext_len + TagSize.
Body []byte
}
Envelope is the parsed form of the §4.1 wire format.
func DecodeEnvelope ¶
DecodeEnvelope parses an envelope. It does NOT verify the GCM tag — authentication happens at Cipher.Decrypt time once the AAD is known.
DecodeEnvelope copies Body so the returned Envelope does not alias src.
func (*Envelope) Encode ¶
Encode serialises the envelope into a single byte slice using the §4.1 wire format. The returned slice is freshly allocated.
Encode validates the envelope at build time so a programmer error (uninitialised Version, truncated Body) fails here with a clear stack trace, rather than surfacing later as a confusing DecodeEnvelope or Cipher.Decrypt failure on the read side. Returns:
- ErrEnvelopeVersion if Version is not EnvelopeVersionV1.
- ErrEnvelopeShort if Body is shorter than TagSize (every valid body must contain at least the GCM tag).
type KEKUnwrapper ¶
KEKUnwrapper is the abstraction the Applier uses to recover cleartext DEK bytes from the wrapped DEK material carried in BootstrapPayload / RotationPayload. The supplied implementation is exercised on every 0x04 / 0x05 apply, so it MUST be safe for concurrent use across replays.
kek.Wrapper from internal/encryption/kek satisfies this interface structurally — the Applier carries its own local interface declaration so the encryption package does not pick up a transitive dependency on the kek package's import graph (which in turn lets the kek package import from internal/encryption without a cycle when KMS providers land in Stage 9).
type Keystore ¶
type Keystore struct {
// contains filtered or unexported fields
}
Keystore is a copy-on-write map from key_id to (DEK, pre-init AEAD).
Reads on the hot path take a single atomic load and observe an immutable snapshot of the map. Writes (rotation, bootstrap, retire) allocate a new map and CAS it in via atomic.Pointer.Store.
Per §10 self-review lens 2: this avoids contending a mutex on the hot path while keeping rotation atomic with respect to readers.
Zero-value safety: a `var ks Keystore` (or a nil *Keystore) is degraded but does not panic — read methods (AEAD, DEK, Has, IDs, Len) treat it as the empty keystore, Delete is a no-op, and Set returns ErrNilKeystore for a nil receiver. Always prefer NewKeystore so an unwrap path that needs to install keys reports the wiring mistake immediately.
func (*Keystore) AEAD ¶
AEAD returns the pre-initialized cipher.AEAD for keyID, ready for Seal/Open. The returned value is safe for concurrent use by multiple goroutines (Go stdlib AEAD implementations are stateless after initialization).
Used by Cipher.Encrypt / Cipher.Decrypt on the hot path. Returns (nil, false) if keyID is not loaded, the receiver is nil, or the Keystore is zero-valued.
func (*Keystore) DEK ¶
DEK returns the raw 32-byte DEK for keyID. The returned array is a value copy — callers are free to mutate it without affecting the keystore. The bool reports whether keyID is loaded.
Most call sites should use AEAD instead; DEK is provided for the rotation / rewrap path that needs the raw key material to wrap it under a new KEK.
func (*Keystore) Delete ¶
Delete removes the DEK for keyID. No-op if absent, the receiver is nil, or the Keystore is zero-valued (no map ever Stored).
func (*Keystore) IDs ¶
IDs returns a sorted snapshot of all currently-loaded key_ids. Returns nil for a nil receiver or zero-value Keystore.
func (*Keystore) Len ¶
Len reports the number of currently-loaded keys. Returns 0 for a nil receiver or zero-value Keystore.
func (*Keystore) Set ¶
Set installs a DEK under keyID and pre-initializes the cipher.AEAD. dek must be exactly KeySize bytes; the reserved key_id 0 is rejected with ErrReservedKeyID. The DEK bytes are copied into the keystore so the caller is free to zero or reuse the source slice.
Set is set-once with idempotent-same semantics: re-Set under an existing keyID with byte-identical DEK is a no-op (returns nil), but Set with DIFFERENT bytes for an already-loaded keyID returns ErrKeyConflict. Replacing live key bytes for a keyID would render every envelope already persisted under that id undecryptable.
A nil receiver returns ErrNilKeystore; zero-value Keystores are rejected at the same boundary as Cipher.
type RegistryValue ¶
type RegistryValue struct {
FullNodeID uint64
FirstSeenLocalEpoch uint16
LastSeenLocalEpoch uint16
}
RegistryValue is the in-memory form of a writer-registry row's stored value: the full untruncated FNV-64a node id and the per-DEK first/last-seen local_epoch counters used by the §4.1 case 1/2/3 dispatch on RegisterEncryptionWriter apply.
func DecodeRegistryValue ¶
func DecodeRegistryValue(raw []byte) (RegistryValue, error)
DecodeRegistryValue parses an on-disk row value. Fails closed on length mismatch — a truncated or padded row is operational corruption, not a "missing" entry, so the caller surfaces ErrRegistryValueMalformed rather than silently treating the node as un-registered.
type Sidecar ¶
type Sidecar struct {
Version int `json:"version"`
RaftAppliedIndex uint64 `json:"raft_applied_index"`
// StorageEnvelopeActive flips to true exactly once, when the
// `RotateSubEnableStorageEnvelope` (§2.2 / §7) cutover entry
// applies. Pre-cutover storage Puts write cleartext; post-cutover
// Puts wrap values in the §4.1 envelope. The flip and the
// cutover index below are written inside a single
// crash-durable `WriteSidecar` fsync (§6.4 atomicity).
StorageEnvelopeActive bool `json:"storage_envelope_active"`
// StorageEnvelopeCutoverIndex is the Raft index at which the
// cutover entry applied. Set once by the first cutover apply
// (§6.4) and never overwritten by later encryption entries; the
// §3.2 idempotent-retry RPC path uses it as a stable
// applied_index across arbitrary subsequent Raft activity. A
// zero value with `StorageEnvelopeActive == false` is the
// pre-cutover baseline.
StorageEnvelopeCutoverIndex uint64 `json:"storage_envelope_cutover_index"`
RaftEnvelopeCutoverIndex uint64 `json:"raft_envelope_cutover_index"`
Active ActiveKeys `json:"active"`
// Keys is keyed by the decimal string form of key_id (per §5.1's
// "JSON object keys must be strings, but the on-disk envelope and
// the in-memory keystore always work in the binary uint32 form").
Keys map[string]SidecarKey `json:"keys"`
}
Sidecar is the parsed §5.1 keys.json layout.
All fields persisted under the §5.1 illustrative JSON are represented here. Fields not yet present in the design (Stage 9 audit log, etc.) are omitted; they will be added as additive fields when the relevant stage ships.
func ReadSidecar ¶
ReadSidecar parses the keys.json file at path. It validates the wire version, the per-key purpose, and the decimal-uint32 form of every keys-map entry, and rejects malformed sidecars with typed errors.
ReadSidecar does NOT KEK-unwrap the DEK bytes — it just hands the caller a parsed struct. Wrapping is the kek.Wrapper's job at a higher layer.
type SidecarKey ¶
type SidecarKey struct {
Purpose string `json:"purpose"`
Wrapped []byte `json:"wrapped"`
Created string `json:"created"`
LocalEpoch uint16 `json:"local_epoch"`
}
SidecarKey holds the metadata for a single wrapped DEK.
Wrapped is the KEK-wrapped DEK bytes (encoding/json base64-encodes []byte automatically). Created is an ISO-8601 timestamp string; the package keeps it as a plain string rather than time.Time so a future timezone-format addition does not break older readers. LocalEpoch is consumed by the §4.1 nonce construction.
type StartupConfig ¶
type StartupConfig struct {
// EncryptionEnabled mirrors --encryption-enabled. When true the
// cluster has opted in to the §7.1 rollout; when false the node
// must refuse to start if encrypted on-disk state already exists
// (downgrade prevention).
EncryptionEnabled bool
// KEKConfigured is true iff --kekFile is non-empty. The KEK
// itself is supplied via KEK below; KEKConfigured exists
// independently so the helper can distinguish "operator did
// not supply a KEK source" from "supplied but failed to load"
// (the latter is handled at main.go's loadKEKWrapperFromFlag()
// call site, before this guard runs).
KEKConfigured bool
// KEK is the loaded KEK wrapper. May be nil iff KEKConfigured
// is false. When EncryptionEnabled and KEKConfigured are both
// true the guard uses KEK.Unwrap to verify each wrapped DEK
// in the sidecar decrypts under the configured KEK.
KEK KEKUnwrapper
// SidecarPath is the absolute path to the §5.1 keys.json file.
// May be empty; an empty path skips every sidecar-dependent
// guard. Stage 6B-2's triple-gate readback (sidecarPath !=
// "") covers the mutator-RPC side, this field covers the
// startup-refusal side.
SidecarPath string
}
StartupConfig is the input to CheckStartupGuards. Each field is derived from the operator-facing flags at process startup; the helper exists so the §9.1 refusal logic can be unit-tested without reaching back into main.go's flag plumbing.
All paths are operator-supplied (--kekFile, --encryptionSidecarPath); empty values mean "operator did not provide one". The presence of a sidecar on disk is detected by os.Stat on SidecarPath, not by an explicit boolean — operators do not pass "sidecar exists" as a flag, the on-disk state is the source of truth.
type StateCache ¶
type StateCache struct {
// contains filtered or unexported fields
}
StateCache mirrors the sidecar fields the storage hot path needs to consult on every Put. Two requirements drive its existence:
ReadSidecar-on-every-Put would serialise the hot path through a JSON parse + fsync barrier. atomic.Uint32 / atomic.Bool give a wait-free single-load read instead.
In a multi-group deployment, encryption FSM entries apply on whichever shard's leader accepted the proposal — not on every shard. The per-shard storage layers must still observe the updated state, so the cache MUST be a process-shared singleton rather than a per-Applier field. main.go constructs one StateCache at startup (parallel to the shared *Keystore) and threads it into every per-shard Applier via WithStateCache.
Coherence with disk is maintained by **durable write-then-cache** ordering: NewApplier primes the cache from ReadSidecar, and every apply path calls RefreshFromSidecar AFTER WriteSidecar succeeds. A crash between fsync and atomic store is benign because the next process start re-primes from disk.
Zero values match the pre-bootstrap posture (no active storage DEK, envelope gate off) so a freshly-constructed StateCache is safe to use before any apply or prime has run.
func NewStateCache ¶
func NewStateCache() *StateCache
NewStateCache returns a zero-initialised StateCache. The pre-bootstrap posture (Active.Storage=0, StorageEnvelopeActive=false) is the correct initial state; RefreshFromSidecar advances it to the current sidecar values when one is supplied.
func (*StateCache) ActiveStorageKeyID ¶
func (c *StateCache) ActiveStorageKeyID() (uint32, bool)
ActiveStorageKeyID returns the current sidecar.Active.Storage DEK id. Signature matches store.ActiveStorageKeyID so main.go can pass `cache.ActiveStorageKeyID` directly into `store.WithEncryption(...)` as the per-Put activeKeyID closure. A non-zero id with ok=true means the cluster has run BootstrapEncryption; zero with ok=false means the cluster is still pre-bootstrap and the storage layer should write cleartext.
func (*StateCache) MarkRegistered ¶
func (c *StateCache) MarkRegistered(dekID uint32)
MarkRegistered records that this process load's §4.1 writer registration has committed for storage DEK dekID. Stage 7a's registration paths call it once their barrier closes (or in the already-registered startup branch). Idempotent; a zero dekID is a no-op so a not-bootstrapped caller cannot accidentally mark the "no DEK" sentinel as registered.
func (*StateCache) RefreshFromSidecar ¶
func (c *StateCache) RefreshFromSidecar(sc *Sidecar)
RefreshFromSidecar copies the relevant fields out of sc into the atomic mirrors. Safe to call concurrently with reads; safe to call from multiple goroutines (writers race to the same atomic CAS path, but the only writer in production is the FSM apply goroutine of the shard that accepted the encryption proposal).
nil sc is a no-op: matches the pre-bootstrap posture where ReadSidecar returns IsNotExist.
func (*StateCache) Registered ¶
func (c *StateCache) Registered() bool
Registered reports whether this process load has confirmed its §4.1 writer registration for the currently-active storage DEK. It is the predicate Stage 7a-2's WithStorageRegistrationGate consults on the direct write path: a self-originated encrypted write is refused (ErrWriterNotRegistered) until Registered() is true.
It is deliberately per-DEK and fail-closed (design §2.3 forbids any fail-OPEN fallback): a node that has not registered for the active DEK — including a freshly rotated DEK it has not yet re-registered under (7b) — is gated. The runtime cases where a node legitimately needs to (re)register before its first encrypted direct write (Phase-0 boot then runtime EnableStorageEnvelope; non-proposer node after a runtime RotateDEK) are the deferred runtime-registration follow-on; until it lands they fail closed, which is the safe posture. No real runtime direct-write caller exists today — the only direct ApplyMutations path is the startup catalog bootstrap Save, which is covered by retryUntilRegistered + the already-registered MarkRegistered seed.
Lock-free: two atomic loads. Returns false when there is no active storage DEK (id == 0) so a pre-bootstrap process never claims to be registered; once a DEK is active, returns true only when this load has marked that exact id (so 7b's rotate-dek to a new id re-arms the gate until the post-rotation registration marks the new id).
func (*StateCache) StorageEnvelopeActive ¶
func (c *StateCache) StorageEnvelopeActive() bool
StorageEnvelopeActive returns the in-memory mirror of sidecar.StorageEnvelopeActive. Signature matches store.StorageEnvelopeActive so main.go can pass `cache.StorageEnvelopeActive` directly into `store.WithStorageEnvelopeGate(...)` as the per-Put cutover gate. Once true, the storage layer wraps every new version in the §4.1 envelope; flips exactly once per cluster lifetime when the §7.1 Phase 1 cutover entry applies.
type WriterRegistryStore ¶
type WriterRegistryStore interface {
// GetRegistryRow returns the value at the supplied registry
// key. The boolean is true iff the row exists; on a missing
// row both `value` and the boolean are zero. The error path
// is reserved for storage faults (Pebble error, etc.); a
// missing row is NOT an error.
GetRegistryRow(key []byte) (value []byte, ok bool, err error)
// SetRegistryRow writes a registry row durably. Overwrites
// any existing value at the same key. Idempotent at the
// (key, value) tuple level — writing the same value twice
// has no observable effect.
SetRegistryRow(key []byte, value []byte) error
}
WriterRegistryStore is the storage abstraction the Applier needs to read and write §4.1 writer-registry rows under the `!encryption|writers|<dek_id>|<uint16(node_id)>` Pebble prefix.
The interface stays separate from store.MVCCStore because writer- registry rows live OUTSIDE the MVCC namespace — they are metadata about the cluster's encryption state, not user data. They carry no commit timestamps, no MVCC visibility, and no retention semantics; the Pebble row is the durable form and is replayed on FSM apply just like any other 0x03 entry.
Implementations MUST make Set durable before returning (or the next FSM apply will see a stale read on Get and §4.1 case 2's monotonic-epoch check will incorrectly accept a rolled-back local_epoch). The main.go wiring satisfies this by routing through the FSM's `pebble.Sync` Set path; the in-memory MapWriterRegistryStore used in tests trivially satisfies it.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package fsmwire defines the §6.3 / §11 binary wire format for the FSM-internal encryption Raft entry types (opcodes 0x03 / 0x04 / 0x05).
|
Package fsmwire defines the §6.3 / §11 binary wire format for the FSM-internal encryption Raft entry types (opcodes 0x03 / 0x04 / 0x05). |
|
Package kek implements KEK (Key Encryption Key) providers that wrap and unwrap DEKs (Data Encryption Keys) per §5.1 of the data-at-rest encryption design.
|
Package kek implements KEK (Key Encryption Key) providers that wrap and unwrap DEKs (Data Encryption Keys) per §5.1 of the data-at-rest encryption design. |