Documentation
¶
Overview ¶
Package codec implements the v3 storage engine's record codec layer.
The codec layer encodes and decodes c1.storage.v3 record types onto Pebble's key-value primitives. It uses a hybrid pattern:
Generated, typed codecs for the SDK's six built-in record types, registered via init() during package import. The hot write path dispatches to these.
Cached descriptor-reflection codecs (*ReflectCodec) for any other proto message — debug paths, manifest descriptor walking, potential future extension protos. Lazily constructed on first Lookup miss and cached process-wide.
Index ¶
- Variables
- func AppendTupleBool(dst []byte, b bool) []byte
- func AppendTupleBytes(dst []byte, b []byte) []byte
- func AppendTupleInt32(dst []byte, n int32) []byte
- func AppendTupleInt64(dst []byte, n int64) []byte
- func AppendTupleSeparator(dst []byte) []byte
- func AppendTupleString(dst []byte, s string) []byte
- func AppendTupleStrings(dst []byte, s ...string) []byte
- func AppendTupleUint32(dst []byte, n uint32) []byte
- func AppendTupleUint64(dst []byte, n uint64) []byte
- func DecodeSyncID(b []byte) string
- func DecodeTupleStringAlias(src []byte, off int) ([]byte, int, bool)
- func DecodeTupleStringTo(dst []byte, src []byte, off int) ([]byte, int, error)
- func EncodeSyncID(s string) ([]byte, error)
- func KeyUpperBound(prefix []byte) []byte
- func Register(name protoreflect.FullName, c Codec)
- func RegisteredNames() []protoreflect.FullName
- type Codec
- type ReflectCodec
- func (c *ReflectCodec) DecodeValue(b []byte, dst proto.Message) error
- func (c *ReflectCodec) DeleteIndexes(batch *pebble.Batch, msg proto.Message) error
- func (c *ReflectCodec) EncodeKey(msg proto.Message) ([]byte, error)
- func (c *ReflectCodec) EncodeValue(msg proto.Message) ([]byte, error)
- func (c *ReflectCodec) WriteIndexes(batch *pebble.Batch, msg proto.Message) error
Constants ¶
This section is empty.
Variables ¶
var ErrCodecTypeMismatch = errors.New("codec: input message type does not match registered codec")
ErrCodecTypeMismatch is returned by a generated codec when the supplied proto.Message is not the type the codec was registered for. Engine write paths surface this as a DataLoss-class error.
var ErrInvalidSyncID = errors.New("codec: sync_id is not a valid KSUID")
ErrInvalidSyncID is returned when EncodeSyncID receives a string that is not a parseable KSUID.
var ErrInvalidTuple = errors.New("codec: invalid tuple encoding")
ErrInvalidTuple is returned by tuple decoders when the bytes are malformed (e.g. an escape sequence with no follower, or a truncated fixed-width integer).
var ErrReflectMissingTable = fmt.Errorf("codec: descriptor missing (storage.v3.table) option")
ErrReflectMissingTable is returned by EncodeKey / WriteIndexes when the descriptor lacks a (storage.v3.table) option. Reflection codecs can still encode/decode values without one; only key paths require the table metadata.
Functions ¶
func AppendTupleBool ¶
AppendTupleBool writes 0x26 for false, 0x27 for true. These bytes sort false-before-true and never collide with the separator (0x00) or escape (0x01).
func AppendTupleBytes ¶
AppendTupleBytes writes a tuple-encoded raw-bytes component. Same escape rules as strings — needed because some connectors emit external IDs as opaque bytes that may contain embedded NUL.
func AppendTupleInt32 ¶
AppendTupleInt32 writes a sign-flipped big-endian 4-byte int32. Sign-flipping puts negative numbers before non-negative in bytewise comparison, matching natural int order.
func AppendTupleInt64 ¶
AppendTupleInt64 writes a sign-flipped big-endian 8-byte int64.
func AppendTupleSeparator ¶
AppendTupleSeparator writes a single separator byte between elements. Callers emit this themselves so the encoder is composable — e.g. a record's primary-key emission appends version + type + sync_id + separator + external_id with no separator at the end.
func AppendTupleString ¶
AppendTupleString writes a tuple-encoded string component (no trailing separator). The caller is responsible for emitting the separator between successive components.
func AppendTupleStrings ¶
AppendTupleStrings tuple-encodes each string in s and interleaves the tuple separator between successive elements. Equivalent to calling AppendTupleString in a loop with AppendTupleSeparator between calls — but in one place, so key-encoding sites can't silently drift on "did I emit one too many / one too few separators?".
No leading or trailing separator is emitted. Callers that need a leading separator (e.g. to delimit the raw sync_id bytes that precede the tuple tail in every Pebble v3 key) or a trailing separator (e.g. to make a by-value range-scan prefix unambiguous — see keys.go's convention doc) must add it themselves.
For a single string, AppendTupleStrings(dst, s) is exactly equivalent to AppendTupleString(dst, s).
func AppendTupleUint32 ¶
AppendTupleUint32 writes a big-endian 4-byte uint32 (no sign flip).
func AppendTupleUint64 ¶
AppendTupleUint64 writes a big-endian 8-byte uint64 (no sign flip).
func DecodeSyncID ¶
DecodeSyncID converts a 20-byte canonical KSUID back to its base62 string form for human-readable display. Returns an empty string if b is not exactly 20 bytes.
func DecodeTupleStringAlias ¶ added in v0.15.3
DecodeTupleStringAlias decodes a single tuple-encoded string component from src starting at offset off. It is the zero-alloc counterpart to DecodeTupleStringTo for read-only callers on the hot path:
- When the component contains no escape byte (the overwhelmingly common case for ids), the returned slice ALIASES src — no allocation. It is only valid until src is mutated or its backing iterator advances.
- When the component contains an escape sequence, a decoded copy is allocated, identical to DecodeTupleStringTo(nil, ...).
The second return is the offset of the terminating separator byte, or len(src) if the component runs to end-of-input — same convention as DecodeTupleStringTo. The bool is false only when the input ends inside an escape sequence (malformed).
Finding the component end by scanning for the next 0x00 is correct because the escape rules guarantee no component's encoded bytes contain a bare 0x00.
func DecodeTupleStringTo ¶
DecodeTupleStringTo decodes a single tuple-encoded string from src starting at offset off. Returns the decoded string, the offset immediately after the consumed bytes (pointing at the separator or end-of-input), and any error. If the input ends inside an escape sequence, returns ErrInvalidTuple.
func EncodeSyncID ¶
EncodeSyncID converts a KSUID string into its 20-byte canonical binary form. baton's sync_id values are KSUIDs (27-char base62); storing them in keys as base62 strings would burn ~7 bytes per occurrence × N indexes × 100M+ rows = real space. The binary form is uniformly 20 bytes and lex-compares identically to the base62 form because KSUIDs are sortable by their timestamp prefix.
Returns ErrInvalidSyncID if s is not a valid KSUID string.
func KeyUpperBound ¶ added in v0.15.3
KeyUpperBound returns the lexicographically smallest key strictly greater than every key carrying prefix: the prefix with its last non-0xff byte incremented and any trailing 0xff bytes dropped. It returns nil when prefix is empty or all 0xff — there is no finite upper bound, so a range scan should run to the end of the keyspace. The input is not modified.
func Register ¶
func Register(name protoreflect.FullName, c Codec)
Register installs a codec under its proto full-name. Called only from generated init() functions. Panics on duplicate registration — that's a build error, not a runtime concern.
func RegisteredNames ¶
func RegisteredNames() []protoreflect.FullName
RegisteredNames returns the proto full-names of all generated codecs registered in the binary. Intended for test introspection and for the manifest's RecordTypeInfo population.
Types ¶
type Codec ¶
type Codec interface {
// EncodeKey returns the primary-key bytes for the message.
// The bytes are tuple-encoded per the codec's record-type
// declaration. Returns ErrCodecTypeMismatch if msg is not the
// type the codec was registered for.
EncodeKey(msg proto.Message) ([]byte, error)
// EncodeValue returns deterministic proto wire bytes for the
// message. Generated codecs use proto.MarshalOptions{Deterministic:
// true} so two equal records produce equal bytes — required for
// the equivalence harness's byte-equality assertion.
EncodeValue(msg proto.Message) ([]byte, error)
// DecodeValue parses bytes into dst. dst must be the same type
// as the registered codec; mismatches return ErrCodecTypeMismatch.
DecodeValue(b []byte, dst proto.Message) error
// WriteIndexes appends all secondary-index entries for msg to
// batch. Called inside a pebble.Batch alongside the primary write.
WriteIndexes(batch *pebble.Batch, msg proto.Message) error
// DeleteIndexes appends index-entry deletions for msg to batch.
// Called during overwrite (after reading the previous value) and
// during explicit Delete. Same atomicity as WriteIndexes.
DeleteIndexes(batch *pebble.Batch, msg proto.Message) error
}
Codec is the per-record-type interface every storage codec implements. Methods return errors on type-assertion mismatch (ErrCodecTypeMismatch) rather than panicking; the engine write path plumbs errors and surfaces a DataLoss-class gRPC code to upstream callers.
Generated codecs hold a private type-assertion at the entry point; reflection codecs evaluate against the descriptor at call time.
func Lookup ¶
func Lookup(md protoreflect.MessageDescriptor) Codec
Lookup returns the codec for the given message descriptor. If a generated codec is registered, returns it (hot path, lock-free read from a frozen map). Otherwise constructs a ReflectCodec lazily, caches it in reflectCache, and returns it.
Lookup never returns nil and never returns an error; an unknown descriptor produces a working reflection codec. Whether that codec can actually encode keys depends on whether the descriptor has the required (storage.v3.table) option — ReflectCodec's methods return errors at call time if the descriptor lacks the necessary metadata.
type ReflectCodec ¶
type ReflectCodec struct {
// contains filtered or unexported fields
}
ReflectCodec encodes records via cached descriptor reflection. It satisfies the value-encoding portions of the Codec interface for any message. Key and index encoding require typed metadata and are provided by the built-in codecs.
ReflectCodec is constructed lazily by Lookup() and cached process-wide. The construction cost — resolving primary-key field paths and index declarations — is paid once per descriptor across the entire process; subsequent calls reuse the cached codec.
Performance: ~5× slower than a generated typed codec at the same workload. Used only off the engine's hot write path.
func NewReflectCodec ¶
func NewReflectCodec(md protoreflect.MessageDescriptor) *ReflectCodec
NewReflectCodec constructs a codec for the given message descriptor. Callers should generally go through Lookup() instead, which caches.
func (*ReflectCodec) DecodeValue ¶
func (c *ReflectCodec) DecodeValue(b []byte, dst proto.Message) error
func (*ReflectCodec) DeleteIndexes ¶
func (*ReflectCodec) EncodeKey ¶
func (c *ReflectCodec) EncodeKey(msg proto.Message) ([]byte, error)
EncodeKey is unsupported for reflection codecs. Built-in record types use typed codecs for primary keys and indexes.
func (*ReflectCodec) EncodeValue ¶
func (c *ReflectCodec) EncodeValue(msg proto.Message) ([]byte, error)
EncodeValue uses deterministic proto marshal — same contract as generated codecs.