codec

package
v0.15.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 16, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package codec implements the v3 storage engine's record codec layer.

The codec layer encodes and decodes c1.storage.v3 record types onto Pebble's key-value primitives. It uses a hybrid pattern:

  • Generated, typed codecs for the SDK's six built-in record types, registered via init() during package import. The hot write path dispatches to these.

  • Cached descriptor-reflection codecs (*ReflectCodec) for any other proto message — debug paths, manifest descriptor walking, potential future extension protos. Lazily constructed on first Lookup miss and cached process-wide.

Index

Constants

This section is empty.

Variables

View Source
var ErrCodecTypeMismatch = errors.New("codec: input message type does not match registered codec")

ErrCodecTypeMismatch is returned by a generated codec when the supplied proto.Message is not the type the codec was registered for. Engine write paths surface this as a DataLoss-class error.

View Source
var ErrInvalidSyncID = errors.New("codec: sync_id is not a valid KSUID")

ErrInvalidSyncID is returned when EncodeSyncID receives a string that is not a parseable KSUID.

View Source
var ErrInvalidTuple = errors.New("codec: invalid tuple encoding")

ErrInvalidTuple is returned by tuple decoders when the bytes are malformed (e.g. an escape sequence with no follower, or a truncated fixed-width integer).

View Source
var ErrReflectMissingTable = fmt.Errorf("codec: descriptor missing (storage.v3.table) option")

ErrReflectMissingTable is returned by EncodeKey / WriteIndexes when the descriptor lacks a (storage.v3.table) option. Reflection codecs can still encode/decode values without one; only key paths require the table metadata.

Functions

func AppendTupleBool

func AppendTupleBool(dst []byte, b bool) []byte

AppendTupleBool writes 0x26 for false, 0x27 for true. These bytes sort false-before-true and never collide with the separator (0x00) or escape (0x01).

func AppendTupleBytes

func AppendTupleBytes(dst []byte, b []byte) []byte

AppendTupleBytes writes a tuple-encoded raw-bytes component. Same escape rules as strings — needed because some connectors emit external IDs as opaque bytes that may contain embedded NUL.

func AppendTupleInt32

func AppendTupleInt32(dst []byte, n int32) []byte

AppendTupleInt32 writes a sign-flipped big-endian 4-byte int32. Sign-flipping puts negative numbers before non-negative in bytewise comparison, matching natural int order.

func AppendTupleInt64

func AppendTupleInt64(dst []byte, n int64) []byte

AppendTupleInt64 writes a sign-flipped big-endian 8-byte int64.

func AppendTupleSeparator

func AppendTupleSeparator(dst []byte) []byte

AppendTupleSeparator writes a single separator byte between elements. Callers emit this themselves so the encoder is composable — e.g. a record's primary-key emission appends version + type + sync_id + separator + external_id with no separator at the end.

func AppendTupleString

func AppendTupleString(dst []byte, s string) []byte

AppendTupleString writes a tuple-encoded string component (no trailing separator). The caller is responsible for emitting the separator between successive components.

func AppendTupleStrings

func AppendTupleStrings(dst []byte, s ...string) []byte

AppendTupleStrings tuple-encodes each string in s and interleaves the tuple separator between successive elements. Equivalent to calling AppendTupleString in a loop with AppendTupleSeparator between calls — but in one place, so key-encoding sites can't silently drift on "did I emit one too many / one too few separators?".

No leading or trailing separator is emitted. Callers that need a leading separator (e.g. to delimit the raw sync_id bytes that precede the tuple tail in every Pebble v3 key) or a trailing separator (e.g. to make a by-value range-scan prefix unambiguous — see keys.go's convention doc) must add it themselves.

For a single string, AppendTupleStrings(dst, s) is exactly equivalent to AppendTupleString(dst, s).

func AppendTupleUint32

func AppendTupleUint32(dst []byte, n uint32) []byte

AppendTupleUint32 writes a big-endian 4-byte uint32 (no sign flip).

func AppendTupleUint64

func AppendTupleUint64(dst []byte, n uint64) []byte

AppendTupleUint64 writes a big-endian 8-byte uint64 (no sign flip).

func DecodeSyncID

func DecodeSyncID(b []byte) string

DecodeSyncID converts a 20-byte canonical KSUID back to its base62 string form for human-readable display. Returns an empty string if b is not exactly 20 bytes.

func DecodeTupleStringAlias added in v0.15.3

func DecodeTupleStringAlias(src []byte, off int) ([]byte, int, bool)

DecodeTupleStringAlias decodes a single tuple-encoded string component from src starting at offset off. It is the zero-alloc counterpart to DecodeTupleStringTo for read-only callers on the hot path:

  • When the component contains no escape byte (the overwhelmingly common case for ids), the returned slice ALIASES src — no allocation. It is only valid until src is mutated or its backing iterator advances.
  • When the component contains an escape sequence, a decoded copy is allocated, identical to DecodeTupleStringTo(nil, ...).

The second return is the offset of the terminating separator byte, or len(src) if the component runs to end-of-input — same convention as DecodeTupleStringTo. The bool is false only when the input ends inside an escape sequence (malformed).

Finding the component end by scanning for the next 0x00 is correct because the escape rules guarantee no component's encoded bytes contain a bare 0x00.

func DecodeTupleStringTo

func DecodeTupleStringTo(dst []byte, src []byte, off int) ([]byte, int, error)

DecodeTupleStringTo decodes a single tuple-encoded string from src starting at offset off. Returns the decoded string, the offset immediately after the consumed bytes (pointing at the separator or end-of-input), and any error. If the input ends inside an escape sequence, returns ErrInvalidTuple.

func EncodeSyncID

func EncodeSyncID(s string) ([]byte, error)

EncodeSyncID converts a KSUID string into its 20-byte canonical binary form. baton's sync_id values are KSUIDs (27-char base62); storing them in keys as base62 strings would burn ~7 bytes per occurrence × N indexes × 100M+ rows = real space. The binary form is uniformly 20 bytes and lex-compares identically to the base62 form because KSUIDs are sortable by their timestamp prefix.

Returns ErrInvalidSyncID if s is not a valid KSUID string.

func KeyUpperBound added in v0.15.3

func KeyUpperBound(prefix []byte) []byte

KeyUpperBound returns the lexicographically smallest key strictly greater than every key carrying prefix: the prefix with its last non-0xff byte incremented and any trailing 0xff bytes dropped. It returns nil when prefix is empty or all 0xff — there is no finite upper bound, so a range scan should run to the end of the keyspace. The input is not modified.

func Register

func Register(name protoreflect.FullName, c Codec)

Register installs a codec under its proto full-name. Called only from generated init() functions. Panics on duplicate registration — that's a build error, not a runtime concern.

func RegisteredNames

func RegisteredNames() []protoreflect.FullName

RegisteredNames returns the proto full-names of all generated codecs registered in the binary. Intended for test introspection and for the manifest's RecordTypeInfo population.

Types

type Codec

type Codec interface {
	// EncodeKey returns the primary-key bytes for the message.
	// The bytes are tuple-encoded per the codec's record-type
	// declaration. Returns ErrCodecTypeMismatch if msg is not the
	// type the codec was registered for.
	EncodeKey(msg proto.Message) ([]byte, error)

	// EncodeValue returns deterministic proto wire bytes for the
	// message. Generated codecs use proto.MarshalOptions{Deterministic:
	// true} so two equal records produce equal bytes — required for
	// the equivalence harness's byte-equality assertion.
	EncodeValue(msg proto.Message) ([]byte, error)

	// DecodeValue parses bytes into dst. dst must be the same type
	// as the registered codec; mismatches return ErrCodecTypeMismatch.
	DecodeValue(b []byte, dst proto.Message) error

	// WriteIndexes appends all secondary-index entries for msg to
	// batch. Called inside a pebble.Batch alongside the primary write.
	WriteIndexes(batch *pebble.Batch, msg proto.Message) error

	// DeleteIndexes appends index-entry deletions for msg to batch.
	// Called during overwrite (after reading the previous value) and
	// during explicit Delete. Same atomicity as WriteIndexes.
	DeleteIndexes(batch *pebble.Batch, msg proto.Message) error
}

Codec is the per-record-type interface every storage codec implements. Methods return errors on type-assertion mismatch (ErrCodecTypeMismatch) rather than panicking; the engine write path plumbs errors and surfaces a DataLoss-class gRPC code to upstream callers.

Generated codecs hold a private type-assertion at the entry point; reflection codecs evaluate against the descriptor at call time.

func Lookup

Lookup returns the codec for the given message descriptor. If a generated codec is registered, returns it (hot path, lock-free read from a frozen map). Otherwise constructs a ReflectCodec lazily, caches it in reflectCache, and returns it.

Lookup never returns nil and never returns an error; an unknown descriptor produces a working reflection codec. Whether that codec can actually encode keys depends on whether the descriptor has the required (storage.v3.table) option — ReflectCodec's methods return errors at call time if the descriptor lacks the necessary metadata.

type ReflectCodec

type ReflectCodec struct {
	// contains filtered or unexported fields
}

ReflectCodec encodes records via cached descriptor reflection. It satisfies the value-encoding portions of the Codec interface for any message. Key and index encoding require typed metadata and are provided by the built-in codecs.

ReflectCodec is constructed lazily by Lookup() and cached process-wide. The construction cost — resolving primary-key field paths and index declarations — is paid once per descriptor across the entire process; subsequent calls reuse the cached codec.

Performance: ~5× slower than a generated typed codec at the same workload. Used only off the engine's hot write path.

func NewReflectCodec

func NewReflectCodec(md protoreflect.MessageDescriptor) *ReflectCodec

NewReflectCodec constructs a codec for the given message descriptor. Callers should generally go through Lookup() instead, which caches.

func (*ReflectCodec) DecodeValue

func (c *ReflectCodec) DecodeValue(b []byte, dst proto.Message) error

func (*ReflectCodec) DeleteIndexes

func (c *ReflectCodec) DeleteIndexes(batch *pebble.Batch, msg proto.Message) error

func (*ReflectCodec) EncodeKey

func (c *ReflectCodec) EncodeKey(msg proto.Message) ([]byte, error)

EncodeKey is unsupported for reflection codecs. Built-in record types use typed codecs for primary keys and indexes.

func (*ReflectCodec) EncodeValue

func (c *ReflectCodec) EncodeValue(msg proto.Message) ([]byte, error)

EncodeValue uses deterministic proto marshal — same contract as generated codecs.

func (*ReflectCodec) WriteIndexes

func (c *ReflectCodec) WriteIndexes(batch *pebble.Batch, msg proto.Message) error

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL