Documentation
¶
Overview ¶
Package codec encodes and decodes iceberg-go values for cross-process transport. The bytes it produces are the same Avro bytes a manifest carries for the corresponding value, so callers transport iceberg internals without inventing a parallel wire schema.
EncodeDataFile / DecodeDataFile move a single iceberg.DataFile using the manifest-entry encoding for a given partition spec, table schema, and format version.
EncodeFileScanTask / DecodeFileScanTask layer on top: each embedded DataFile is encoded with EncodeDataFile, then wrapped alongside the scan range and v3 row lineage in a small Avro envelope.
The receiver supplies (spec, schema, version) out of band. Both sides in a distributed-processing design already hold table metadata, and the per-(partition-type, version) avro schema is cached.
Index ¶
- func DecodeDataFile(data []byte, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) (iceberg.DataFile, error)
- func DecodeFileScanTask(data []byte, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) (table.FileScanTask, error)
- func EncodeDataFile(df iceberg.DataFile, spec iceberg.PartitionSpec, schema *iceberg.Schema, ...) ([]byte, error)
- func EncodeFileScanTask(task table.FileScanTask, spec iceberg.PartitionSpec, schema *iceberg.Schema, ...) ([]byte, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DecodeDataFile ¶
func DecodeDataFile(data []byte, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) (iceberg.DataFile, error)
DecodeDataFile decodes bytes produced by EncodeDataFile back into a DataFile. The (spec, schema, version) triple must match the encoder; passing a different spec or version yields a decode error or silently mis-typed partition values.
The returned DataFile carries the partition spec id and the field-id lookup tables, so Partition() and the stats accessors return id-keyed maps as if the file had been read from a manifest.
func DecodeFileScanTask ¶
func DecodeFileScanTask(data []byte, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) (table.FileScanTask, error)
DecodeFileScanTask reverses EncodeFileScanTask. The triple (spec, schema, version) must match the encoder.
func EncodeDataFile ¶
func EncodeDataFile(df iceberg.DataFile, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) ([]byte, error)
EncodeDataFile encodes a single DataFile for cross-process transport using the manifest-entry Avro encoding for the given partition spec, table schema and format version (1, 2, or 3). The wire format is the same one a manifest carries for this data file. The receiver MUST call DecodeDataFile with the matching (spec, schema, version) triple.
df must implement iceberg.AvroEntryMarshaler. The iceberg package's built-in DataFile implementation satisfies it; external implementations of iceberg.DataFile can opt in by implementing the marshaler interface themselves.
EncodeDataFile is non-mutating and safe to call concurrently with any other reader or encoder of the same DataFile, provided the underlying implementation honors that contract.
v1 note: v1 manifest entries carry a non-nullable snapshot_id which is written as 0 by the iceberg implementation. v1 bytes are not usable as a standalone manifest entry — they only round-trip via DecodeDataFile.
distinct_counts (field 111) is deprecated in the spec for every version (apache/iceberg#12182). EncodeDataFile drops the field on encode for v1, v2, and v3 alike — values populated on the source DataFile are not transported. Legacy manifests that already carry the field on the wire still decode correctly through DecodeDataFile. New DataFiles should not set distinct counts.
func EncodeFileScanTask ¶
func EncodeFileScanTask(task table.FileScanTask, spec iceberg.PartitionSpec, schema *iceberg.Schema, version int) ([]byte, error)
EncodeFileScanTask encodes a FileScanTask for cross-process transport. Each carried DataFile is encoded with EncodeDataFile and wrapped in a small record that also carries the scan range and v3 row lineage. The (spec, schema, version) triple must match what DecodeFileScanTask is given on the receiver.
All carried DataFiles (data, positional deletes, equality deletes, and deletion vectors) must share the supplied spec.ID(): each delete file's SpecID is validated and a mismatch returns an error. After partition evolution, delete files may have been written under a different partition spec than the data file; the caller is responsible for partitioning the FileScanTask by per-file specID and calling EncodeFileScanTask once per group.
Types ¶
This section is empty.