dv

package
v0.6.0-rc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// DVMagicNumber is the magic number for deletion vectors.
	// Spec bytes: D1 D3 39 64 (big-endian) = 0x6439D3D1 (little-endian uint32)
	DVMagicNumber uint32 = 0x6439D3D1
)

Variables

This section is empty.

Functions

func SerializeDV

func SerializeDV(bitmap *RoaringPositionBitmap) ([]byte, error)

SerializeDV produces the spec-format DV binary envelope from a bitmap:

  • Length (4 bytes, big-endian): size of magic + bitmap data, excluding CRC-32
  • Magic (4 bytes, little-endian): DVMagicNumber
  • Bitmap (variable): roaring bitmap in Iceberg portable format
  • CRC-32 (4 bytes, big-endian): checksum over magic + bitmap

Types

type DVWriter

type DVWriter struct {
	// contains filtered or unexported fields
}

DVWriter accumulates deletion positions per data file and flushes them as a single Puffin file containing one deletion-vector-v1 blob per data file. The returned DataFile entries are ready for RowDelta.AddDeletes().

func NewDVWriter

func NewDVWriter(fs iceio.WriteFileIO) *DVWriter

NewDVWriter creates a DVWriter backed by the given writable filesystem.

func (*DVWriter) Add

func (w *DVWriter) Add(dataFilePath string, positions []int64)

Add accumulates positions to delete for a given data file. Positions are deduplicated via the underlying roaring bitmap.

func (*DVWriter) Flush

func (w *DVWriter) Flush(_ context.Context, location string) ([]iceberg.DataFile, error)

Flush writes one Puffin file containing one blob per data file, and returns manifest entries ready for RowDelta.AddDeletes().

The location parameter is the full path (including filename) for the Puffin file to create. The caller is responsible for generating a unique path within the table's metadata directory.

type RoaringPositionBitmap

type RoaringPositionBitmap struct {
	// contains filtered or unexported fields
}

RoaringPositionBitmap supports 64-bit positions using a sparse map of 32-bit Roaring bitmaps. Positions are split into a 32-bit key (high bits) and 32-bit value (low bits).

Compatible with the Java Iceberg RoaringPositionBitmap serialization format.

func DeserializeDV

func DeserializeDV(data []byte, expectedCardinality int64) (*RoaringPositionBitmap, error)

DeserializeDV parses a deletion vector blob and returns a bitmap of deleted positions.

The DV binary format is:

  • Length (4 bytes, big-endian): size of magic + bitmap data, excluding CRC-32
  • Magic (4 bytes, little-endian): must be 0x6439D3D1
  • Bitmap (variable): roaring bitmap in Iceberg portable format
  • CRC-32 (4 bytes, big-endian): checksum over magic + bitmap

If expectedCardinality >= 0, the bitmap's cardinality is validated against it.

func DeserializeRoaringPositionBitmap

func DeserializeRoaringPositionBitmap(data []byte) (*RoaringPositionBitmap, error)

DeserializeRoaringPositionBitmap reads a bitmap from the Iceberg portable format. Format: [count] { [key][bitmap] } .....{[key_n][bitmap_n]}

func NewRoaringPositionBitmap

func NewRoaringPositionBitmap() *RoaringPositionBitmap

NewRoaringPositionBitmap creates an empty bitmap.

func ReadDV

func ReadDV(fs iceio.IO, dvFile iceberg.DataFile) (*RoaringPositionBitmap, error)

ReadDV reads a deletion vector from a puffin file using the manifest entry metadata. ContentOffset and ContentSizeInBytes must be set on the DataFile (required by v3 spec).

When the puffin blob carries a `cardinality` property (spec-mandated for deletion-vector-v1), its value is parsed and used to validate the decoded bitmap — this catches truncated or partially-overwritten blobs whose CRC still validates over the bytes that are present.

Java's BitmapPositionDeleteIndex validates against the manifest entry's `record_count` field rather than the puffin blob property; the two are always set to the same value by Java's writer, so this PR's behavior agrees with Java for spec-conformant tables. A future change should cross-validate both sources when they're independently available (tracked separately).

Blobs missing the spec-required cardinality property are accepted with a slog warning rather than rejected — strict enforcement is deferred until the Go DV writer guarantees the property is always present. The per-byte CRC check in DeserializeDV still applies, so missing-property is degraded integrity, not absent integrity.

func (*RoaringPositionBitmap) Cardinality

func (b *RoaringPositionBitmap) Cardinality() int64

Cardinality returns the total number of set positions.

func (*RoaringPositionBitmap) Contains

func (b *RoaringPositionBitmap) Contains(pos uint64) bool

Contains checks if a position is set.

func (*RoaringPositionBitmap) IsEmpty

func (b *RoaringPositionBitmap) IsEmpty() bool

IsEmpty returns true if no positions are set.

func (*RoaringPositionBitmap) Serialize

func (b *RoaringPositionBitmap) Serialize(w io.Writer) error

Serialize writes in the Iceberg portable format (little-endian):

  • bitmap count (8 bytes, LE): number of non-empty bitmaps
  • for each bitmap in ascending key order: key (4 bytes, LE) + roaring portable data

Only non-empty bitmaps are written, matching Java Iceberg behavior.

func (*RoaringPositionBitmap) Set

func (b *RoaringPositionBitmap) Set(pos uint64)

Set marks a position in the bitmap.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL