dv

package
v0.6.0-rc0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: Apache-2.0 Imports: 14 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// DVMagicNumber is the magic number for deletion vectors.
	// Spec bytes: D1 D3 39 64 (big-endian) = 0x6439D3D1 (little-endian uint32)
	DVMagicNumber uint32 = 0x6439D3D1
)

Variables

This section is empty.

Functions

This section is empty.

Types

type RoaringPositionBitmap

type RoaringPositionBitmap struct {
	// contains filtered or unexported fields
}

RoaringPositionBitmap supports 64-bit positions using a sparse map of 32-bit Roaring bitmaps. Positions are split into a 32-bit key (high bits) and 32-bit value (low bits).

Compatible with the Java Iceberg RoaringPositionBitmap serialization format.

func DeserializeDV

func DeserializeDV(data []byte, expectedCardinality int64) (*RoaringPositionBitmap, error)

DeserializeDV parses a deletion vector blob and returns a bitmap of deleted positions.

The DV binary format is:

  • Length (4 bytes, big-endian): size of magic + bitmap data, excluding CRC-32
  • Magic (4 bytes, little-endian): must be 0x6439D3D1
  • Bitmap (variable): roaring bitmap in Iceberg portable format
  • CRC-32 (4 bytes, big-endian): checksum over magic + bitmap

If expectedCardinality >= 0, the bitmap's cardinality is validated against it.

func DeserializeRoaringPositionBitmap

func DeserializeRoaringPositionBitmap(data []byte) (*RoaringPositionBitmap, error)

DeserializeRoaringPositionBitmap reads a bitmap from the Iceberg portable format. Format: [count] { [key][bitmap] } .....{[key_n][bitmap_n]}

func NewRoaringPositionBitmap

func NewRoaringPositionBitmap() *RoaringPositionBitmap

NewRoaringPositionBitmap creates an empty bitmap.

func ReadDV

func ReadDV(fs iceio.IO, dvFile iceberg.DataFile) (*RoaringPositionBitmap, error)

ReadDV reads a deletion vector from a puffin file using the manifest entry metadata. ContentOffset and ContentSizeInBytes must be set on the DataFile (required by v3 spec).

When the puffin blob carries a `cardinality` property (spec-mandated for deletion-vector-v1), its value is parsed and used to validate the decoded bitmap — this catches truncated or partially-overwritten blobs whose CRC still validates over the bytes that are present.

Java's BitmapPositionDeleteIndex validates against the manifest entry's `record_count` field rather than the puffin blob property; the two are always set to the same value by Java's writer, so this PR's behavior agrees with Java for spec-conformant tables. A future change should cross-validate both sources when they're independently available (tracked separately).

Blobs missing the spec-required cardinality property are accepted with a slog warning rather than rejected — strict enforcement is deferred until the Go DV writer guarantees the property is always present. The per-byte CRC check in DeserializeDV still applies, so missing-property is degraded integrity, not absent integrity.

func (*RoaringPositionBitmap) Cardinality

func (b *RoaringPositionBitmap) Cardinality() int64

Cardinality returns the total number of set positions.

func (*RoaringPositionBitmap) Contains

func (b *RoaringPositionBitmap) Contains(pos uint64) bool

Contains checks if a position is set.

func (*RoaringPositionBitmap) IsEmpty

func (b *RoaringPositionBitmap) IsEmpty() bool

IsEmpty returns true if no positions are set.

func (*RoaringPositionBitmap) Serialize

func (b *RoaringPositionBitmap) Serialize(w io.Writer) error

Serialize writes in the Iceberg portable format (little-endian):

  • bitmap count (8 bytes, LE): number of non-empty bitmaps
  • for each bitmap in ascending key order: key (4 bytes, LE) + roaring portable data

Only non-empty bitmaps are written, matching Java Iceberg behavior.

func (*RoaringPositionBitmap) Set

func (b *RoaringPositionBitmap) Set(pos uint64)

Set marks a position in the bitmap.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL