Documentation
¶
Index ¶
- Constants
- type RoaringPositionBitmap
- func DeserializeDV(data []byte, expectedCardinality int64) (*RoaringPositionBitmap, error)
- func DeserializeRoaringPositionBitmap(data []byte) (*RoaringPositionBitmap, error)
- func NewRoaringPositionBitmap() *RoaringPositionBitmap
- func ReadDV(fs iceio.IO, dvFile iceberg.DataFile) (*RoaringPositionBitmap, error)
Constants ¶
const ( // DVMagicNumber is the magic number for deletion vectors. // Spec bytes: D1 D3 39 64 (big-endian) = 0x6439D3D1 (little-endian uint32) DVMagicNumber uint32 = 0x6439D3D1 )
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type RoaringPositionBitmap ¶
type RoaringPositionBitmap struct {
// contains filtered or unexported fields
}
RoaringPositionBitmap supports 64-bit positions using a sparse map of 32-bit Roaring bitmaps. Positions are split into a 32-bit key (high bits) and 32-bit value (low bits).
Compatible with the Java Iceberg RoaringPositionBitmap serialization format.
func DeserializeDV ¶
func DeserializeDV(data []byte, expectedCardinality int64) (*RoaringPositionBitmap, error)
DeserializeDV parses a deletion vector blob and returns a bitmap of deleted positions.
The DV binary format is:
- Length (4 bytes, big-endian): size of magic + bitmap data, excluding CRC-32
- Magic (4 bytes, little-endian): must be 0x6439D3D1
- Bitmap (variable): roaring bitmap in Iceberg portable format
- CRC-32 (4 bytes, big-endian): checksum over magic + bitmap
If expectedCardinality >= 0, the bitmap's cardinality is validated against it.
func DeserializeRoaringPositionBitmap ¶
func DeserializeRoaringPositionBitmap(data []byte) (*RoaringPositionBitmap, error)
DeserializeRoaringPositionBitmap reads a bitmap from the Iceberg portable format. Format: [count] { [key][bitmap] } .....{[key_n][bitmap_n]}
func NewRoaringPositionBitmap ¶
func NewRoaringPositionBitmap() *RoaringPositionBitmap
NewRoaringPositionBitmap creates an empty bitmap.
func ReadDV ¶
ReadDV reads a deletion vector from a puffin file using the manifest entry metadata. ContentOffset and ContentSizeInBytes must be set on the DataFile (required by v3 spec).
When the puffin blob carries a `cardinality` property (spec-mandated for deletion-vector-v1), its value is parsed and used to validate the decoded bitmap — this catches truncated or partially-overwritten blobs whose CRC still validates over the bytes that are present.
Java's BitmapPositionDeleteIndex validates against the manifest entry's `record_count` field rather than the puffin blob property; the two are always set to the same value by Java's writer, so this PR's behavior agrees with Java for spec-conformant tables. A future change should cross-validate both sources when they're independently available (tracked separately).
Blobs missing the spec-required cardinality property are accepted with a slog warning rather than rejected — strict enforcement is deferred until the Go DV writer guarantees the property is always present. The per-byte CRC check in DeserializeDV still applies, so missing-property is degraded integrity, not absent integrity.
func (*RoaringPositionBitmap) Cardinality ¶
func (b *RoaringPositionBitmap) Cardinality() int64
Cardinality returns the total number of set positions.
func (*RoaringPositionBitmap) Contains ¶
func (b *RoaringPositionBitmap) Contains(pos uint64) bool
Contains checks if a position is set.
func (*RoaringPositionBitmap) IsEmpty ¶
func (b *RoaringPositionBitmap) IsEmpty() bool
IsEmpty returns true if no positions are set.
func (*RoaringPositionBitmap) Serialize ¶
func (b *RoaringPositionBitmap) Serialize(w io.Writer) error
Serialize writes in the Iceberg portable format (little-endian):
- bitmap count (8 bytes, LE): number of non-empty bitmaps
- for each bitmap in ascending key order: key (4 bytes, LE) + roaring portable data
Only non-empty bitmaps are written, matching Java Iceberg behavior.
func (*RoaringPositionBitmap) Set ¶
func (b *RoaringPositionBitmap) Set(pos uint64)
Set marks a position in the bitmap.