Documentation
¶
Overview ¶
Package deletionvector decodes Paimon deletion vector records.
A deletion vector encodes which rows in a Parquet data file have been logically deleted, as a compressed RoaringBitmap of row positions (0-based). The read pipeline uses this to filter out deleted rows before returning record batches to the caller.
Index file layout ¶
Deletion vectors are stored in index files under <table>/index/. Each index file begins with a 1-byte version (0x01) followed by concatenated DV records. The byte offset and length of each record are stored in the index manifest.
DV record format (BitmapDeletionVector32) ¶
[4 bytes, big-endian int32] total_length ← byte count of (magic + roaring_bytes) [4 bytes, big-endian int32] magic ← 1581511376 (0x5E2D3AB0) [total_length - 4 bytes] RoaringBitmap32 in standard Roaring serialisation [4 bytes, big-endian int32] CRC32 ← skipped on read
The offset stored in DeletionVectorMeta points to the first byte of total_length.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Decode ¶
Decode decodes a BitmapDeletionVector32 record from data.
data must contain exactly the bytes for one DV record starting at its first byte (i.e. the 4-byte total_length field). Typically the caller reads DeletionVectorMeta.Length bytes from the index file at DeletionVectorMeta.Offset.
Returns ErrUnsupportedMagic if the magic number indicates a 64-bit bitmap (Bitmap64DeletionVector) — the caller should treat the file as having no deletions and log a warning.
Types ¶
type ErrUnsupportedMagic ¶
type ErrUnsupportedMagic struct {
Magic int32
}
ErrUnsupportedMagic is returned when the DV record uses an unsupported magic number (e.g. Bitmap64DeletionVector).
func (*ErrUnsupportedMagic) Error ¶
func (e *ErrUnsupportedMagic) Error() string