compaction

package
v0.6.0-rc2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Overview

Package compaction provides bin-pack compaction planning for Iceberg tables.

Index

Constants

View Source
const (
	// DefaultMinInputFiles is the default minimum number of files per group.
	DefaultMinInputFiles uint = 5

	// DefaultPackingLookback is the default packing lookback.
	DefaultPackingLookback uint = 128
)

Variables

This section is empty.

Functions

func CollectDeadEqualityDeletes

func CollectDeadEqualityDeletes(
	ctx context.Context,
	fs iceio.IO,
	snap *table.Snapshot,
	rewrittenPaths map[string]struct{},
) ([]iceberg.DataFile, error)

CollectDeadEqualityDeletes walks the given snapshot's manifests and returns the equality delete files that are provably dead after a rewrite removes the data files in rewrittenPaths.

"Dead" means: no surviving data file in the snapshot (excluding rewrittenPaths) could ever apply to the eq-delete per the v2 reader predicate (see DecideDeadEqualityDeletes for the exact rule).

rewrittenPaths is the union of every old data file path being replaced by a planned rewrite (across all groups). The caller typically computes this from a compaction plan.

The returned files are safe to remove during the same commit that stages the rewrite — they cannot be resurrected by any concurrent commit because (1) any concurrent eq-delete is rejected by the rewrite-validator, and (2) any concurrent data file gets a fresh sequence number strictly greater than every preexisting eq-delete, so it cannot satisfy E.seq > D.seq for any preexisting E.

Two-pass design: the first pass walks delete manifests only to discover candidate eq-deletes; if there are none, the more expensive data-manifest walk is skipped.

func DecideDeadEqualityDeletes

func DecideDeadEqualityDeletes(survey *SurvivorSurvey, candidates []iceberg.ManifestEntry) []iceberg.DataFile

DecideDeadEqualityDeletes is the pure spec-correctness predicate for equality-delete cleanup during compaction. Given a survey of surviving data file seqs (already excluding the rewrite set) and a list of equality-delete manifest entries, it returns the eq-delete files that no surviving data file could ever apply to.

The rule is identical to scanner.matchEqualityDeletesToData (the reader-side filter):

E applies to D iff E.seq > D.seq AND (
    len(E.partition) == 0 ||
    len(D.partition) == 0 ||
    partitionsMatch(E.partition, D.partition)
)

E is dead iff no applicable surviving D has D.seq < E.seq — equivalently, the applicable min-seq is >= E.seq.

SpecID is intentionally NOT part of the predicate. The Iceberg-go reader does not consult it; if the executor used a stricter predicate it could drop eq-deletes the reader still applies, causing silent data loss under partition-spec evolution.

Defensive: candidates with sequence number < 0 (sentinel for unset) are skipped — preserved rather than risk dropping an unidentifiable file.

Dedup by file path: the same eq-delete file may appear in multiple manifest entries after manifest merging.

Types

type Config

type Config struct {
	// TargetFileSizeBytes is the desired output file size.
	// Default: table.WriteTargetFileSizeBytesDefault (512 MB).
	TargetFileSizeBytes int64

	// MinFileSizeBytes is the lower bound for a file to be considered "optimal".
	// Files smaller than this are candidates for compaction.
	// Default: 75% of TargetFileSizeBytes.
	MinFileSizeBytes int64

	// MaxFileSizeBytes is the upper bound. Files larger than this are never rewritten
	// (unless they exceed DeleteFileThreshold).
	// Default: 180% of TargetFileSizeBytes.
	MaxFileSizeBytes int64

	// MinInputFiles is the minimum number of files in a group to justify rewriting.
	// Groups with fewer files are dropped from the plan.
	// Default: DefaultMinInputFiles.
	MinInputFiles uint

	// DeleteFileThreshold is the minimum number of delete files associated with
	// a data file to force it into compaction regardless of file size.
	// Default: 5.
	DeleteFileThreshold int

	// PackingLookback controls how many open bins the packer considers
	// before evicting. Higher values produce better packing at the cost
	// of memory. Default: DefaultPackingLookback.
	PackingLookback uint

	// PreserveDeadEqualityDeletes, when true, retains equality delete
	// files that are provably dead after the rewrite. The cleanup
	// predicate matches the v2 reader (see scanner.go
	// matchEqualityDeletesToData and DecideDeadEqualityDeletes): an
	// eq-delete is dead iff no surviving applicable data file has
	// seq < eq-delete.seq, where "applicable" means same partition
	// tuple OR either side has empty partition. SpecID is NOT part of
	// the predicate.
	//
	// Zero value (false) is the recommended default: dead eq-deletes are
	// expunged during the rewrite commit, which keeps manifest fanout
	// bounded under sustained CDC workloads where eq-deletes accumulate
	// one-per-snapshot. Set to true only if a downstream consumer
	// depends on the historical eq-delete files surviving in the live
	// snapshot's manifests (rare).
	//
	// Honored only in atomic mode (RewriteDataFilesOptions.PartialProgress=false).
	// The CLI / library caller is responsible for translating this flag
	// into the snapshot walk + RewriteDataFilesOptions.ExtraDeleteFilesToRemove
	// — see CollectDeadEqualityDeletes.
	PreserveDeadEqualityDeletes bool
}

Config holds tunable thresholds for bin-pack compaction.

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a Config with production defaults.

func (Config) PlanCompaction

func (cfg Config) PlanCompaction(tasks []table.FileScanTask) (Plan, error)

PlanCompaction analyzes the given scan tasks and produces a Plan. Tasks are grouped by partition, classified as candidates or skipped, and candidates are bin-packed into groups targeting TargetFileSizeBytes.

func (Config) Validate

func (cfg Config) Validate() error

Validate checks that the config values are sensible.

type Group

type Group struct {
	// PartitionKey is an opaque grouping key derived from partition values.
	// Empty string for unpartitioned tables.
	PartitionKey string

	// Tasks are the input FileScanTasks that should be compacted together.
	Tasks []table.FileScanTask

	// TotalSizeBytes is the sum of data file sizes in this group.
	TotalSizeBytes int64

	// DeleteFileCount is the total number of delete files across all tasks.
	DeleteFileCount int
}

Group represents a set of files in the same partition that should be compacted together.

type Plan

type Plan struct {
	// Groups are the sets of files to compact, each within a single partition.
	Groups []Group

	// SkippedFiles is the count of files that are already optimal or oversized.
	SkippedFiles int

	// TotalInputFiles is the total number of files scanned.
	TotalInputFiles int

	// TotalInputBytes is the total size of all scanned files.
	TotalInputBytes int64

	// EstOutputFiles is the estimated number of output files after compaction.
	EstOutputFiles int

	// EstOutputBytes is the estimated total size of output files.
	// This is an upper-bound estimate: actual output may be smaller because
	// deleted rows are removed and Parquet compression/encoding is more
	// effective on larger files with better column statistics.
	EstOutputBytes int64
}

Plan is the output of analyzing a table for compaction.

func Analyze

func Analyze(ctx context.Context, tbl *table.Table, cfg Config) (Plan, error)

Analyze scans the table and produces a compaction Plan without modifying anything. This is the dry-run / planning entry point.

type SurvivorSurvey

type SurvivorSurvey struct {
	EmptyPartMinSeq int64
	PartMinSeq      map[string]int64
}

SurvivorSurvey describes the surviving data files in a snapshot AFTER a planned rewrite has logically removed its rewrite set. It is the input to DecideDeadEqualityDeletes.

EmptyPartMinSeq is the smallest sequence number among unpartitioned surviving data files. Per the Iceberg v2 reader predicate (see table/scanner.go matchEqualityDeletesToData), an unpartitioned data file applies to every equality delete — so it is always part of an eq-delete's "applicable survivors" minimum.

PartMinSeq maps the partition tuple (encoded via partitionMatchKey) to the smallest sequence number among surviving partitioned data files in that tuple. The key intentionally does NOT include SpecID — the reader's predicate ignores SpecID, so the writer-side cleanup must too.

Sentinel "no survivor in this bucket" is math.MaxInt64.

func NewSurvivorSurvey

func NewSurvivorSurvey() *SurvivorSurvey

NewSurvivorSurvey returns a survey initialized with the no-survivor sentinel for the empty-partition bucket and an empty per-partition map. Callers populate via AddSurvivor.

func (*SurvivorSurvey) AddSurvivor

func (s *SurvivorSurvey) AddSurvivor(partition map[int]any, seq int64)

AddSurvivor records a surviving data file's (partition, seq) into the survey. partition is the data file's partition tuple from iceberg.DataFile.Partition() (nil/empty for unpartitioned tables); seq is the data file's sequence number from the manifest entry.

Defensive: if seq < 0 (sentinel for "unset"), the file is recorded with seq=0 (smallest real value), which keeps it permanently alive against every eq-delete. Better to preserve uncertain state than to drop deletes that may still apply.

Pointer receiver: EmptyPartMinSeq is int64 (value type) and we need updates to persist. Callers must hold the survey by value or pointer consistently — typical pattern is `survey := NewSurvivorSurvey()` followed by `survey.AddSurvivor(...)` which Go auto-addresses.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL