Documentation
¶
Overview ¶
Package pebble implements the record-merge strategies behind v3 (Pebble-engine) c1z compaction.
Three strategies live here. Which one runs is decided per compaction by synccompactor.resolvePebbleMode: overlay is the default (and currently the only auto-selected mode), fold runs only when requested explicitly — its sync-id adoption is not yet handled by downstream compaction bookkeeping — and kway is never auto-selected on its own: it is overlay's internal fallback and an operational escape hatch (BATON_EXPERIMENTAL_PEBBLE_COMPACTOR=kway). The dormant fold cutover thresholds and the measured crossover data live next to that gate.
Overlay merge (MergeFilesIntoOverlay) — the default rebuild ¶
A sqlite-shaped "newest wins" merge into a fresh dest sync. Sources are scanned newest-to-oldest with a bounded in-memory seen set per bucket (128-bit suffix hash → discovered_at; see seenSuffixSet), and only winners are written, through raw Pebble batches that materialize each record and its derived index keys exactly once. The last source scanned (the oldest — in the production skewed shape, the large base) can skip the record write path entirely: its bucket is materialized as SST files filtered against the seen set (every base key not overridden by a newer source) and ingested wholesale (overlayWholeSourceWorthIt gates this on bucket size). Buckets whose estimated key count exceeds the seen-set memory bound (overlaySeenKeyLimit) are routed to the kway run-file path by overlayPlanBuckets, so overlay's worst case is kway plus a planning pass. Cost: O(total input volume), with the cheapest per-record path of the rebuild strategies; wins or ties every measured shape.
K-way merge (MergeFilesInto) — overlay's fallback ¶
A bounded-fan-in external merge sort into a fresh dest sync. Sources are processed in fan-in-sized chunks: each chunk's buckets are streamed into a sorted run file (≤ fanIn sources merge directly into Pebble with no run files at all); run files are then merged fanIn at a time per round until one generation remains, and the final generation is deduped (newest discovered_at wins; runRecordIsNewer) and materialized into SSTs that are ingested into dest. Memory stays bounded regardless of input count or size and there is no reliance on in-memory dedup state — the price is writing and re-reading every record through run files, strictly more I/O than overlay. Kept because overlay's oversized buckets need it, and as a forced-mode backup if overlay misbehaves in production.
In-place fold (MergeInto) — explicit-only, fastest for large bases ¶
Not a rebuild. The dest store starts as a byte copy of the base input, the output adopts the base sync's id, and each partial's records are streamed into the base keyspace through the engine's keep-newer puts (Put*RecordsIfNewer), which resolve conflicts against incumbents by discovered_at and maintain indexes with point tombstones for overridden records only. Base records are never read, decoded, or rewritten, and the envelope save splices the base's unchanged zstd frames instead of re-encoding them — total cost is O(partial volume) plus a fixed per-source open, versus O(total input) for the rebuilds (measured ~3x faster at a 1.1GB base with small partials). It loses when partial volume grows: every partial record pays a point read-modify-write (~0.2s/MB), so the (dormant) auto gate caps partials at a small fraction of the base.
The package also hosts an IngestAndExcise-based Compactor, a byte-level primitive that atomically replaces one sync_id range in a destination engine with a source's view of it (see Compactor).
Index ¶
- Variables
- func MergeFilesInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, ...) (*v3.SyncStatsRecord, error)
- func MergeFilesIntoOverlay(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, ...) (*v3.SyncStatsRecord, error)
- func MergeInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceSync, ...) error
- type Compactor
- type KWayOption
- type OverlayOption
- type SourceFile
- type SourceSync
Constants ¶
This section is empty.
Variables ¶
var ErrEmptySync = errors.New("synccompactor/pebble: source has no records under the given sync_id")
ErrEmptySync is returned when Compact is called with a sync_id that has no records in the source engine. The caller can treat this as a no-op or as an error depending on context.
Functions ¶
func MergeFilesInto ¶ added in v0.13.2
func MergeFilesInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, destSyncID string, tmpDir string, opts ...KWayOption) (*v3.SyncStatsRecord, error)
MergeFilesInto folds source syncs into destSyncID using a bounded K-way semantic merge and final SST ingestion. It materializes resource_types, resources, entitlements, and grants plus all derived indexes. Assets are not copied, matching SQLite compaction behavior.
Algorithm (external merge sort with fan-in cfg.fanIn, default 50):
- ≤ fanIn sources merge DIRECTLY into the dest Pebble — one pass, no run files (mergeSourceChunkToPebble).
- Otherwise sources are processed in fanIn-sized chunks, each chunk's buckets streamed into one sorted run file (buildChunkRunFileFromSources); run files are then merged fanIn at a time per round (mergeRunFileGroup) until ≤ fanIn remain, and the final generation is materialized into SSTs and ingested (materializeRunFilesToPebble). Each extra round re-reads and re-writes the full dataset, so fan-in should comfortably exceed the expected source count.
- Winner rule: newest discovered_at per primary key, ties keep the newer source (runRecordIsNewer) — same rule as the overlay and sqlite compactors. Sources are unpacked per chunk and their directories removed asynchronously when the chunk closes, so peak disk is O(fanIn) unpacked sources, not O(len(sources)).
Memory stays bounded regardless of input count or size — there is no in-memory dedup state — at the cost of writing and re-reading every record through run files. The overlay merge routes its oversized buckets here; see the package doc for how the strategies divide up.
On success it returns the dest sync's stats record, accumulated at the final winner-dedupe sites (never at run-file build time — run files keep duplicate keys across chunks until the last merge round), so the caller can persist the stats sidecar without re-scanning the output.
func MergeFilesIntoOverlay ¶ added in v0.13.2
func MergeFilesIntoOverlay(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, destSyncID string, tmpDir string, opts ...OverlayOption) (*v3.SyncStatsRecord, error)
MergeFilesIntoOverlay is a sqlite-shaped compactor:
- keep a bounded in-memory seen map (suffix hash → discovered_at) for each bucket while scanning sources newest-to-oldest;
- per key, admit the record with the strictly newest discovered_at; ties keep the earliest admission (the newest source). This is the same winner rule as the K-way merge (runRecordIsNewer) and the sqlite attached compactor (`a.discovered_at > m.discovered_at`), enforced via a shallow discovered_at scan of each candidate value;
- write winners through raw Pebble batches so secondary indexes are materialized once;
- let the last source scanned (the oldest — in the production skewed shape, the large base) skip the record write path when its bucket is big enough (overlayWholeSourceWorthIt): the bucket is materialized as SST files filtered against the seen set and ingested wholesale, making the base's cost proportional to bytes copied rather than records decoded; and
- degrade buckets whose seen set would outgrow memory to the K-way run-file path ADAPTIVELY (per-bucket state machine): a pre-source gate cuts at a source boundary and RESUMES via kway with the seen map frozen as the conflict oracle; a source that crosses the soft limit mid-scan may finish inside the buffer (hard limit = soft × bufferFactor) and resume at the boundary; crossing the hard limit mid-source RESTARTS the bucket — its dest keyspace is range-deleted and the whole bucket flows through the blind kway path (keeping SST ingest, total work ≤2×). Planning routes only the provably-safe (stats sum ≤ soft) and provably-doomed (single-source count > hard) buckets up front.
On success it returns the dest sync's stats record, accumulated as winners were written, so the caller can persist the stats sidecar without re-scanning the output.
func MergeInto ¶ added in v0.12.5
func MergeInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceSync, destSyncID string) error
MergeInto folds every source's primary records into dest under destSyncID. Across all inputs (and any records already present under destSyncID), the newest record per logical key (by discovered_at, ties keep the incumbent) survives, and dest's derived indexes are maintained by the keep-newer write path.
destSyncID must already exist in dest. It may be non-empty: the in-place fold compaction merges partial syncs directly into the base sync's keyspace, relying on the Put*RecordsIfNewer semantics to resolve conflicts against pre-existing records. MergeInto binds the dest engine's current sync to destSyncID — the engine's write methods key off the current sync, not any per-record field.
Only the four primary record buckets are copied: resource_types, resources, entitlements, grants. Assets are intentionally NOT copied — this matches the SQLite compaction path, which folds only those four tables and drops assets from the compacted sync; copying assets here would give Pebble compacted syncs asset access the SQLite path does not have. Index keyspaces are not copied verbatim either; the per-record write path maintains them.
Sources are applied in the given order. The dedup keeps the record with the strictly-greater discovered_at; on an equal discovered_at the already-written (earlier source, or pre-existing dest) record is kept. Callers pass sources newest-first so the tie winner matches the SQLite fold.
Types ¶
type Compactor ¶
type Compactor struct {
// contains filtered or unexported fields
}
Compactor merges sync_run data between two Pebble engines using pebble.DB.IngestAndExcise:
- For each bucket of the applied sync, iterate its records in the source engine's key order and accumulate them into an SST file on disk.
- Call base.DB.IngestAndExcise with the new SST and the key range covering the applied sync_id. Pebble atomically (a) excises every key in the span from base — old sync_id rows go away in one shot — and (b) ingests the SST as a new L6 file (or flushable, depending on Pebble's choice) under that range.
The net effect is a byte-level merge: zero proto encode/decode per record, zero LSM compaction churn from a record-by-record Put loop. Each Compact call replaces exactly one sync_id range in the destination; multi-sync rollup is a higher-level loop calling Compact in sequence.
Reusable across many Compact calls; safe for sequential use only (not concurrent).
func NewCompactor ¶
NewCompactor builds a compactor that writes its merge into base. The tmpDir is where intermediate SST files are written; it must be on the same filesystem as the engine's data directory so Pebble can hard-link (a different FS would force a copy and break atomicity). If tmpDir is empty, os.TempDir is used (acceptable for tests but not production).
func (*Compactor) Compact ¶
Compact merges all records belonging to syncID in `source` into the base engine. Any pre-existing data in base under that syncID is excised before the new data is ingested — base ends up with exactly source's view of that sync.
Atomicity: per-bucket atomic, NOT whole-Compact atomic. Each IngestAndExcise call (one per record-type bucket: grants, by_entitlement index, by_principal index) is atomic from base's perspective — a concurrent reader sees the old-or-new state of that bucket, never a mixture. However, the multi-bucket loop is NOT transactional as a whole: a crash or hard cancellation mid-loop leaves base with new data in some buckets and old data in others (and the same for the DeleteRange-only path used for empty buckets). Recovery is "re-run Compact for the same syncID" — every step is idempotent (excise + ingest the same SSTs again converges to the source's view).
Caller is responsible for quiescing writes to `source` for the duration of the call (an active syncer would invalidate the SST as it's being built).
type KWayOption ¶ added in v0.13.2
type KWayOption func(*kWayConfig)
func WithFanIn ¶ added in v0.13.2
func WithFanIn(fanIn int) KWayOption
type OverlayOption ¶ added in v0.13.2
type OverlayOption func(*overlayConfig)
OverlayOption tunes the overlay merge.
func WithOverlayBufferFactor ¶ added in v0.13.5
func WithOverlayBufferFactor(f float64) OverlayOption
WithOverlayBufferFactor overrides the soft→hard limit multiplier (default 1.25). Non-positive values are ignored.
func WithOverlayFanIn ¶ added in v0.13.5
func WithOverlayFanIn(n int) OverlayOption
WithOverlayFanIn overrides how many sources are opened per chunk (default matches the kway fan-in). Intended for tests that need multi-chunk behavior without dozens of sources. Values below 1 are ignored.
func WithOverlayGateFraction ¶ added in v0.13.5
func WithOverlayGateFraction(f float64) OverlayOption
WithOverlayGateFraction overrides the statless pre-source gate fraction (default 0.9). Non-positive values are ignored.
func WithOverlayRecordChunkSize ¶ added in v0.13.2
func WithOverlayRecordChunkSize(n int) OverlayOption
WithOverlayRecordChunkSize sets how many winner records are buffered per raw write batch while streaming into the dest. Non-positive values are ignored.
func WithOverlaySeenKeyLimit ¶ added in v0.13.2
func WithOverlaySeenKeyLimit(n int64) OverlayOption
WithOverlaySeenKeyLimit caps the estimated per-bucket key count admitted to the overlay path; buckets above the cap route to the K-way run-file fallback (overlayPlanBuckets). Each seen-set entry costs ~40B with map overhead, so the cap bounds per-bucket memory. Non-positive values are ignored.
type SourceFile ¶ added in v0.13.2
type SourceFile struct {
Path string
SyncID string
Stats *reader_v2.SyncStats
Engine *enginepkg.Engine
DBDir string
// DecoderPool, when set, supplies the v3 payload decoder for every
// open of this source (the compactor scopes one pool to the whole
// merge). Nil is fine: each open constructs a one-shot decoder.
DecoderPool *dotc1z.EnvelopeDecoderPool
}
SourceFile identifies a compactable Pebble c1z input and the sync within it.
type SourceSync ¶ added in v0.12.5
SourceSync names one input to a merge: its open Pebble engine and the sync id whose records should be folded into the destination.