pebble

package
v0.13.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2026 License: Apache-2.0 Imports: 26 Imported by: 0

Documentation

Overview

Package pebble implements the cross-engine compaction primitive for the v3 Pebble-backed storage engine.

The compactor takes a `base` engine and one or more `applied` engines (each representing a sync_run worth of records) and atomically merges `applied`'s data into `base` using pebble.DB.IngestAndExcise:

  1. For each applied engine, iterate its records in the source engine's key order and accumulate them into an SST file on disk.
  2. Call base.DB.IngestAndExcise(paths, exciseSpan) with the new SST and the key range covering the applied sync_id. Pebble atomically: (a) excises every key in [exciseSpan.Start, exciseSpan.End) from base — old sync_id rows go away in one shot. (b) ingests the SST as a new L6 file (or flushable, depending on Pebble's choice) under that range.

The net effect is a byte-level merge: zero proto encode/decode per record, zero LSM compaction churn from a record-by-record Put loop.

Bound: each Compact call replaces exactly one sync_id range in the destination. Multi-sync rollup is a higher-level loop that Compact in sequence.

Index

Constants

This section is empty.

Variables

View Source
var ErrEmptySync = errors.New("synccompactor/pebble: source has no records under the given sync_id")

ErrEmptySync is returned when Compact is called with a sync_id that has no records in the source engine. The caller can treat this as a no-op or as an error depending on context.

Functions

func MergeFilesInto added in v0.13.2

func MergeFilesInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, destSyncID string, tmpDir string, opts ...KWayOption) (*v3.SyncStatsRecord, error)

MergeFilesInto folds source syncs into destSyncID using a bounded K-way semantic merge and final SST ingestion. It materializes resource_types, resources, entitlements, and grants plus all derived indexes. Assets are not copied, matching SQLite compaction behavior.

On success it returns the dest sync's stats record, accumulated at the final winner-dedupe sites (never at run-file build time — run files keep duplicate keys across chunks until the last merge round), so the caller can persist the stats sidecar without re-scanning the output.

func MergeFilesIntoOverlay added in v0.13.2

func MergeFilesIntoOverlay(ctx context.Context, dest *enginepkg.Engine, sources []SourceFile, destSyncID string, tmpDir string, opts ...OverlayOption) (*v3.SyncStatsRecord, error)

MergeFilesIntoOverlay is a sqlite-shaped compactor:

  • keep a bounded in-memory seen map (suffix hash → discovered_at) for each bucket while scanning sources newest-to-oldest;
  • per key, admit the record with the strictly newest discovered_at; ties keep the earliest admission (the newest source). This is the same winner rule as the K-way merge (runRecordIsNewer) and the sqlite attached compactor (`a.discovered_at > m.discovered_at`), enforced via a shallow discovered_at scan of each candidate value;
  • write winners through raw Pebble batches so secondary indexes are materialized once; and
  • route buckets whose estimated key count exceeds the in-memory limit through the K-way run-file path instead (overlayPlanBuckets).

On success it returns the dest sync's stats record, accumulated as winners were written, so the caller can persist the stats sidecar without re-scanning the output.

func MergeInto added in v0.12.5

func MergeInto(ctx context.Context, dest *enginepkg.Engine, sources []SourceSync, destSyncID string) error

MergeInto folds every source's primary records into dest under destSyncID, producing a single merged sync. It is the native-Pebble equivalent of the SQLite ATTACH union: across all inputs, the newest record per logical key (by discovered_at, ties keep the incumbent) survives, and dest's derived indexes are rebuilt from the survivors.

destSyncID must already exist in dest (created by StartNewSync) and be empty — MergeInto only adds records, it never excises, so no pre-existing data under destSyncID is assumed. MergeInto binds the dest engine's current sync to destSyncID before writing: v3 record values do not carry sync_id, so the engine's keep-newer write path keys every record off the current sync. The binding is left in place when MergeInto returns.

Only the four primary record buckets are copied: resource_types, resources, entitlements, grants. Assets are intentionally NOT copied — this matches the SQLite compaction path, which folds only those four tables and drops assets from the compacted sync; copying assets here would give Pebble compacted syncs asset access the SQLite path does not have. Index keyspaces are not copied verbatim either; the per-record write path rebuilds them from the surviving primaries.

Sources are applied in the given order. The dedup keeps the record with the strictly-greater discovered_at; on an equal discovered_at the already-written (earlier source) record is kept. Callers pass sources in the same order the SQLite fold applies them so the tie winner is identical across engines.

Types

type Compactor

type Compactor struct {
	// contains filtered or unexported fields
}

Compactor merges sync_run data between two Pebble engines using IngestAndExcise. Reusable across many Compact calls; safe for sequential use only (not concurrent).

func NewCompactor

func NewCompactor(base *enginepkg.Engine, tmpDir string) (*Compactor, error)

NewCompactor builds a compactor that writes its merge into base. The tmpDir is where intermediate SST files are written; it must be on the same filesystem as the engine's data directory so Pebble can hard-link (a different FS would force a copy and break atomicity). If tmpDir is empty, os.TempDir is used (acceptable for tests but not production).

func (*Compactor) Compact

func (c *Compactor) Compact(ctx context.Context, source *enginepkg.Engine, syncID string) error

Compact merges all records belonging to syncID in `source` into the base engine. Any pre-existing data in base under that syncID is excised before the new data is ingested — base ends up with exactly source's view of that sync.

Atomicity: per-bucket atomic, NOT whole-Compact atomic. Each IngestAndExcise call (one per record-type bucket: grants, by_entitlement index, by_principal index) is atomic from base's perspective — a concurrent reader sees the old-or-new state of that bucket, never a mixture. However, the multi-bucket loop is NOT transactional as a whole: a crash or hard cancellation mid-loop leaves base with new data in some buckets and old data in others (and the same for the DeleteRange-only path used for empty buckets). Recovery is "re-run Compact for the same syncID" — every step is idempotent (excise + ingest the same SSTs again converges to the source's view).

Caller is responsible for quiescing writes to `source` for the duration of the call (an active syncer would invalidate the SST as it's being built).

type KWayOption added in v0.13.2

type KWayOption func(*kWayConfig)

func WithFanIn added in v0.13.2

func WithFanIn(fanIn int) KWayOption

type OverlayOption added in v0.13.2

type OverlayOption func(*overlayConfig)

OverlayOption tunes the overlay merge.

func WithOverlayRecordChunkSize added in v0.13.2

func WithOverlayRecordChunkSize(n int) OverlayOption

WithOverlayRecordChunkSize sets how many winner records are buffered per raw write batch while streaming into the dest. Non-positive values are ignored.

func WithOverlaySeenKeyLimit added in v0.13.2

func WithOverlaySeenKeyLimit(n int64) OverlayOption

WithOverlaySeenKeyLimit caps the estimated per-bucket key count admitted to the overlay path; buckets above the cap route to the K-way run-file fallback (overlayPlanBuckets). Each seen-set entry costs ~40B with map overhead, so the cap bounds per-bucket memory. Non-positive values are ignored.

type SourceFile added in v0.13.2

type SourceFile struct {
	Path   string
	SyncID string
	Stats  *reader_v2.SyncStats
	Engine *enginepkg.Engine
	DBDir  string
	// DecoderPool, when set, supplies the v3 payload decoder for every
	// open of this source (the compactor scopes one pool to the whole
	// merge). Nil is fine: each open constructs a one-shot decoder.
	DecoderPool *dotc1z.EnvelopeDecoderPool
}

SourceFile identifies a compactable Pebble c1z input and the sync within it.

type SourceSync added in v0.12.5

type SourceSync struct {
	Engine *enginepkg.Engine
	SyncID string
}

SourceSync names one input to a merge: its open Pebble engine and the sync id whose records should be folded into the destination.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL