snapshot

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 12, 2026 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package snapshot defines the SnapshotSource contract used by Murmur's Bootstrap execution mode. SnapshotSources scan a source-of-truth datastore (Mongo, DynamoDB, JDBC, S3 dump) and emit synthetic events into the same monoid-combine logic the streaming runtime uses, populating initial state when the live event stream lacks full history (typical for CDC sources).

Bootstrap mirrors Debezium's "incremental snapshot" pattern: chunked, watermarked, resumable, can run in parallel with live capture. Concrete implementations live in subpackages (snapshot/mongo, snapshot/dynamodb).

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type HandoffToken

type HandoffToken []byte

HandoffToken is an opaque marker captured at the start of a snapshot that tells the live source where to resume from after bootstrap completes — without gap or duplicate.

  • For Mongo: a Change Stream resume token captured before the collection scan.
  • For DynamoDB: a Streams shard iterator position captured before the table scan.
  • For JDBC + binlog CDC: a binlog position.

The bootstrap runtime persists this token alongside its progress and hands it back to the live source on transition.

type Source

type Source[T any] interface {
	// CaptureHandoff captures the live-source resume position before scanning begins.
	// Called once at bootstrap start.
	CaptureHandoff(ctx context.Context) (HandoffToken, error)

	// Scan emits records via out until the snapshot is exhausted. Implementations should
	// honor ctx cancellation and yield records in stable, resumable chunks. Each record
	// must carry an EventID derived from the document's primary key so re-runs are
	// idempotent under at-least-once dedup.
	Scan(ctx context.Context, out chan<- source.Record[T]) error

	// Resume continues a previously-interrupted scan from the persisted progress marker.
	// Implementations that don't support resumption should restart from the beginning;
	// at-least-once dedup handles the duplicate emissions.
	Resume(ctx context.Context, marker []byte, out chan<- source.Record[T]) error

	// Name returns a stable identifier for logging and metrics.
	Name() string

	// Close releases any underlying resources.
	Close() error
}

Source scans a datastore and emits records as synthetic events. Implementations should support resumable, chunked progress so a long-running bootstrap can survive worker restarts.

Distinct from pkg/source.Source (the live-streaming abstraction): a snapshot.Source is one-shot and finite; a pkg/source.Source is open-ended and streaming. Bootstrap mode uses snapshot.Source; live mode uses pkg/source.Source.

Directories

Path Synopsis
Package dynamodb provides a snapshot.Source backed by DynamoDB ParallelScan, suitable for bootstrapping a Murmur pipeline from a DDB-resident OLTP table.
Package dynamodb provides a snapshot.Source backed by DynamoDB ParallelScan, suitable for bootstrapping a Murmur pipeline from a DDB-resident OLTP table.
Package jsonl provides a snapshot.Source that reads JSON Lines from any io.Reader — local files, S3 objects, gzip streams, etc.
Package jsonl provides a snapshot.Source that reads JSON Lines from any io.Reader — local files, S3 objects, gzip streams, etc.
Package mongo provides a Mongo-backed implementation of snapshot.Source for Murmur's Bootstrap execution mode.
Package mongo provides a Mongo-backed implementation of snapshot.Source for Murmur's Bootstrap execution mode.
Package parquet provides a snapshot.Source that reads Apache Parquet from any io.ReaderAt — local files, S3 objects fetched into memory, in-memory bytes from tests, etc.
Package parquet provides a snapshot.Source that reads Apache Parquet from any io.ReaderAt — local files, S3 objects fetched into memory, in-memory bytes from tests, etc.
Package s3 provides a snapshot.Source that scans every JSON-Lines object under an S3 prefix and emits records for bootstrap.
Package s3 provides a snapshot.Source that scans every JSON-Lines object under an S3 prefix and emits records for bootstrap.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL