backfill

command

v0.1.0 Latest Latest Go to latest Published: May 12, 2026 License: Apache-2.0 Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gallowaysoftware/murmur

Links

Open Source Insights

Documentation ¶

Overview ¶

Command backfill drives a Murmur counter pipeline from Spark- aggregated S3 JSON-Lines into a DynamoDB Sum-monoid store. It's the canonical "snapshot then stream" bootstrap step: a Spark job pre-aggregates raw events to hourly summaries, lands them in S3, and this binary scans the prefix and folds every row into the same pipeline a live Kafka/Kinesis worker would feed.

Run it once before flipping the live worker to a fresh DDB table, or repeatedly during a 40-day backfill window; the StableEventID extractor in package backfill keeps re-runs idempotent.

Required flags:

-bucket    S3 bucket holding the Spark output
-prefix    S3 key prefix to scan (e.g. counters/bot_interaction/)
-table     DynamoDB table for the Sum store (the pipeline's primary state)
-name      Pipeline name (used for metrics + log lines)

Optional:

-concurrency  Parallel S3 fetches (default 8)
-region       AWS region (default from environment)

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL