morph

module

v0.0.0-...-17df368 Latest Latest Go to latest Published: Apr 22, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/john-peterson-g17/morph

Links

Open Source Insights

README ¶

Safe, resumable data backfills for schema evolution.

Install • Quick Start • Commands • Config • How It Works

[!WARNING] This project is in early development. Expect breaking changes. Feedback and contributions are welcome.

Why Morph?

Schema migrations are easy — until you need to move millions or billions of rows.

Most teams end up writing one-off scripts that overload the database, can't be resumed, don't track progress, and break halfway through. Morph exists to solve that.

What It Does

Chunked backfills — process large datasets incrementally without overwhelming your database
Adaptive sizing — chunk width auto-tunes based on observed query runtime
Resumable jobs — safely stop and restart without losing progress
Real-time TUI — progress bar, worker status, row counts, ETA, and database health
Database health monitoring — tracks connections, cache hit ratio, locks, replication lag
Before/after hooks — drop indexes before a backfill, recreate them after
Row estimation — EXPLAIN-based row estimates in the preview command
Schema validation — validates job configs against a JSON schema before execution
Idempotent execution — designed to safely re-run without duplicating data

Installation

go install github.com/john-peterson-g17/morph/cmd/morph@latest

No C dependencies — pure Go, builds anywhere.

Or clone and build:

git clone https://github.com/john-peterson-g17/morph.git
cd morph
go build -o morph ./cmd/morph

Quick Start

1. Create a job file:

morph create my-backfill

This generates jobs/my-backfill.v1.yml from a template.

2. Edit the job file with your source query, target table, and time window.

3. Preview the execution plan:

morph preview my-backfill --dsn "postgres://user:pass@host:5432/db"

4. Run the backfill:

morph run my-backfill --dsn "postgres://user:pass@host:5432/db"

Or set DATABASE_URL in your environment and skip --dsn.

Commands

`morph run <job name>`

Execute a backfill job with real-time TUI output.

Flags:
  --dsn              Database connection string (env: DATABASE_URL)
  --dir, -d          Job files directory (default: "jobs")
  --progress-dir     Progress file directory (default: ".morph/progress")
  --concurrency      Parallel chunk workers (default: from job config or 1)
  --fresh            Discard previous progress and start from scratch
  --debug            Show executed SQL queries in output

`morph preview <job name>`

Preview the execution plan without running anything. Shows job metadata, partitioning config, runtime settings, composed SQL queries, and EXPLAIN-based row estimates (if the database is reachable).

Flags:
  --dsn              Database connection string (env: DATABASE_URL)
  --dir, -d          Job files directory (default: "jobs")

`morph validate <job name>`

Validate a job config against the schema and check SQL syntax.

Flags:
  --dir, -d          Job files directory (default: "jobs")

`morph create <job name>`

Generate a new job file from the built-in template.

Flags:
  --dir, -d          Job files directory (default: "jobs")

Job File Format

Job files use YAML with a .v1.yml extension. You can pass a job by name (morph run my-backfill) and it resolves to jobs/my-backfill.v1.yml, or pass a path directly (morph run path/to/file.v1.yml).

version: v1

job:
  name: event-states-backfill
  description: Populate event_states from events and attempts tables

driver: postgres

partitioning:
  strategy: time_range
  window:
    start: "2024-01-01T00:00:00Z"
    end: "2026-04-13T00:00:00Z"
  adaptive:
    initial_width: 30m
    min_width: 5m
    max_width: 2h
    target_runtime: 30s

runtime:
  defaults:
    concurrency: 10
    max_retries: 3
    max_rows: 0            # 0 = unlimited
    statement_timeout: 1h

steps:
  - name: event_states
    before:
      - name: drop index
        sql: DROP INDEX IF EXISTS idx_event_states_updated_at

    morph:
      partition_by: e.occurred_at
      from:
        sql: |
          SELECT e.id, e.status, e.updated_at
          FROM events.events e
      into:
        sql: |
          INSERT INTO events.event_states (event_id, status, updated_at)
          ON CONFLICT (event_id) DO NOTHING

    after:
      - name: create index
        sql: CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_event_states_updated_at ON events.event_states (updated_at)

The from.sql query is automatically wrapped with WHERE {partition_by} >= $1 AND {partition_by} < $2 using the chunk boundaries. The into.sql is split at ON CONFLICT / RETURNING to compose the final statement.

How It Works

Adaptive Chunking

Morph divides the time window into chunks and adapts their width based on observed runtime:

Start with initial_width (e.g. 30 minutes)
After each chunk, compare actual runtime to target_runtime
Adjust width proportionally with smoothing (70% new / 30% old) and a 1.5x growth cap
Empty chunks automatically expand; width snaps back when data appears
Width is always clamped between min_width and max_width

Progress Tracking

Progress is saved to a JSON file after each chunk. If a job is interrupted:

Re-running the same job resumes from where it left off
Use --fresh to discard progress and start over
Progress files live in .morph/progress/ by default

Worker Pool

Multiple workers process chunks in parallel from a shared queue. Each worker:

Picks a chunk from the queue
Executes all steps sequentially within that chunk
Records results and adjusts the adaptive planner
Retries failed chunks up to max_retries times

Database Health Monitoring

During morph run, a background monitor polls the database every 2 seconds and displays health signals in the TUI with traffic-light coloring:

Signal	Green	Yellow	Red
Cache hit ratio	> 95%	90–95%	< 90%
Active connections	normal	elevated	near limit
Lock waits	none	some	many
Deadlocks	none	—	increasing
Replication lag	low	elevated	high

TUI Display

The real-time terminal UI shows:

Progress bar with percentage and ETA
Per-worker status — current chunk, step, rows, elapsed time
Completed chunks log — recent successes with timing and row counts
Database health — per-signal status with baseline comparison
Debug mode (--debug) — shows executed SQL for each completed chunk

Local Development

# Start postgres
docker compose up -d

# Get the mapped port
docker compose port postgres 5432

# Run tests
DATABASE_URL="postgres://morph:password@localhost:<port>/morph?sslmode=disable" go test ./...

Code Quality

go fmt ./...
go vet ./...
golangci-lint run

License

MIT

Directories ¶

Path	Synopsis
cmd
morph command
internal
cli
cli/flags
engine
job
monitor
monitor/postgres
previewer
previewer/postgres
progress
progress/file
progress/postgres
progress/s3
schema
tui
validator
validator/postgres
validator/schema

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL