subset

command module

v0.1.0 Latest Latest Go to latest Published: May 27, 2026 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/mickamy/subset

Links

Open Source Insights

README ¶

subset

Carve a referentially-consistent slice out of your database — one command, plain SQL, read-only source.

subset extracts a foreign-key-consistent subset of your MySQL or Postgres database as plain SQL. Point it at a DSN and a seed (<table> --where ...) and it introspects the schema, walks the foreign-key graph, and emits ordered INSERT or DELETE statements to stdout. It never modifies the source DB — it only runs read-only SELECTs. You decide where the SQL goes: pipe it straight into another database, or save it to a file.

# Clone an order and everything it depends on into a fresh dev database.
$ subset clone postgres://user:pass@prod:5432/app orders --id 42 | psql "$DEV_URL"

# Preview the closure without emitting SQL.
$ subset clone postgres://user:pass@prod:5432/app orders --id 42 --plan
subset clone: 3 tables, 3 rows in 14ms
  tenants  1 row  via orders → users → tenants
  users    1 row  via orders → users
  orders   1 row  seed

# Delete a tenant and every row that depends on it, in FK-safe order.
$ subset delete postgres://user:pass@localhost:5432/app tenants --id 7 > cleanup.sql

Why

You often need part of a database, kept consistent: one customer's data for a repro, a single order graph for a test fixture, or a clean teardown of a tenant that won't trip a foreign-key constraint. Doing this by hand means hand-writing SELECTs and ordering INSERTs/DELETEs yourself, getting the FK direction right every time.

subset does the graph walk for you:

Referentially consistent. clone follows FKs to parents so every INSERT lands after the rows it references. delete follows them to children so every DELETE runs before the row it depends on.
Read-only source, plain SQL out. The source DB only sees SELECTs. The output is ordinary SQL you can read, diff, and pipe to psql / mysql.
Both engines. PostgreSQL and MySQL, through the standard DSN form.

Install

Homebrew (macOS / Linux)

brew install mickamy/tap/subset

Windows

Grab the latest subset_<version>_windows_<arch>.zip from the Releases page, unzip, and move subset.exe into a directory already on your PATH (e.g., %USERPROFILE%\bin). The PowerShell snippet below picks the right archive for the current host arch and unpacks it next to the working directory; you still need the final move step yourself.

$ver  = (Invoke-RestMethod https://api.github.com/repos/mickamy/subset/releases/latest).tag_name.TrimStart('v')
$arch = if ($env:PROCESSOR_ARCHITECTURE -eq 'ARM64') { 'arm64' } else { 'amd64' }
Invoke-WebRequest -OutFile subset.zip "https://github.com/mickamy/subset/releases/latest/download/subset_${ver}_windows_${arch}.zip"
Expand-Archive subset.zip -DestinationPath .\subset -Force
# Move .\subset\subset.exe into a directory on $env:PATH (e.g., $HOME\bin).

From source

go install github.com/mickamy/subset@latest

Requires Go 1.26+ to build from source. Pre-built binaries (macOS / Linux / Windows × amd64 / arm64) are published on GitHub Releases on every tag.

Supported databases

PostgreSQL — connect with a postgres:// or postgresql:// DSN.
MySQL — connect with a mysql:// DSN.

Both drivers read the schema from information_schema / pg_catalog and the seed rows with ordinary SELECTs; nothing else touches the source. CI exercises PostgreSQL and MySQL on every push.

Usage

subset <command> <dsn> <table> [flags]

COMMANDS:
  clone   Pull rows plus their FK parents as INSERT statements
  delete  Remove rows plus their FK-dependent rows in safe order

FLAGS:
  --where <expr>   WHERE clause selecting the seed rows (e.g., "id=42")
  --id <value>     Shortcut for --where "id=<value>"
  --plan           Print row counts and FK paths; emit no rows
  --version, -v    Print subset version
  --help, -h       Show this help

subset writes SQL to stdout and progress/errors to stderr, so redirects and pipes carry only the SQL.

Clone: pull a row and its parents

clone starts at the seed rows and walks forward along foreign keys, collecting every parent row the seed transitively references. It emits INSERTs parents-first, so the output applies cleanly to a database that has the schema but not the data.

# One order and its FK parents (user, tenant, ...).
subset clone postgres://user:pass@host:5432/app orders --id 42 | psql "$DEV_URL"

# Several rows via a WHERE clause.
subset clone mysql://root:pass@host:3306/app users --where "status = 'active'" > users.sql

Delete: remove a row and its dependents

delete starts at the seed rows and walks backward, collecting every row that transitively references the seed. It emits DELETEs most-dependent-first, so applying them never violates a foreign key.

# A tenant and everything under it.
subset delete postgres://user:pass@host:5432/app tenants --id 7 | psql "$APP_URL"

# Save first, review, then apply.
subset delete postgres://... users --where "email LIKE '%@example.com'" > cleanup.sql

`--plan`: see the closure before committing to SQL

--plan prints a summary to stdout instead of SQL: the tables involved, their row counts, and a representative shortest FK path from the seed. A table reachable by more than one path shows (+N more path).

$ subset delete postgres://... tenants --id 7 --plan
subset delete: 5 tables, 7 rows in 18ms
  order_items  1 row   via tenants → products → order_items (+1 more path)
  products     1 row   via tenants → products
  orders       2 rows  via tenants → users → orders
  users        2 rows  via tenants → users
  tenants      1 row   seed

How it works

Introspect. Read tables, columns, primary keys, and foreign keys from the source schema.
Walk the FK graph. From the seed rows, BFS in the chosen direction — forward to FK parents for clone, backward to referencing children for delete — fetching each adjacent row with a SELECT. Rows are deduplicated by primary key, so shared parents (or diamond-shaped graphs) are visited once.
Order. Tables are topologically sorted by their FK dependencies: parents-first for clone, the reverse for delete. Within a self-referencing table (e.g., employees.manager_id), rows are ordered so a referenced row is emitted before the row that points at it (reversed for delete).
Emit. One INSERT or DELETE per row, with a leading summary comment. Generated columns are skipped on INSERT; identifiers and literals are quoted per dialect ("ident" for Postgres, `ident` for MySQL).

Composite primary keys, composite foreign keys, and single-column self-references are all supported.

Current scope (v0.1)

subset v0.1 is deliberately small. Not yet included (planned for later):

clone --include-children (opt-in backward closure during clone).
Output formats other than SQL (--format yaml, etc.).
ID remapping / sequence bumping (--remap-ids).
Column masking / skipping (--mask, --skip-columns).
Selecting schema/database via a CLI flag (use DSN parameters instead, e.g., ?search_path= on Postgres).
ON DELETE CASCADE / SET NULL awareness — delete always emits explicit DELETEs regardless of the constraint's action.
Composite self-referencing foreign keys and self-referencing data cycles (A → B → A) — both report an error.
Foreign-key cycles between tables — detected and reported as an error.

Develop

# Bring up local Postgres + MySQL via docker compose.
make compose-up

# Unit tests.
make test

# Integration tests (set either or both; unset drivers skip their tests).
SUBSET_TEST_DSN_POSTGRES="postgres://postgres:pass@127.0.0.1:5432/dev?sslmode=disable" \
SUBSET_TEST_DSN_MYSQL="mysql://root:pass@127.0.0.1:3306/dev?parseTime=true" \
  make test-integration

make compose-down

License

MIT

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
internal
cli
dialect
dsn
emit
exit
extract
introspect
plan
test/tsql Package tsql provides SQL helpers shared by integration tests.	Package tsql provides SQL helpers shared by integration tests.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

subset

Why

Install

Homebrew (macOS / Linux)

Windows

From source

Supported databases

Usage

Clone: pull a row and its parents

Delete: remove a row and its dependents

--plan: see the closure before committing to SQL

How it works

Current scope (v0.1)

Develop

License

Documentation ¶

Source Files ¶

Directories ¶

`--plan`: see the closure before committing to SQL