klepto

command module
v0.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 12, 2018 License: MIT Imports: 3 Imported by: 0

README

Klepto

Klepto

Build Status Go Report Card Go Doc

Klepto is a tool for copying and anonymising data

Klepto is a tool that copies and anonymises data from other sources.

Intro

Klepto helps you to keep the data in your environment as consistent as possible by copying it from another environment's database.

You can use Klepto to get production data but without sensitive customer information for your testing or local debugging.

Features
  • Copy data to your local database or to stdout, stderr
  • Filter the source data
  • Anonymise the source data

Supported Databases
  • PostgreSQL
  • MySQL

If you need to get data from a database type that you don't see here, build it yourself and add it to this list. Contributions are welcomed :)

Requirements

  • Active connection to the IT VPN
  • Latest version of pg_dump installed (Only required when working with PostgreSQL databases)

Installation

Klepto is written in Go with support for multiple platforms. Pre-built binaries are provided for the following:

  • macOS (Darwin) for x64, i386, and ARM architectures
  • Windows
  • Linux

You can download the binary for your platform of choice from the releases page.

Once downloaded, the binary can be run from anywhere. We recommend that you move it into your $PATH for easy use, which is usually at /usr/local/bin.

Usage

Klepto uses a configuration file called .klepto.toml to define your table structure. If your table is normalized, the structure can be detected automatically.

For dumping the last 10 created active users, your file will look like this:

[[Tables]]
  Name = "users"
  [Tables.Anonymise]
    email = "EmailAddress"
    username = "FirstName"
    password = "SimplePassword"
  [Tables.Filter]
    Match = "users.status = 'active'"
    Limit = 10
    [Tables.Filter.Sorts]
      created_at = "desc"

After you have created the file, run:

Postgres:

klepto steal \
--from="postgres://user:pass@localhost/fromDB?sslmode=disable" \
--to="postgres://user:pass@localhost/toDB?sslmode=disable" \

MySQL:

klepto steal \
--from="user:pass@tcp(localhost:3306)/fromDB?sslmode=disable" \
--to="user:pass@tcp(localhost:3306)/toDB?sslmode=disable" \

Behind the scenes Klepto will establishes the connection with the source and target databases with the given parameters passed, and will dump the tables.

Steal Options

The available options can be seen by running klepto steal -h

Klepto Steal Help

We recommend to always set the following parameters:

  • concurrency to alleviate the pressure over both the source and target databases.
  • read-max-conns to limit the number of open connections, so that the source database does not get overloaded.

Configuration File Options

You can set a number of keys in the configuration file. Here is the list of all configuration options:

  • Matchers are variables to store filter data, you can declare a filter once and reuse it among tables.
  • Tables represents a klepto table definition.
    • Name is the table name.
    • IgnoreData if set to true, it will dump the table structure without importing data.
    • Filter represents the way you want to filter the results.
      • Match is a condition field to dump only certain amount data.
      • Limit defines a limit of results to be fetched.
      • Sorts is the sort condition for the table.
    • Anonymise anonymise columns.
    • Relationships represents the relationship between the table and referenced table.
      • Table is the table name.
      • ForeignKey is the table name foreign key.
      • ReferencedTable is the referenced table name.
      • ReferencedKey is the referenced table primary key name.

Relationships

The Relationships key represents a relationship definition between tables.

Example

Dump the latest 100 users with their orders:

[[Tables]]
  Name = "users"
  [Tables.Filter]
    Limit = 100
    [Tables.Filter.Sorts]
      created_at = "desc"

[[Tables]]
  Name = "orders"
  [[Tables.Relationships]]
    # behind the scenes klepto will create a inner join between orders and users
    ForeignKey = "user_id"
    ReferencedTable = "users"
    ReferencedKey = "id"
  [Tables.Filter]
    Limit = 100
    [Tables.Filter.Sorts]
      created_at = "desc"

Matchers

Matchers are variables to store filter data. You can declare a filter once and reuse it among tables:

[[Matchers]]
  Latest100Users = "ORDER BY users.created_at DESC LIMIT 100"

[[Tables]]
  Name = "users"
  [Tables.Filter]
    Match = "Latest100Users"

[[Tables]]
  Name = "orders"
  [[Tables.Relationships]]
    ForeignKey = "user_id"
    ReferencedTable = "users"
    ReferencedKey = "id"
  [Tables.Filter]
    Match = "Latest100Users"

See examples for more.

Ignore data

You can dump the database structure without importing data by setting the IgnoreData value to true.

[[Tables]]
 Name = "logs"
 IgnoreData = true

Anonymisation

You can anonymise specific columns in your table using the Anonymise key. Anonymisation is performed by running a Faker against the specified column.

[[Tables]]
  Name = "customers"
  [Tables.Anonymise]
    email = "EmailAddress"
    firstName = "FirstName"

[[Tables]]
  Name = "users"
  [Tables.Anonymise]
    email = "EmailAddress"
    password = "literal:1234"

This would replace these 4 columns from the customer and users tables and run fake.EmailAddress and fake.FirstName against them respectively. We can use literal:[some-constant-value] to specify a constant we want to write for a column. In this case, password = "literal:1234" would write 1234 for every row in the password column of the users table.

Available data types for anonymisation

Available data types can be found in fake.go. This file is generated from https://github.com/icrowley/fake (it must be generated because it is written in such a way that Go cannot reflect upon it).

We generate the file with the following:

$ go get github.com/ungerik/pkgreflect
$ fake master pkgreflect -notypes -novars -norecurs vendor/github.com/icrowley/fake/

Examples

Example configuration files for intfood and the ordering tool can be found on Klepto Examples on Confluence.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the MIT License - see the LICENSE file for details

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL