imp

module
v0.7.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2026 License: GPL-3.0

README

imp

CSV file restructuring tool with file encoding normalization functionality. Created primarily as importing helper for GnuCash, which accepts CSV files with transaction history.

CSV files with exported from banking systems come in different formats and encodings. Importing them without preprocessing may cause display issues. Fixing file encoding and making CSV data fit user's custom format before every import can quickly become a chore.

imp does the heavy lifting for you - specify your desired structure, let imp work its magic, then import the transformed CSV into your program. You may also choose to create a collection of presets - settings tailored for a particular task - so that you do not need to type complicated commands every time.

Table of contents

Installation

From Releases

Find the latest release and download package for your system. Packages are available for:

  • Debian-based systems
  • RPM-based systems

On other Linux distributions, you can install imp from tar.gz archives, which are also available on the Releases page.

From Go package index
go install github.com/Zedran/imp/cmd/imp@latest

Options overview

There are several options that modify imp's behaviour.

Option Description
-0 omits the first row (header) from the input when rewriting
-G generates an empty preset file in user's home directory and exits
-H adds the provided string as the first row (new header)
-P preset name to be used - overrides -0cdeHlp options
-c comma character in the input file (by default, the same as in output)
-d decimal separator for currency (default is .)
-e input file encoding (default is utf-8)
-f overwrite output file if it exists
-i input CSV file path, (by default, imp reads from stdin)
-l uses CRLF (Windows line ending) instead of LF for line endings in the output file
-o output CSV file path (by default, imp writes to stdout)
-p pattern that determines how to rewrite the input file

Pattern

Pattern determines the structure of rewritten rows. It starts with two characters that can be freely chosen:

  1. CSV "comma" character - typically tab, comma or semicolon.
  2. Group prefix - / is used in the project's documentation.

Both characters must be unique and cannot be used for any other purpose within the pattern. They also must be ASCII characters (Latin letters, Arabic numerals, basic symbols).

There are two types of groups, differentiated by the character directly following the prefix:

  • d<index>[comma] - column group. This group indicates that compiler should substitute a CSV column the given index from input CSV file. Columns are enumerated starting with 0. A single comma character is allowed after the number.
  • c<index>[comma] - a special type of column group used for normalizing currency amount. The number contained in the corresponding column will have any "cosmetic" separators removed and the decimal separator will be converted to the one indicated by the -d option. imp assumes that decimal separators in the input are either . or ,.
  • s<text> - text group. An arbitrary text will be inserted in place of this group in the output file.

It is recommended to enclose your pattern in quotes.

There is also a special pattern ,*, indicating that imp should not perform any column modifications. Other options (-0ceHl) are still effective and encoding is normalized. This allows users to modify header, comma character, encoding and line endings without constructing potentially complicated pattern just to preserve row contents.

Preset

Preset is a JSON dictionary that holds a combination of settings for future reuse. Presets are stored in the .imp-presets.json file in user's home directory. The file can be generated by running imp with the -G option. Users can define multiple presets, each with its unique name. Empty names are not allowed.

The following table presents available JSON keys along with command-line options they represent.

Option Key Type Default value
-0 skip_header bool false
-H new_header string ""
-c input_comma string ""
-d curr_sep string ""
-e encoding string ""
-l crlf bool false
-p pattern string ""

To apply a preset, use -P preset_name option. Note that values in the preset take precendence over command-line options.

How to determine file encoding

On Linux, run the following command on your CSV file:

uchardet file.csv

The command will display the name of your file's encoding. Specify it with the -e option or the encoding key, if you are using a preset.

Note: Accuracy of encoding detection depends on the file size and character diversity.

Example usage

Input

Encoding: windows-1250

First name,Last name,Amount
John,Doe,123
Desired output

Encoding: utf-8

Full name,Amount,Currency
John Doe,123.00,USD

Notes:

  • Original header should be discarded.
  • New header needs to be set.
  • The same character is used as comma in both files.
  • We use the default LF line endings (even on Windows, modern programs should tolerate LF endings).
Using full command
imp -i input.csv                   \
    -e windows-1250                \
    -o output.csv                  \
    -p ',/d0/s /d1,/c2,/sUSD'      \
    -d '.'                         \
    -0                             \
    -H 'Full name,Amount,Currency'
Pattern breakdown
,/d0/s /d1,/c2,/sUSD

Special characters:

  1. , - CSV comma
  2. / - group separator

Columns:

  1. /d0/s /d1,:
    • /d0 - copy contents from the 1st column
    • /s  - insert a single space
    • /d1, - copy contents from the 2nd column and insert comma at the end
  2. /c2, - format the currency amount contained in the 3rd column, insert comma at the end
  3. /sUSD - insert text USD
Using preset
{
    "default": {
        "encoding":    "windows-1250",
        "pattern":     ",/d0/s /d1,/c2,/sUSD",
        "input_comma": "",
        "skip_header": true,
        "new_header":  "Full name,Amount,Currency",
        "crlf":        false,
        "curr_sep":    "."
    }
}
imp -i input.csv  \
    -o output.csv \
    -P default

Attributions

Refer to NOTICE.md.

License

This software is available under GPL-3.0-only License.

Directories

Path Synopsis
cmd
imp command
internal
cli
csv

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL