imp
CSV file restructuring tool with file encoding normalization functionality. Created primarily as importing helper for GnuCash, which accepts CSV files with transaction history.
CSV files with exported from banking systems come in different formats and encodings. Importing them without preprocessing may cause display issues. Fixing file encoding and making CSV data fit user's custom format before every import can quickly become a chore.
imp does the heavy lifting for you - specify your desired structure, let imp work its magic, then import the transformed CSV into your program. You may also choose to create a collection of presets - settings tailored for a particular task - so that you do not need to type complicated commands every time.
Table of contents
Installation
From Go package index
go install github.com/Zedran/imp@latest
Options overview
There are several options that modify imp's behaviour.
| Option |
Description |
-0 |
omits the first row (header) from the input when rewriting |
-G |
generates an empty preset file in user's home directory and exits |
-H |
adds the provided string as the first row (new header) |
-P |
preset name to be used - overrides -0ceHlp options |
-c |
comma character in the input file (by default, the same as in output) |
-e |
input file encoding (default is utf-8) |
-f |
overwrite output file if it exists |
-i |
input CSV file path, (by default, imp reads from stdin) |
-l |
uses CRLF (Windows line ending) instead of LF for line endings in the output file |
-o |
output CSV file path (by default, imp writes to stdout) |
-p |
pattern that determines how to rewrite the input file |
Pattern
Pattern determines the structure of rewritten rows. It starts with two characters that can be freely chosen:
- CSV "comma" character - typically tab, comma or semicolon.
- Group prefix -
/ is used in the project's documentation.
Both characters must be unique and cannot be used for any other purpose within the pattern. They also must be ASCII characters (Latin letters, Arabic numerals, basic symbols).
There are two types of groups, differentiated by the character directly following the prefix:
d<index>[comma] - column group. This group indicates that compiler should substitute a CSV column the given index from input CSV file. Columns are enumerated starting with 0. A single comma character is allowed after the number.
s<text> - text group. An arbitrary text will be inserted in place of this group in the output file.
It is recommended to enclose your pattern in quotes.
Preset
Preset is a JSON dictionary that holds a combination of settings for future reuse. Presets are stored in the .imp-presets.json file in user's home directory. The file can be generated by running imp with the -G option. Users can define multiple presets, each with its unique name. Empty names are not allowed.
The following table presents available JSON keys along with command-line options they represent.
| Option |
Key |
Type |
Default value |
-0 |
skip_header |
bool |
false |
-H |
new_header |
string |
"" |
-c |
input_comma |
string |
"" |
-e |
encoding |
string |
"" |
-l |
crlf |
bool |
false |
-p |
pattern |
string |
"" |
To apply a preset, use -P preset_name option. Note that values in the preset take precendence over command-line options.
How to determine file encoding
On Linux, run the following command on your CSV file:
uchardet file.csv
The command will display the name of your file's encoding. Specify it with the -e option or the encoding key, if you are using a preset.
Note: Accuracy of encoding detection depends on the file size and character diversity.
Example usage
Encoding: windows-1250
First name,Last name,Amount
John,Doe,123
Desired output
Encoding: utf-8
Full name,Amount,Currency
John Doe,123,USD
Notes:
- Original header should be discarded.
- New header needs to be set.
- The same character is used as comma in both files.
- We use the default
LF line endings (even on Windows, modern programs should tolerate LF endings).
Using full command
imp -i input.csv \
-e windows-1250 \
-o output.csv \
-p ',/d0/s /d1,/d2,/sUSD' \
-0 \
-H 'Full name,Amount,Currency'
Pattern breakdown
,/d0/s /d1,/d2,/sUSD
Special characters:
, - CSV comma
/ - group separator
Columns:
/d0/s /d1,:
/d0 - copy contents from the 1st column
/s - insert a single space
/d1, - copy contents from the 2nd column and insert comma at the end
/d2, - copy contents of the 3rd column, insert comma at the end
/sUSD - insert text USD
Using preset
{
"default": {
"encoding": "windows-1250",
"pattern": ",/d0/s /d1,/d2,/sUSD",
"input_comma": "",
"skip_header": true,
"new_header": "Full name,Amount,Currency",
"crlf": false
}
}
imp -i input.csv \
-o output.csv \
-P default
Attributions
Refer to NOTICE.md.
License
This software is available under GPL-3.0-only License.