Global Names Verifier

Takes a name or a list of names and verifies them against a variety of
biodiversity Data Sources
Citing
If you want to cite GNverifier, use DOI generated by Zenodo:
Features
- Small and fast app to verify scientific names against many biodiversity
databases.
- Has 4 different match levels:
- Exact: complete match with a canonical form or full name-string from a
data source.
- Fuzzy: if exact match did not happen, it tries to match name-strings
assuming spelling errors.
- Partial: strips middle or last epithets from bi- or multi-nomial names
and tries to match what is left.
- PartialFuzzy: the same as Partial but assuming spelling mistakes.
- Taxonomic resolution. If a database contains taxonomic information, returns
currently accepted name for a name-string, if it is different from the
matched name.
- Best match is returned according to the match score. Data sources with some
manual curation have priority over auto-curated and uncurated datasets. For
example Catalogue of Life or WoRMS are considered curated,
GBIF auto-curated, uBio not curated.
- It is possible to map any name-strings checklist to any of registered
Data Sources.
- If a Data Source provides classification for a name, it will be returned in
the output.
- Works for checking just one name-string, or multiple ones written in a file.
- Supports feeding data via pipes of an operating system. This feature allows
to chain the program together with other tools.
Installation
Using Homebrew on Mac OS, Linux, and Linux on Windows X (WSL2)
Homebrew is a popular package manager for Open Source software originally
developed for Mac OS X. Now it is also available on Linux, and can easily
be used on Windows 10, if Windows Subsystem for Linux (WSL) is
installed.
To use GNverifier with Homebrew:
-
Install Homebrew
-
Open terminal and run the following commands:
brew tap gnames/gn
brew install gnverifier
MS Windows
Download the latest release from github, unzip.
One possible way would be to create a default folder for executables and place
GNverifier there.
Use Windows+R keys
combination and type "cmd". In the appeared terminal window type:
mkdir C:\Users\your_username\bin
copy path_to\gnverifier.exe C:\Users\your_username\bin
Add C:\Users\your_username\bin directory to your PATH
environment variable.
Another, simpler way, would be to use cd C:\Users\your_username\bin command
in cmd terminal window. The GNverifier program then will be automatically
found by Windows operating system when you run its commands from that
directory.
You can also read a more detailed guide for Windows users in
a PDF document.
Linux and Mac
Download the latest release from github, untar, and install binary somewhere
in your path.
tar xvf gnverifier-linux-0.1.0.tar.xz
# or tar xvf gnverifier-mac-0.1.0.tar.gz
sudo mv gnverifier /usr/local/bin
Compile from source
Install Go according to installation instructions
go get github.com/gnames/gnverifier/gnverifier
Usage
GNverifier takes one name-string or a text file with one name-string per
line as an argument, sends a query with these data to remote gnames
server to match the name-strigs against many different biodiversity
databases and returns results to STDOUT either in JSON, CSV or TSV format.
As a web service
gnverifier -p 8080
You should be able to access web user interface via a browser at
http://localhost:8080
One name-string
gnverifier "Monohamus galloprovincialis"
Many name-strings in a file
gnverifier /path/to/names.txt
The app assumes that a file contains a simple list of names, one per line.
It is also possible to feed data via STDIN:
cat /path/to/names.txt | gnverifier
Options and flags
According to POSIX standard flags and options can be given either before or
after name-string or file name.
help
gnverifier -h
# or
gnverifier --help
# or
gnverifier
version
gnverifier -V
# or
gnverifier --version
port
Starts GNverifier as a web service using entered port
gnverifier -p 8080
This command will run user-interface accessible by a browser
at http://localhost:8080
capitalize
If your names are co not have uninomials or genera capitalized according to
rules on nomenclature, you can still verify them using this option. If
capitalize flag is set, the first character of every name-string will be
capitalized (when appropriate).
gnverifier -c "bubo bubo"
# or
gnverifier --capitalize "bubo bubo"
Allows to pick a format for output. Supported formats are
- compact: one-liner JSON.
- pretty: prettified JSON with new lines and tabs for easier reading.
- tsv: returns tab-separated values representation.
- csv: (DEFAULT) returns comma-separated values representation.
# short form for compact JSON format
gnverifier -f compact file.txt
# or long form for "pretty" JSON format
gnverifier --format="pretty" file.csv
# tsv format
gnverifier -f tsv file.csv
Note that a separate JSON "document" is returned for each separate record,
instead of returning one big JSON document for all records. For large lists it
significantly speeds up parsin of the JSON on the user side.
sources
By default GNverifier returns only one "best" result of a match. If a user
has a particular interest in a data set, s/he can set it with this option, and
all matches that exist for this source will be returned as well. You need to
provide a data source id for a dataset. Ids can be found at the following
URL. Some of them are provided in the GNverifier help
output as well.
Data from such sources will be returned in preferred_results section of JSON
output, or with CSV/TSV rows that start with "PreferredMatch" string.
gnverifier file.csv -s "1,11,172"
# or
gnverifier file.tsv --sources="12"
# or
cat file.txt | gnverifier -s '1,12'
If all matched sources need to be returned, set the flag to "0".
WARNING: the result might be excessively large.
gnverifier "Bubo bubo" -s 0
# potentially even more results get returned by adding --all_matches flag
gnverifier "Bubo bubo" -s 0 -M
only_preferred
Sometimes a users wants to map a list of names to a DataSource. They
are not interested if name matched anywhere else. In such case you can use
the only_preferred flag.
gnverifier -o -s '12' file.txt
# or
gnverifier --only_preferred --sources='1,12' file.tsv
all_matches
Sometimes data sources have more than one match to a name. To see all matches
instead of the best one per source use --all_matches flag.
WARNING: for some names the result will be excessively large.
gnverifier -s '1,12' -M file.txt
jobs
If the list of names if very large, it is possible to tell GNverifier to
run requests in parallel. In this example GNverifier will run 8 processes
simultaneously. The order of returned names will be somewhat randomized.
gnverifier -j 8 file.txt
# or
gnverifier --jobs=8 file.tsv
Sometimes it is important to return names in exactly same order. For such
cases set jobs flag to 1.
gnverifier -j 1 file.txt
Configuration file
If you find yourself using the same flags over and over again, it makes sense
to edit configuration file instead. It is located at
$HOME/.config/gnverifier.yaml. After that you do not need to use command line
options and flags.
gnverifier file.txt
Copyright
Authors: Dmitry Mozzherin
Copyright © 2020-2021 Dmitry Mozzherin. See LICENSE for further
details.