Documentation
¶
Index ¶
- type Config
- type Option
- func OptBayesOddsThreshold(f float64) Option
- func OptDataSources(is []int) Option
- func OptFormat(f gnfmt.Format) Option
- func OptIncludeInputText(b bool) Option
- func OptInputTextOnly(b bool) Option
- func OptLanguage(l lang.Language) Option
- func OptTikaURL(s string) Option
- func OptTokensAround(i int) Option
- func OptVerifierURL(s string) Option
- func OptWithAllMatches(b bool) Option
- func OptWithAmbiguousNames(b bool) Option
- func OptWithBayes(b bool) Option
- func OptWithBayesOddsDetails(b bool) Option
- func OptWithOddsAdjustment(b bool) Option
- func OptWithPlainInput(b bool) Option
- func OptWithPositonInBytes(b bool) Option
- func OptWithUniqueNames(b bool) Option
- func OptWithVerification(b bool) Option
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct {
// BayesOddsThreshold sets the limit of posterior odds. Everything higher
// this limit will be classified as a name.
BayesOddsThreshold float64
// Format output format for finding results. Possible formats are
// csv - CSV output
// compact - JSON in one line
// pretty - JSON with new lines and indentations.
Format gnfmt.Format
// IncludeInputText can be set to true if the user wants to get back the text
// used for name-finding. This feature is epspecilly useful if original file
// was a PDF, MS Word, HTML etc. and a user wants to use OffsetStart and
// OffsetEnd indices to find names in the text.
IncludeInputText bool
// InputTextOnly can be set to true if the user wants only the UTF8-encoded text
// of the file without name-finding. If this option is true, then most of other
// options are ignored.
InputTextOnly bool
// Language that is prevalent in the text. This setting helps to get
// a better result for NLP name-finding, because languages differ in their
// training patterns.
// Currently only the following languages are supported:
//
// eng - English
// deu - German
Language lang.Language
// LanguageDetected is the code of a language that was detected in text.
// It is an empty string, if detection of language is not set.
LanguageDetected string
// DataSources is a list of data-source IDs used for the
// name-verification. These data-sources will always be matched with the
// verified names. You can find the list of all data-sources at
// https://verifier.globalnames.org/api/v1/data_sources
DataSources []int
// TikaURL contains the URL of Apache Tika service. This service is used
// for extraction of UTF8-encoded texts from a variety of file formats.
TikaURL string
// TokensAround sets the number of tokens (words) before and after each
// name-candidate. These words will be returned with the output.
TokensAround int
// VerifierURL contains the URL of a name-verification service.
VerifierURL string
// WithAllMatches sets verification to return all found matches.
WithAllMatches bool
// WithAmbiguousNames shows ambigous uninomial names when true.
WithAmbiguousNames bool
// WithBayes determines if both heuristic and Naive Bayes algorithms run
// during the name-finnding.
// false - only heuristic algorithms run
// true - both heuristic and Naive Bayes algorithms run.
WithBayes bool
// WithBayesOddsDetails show in detail how odds are calculated.
WithBayesOddsDetails bool
// WithOddsAdjustment can be set to true to adjust calculated odds using the
// ratio of scientific names found in text to the number of capitalized
// words.
WithOddsAdjustment bool
// WithPlainInput flag can be set to true if the input is a plain
// UTF8-encoded text. In this case file is read directly instead of going
// through file type and encoding checking.
WithPlainInput bool
// WithPositionInBytes can be set to true to receive offsets in number of
// bytes instead of UTF-8 characters.
WithPositionInBytes bool
// WithUniqueNames can be set to true to get a unique list of names.
WithUniqueNames bool
// WithVerification is true if names should be verified
WithVerification bool
// APIDoc
APIDoc string
}
Config is responsible for name-finding operations.
type Option ¶
type Option func(*Config)
Option type for changing GNfinder settings.
func OptBayesOddsThreshold ¶
OptBayesOddsThreshold is an option for name finding, that sets new threshold for results from the Bayes name-finding. All the name candidates that have a higher threshold will appear in the resulting names output.
func OptDataSources ¶
OptDataSources sets data sources that will always be checked during verification process.
func OptIncludeInputText ¶
OptIncludeInputText indicates if to return original UTF8-encoded input.
func OptInputTextOnly ¶
OptInputTextOnly indicates if to return original UTF8-encoded input.
func OptTikaURL ¶
OptTikaURL sets URL for UTF8 text extraction service.
func OptTokensAround ¶
OptTokensAround sets number of tokens rememberred on the left and right side of a name-candidate.
func OptVerifierURL ¶
OptVerifierURL sets URL for verification service.
func OptWithAllMatches ¶
OptWithAllMatches sets WithAllMatches option to return all matches found by verification.
func OptWithAmbiguousNames ¶
OptWithAmbiguousNames sets WithAmbiguousNames option to show ambiguous uninomials and genera.
func OptWithBayes ¶
OptWithBayes is an option that forces running bayes name-finding even when the language is not supported by training sets.
func OptWithBayesOddsDetails ¶
OptWithBayesOddsDetails option to show details of odds calculations.
func OptWithOddsAdjustment ¶
OptWithOddsAdjustment is an option that triggers recalculation of prior odds using number of found names divided by number of all name candidates.
func OptWithPlainInput ¶
OptWithPlainInput sets WithPlainInput option indicating there is no need to check file type and encoding, and the file can be read directly.
func OptWithPositonInBytes ¶
OptWithPositonInBytes is an option that allows to have offsets in number of bytes of number of UTF-8 characters.
func OptWithUniqueNames ¶
OptWithUniqueNames indicates if to return the unique list of names instead of all occurences of names in the text.
func OptWithVerification ¶
OptWithVerification indicates either to run or not to run the verification process after name-finding.