Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type HTindex ¶
type HTindex struct {
// RootPrefix is concatenated with paths given in input file to get
// complete path to HathiTrust files.
RootPrefix string
// InputPath gives path to file with input data.
InputPath string
// OutputPath gives path to a directory to keep output data.
OutputPath string
// JobsNum sets number of jobs/workers to run.
JobsNum int
// Dict contains shared dictionary for name finding.
Dict *dict.Dictionary
// WordsAround sets number of words retained before and after a
// name-candidate.
WordsAround int
// ProgressNum determines how many titles should be processed for
// a progress report.
ProgressNum int
}
HTindex detects occurences of scientific names in Hathi Trust data.
func NewHTindex ¶
NewHTindex creates HTindex instance with several defaults. If a some options are provided, they will override default settings.
type Option ¶
type Option func(h *HTindex)
Option sets the time for all options received during creation of new instance of HTindex object.
func OptInput ¶
OptIntput is an absolute path to input data file. Each line of such file displays path to zipped file of a title.
func OptOutput ¶
OptOutput is an absolute path to a directory where results will be written. If such directory does not exist already, it will be created during initialization of HTindex instance.
func OptProgressNum ¶ added in v0.0.2
OptProgressNum sets how often to printout a line about the progress. When it is set to 1 report line appears after processing every title, and if it is 10 progress is shows after every 10th title.
func OptRoot ¶
OptRoot sets the prefix of the path to zipped titles. It wil be concatenated with a path provided in the input file to receive complete absolute path.
func OptWordsAround ¶ added in v0.0.7
OptWordsAround sets number of words retained before and after a name-candidate.