Documentation
¶
Overview ¶
Copyright 2015 by Leipzig University Library, http://ub.uni-leipzig.de
by The Finc Authors, http://finc.info
by Martin Czygan, <martin.czygan@uni-leipzig.de>
This file is part of some open source application.
Some open source application is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Some open source application is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Foobar. If not, see <http://www.gnu.org/licenses/>.
@license GPL-3.0+ <http://spdx.org/licenses/GPL-3.0+>
Copyright 2015 by Leipzig University Library, http://ub.uni-leipzig.de
by The Finc Authors, http://finc.info
by Martin Czygan, <martin.czygan@uni-leipzig.de>
This file is part of some open source application.
Some open source application is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Some open source application is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Foobar. If not, see <http://www.gnu.org/licenses/>.
@license GPL-3.0+ <http://spdx.org/licenses/GPL-3.0+>
Index ¶
- Constants
- func ByteSink(w io.Writer, out chan []byte, done chan bool)
- func DetectLang3(text string) (string, error)
- func FromJSON(r io.Reader, decoder JSONDecoderFunc) (chan []Importer, error)
- func FromJSONSize(r io.Reader, decoder JSONDecoderFunc, size int) (chan []Importer, error)
- func FromXML(r io.Reader, name string, decoderFunc XMLDecoderFunc) (chan []Importer, error)
- func FromXMLSize(r io.Reader, name string, decoderFunc XMLDecoderFunc, size int) (chan []Importer, error)
- func UnescapeTrim(s string) string
- type Importer
- type JSONDecoderFunc
- type Skip
- type Source
- type XMLDecoderFunc
Constants ¶
const ( // AppVersion of span package. Commandline tools will show this on -v. AppVersion = "0.1.53" // KeyLengthLimit is a limit imposed by memcached protocol, which is used // for blob storage as of June 2015. If we change the key value store, // this limit might become obsolete. KeyLengthLimit = 250 )
Variables ¶
This section is empty.
Functions ¶
func ByteSink ¶
ByteSink is a fan in writer for a byte channel. A newline is appended after each object.
func DetectLang3 ¶
DetectLang3 returns the best guess 3-letter language code for a given text.
func FromJSON ¶
func FromJSON(r io.Reader, decoder JSONDecoderFunc) (chan []Importer, error)
FromJSON returns a channel of slices of importable objects with a default batch size of 20000 docs.
func FromJSONSize ¶
FromJSONSize returns a channel of slices of importable values, given a reader, decoder (for a single value) and number of documents to batch. Important: Due to fan-out input and output order will differ.
func FromXMLSize ¶
func FromXMLSize(r io.Reader, name string, decoderFunc XMLDecoderFunc, size int) (chan []Importer, error)
FromXMLSize returns a channel of importable document slices given a reader over XML, a name of the XML start element, a XMLDecoderFunc callback that deserializes an XML snippet and a batch size. TODO(miku): more idiomatic error handling, e.g. over error channel.
func UnescapeTrim ¶
UnescapeTrim unescapes HTML character references and trims the space of a given string.
Types ¶
type Importer ¶
type Importer interface {
ToIntermediateSchema() (*finc.IntermediateSchema, error)
}
Importer objects can be converted into an intermediate schema.
type JSONDecoderFunc ¶
JSONDecoderFunc turns a string into a single importable object.
type Source ¶
Source can emit records given a reader. The channel is of type []Importer, to allow the source to send objects over the channel in batches for performance (1000 x 1000 docs vs 1000000 x 1 doc).
type XMLDecoderFunc ¶
XMLDecoderFunc returns an importable document, given an XML decoder and a start element.