Documentation
¶
Index ¶
- func NewFile() corpus.File
- func NewIterator(scanner *bufio.Scanner) corpus.Iterator
- func NewPayload(line int, content string) corpus.Payload
- func NewRawCorpus(filePath string) corpus.Corpus
- type File
- type Payload
- type RawCorpus
- func (c *RawCorpus) CloseIterator() error
- func (c *RawCorpus) FetchCorpusFile() corpus.File
- func (c *RawCorpus) GetIterator(cache corpus.File) corpus.Iterator
- func (c *RawCorpus) Language() string
- func (c *RawCorpus) LocalPath() string
- func (c *RawCorpus) Size() string
- func (c *RawCorpus) Source() string
- func (c *RawCorpus) URL() string
- func (c *RawCorpus) WithLanguage(lang string) corpus.Corpus
- func (c *RawCorpus) WithSize(size string) corpus.Corpus
- func (c *RawCorpus) WithSource(source string) corpus.Corpus
- func (c *RawCorpus) WithURL(url string) corpus.Corpus
- func (c *RawCorpus) WithYear(year string) corpus.Corpus
- func (c *RawCorpus) Year() string
- type RawIterator
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewIterator ¶
NewIterator creates a new RawIterator from a scanner
func NewPayload ¶
NewPayload returns a new Payload from a line in the raw corpus. Since raw corpus files don't include line numbers, the line number must be provided separately.
func NewRawCorpus ¶
NewRawCorpus returns a new raw corpus instance
Types ¶
type File ¶
type File struct {
// contains filtered or unexported fields
}
File implements the corpus.File interface for raw corpus files. Raw corpus files are local files, so no caching is needed.
func (*File) WithCacheDir ¶
WithCacheDir is a no-op for raw files since they don't use caching
type Payload ¶
type Payload struct {
// contains filtered or unexported fields
}
Payload implements the corpus.Payload interface for raw corpus files. Raw corpus files contain one payload per line without line numbers.
func (*Payload) LineNumber ¶
LineNumber returns the line number of the payload
func (*Payload) SetContent ¶
SetContent sets the content of the payload
func (*Payload) SetLineNumber ¶
SetLineNumber sets the line number of the payload
type RawCorpus ¶
type RawCorpus struct {
// contains filtered or unexported fields
}
RawCorpus represents a corpus from a raw text file with one payload per line.
func (*RawCorpus) CloseIterator ¶
CloseIterator closes the underlying file the iterator is using.
func (*RawCorpus) FetchCorpusFile ¶
FetchCorpusFile returns a File interface for the raw corpus file. Since raw files are local, no downloading is needed.
func (*RawCorpus) GetIterator ¶
GetIterator returns an iterator for the corpus. Call CloseIterator to close the underlying file when done.
func (*RawCorpus) WithLanguage ¶
WithLanguage sets the language of the corpus (informational only for raw corpus)
func (*RawCorpus) WithSize ¶
WithSize sets the size of the corpus (informational only for raw corpus)
func (*RawCorpus) WithSource ¶
WithSource sets the source of the corpus (informational only for raw corpus)
type RawIterator ¶
type RawIterator struct {
// contains filtered or unexported fields
}
RawIterator implements the Iterator interface for raw corpus files. It reads one payload per line and automatically generates line numbers.
func (*RawIterator) HasNext ¶
func (r *RawIterator) HasNext() bool
HasNext returns true if there is another line in the corpus
func (*RawIterator) Next ¶
func (r *RawIterator) Next() corpus.Payload
Next returns the next payload from the corpus