Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func TextToSentences ¶
TextToSentences splits text into sentences using a robust regex.
Types ¶
type Offset ¶
Offset represents a start and end position of a token within the original string.
func TextToSentencesWithOffsets ¶
TextToSentencesWithOffsets splits text into sentences and returns their offsets.
func TextToWordsWithOffsets ¶
TextToWordsWithOffsets splits text into words and returns their offsets.
type SentenceTokenizer ¶
func NewSentenceTokenizer ¶
func NewSentenceTokenizer(language string, minSentenceLen, streamContextLen int) *SentenceTokenizer
func (*SentenceTokenizer) Stream ¶
func (t *SentenceTokenizer) Stream(language string) tokenize.SentenceStream
type WordTokenizer ¶
type WordTokenizer struct {
Language string
}
func NewWordTokenizer ¶
func NewWordTokenizer(language string) *WordTokenizer
func (*WordTokenizer) Stream ¶
func (t *WordTokenizer) Stream(language string) tokenize.WordStream
Click to show internal directories.
Click to hide internal directories.