ngrams

package
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 12, 2024 License: AGPL-3.0 Imports: 8 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CalcWeights

func CalcWeights(ngmap map[string]*NGram, N int)

Calculate weights for ngrams found in a map of NGrams across N documents To be used after all document NGram maps are merged

func CosineSimilarity

func CosineSimilarity(ngmap map[string]*NGram)

Compute cosine similarity between all document pairs

func Count

func Count(ngrams map[string]*NGram) map[string]int

Count occurences of each NGram in a map of NGrams

func FilterMeaningfulNGrams

func FilterMeaningfulNGrams(ngmap map[string]*NGram, minDF int, maxDF int, minAvgTFIDF float64) []string

func Generate

func Generate(tokens []lexer.Lexeme, nrange []int, path string) map[string]*NGram

Generate NGrams from a slice of document lexemes

func Merge

func Merge(maps ...map[string]*NGram)

Merge 2 or more string->*NGram maps

func MergeStopWords

func MergeStopWords()

Merge custom stop words from config with stop words slice

func OrderByFrequency

func OrderByFrequency(m map[string]*NGram) []struct {
	Key   string
	Value float64
}

Create an slice of NGrams keywords, ordered by their weights

Types

type NGram

type NGram struct {
	// contains filtered or unexported fields
}

NGram type used throughout linking package

func (*NGram) Count

func (ng *NGram) Count() int

func (*NGram) Documents

func (ng *NGram) Documents() map[string]*NGramInfo

Non-interface getter methods

func (*NGram) Keyword

func (ng *NGram) Keyword() string

func (*NGram) Weight

func (ng *NGram) Weight() float64

NGram implements Keyword interface

type NGramInfo

type NGramInfo struct {
	DocumentCount  int
	DocumentWeight float64
}

Information about NGram occurences in a single document

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL