Documentation
¶
Overview ¶
Package summarize implements utilities for computing readability scores, usage statistics, and TL;DR summaries of text.
Index ¶
- func Syllables(word string) int
- type Assessment
- type Document
- func (d *Document) Assess() *Assessment
- func (d *Document) AutomatedReadability() float64
- func (d *Document) ColemanLiau() float64
- func (d *Document) DaleChall() float64
- func (d *Document) FleschKincaid() float64
- func (d *Document) FleschReadingEase() float64
- func (d *Document) GunningFog() float64
- func (d *Document) Initialize()
- func (d *Document) Keywords() map[string]int
- func (d *Document) LIX() float64
- func (d *Document) MeanWordLength() float64
- func (d *Document) SMOG() float64
- func (d *Document) Summary(n int) []RankedParagraph
- func (d *Document) WordDensity() map[string]float64
- type RankedParagraph
- type Sentence
- type Word
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Assessment ¶
type Assessment struct {
// assessments returning an estimated grade level
AutomatedReadability float64
ColemanLiau float64
FleschKincaid float64
GunningFog float64
SMOG float64
LIX float64
// mean & standard deviation of the above estimated grade levels
MeanGradeLevel float64
StdDevGradeLevel float64
// assessments returning non-grade numerical scores
DaleChall float64
ReadingEase float64
}
An Assessment provides comprehensive access to a Document's metrics.
type Document ¶
type Document struct {
Content string // Actual text
NumCharacters float64 // Number of Characters
NumComplexWords float64 // PolysylWords without common suffixes
NumParagraphs float64 // Number of paragraphs
NumPolysylWords float64 // Number of words with > 2 syllables
NumSentences float64 // Number of sentences
NumSyllables float64 // Number of syllables
NumWords float64 // Number of words
NumLongWords float64 // Number of long words
Sentences []Sentence // the Document's sentences
WordFrequency map[string]int // [word]frequency
SentenceTokenizer tokenize.ProseTokenizer
WordTokenizer tokenize.ProseTokenizer
}
A Document represents a collection of text to be analyzed.
A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,
d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...}
d.Initialize()
func NewDocument ¶
NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.
This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.
func (*Document) Assess ¶
func (d *Document) Assess() *Assessment
Assess returns an Assessment for the Document d.
func (*Document) AutomatedReadability ¶
AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).
func (*Document) ColemanLiau ¶
ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).
func (*Document) DaleChall ¶
DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).
func (*Document) FleschKincaid ¶
FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).
func (*Document) FleschReadingEase ¶
FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).
func (*Document) GunningFog ¶
GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).
func (*Document) Initialize ¶
func (d *Document) Initialize()
Initialize calculates the data necessary for computing readability and usage statistics.
func (*Document) Keywords ¶
Keywords returns a Document's words in the form
map[word]count
omitting stop words and normalizing case.
func (*Document) LIX ¶ added in v1.1.1
LIX computes readability measure (https://en.wikipedia.org/wiki/Lix_(readability_test)) .
func (*Document) MeanWordLength ¶
MeanWordLength returns the mean number of characters per word.
func (*Document) SMOG ¶
SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).
func (*Document) Summary ¶
func (d *Document) Summary(n int) []RankedParagraph
Summary returns a Document's n highest ranked paragraphs according to keyword frequency.
func (*Document) WordDensity ¶
WordDensity returns a map of each word and its density.
type RankedParagraph ¶
type RankedParagraph struct {
Sentences []Sentence
Position int // the zero-based position within a Document
Rank int
}
A RankedParagraph is a paragraph ranked by its number of keywords.