The highest tagged major version is v2.

summarize

package

v1.2.1 Latest Latest Go to latest Published: Dec 22, 2020 License: MIT Imports: 9 Imported by: 6

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jdkato/prose

Links

Open Source Insights

Documentation ¶

Overview ¶

Package summarize implements utilities for computing readability scores, usage statistics, and TL;DR summaries of text.

Index ¶

func Syllables(word string) int
type Assessment
type Document
- func NewDocument(text string) *Document
type RankedParagraph
type Sentence
type Word

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Syllables ¶

func Syllables(word string) int

Syllables returns the number of syllables in the string word.

NOTE: This function expects a word (not raw text) as input.

Types ¶

type Assessment ¶

type Assessment struct {
	// assessments returning an estimated grade level
	AutomatedReadability float64
	ColemanLiau          float64
	FleschKincaid        float64
	GunningFog           float64
	SMOG                 float64
	LIX                  float64

	// mean & standard deviation of the above estimated grade levels
	MeanGradeLevel   float64
	StdDevGradeLevel float64

	// assessments returning non-grade numerical scores
	DaleChall   float64
	ReadingEase float64
}

An Assessment provides comprehensive access to a Document's metrics.

type Document ¶

type Document struct {
	Content         string         // Actual text
	NumCharacters   float64        // Number of Characters
	NumComplexWords float64        // PolysylWords without common suffixes
	NumParagraphs   float64        // Number of paragraphs
	NumPolysylWords float64        // Number of words with > 2 syllables
	NumSentences    float64        // Number of sentences
	NumSyllables    float64        // Number of syllables
	NumWords        float64        // Number of words
	NumLongWords    float64        // Number of long words
	Sentences       []Sentence     // the Document's sentences
	WordFrequency   map[string]int // [word]frequency

	SentenceTokenizer tokenize.ProseTokenizer
	WordTokenizer     tokenize.ProseTokenizer
}

A Document represents a collection of text to be analyzed.

A Document's calculations depend on its word and sentence tokenizers. You can use the defaults by invoking NewDocument, choose another implemention from the tokenize package, or use your own (as long as it implements the ProseTokenizer interface). For example,

d := Document{Content: ..., WordTokenizer: ..., SentenceTokenizer: ...}
d.Initialize()

func NewDocument ¶

func NewDocument(text string) *Document

NewDocument is a Document constructor that takes a string as an argument. It then calculates the data necessary for computing readability and usage statistics.

This is a convenience wrapper around the Document initialization process that defaults to using a WordBoundaryTokenizer and a PunktSentenceTokenizer as its word and sentence tokenizers, respectively.

func (*Document) Assess ¶

func (d *Document) Assess() *Assessment

Assess returns an Assessment for the Document d.

func (*Document) AutomatedReadability ¶

func (d *Document) AutomatedReadability() float64

AutomatedReadability computes the automated readability index score (https://en.wikipedia.org/wiki/Automated_readability_index).

func (*Document) ColemanLiau ¶

func (d *Document) ColemanLiau() float64

ColemanLiau computes the Coleman–Liau index score (https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index).

func (*Document) DaleChall ¶

func (d *Document) DaleChall() float64

DaleChall computes the Dale–Chall score (https://en.wikipedia.org/wiki/Dale%E2%80%93Chall_readability_formula).

func (*Document) FleschKincaid ¶

func (d *Document) FleschKincaid() float64

FleschKincaid computes the Flesch–Kincaid grade level (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) FleschReadingEase ¶

func (d *Document) FleschReadingEase() float64

FleschReadingEase computes the Flesch reading-ease score (https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests).

func (*Document) GunningFog ¶

func (d *Document) GunningFog() float64

GunningFog computes the Gunning Fog index score (https://en.wikipedia.org/wiki/Gunning_fog_index).

func (*Document) Initialize ¶

func (d *Document) Initialize()

Initialize calculates the data necessary for computing readability and usage statistics.

func (*Document) Keywords ¶

func (d *Document) Keywords() map[string]int

Keywords returns a Document's words in the form

map[word]count

omitting stop words and normalizing case.

func (*Document) LIX ¶ added in v1.1.1

func (d *Document) LIX() float64

LIX computes readability measure (https://en.wikipedia.org/wiki/Lix_(readability_test)) .

func (*Document) MeanWordLength ¶

func (d *Document) MeanWordLength() float64

MeanWordLength returns the mean number of characters per word.

func (*Document) SMOG ¶

func (d *Document) SMOG() float64

SMOG computes the SMOG grade (https://en.wikipedia.org/wiki/SMOG).

func (*Document) Summary ¶

func (d *Document) Summary(n int) []RankedParagraph

Summary returns a Document's n highest ranked paragraphs according to keyword frequency.

func (*Document) WordDensity ¶

func (d *Document) WordDensity() map[string]float64

WordDensity returns a map of each word and its density.

type RankedParagraph ¶

type RankedParagraph struct {
	Sentences []Sentence
	Position  int // the zero-based position within a Document
	Rank      int
}

A RankedParagraph is a paragraph ranked by its number of keywords.

type Sentence ¶

type Sentence struct {
	Text      string // the actual text
	Length    int    // the number of words
	Words     []Word // the words in this sentence
	Paragraph int
}

A Sentence represents a single sentence in a Document.

type Word ¶

type Word struct {
	Text      string // the actual text
	Syllables int    // the number of syllables
}

A Word represents a single word in a Document.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL