Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CountCharacters ¶
CountCharacters counts letters and digits in text (no spaces or punctuation).
func CountSentences ¶
CountSentences counts sentences by splitting on sentence-ending punctuation (., !, ?) followed by whitespace or end of text. Returns at least 1 for non-empty text.
func CountWords ¶
CountWords counts whitespace-delimited words in text. It is exactly len(strings.Fields(text)) — a word is a maximal run of non-space runes, space being IsSpace (exactly unicode.IsSpace) — but counts in a single rune scan instead of allocating the []string. CountWords is called per sentence, per paragraph, per file; the slice strings.Fields built only to be discarded was ~0.48 GB over the 600-file check gate (plan 175 profiling).
func ExtractPlainText ¶
ExtractPlainText extracts readable text from a goldmark AST node, stripping markdown syntax. Keeps: text content, link display text, emphasis inner text, image alt text, code span text.
func IsSpace ¶ added in v0.21.0
IsSpace reports whether r is a Unicode space, with exactly the result unicode.IsSpace gives but an inlinable ASCII fast path: for r < utf8.RuneSelf the only spaces are ' ' and '\t'..'\r', so two integer comparisons decide it and only genuine non-ASCII runes pay for unicode.IsSpace's table lookup. It is called per rune of every word of every file on the check hot path, where unicode.IsSpace alone was ~5.5% of CPU (plan 175 profiling).
func Slugify ¶ added in v0.6.0
Slugify converts heading text to a GitHub-compatible URL anchor slug. Lowercase, letters/digits preserved, spaces and hyphens become a single dash.
func SplitSentences ¶
SplitSentences splits text into individual sentences using a Punkt sentence tokenizer. Handles abbreviations, decimals, and ellipses.
Types ¶
type TOCItem ¶ added in v0.6.0
TOCItem represents a single heading entry for table-of-contents generation.
func CollectTOCItems ¶ added in v0.6.0
CollectTOCItems returns all headings from the AST as TOC items, in document order. Anchors are disambiguated by insertion order: first occurrence keeps the plain slug, subsequent duplicates get -1, -2, … suffixes — matching the anchor computation in crossfilereferenceintegrity. Tracks used anchors (not just base slugs) to guarantee unique anchors even when a later heading's base slug matches an earlier heading's disambiguated anchor.