mdtext

package

v0.23.0 Latest Latest Go to latest Published: May 21, 2026 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jeduden/mdsmith

Links

Open Source Insights

Documentation ¶

Index ¶

func CountCharacters(text string) int
func CountSentences(text string) int
func CountWords(text string) int
func ExtractPlainText(node ast.Node, source []byte) string
func IsSpace(r rune) bool
func NonNegativeUTF16RuneLen(r rune) int
func Slugify(s string) string
func SplitSentences(text string) []string
func SplitSentencesInto(dst []string, text string) []string
func UTF16FromByteOffset(line []byte, byteOff int) int
func UTF16ToByteOffset(line []byte, target int) int
type TOCItem
- func CollectTOCItems(root ast.Node, source []byte) []TOCItem

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CountCharacters ¶

func CountCharacters(text string) int

CountCharacters counts letters and digits in text (no spaces or punctuation).

func CountSentences ¶

func CountSentences(text string) int

CountSentences counts sentences by splitting on sentence-ending punctuation (., !, ?) followed by whitespace or end of text. Returns at least 1 for non-empty text.

func CountWords ¶

func CountWords(text string) int

CountWords counts whitespace-delimited words in text. It is exactly len(strings.Fields(text)) — a word is a maximal run of non-space runes, space being IsSpace (exactly unicode.IsSpace) — but counts in a single rune scan instead of allocating the []string. CountWords is called per sentence, per paragraph, per file; the slice strings.Fields built only to be discarded was ~0.48 GB over the 600-file check gate (plan 175 profiling).

func ExtractPlainText ¶

func ExtractPlainText(node ast.Node, source []byte) string

ExtractPlainText extracts readable text from a goldmark AST node, stripping markdown syntax. Keeps: text content, link display text, emphasis inner text, image alt text, code span text.

func IsSpace ¶ added in v0.21.0

func IsSpace(r rune) bool

IsSpace reports whether r is a Unicode space, with exactly the result unicode.IsSpace gives but an inlinable ASCII fast path: for r < utf8.RuneSelf the only spaces are ' ' and '\t'..'\r', so two integer comparisons decide it and only genuine non-ASCII runes pay for unicode.IsSpace's table lookup. It is called per rune of every word of every file on the check hot path, where unicode.IsSpace alone was ~5.5% of CPU (plan 175 profiling).

func NonNegativeUTF16RuneLen ¶ added in v0.23.0

func NonNegativeUTF16RuneLen(r rune) int

NonNegativeUTF16RuneLen wraps utf16.RuneLen so its negative "invalid code point" return cannot decrement a caller's running UTF-16 unit total. utf8.DecodeRune already maps invalid bytes to RuneError (U+FFFD, width 1), so in practice utf16.RuneLen never returns a negative for runes decoded from real input; the guard is defensive against a future Go change that weakens that invariant. A negative width means the rune is outside [0, MaxRune] or is a surrogate, both of which take one UTF-16 unit when serialized as RuneError.

func Slugify ¶ added in v0.6.0

func Slugify(s string) string

Slugify converts heading text to a GitHub-compatible URL anchor slug. Lowercase, letters/digits preserved, spaces and hyphens become a single dash.

func SplitSentences ¶

func SplitSentences(text string) []string

SplitSentences splits text into individual sentences using a Punkt sentence tokenizer. Handles abbreviations, decimals, and ellipses. The actual segmentation is delegated to splitSentences (defined by the active build tag).

The returned slice is freshly allocated. Hot callers that want to pool the destination should use SplitSentencesInto instead.

func SplitSentencesInto ¶ added in v0.23.0

func SplitSentencesInto(dst []string, text string) []string

SplitSentencesInto is the pool-friendly variant of SplitSentences: it appends the segmented sentences (trimmed, non-empty) to dst and returns the extended slice. The intended pattern is

bufPtr := sentBufPool.Get().(*[]string)
*bufPtr = mdtext.SplitSentencesInto((*bufPtr)[:0], text)
defer sentBufPool.Put(bufPtr)

so the per-call `make([]string, 0, n)` plain SplitSentences pays is amortized across a sync.Pool. MDS024's hot path uses this form to stay within the per-rule allocation budget on cold-File runs.

func UTF16FromByteOffset ¶ added in v0.23.0

func UTF16FromByteOffset(line []byte, byteOff int) int

UTF16FromByteOffset returns the UTF-16 code-unit offset that corresponds to UTF-8 byte offset byteOff within line. The result is clamped to [0, total UTF-16 length of line] so callers cannot receive a negative or past-end position even when given a malformed byte column.

func UTF16ToByteOffset ¶ added in v0.23.0

func UTF16ToByteOffset(line []byte, target int) int

UTF16ToByteOffset returns the byte offset in line at the given UTF-16 code-unit count. Offsets past the line's end clamp to len(line) so a defensive guard upstream still sees an in-range value. A target that lands inside a surrogate pair rounds up to the next codepoint boundary.

Types ¶

type TOCItem ¶ added in v0.6.0

type TOCItem struct {
	Level  int
	Text   string
	Anchor string
}

TOCItem represents a single heading entry for table-of-contents generation.

func CollectTOCItems ¶ added in v0.6.0

func CollectTOCItems(root ast.Node, source []byte) []TOCItem

CollectTOCItems returns all headings from the AST as TOC items, in document order. Anchors are disambiguated by insertion order: first occurrence keeps the plain slug, subsequent duplicates get -1, -2, … suffixes — matching the anchor computation in crossfilereferenceintegrity. Tracks used anchors (not just base slugs) to guarantee unique anchors even when a later heading's base slug matches an earlier heading's disambiguated anchor.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL