markdown

package

v0.4.2 Latest Latest Go to latest Published: Mar 20, 2026 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/yeasy/mdpress

Links

Open Source Insights

README ¶

Markdown Parser Module

This module provides a production-grade Markdown parser for the mdpress project, a Markdown-to-PDF book converter.

Overview

The markdown package wraps goldmark with carefully selected extensions to provide:

GFM Extensions: Tables, strikethrough, task lists, and autolinks
Code Syntax Highlighting: Via goldmark-highlighting
Footnotes: Full footnote support
Custom Heading IDs: Automatic ID generation for cross-referencing
Heading Collection: Automatic gathering of heading information for TOC generation
Cross-Reference Links: Support for internal document references

Features

Supported Markdown Extensions

Tables (GFM)
- Standard GitHub Flavored Markdown tables
- Column alignment support
Strikethrough (GFM)
- ~~text~~ syntax for strikethrough
Task Lists (GFM)
- - [ ] unchecked and - [x] checked syntax
Autolinks (GFM)
- <https://example.com> and <user@example.com> formats
Footnotes
- [^1] references with [^1]: content definitions
Code Highlighting
- Syntax highlighting for code blocks
- Multiple theme support (github, monokai, dracula, etc.)
Custom Heading IDs
- Automatic ID generation for headings
- Support for manual ID specification via attributes

Installation

This module is part of the mdpress project and requires:

go get github.com/yuin/goldmark
go get github.com/yuin/goldmark-highlighting/v2

Usage

Basic Parsing

package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser()

    source := []byte(`# Hello World
This is **bold** text.`)

    html, headings, err := parser.Parse(source)
    if err != nil {
        panic(err)
    }

    fmt.Println("HTML:", html)
    fmt.Println("Headings:", headings)
}

With Custom Code Theme

parser := markdown.NewParser(
    markdown.WithCodeTheme("monokai"),
)

html, headings, err := parser.Parse(source)

Collecting Headings for TOC

parser := markdown.NewParser()
html, headings, _ := parser.Parse(source)

// headings is []HeadingInfo with Level, Text, and ID
for _, h := range headings {
    fmt.Printf("%d. %s -> #%s\n", h.Level, h.Text, h.ID)
}

Changing Code Theme After Creation

parser := markdown.NewParser()
parser.SetCodeTheme("github")
html, _, _ := parser.Parse(source)

API Reference

Parser

The main parser struct that holds the goldmark instance and manages parsing state.

type Parser struct {
    // ... private fields
}

NewParser

Creates and returns a new Markdown parser instance.

func NewParser(opts ...ParserOption) *Parser

Parameters:

opts: Variable number of ParserOption functions for customization

Returns:

*Parser: Initialized parser instance

Example:

parser := markdown.NewParser(
    markdown.WithCodeTheme("github"),
)

Parse

Parses Markdown source code and returns HTML string and heading information.

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parameters:

source: Markdown source code as byte slice

Returns:

string: Generated HTML content
[]HeadingInfo: Slice of collected heading information
error: Any parsing errors

Example:

html, headings, err := parser.Parse([]byte("# Title\nContent"))
if err != nil {
    log.Fatal(err)
}

SetCodeTheme

Sets the code syntax highlighting theme.

func (p *Parser) SetCodeTheme(theme string)

Parameters:

theme: Theme name (e.g., "github", "monokai", "dracula")

Note: Reinitializes the parser with the new theme

Supported Themes:

github (default)
monokai
dracula
solarized-dark
solarized-light
And others supported by goldmark-highlighting

GetHeadings

Retrieves all collected heading information from the last parse.

func (p *Parser) GetHeadings() []HeadingInfo

Returns:

[]HeadingInfo: Thread-safe copy of collected headings

HeadingInfo

Structure containing information about a heading.

type HeadingInfo struct {
    Level int    // Heading level (1-6)
    Text  string // Heading text content
    ID    string // Custom heading ID for cross-referencing
}

ParserOption

Functional option type for customizing parser behavior.

type ParserOption func(*Parser)

Built-in Options:

WithCodeTheme(theme string): Set code highlighting theme
WithExtensions(exts ...goldmark.Extender): Add custom extensions
WithParserOptions(opts ...parser.Option): Set goldmark parser options

Examples

Complete Document Parsing

package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser(
        markdown.WithCodeTheme("github"),
    )

    md := []byte(`# Go Tutorial

## Chapter 1

Here's a code example:

\`\`\`go
func main() {
    fmt.Println("Hello")
}
\`\`\`

| Feature | Support |
|---------|---------|
| Tables  | Yes     |
| Code    | Yes     |

- [x] Implemented
- [ ] TODO
`)

    html, headings, err := parser.Parse(md)
    if err != nil {
        panic(err)
    }

    // Print TOC
    for _, h := range headings {
        fmt.Printf("%s# %s\n",
            repeatString("  ", h.Level-1), h.Text)
    }
}

func repeatString(s string, count int) string {
    result := ""
    for i := 0; i < count; i++ {
        result += s
    }
    return result
}

Multi-Document Processing

parser := markdown.NewParser()

documents := [][]byte{
    []byte("# Document 1\nContent..."),
    []byte("# Document 2\nContent..."),
}

for _, doc := range documents {
    html, headings, _ := parser.Parse(doc)
    // Process html and headings
}

Implementation Details

Heading ID Generation

IDs are automatically generated from heading text by:

Converting to lowercase
Removing special characters
Replacing spaces with hyphens
Trimming leading/trailing hyphens

Example: "Hello World" → "hello-world"

Heading Collection

The parser uses a custom AST transformer to collect heading information during parsing. This is done in a thread-safe manner using mutexes.

Extensions Architecture

The module uses goldmark's extensibility properly:

Built-in GFM extensions for standard features
Custom transformers for heading ID generation
Cross-reference link resolver for internal references

Thread Safety

The Parser instance is thread-safe for concurrent parsing operations
Heading collection uses internal synchronization
Recommended: Create one Parser instance and reuse it

Performance

Benchmarks on typical documents:

Simple documents: ~1-2ms
Complex documents with multiple features: ~5-10ms

For optimal performance:

Reuse Parser instances
Parse in parallel for multiple documents
Consider caching results for frequently parsed content

Error Handling

The Parse method returns detailed errors:

Syntax errors in the markdown (rare with goldmark)
Rendering errors (usually configuration issues)

if html, headings, err := parser.Parse(source); err != nil {
    fmt.Printf("Parse error: %v\n", err)
    // Handle error
}

Testing

The module includes comprehensive tests:

Unit tests for basic functionality
Integration tests for extension support
Benchmark tests for performance measurement

Run tests with:

go test ./internal/markdown -v

Future Enhancements

Potential improvements for future versions:

Custom renderer for PDF-specific formatting
Math equation support (KaTeX/MathJax)
Diagram support (Mermaid)
Custom CSS class insertion for styling
Metadata/front-matter extraction

Troubleshooting

Issue: Code blocks not highlighted

Solution: Ensure language tag is specified in code fence (e.g., ```go)

Issue: Table not rendering

Solution: Ensure table format follows GFM specification with proper separators

Issue: Heading IDs have conflicts

Solution: The parser automatically appends numbers to duplicate IDs (e.g., "hello-world-1")

Contributing

When extending this module:

Maintain backward compatibility
Add tests for new features
Follow the existing code style
Update this documentation
Ensure thread safety for concurrent operations

License

This module is part of the mdpress project.

Documentation ¶

Overview ¶

math.go implements pre/post processing for math formulas.

Problem: goldmark follows CommonMark spec where `_` inside words may be treated as emphasis delimiters, so `$x_1^2$` becomes `$x<em>1</em>^2$`, breaking the formula structure.

Solution: Before goldmark processes the Markdown source, replace $$...$$ and $...$ with placeholder tokens (e.g. MDPMATHBLOCK000000) that contain no Markdown special characters. After goldmark renders HTML, replace the placeholders back with HTML span elements that KaTeX auto-render can find.

Package markdown 提供 Markdown 解析和 HTML 转换功能。基于 goldmark 库，支持 GFM 扩展、代码高亮、脚注等特性。

核心类型：

Parser: Markdown 解析器，调用 Parse() 返回 HTML 和标题列表
HeadingInfo: 标题信息（级别、文本、ID），用于目录生成

使用示例：

p := markdown.NewParser(markdown.WithCodeTheme("monokai"))
html, headings, err := p.Parse(source)

Package markdown 提供 Markdown 解析和 HTML 转换功能。基于 goldmark 库，支持 GFM 扩展、代码高亮、脚注等特性。

postprocess.go performs post-processing on HTML emitted by goldmark. Includes: GFM Alert conversion ([!NOTE] etc.) and Mermaid code block conversion.

Index ¶

func HasMath(md string) bool
func KaTeXScript() string
func KaTeXScriptForEpub() string
func MermaidScript() string
func NeedsKaTeX(html string) bool
func NeedsMermaid(html string) bool
func PostProcess(html string) string
type Diagnostic
- func CollectDiagnostics(document ast.Node, source []byte) []Diagnostic
- func (d Diagnostic) Position() string
type DocumentProcessor
- func NewDocumentProcessor(maxConcurrency int) *DocumentProcessor
- func (dp *DocumentProcessor) ClearCache()
- func (dp *DocumentProcessor) ProcessFile(filePath string) *ProcessingResult
- func (dp *DocumentProcessor) ProcessFiles(filePaths []string) []*ProcessingResult
type HeadingInfo
type Parser
- func NewParser(opts ...ParserOption) *Parser
- func (p *Parser) GetHeadings() []HeadingInfo
- func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)
- func (p *Parser) ParseWithDiagnostics(source []byte) (string, []HeadingInfo, []Diagnostic, error)
- func (p *Parser) SetCodeTheme(theme string)
type ParserOption
- func WithCodeTheme(theme string) ParserOption
type ProcessingResult

Examples ¶

NewParser

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HasMath ¶ added in v0.3.0

func HasMath(md string) bool

HasMath reports whether the Markdown source contains any math formula syntax. Used as a quick check to decide whether math processing is needed.

func KaTeXScript ¶ added in v0.3.0

func KaTeXScript() string

KaTeXScript returns the HTML tags (link + scripts) needed to load KaTeX and its auto-render extension. The auto-render extension scans the document for $...$ and $$...$$ delimiters and renders them with KaTeX. Only include this when the HTML contains math elements (see NeedsKaTeX).

func KaTeXScriptForEpub ¶ added in v0.3.0

func KaTeXScriptForEpub() string

KaTeXScriptForEpub returns XHTML-compatible KaTeX script tags for use inside EPUB XHTML documents. Some EPUB readers (e.g. Apple Books) support JavaScript, so KaTeX can render math formulas in those readers.

func MermaidScript ¶

func MermaidScript() string

MermaidScript returns the <script> tags needed to load and initialise Mermaid. Only include this when the HTML contains .mermaid elements.

func NeedsKaTeX ¶ added in v0.3.0

func NeedsKaTeX(html string) bool

NeedsKaTeX reports whether the HTML contains any math formula elements produced by the math preprocessor.

func NeedsMermaid ¶

func NeedsMermaid(html string) bool

NeedsMermaid reports whether the HTML contains any Mermaid diagram elements.

func PostProcess ¶

func PostProcess(html string) string

PostProcess applies all post-processing transforms to goldmark-rendered HTML.

Types ¶

type Diagnostic ¶

type Diagnostic struct {
	Rule    string
	Line    int
	Column  int
	Message string
}

Diagnostic 表示构建期间发现的文档问题。

func CollectDiagnostics ¶

func CollectDiagnostics(document ast.Node, source []byte) []Diagnostic

CollectDiagnostics 收集 Markdown 文档中的结构化 warning。

func (Diagnostic) Position ¶

func (d Diagnostic) Position() string

Position 返回适合日志输出的位置字符串。

type DocumentProcessor ¶

type DocumentProcessor struct {
	// contains filtered or unexported fields
}

DocumentProcessor 批量文档处理器，支持并发

func NewDocumentProcessor ¶

func NewDocumentProcessor(maxConcurrency int) *DocumentProcessor

NewDocumentProcessor 创建文档处理器

func (*DocumentProcessor) ClearCache ¶

func (dp *DocumentProcessor) ClearCache()

ClearCache 清空缓存

func (*DocumentProcessor) ProcessFile ¶

func (dp *DocumentProcessor) ProcessFile(filePath string) *ProcessingResult

ProcessFile 处理单个 Markdown 文件

func (*DocumentProcessor) ProcessFiles ¶

func (dp *DocumentProcessor) ProcessFiles(filePaths []string) []*ProcessingResult

ProcessFiles 并发处理多个文件

type HeadingInfo ¶

type HeadingInfo struct {
	Level  int    // 标题等级 (1-6)
	Text   string // 标题文本内容
	ID     string // 标题 ID，用于交叉引用
	Line   int    // 标题所在行
	Column int    // 标题所在列
}

HeadingInfo 标题信息结构体，用于目录生成

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser Markdown 解析器

func NewParser ¶

func NewParser(opts ...ParserOption) *Parser

NewParser 创建并返回一个新的 Markdown 解析器实例

Example ¶

package main

import (
	"fmt"

	"github.com/yeasy/mdpress/internal/markdown"
)

func main() {
	parser := markdown.NewParser()
	html, headings, err := parser.Parse([]byte("# Hello\n\nWorld"))
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println("HTML length:", len(html) > 0)
	fmt.Println("Headings:", len(headings))
}

Output:
HTML length: true
Headings: 1

func (*Parser) GetHeadings ¶

func (p *Parser) GetHeadings() []HeadingInfo

GetHeadings 获取当前收集的所有标题信息

func (*Parser) Parse ¶

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parse 解析 Markdown 源代码，返回 HTML 和标题信息

func (*Parser) ParseWithDiagnostics ¶

func (p *Parser) ParseWithDiagnostics(source []byte) (string, []HeadingInfo, []Diagnostic, error)

ParseWithDiagnostics 解析 Markdown，并返回构建期 warning。

func (*Parser) SetCodeTheme ¶

func (p *Parser) SetCodeTheme(theme string)

SetCodeTheme 设置代码高亮主题

type ParserOption ¶

type ParserOption func(*Parser)

ParserOption 函数式选项类型

func WithCodeTheme ¶

func WithCodeTheme(theme string) ParserOption

WithCodeTheme 选项：设置代码高亮主题

type ProcessingResult ¶

type ProcessingResult struct {
	FilePath string
	HTML     string
	Headings []HeadingInfo
	Error    error
}

ProcessingResult 文档处理结果

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL