markdown

package
v0.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: MIT Imports: 18 Imported by: 0

README

Markdown Parser Module

This module provides a production-grade Markdown parser for the mdpress project, a Markdown-to-PDF book converter.

Overview

The markdown package wraps goldmark with carefully selected extensions to provide:

  • GFM Extensions: Tables, strikethrough, task lists, and autolinks
  • Code Syntax Highlighting: Via goldmark-highlighting
  • Footnotes: Full footnote support
  • Custom Heading IDs: Automatic ID generation for cross-referencing
  • Heading Collection: Automatic gathering of heading information for TOC generation
  • Cross-Reference Links: Support for internal document references

Features

Supported Markdown Extensions
  1. Tables (GFM)

    • Standard GitHub Flavored Markdown tables
    • Column alignment support
  2. Strikethrough (GFM)

    • ~~text~~ syntax for strikethrough
  3. Task Lists (GFM)

    • - [ ] unchecked and - [x] checked syntax
  4. Autolinks (GFM)

    • <https://example.com> and <user@example.com> formats
  5. Footnotes

    • [^1] references with [^1]: content definitions
  6. Code Highlighting

    • Syntax highlighting for code blocks
    • Multiple theme support (github, monokai, dracula, etc.)
  7. Custom Heading IDs

    • Automatic ID generation for headings
    • Support for manual ID specification via attributes

Installation

This module is part of the mdpress project and requires:

go get github.com/yuin/goldmark
go get github.com/yuin/goldmark-highlighting/v2

Usage

Basic Parsing
package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser()

    source := []byte(`# Hello World
This is **bold** text.`)

    html, headings, err := parser.Parse(source)
    if err != nil {
        panic(err)
    }

    fmt.Println("HTML:", html)
    fmt.Println("Headings:", headings)
}
With Custom Code Theme
parser := markdown.NewParser(
    markdown.WithCodeTheme("monokai"),
)

html, headings, err := parser.Parse(source)
Collecting Headings for TOC
parser := markdown.NewParser()
html, headings, _ := parser.Parse(source)

// headings is []HeadingInfo with Level, Text, and ID
for _, h := range headings {
    fmt.Printf("%d. %s -> #%s\n", h.Level, h.Text, h.ID)
}
Changing Code Theme After Creation
parser := markdown.NewParser()
parser.SetCodeTheme("github")
html, _, _ := parser.Parse(source)

API Reference

Parser

The main parser struct that holds the goldmark instance and manages parsing state.

type Parser struct {
    // ... private fields
}
NewParser

Creates and returns a new Markdown parser instance.

func NewParser(opts ...ParserOption) *Parser

Parameters:

  • opts: Variable number of ParserOption functions for customization

Returns:

  • *Parser: Initialized parser instance

Example:

parser := markdown.NewParser(
    markdown.WithCodeTheme("github"),
)
Parse

Parses Markdown source code and returns HTML string and heading information.

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parameters:

  • source: Markdown source code as byte slice

Returns:

  • string: Generated HTML content
  • []HeadingInfo: Slice of collected heading information
  • error: Any parsing errors

Example:

html, headings, err := parser.Parse([]byte("# Title\nContent"))
if err != nil {
    log.Fatal(err)
}
SetCodeTheme

Sets the code syntax highlighting theme.

func (p *Parser) SetCodeTheme(theme string)

Parameters:

  • theme: Theme name (e.g., "github", "monokai", "dracula")

Note: Reinitializes the parser with the new theme

Supported Themes:

  • github (default)
  • monokai
  • dracula
  • solarized-dark
  • solarized-light
  • And others supported by goldmark-highlighting
GetHeadings

Retrieves all collected heading information from the last parse.

func (p *Parser) GetHeadings() []HeadingInfo

Returns:

  • []HeadingInfo: Thread-safe copy of collected headings
HeadingInfo

Structure containing information about a heading.

type HeadingInfo struct {
    Level int    // Heading level (1-6)
    Text  string // Heading text content
    ID    string // Custom heading ID for cross-referencing
}
ParserOption

Functional option type for customizing parser behavior.

type ParserOption func(*Parser)

Built-in Options:

  • WithCodeTheme(theme string): Set code highlighting theme
  • WithExtensions(exts ...goldmark.Extender): Add custom extensions
  • WithParserOptions(opts ...parser.Option): Set goldmark parser options

Examples

Complete Document Parsing
package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser(
        markdown.WithCodeTheme("github"),
    )

    md := []byte(`# Go Tutorial

## Chapter 1

Here's a code example:

\`\`\`go
func main() {
    fmt.Println("Hello")
}
\`\`\`

| Feature | Support |
|---------|---------|
| Tables  | Yes     |
| Code    | Yes     |

- [x] Implemented
- [ ] TODO
`)

    html, headings, err := parser.Parse(md)
    if err != nil {
        panic(err)
    }

    // Print TOC
    for _, h := range headings {
        fmt.Printf("%s# %s\n",
            repeatString("  ", h.Level-1), h.Text)
    }
}

func repeatString(s string, count int) string {
    result := ""
    for i := 0; i < count; i++ {
        result += s
    }
    return result
}
Multi-Document Processing
parser := markdown.NewParser()

documents := [][]byte{
    []byte("# Document 1\nContent..."),
    []byte("# Document 2\nContent..."),
}

for _, doc := range documents {
    html, headings, _ := parser.Parse(doc)
    // Process html and headings
}

Implementation Details

Heading ID Generation

IDs are automatically generated from heading text by:

  1. Converting to lowercase
  2. Removing special characters
  3. Replacing spaces with hyphens
  4. Trimming leading/trailing hyphens

Example: "Hello World" → "hello-world"

Heading Collection

The parser uses a custom AST transformer to collect heading information during parsing. This is done in a thread-safe manner using mutexes.

Extensions Architecture

The module uses goldmark's extensibility properly:

  • Built-in GFM extensions for standard features
  • Custom transformers for heading ID generation
  • Cross-reference link resolver for internal references

Thread Safety

  • The Parser instance is thread-safe for concurrent parsing operations
  • Heading collection uses internal synchronization
  • Recommended: Create one Parser instance and reuse it

Performance

Benchmarks on typical documents:

  • Simple documents: ~1-2ms
  • Complex documents with multiple features: ~5-10ms

For optimal performance:

  • Reuse Parser instances
  • Parse in parallel for multiple documents
  • Consider caching results for frequently parsed content

Error Handling

The Parse method returns detailed errors:

  • Syntax errors in the markdown (rare with goldmark)
  • Rendering errors (usually configuration issues)
if html, headings, err := parser.Parse(source); err != nil {
    fmt.Printf("Parse error: %v\n", err)
    // Handle error
}

Testing

The module includes comprehensive tests:

  • Unit tests for basic functionality
  • Integration tests for extension support
  • Benchmark tests for performance measurement

Run tests with:

go test ./internal/markdown -v

Future Enhancements

Potential improvements for future versions:

  • Custom renderer for PDF-specific formatting
  • Math equation support (KaTeX/MathJax)
  • Diagram support (Mermaid)
  • Custom CSS class insertion for styling
  • Metadata/front-matter extraction

Troubleshooting

Issue: Code blocks not highlighted
  • Solution: Ensure language tag is specified in code fence (e.g., ```go)
Issue: Table not rendering
  • Solution: Ensure table format follows GFM specification with proper separators
Issue: Heading IDs have conflicts
  • Solution: The parser automatically appends numbers to duplicate IDs (e.g., "hello-world-1")

Contributing

When extending this module:

  1. Maintain backward compatibility
  2. Add tests for new features
  3. Follow the existing code style
  4. Update this documentation
  5. Ensure thread safety for concurrent operations

License

This module is part of the mdpress project.

Documentation

Overview

math.go implements pre/post processing for math formulas.

Problem: goldmark follows CommonMark spec where `_` inside words may be treated as emphasis delimiters, so `$x_1^2$` becomes `$x<em>1</em>^2$`, breaking the formula structure.

Solution: Before goldmark processes the Markdown source, replace $$...$$ and $...$ with placeholder tokens (e.g. MDPMATHBLOCK000000) that contain no Markdown special characters. After goldmark renders HTML, replace the placeholders back with HTML span elements that KaTeX auto-render can find.

Package markdown 提供 Markdown 解析和 HTML 转换功能。 基于 goldmark 库,支持 GFM 扩展、代码高亮、脚注等特性。

核心类型:

  • Parser: Markdown 解析器,调用 Parse() 返回 HTML 和标题列表
  • HeadingInfo: 标题信息(级别、文本、ID),用于目录生成

使用示例:

p := markdown.NewParser(markdown.WithCodeTheme("monokai"))
html, headings, err := p.Parse(source)

Package markdown 提供 Markdown 解析和 HTML 转换功能。 基于 goldmark 库,支持 GFM 扩展、代码高亮、脚注等特性。

postprocess.go performs post-processing on HTML emitted by goldmark. Includes: GFM Alert conversion ([!NOTE] etc.) and Mermaid code block conversion.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func HasMath added in v0.3.0

func HasMath(md string) bool

HasMath reports whether the Markdown source contains any math formula syntax. Used as a quick check to decide whether math processing is needed.

func KaTeXScript added in v0.3.0

func KaTeXScript() string

KaTeXScript returns the HTML tags (link + scripts) needed to load KaTeX and its auto-render extension. The auto-render extension scans the document for $...$ and $$...$$ delimiters and renders them with KaTeX. Only include this when the HTML contains math elements (see NeedsKaTeX).

func KaTeXScriptForEpub added in v0.3.0

func KaTeXScriptForEpub() string

KaTeXScriptForEpub returns XHTML-compatible KaTeX script tags for use inside EPUB XHTML documents. Some EPUB readers (e.g. Apple Books) support JavaScript, so KaTeX can render math formulas in those readers.

func MermaidScript

func MermaidScript() string

MermaidScript returns the <script> tags needed to load and initialise Mermaid. Only include this when the HTML contains .mermaid elements.

func NeedsKaTeX added in v0.3.0

func NeedsKaTeX(html string) bool

NeedsKaTeX reports whether the HTML contains any math formula elements produced by the math preprocessor.

func NeedsMermaid

func NeedsMermaid(html string) bool

NeedsMermaid reports whether the HTML contains any Mermaid diagram elements.

func PostProcess

func PostProcess(html string) string

PostProcess applies all post-processing transforms to goldmark-rendered HTML.

Types

type Diagnostic

type Diagnostic struct {
	Rule    string
	Line    int
	Column  int
	Message string
}

Diagnostic 表示构建期间发现的文档问题。

func CollectDiagnostics

func CollectDiagnostics(document ast.Node, source []byte) []Diagnostic

CollectDiagnostics 收集 Markdown 文档中的结构化 warning。

func (Diagnostic) Position

func (d Diagnostic) Position() string

Position 返回适合日志输出的位置字符串。

type DocumentProcessor

type DocumentProcessor struct {
	// contains filtered or unexported fields
}

DocumentProcessor 批量文档处理器,支持并发

func NewDocumentProcessor

func NewDocumentProcessor(maxConcurrency int) *DocumentProcessor

NewDocumentProcessor 创建文档处理器

func (*DocumentProcessor) ClearCache

func (dp *DocumentProcessor) ClearCache()

ClearCache 清空缓存

func (*DocumentProcessor) ProcessFile

func (dp *DocumentProcessor) ProcessFile(filePath string) *ProcessingResult

ProcessFile 处理单个 Markdown 文件

func (*DocumentProcessor) ProcessFiles

func (dp *DocumentProcessor) ProcessFiles(filePaths []string) []*ProcessingResult

ProcessFiles 并发处理多个文件

type HeadingInfo

type HeadingInfo struct {
	Level  int    // 标题等级 (1-6)
	Text   string // 标题文本内容
	ID     string // 标题 ID,用于交叉引用
	Line   int    // 标题所在行
	Column int    // 标题所在列
}

HeadingInfo 标题信息结构体,用于目录生成

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser Markdown 解析器

func NewParser

func NewParser(opts ...ParserOption) *Parser

NewParser 创建并返回一个新的 Markdown 解析器实例

Example
package main

import (
	"fmt"

	"github.com/yeasy/mdpress/internal/markdown"
)

func main() {
	parser := markdown.NewParser()
	html, headings, err := parser.Parse([]byte("# Hello\n\nWorld"))
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println("HTML length:", len(html) > 0)
	fmt.Println("Headings:", len(headings))
}
Output:
HTML length: true
Headings: 1

func (*Parser) GetHeadings

func (p *Parser) GetHeadings() []HeadingInfo

GetHeadings 获取当前收集的所有标题信息

func (*Parser) Parse

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parse 解析 Markdown 源代码,返回 HTML 和标题信息

func (*Parser) ParseWithDiagnostics

func (p *Parser) ParseWithDiagnostics(source []byte) (string, []HeadingInfo, []Diagnostic, error)

ParseWithDiagnostics 解析 Markdown,并返回构建期 warning。

func (*Parser) SetCodeTheme

func (p *Parser) SetCodeTheme(theme string)

SetCodeTheme 设置代码高亮主题

type ParserOption

type ParserOption func(*Parser)

ParserOption 函数式选项类型

func WithCodeTheme

func WithCodeTheme(theme string) ParserOption

WithCodeTheme 选项:设置代码高亮主题

type ProcessingResult

type ProcessingResult struct {
	FilePath string
	HTML     string
	Headings []HeadingInfo
	Error    error
}

ProcessingResult 文档处理结果

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL