markdown

package

v0.7.2 Latest Latest Go to latest Published: Mar 30, 2026 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/yeasy/mdpress

Links

Open Source Insights

README ¶

Markdown Parser Module

This module provides a production-grade Markdown parser for the mdpress project, a Markdown-to-PDF book converter.

Overview

The markdown package wraps goldmark with carefully selected extensions to provide:

GFM Extensions: Tables, strikethrough, task lists, and autolinks
Code Syntax Highlighting: Via goldmark-highlighting
Footnotes: Full footnote support
Custom Heading IDs: Automatic ID generation for cross-referencing
Heading Collection: Automatic gathering of heading information for TOC generation
Cross-Reference Links: Support for internal document references

Features

Supported Markdown Extensions

Tables (GFM)
- Standard GitHub Flavored Markdown tables
- Column alignment support
Strikethrough (GFM)
- ~~text~~ syntax for strikethrough
Task Lists (GFM)
- - [ ] unchecked and - [x] checked syntax
Autolinks (GFM)
- <https://example.com> and <user@example.com> formats
Footnotes
- [^1] references with [^1]: content definitions
Code Highlighting
- Syntax highlighting for code blocks
- Multiple theme support (github, monokai, dracula, etc.)
Custom Heading IDs
- Automatic ID generation for headings
- Support for manual ID specification via attributes

Installation

This module is part of the mdpress project and requires:

go get github.com/yuin/goldmark
go get github.com/yuin/goldmark-highlighting/v2

Usage

Basic Parsing

package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser()

    source := []byte(`# Hello World
This is **bold** text.`)

    html, headings, err := parser.Parse(source)
    if err != nil {
        panic(err)
    }

    fmt.Println("HTML:", html)
    fmt.Println("Headings:", headings)
}

With Custom Code Theme

parser := markdown.NewParser(
    markdown.WithCodeTheme("monokai"),
)

html, headings, err := parser.Parse(source)

Collecting Headings for TOC

parser := markdown.NewParser()
html, headings, _ := parser.Parse(source)

// headings is []HeadingInfo with Level, Text, and ID
for _, h := range headings {
    fmt.Printf("%d. %s -> #%s\n", h.Level, h.Text, h.ID)
}

Changing Code Theme After Creation

parser := markdown.NewParser()
parser.SetCodeTheme("github")
html, _, _ := parser.Parse(source)

API Reference

Parser

The main parser struct that holds the goldmark instance and manages parsing state.

type Parser struct {
    // ... private fields
}

NewParser

Creates and returns a new Markdown parser instance.

func NewParser(opts ...ParserOption) *Parser

Parameters:

opts: Variable number of ParserOption functions for customization

Returns:

*Parser: Initialized parser instance

Example:

parser := markdown.NewParser(
    markdown.WithCodeTheme("github"),
)

Parse

Parses Markdown source code and returns HTML string and heading information.

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parameters:

source: Markdown source code as byte slice

Returns:

string: Generated HTML content
[]HeadingInfo: Slice of collected heading information
error: Any parsing errors

Example:

html, headings, err := parser.Parse([]byte("# Title\nContent"))
if err != nil {
    log.Fatal(err)
}

SetCodeTheme

Sets the code syntax highlighting theme.

func (p *Parser) SetCodeTheme(theme string)

Parameters:

theme: Theme name (e.g., "github", "monokai", "dracula")

Note: Reinitializes the parser with the new theme. Invalid theme names will fall back to the default style gracefully.

Supported Themes:

github (default)
monokai
dracula
solarized-dark
solarized-light
And others supported by goldmark-highlighting

HeadingInfo

Structure containing information about a heading.

type HeadingInfo struct {
    Level  int    // Heading level (1-6)
    Text   string // Heading text content
    ID     string // Custom heading ID for cross-referencing
    Line   int    // Line number of the heading
    Column int    // Column number of the heading
}

ParserOption

Functional option type for customizing parser behavior.

type ParserOption func(*Parser)

Built-in Options:

WithCodeTheme(theme string): Set code highlighting theme

Examples

Complete Document Parsing

package main

import (
    "fmt"
    "github.com/yeasy/mdpress/internal/markdown"
)

func main() {
    parser := markdown.NewParser(
        markdown.WithCodeTheme("github"),
    )

    md := []byte(`# Go Tutorial

## Chapter 1

Here's a code example:

\`\`\`go
func main() {
    fmt.Println("Hello")
}
\`\`\`

| Feature | Support |
|---------|---------|
| Tables  | Yes     |
| Code    | Yes     |

- [x] Implemented
- [ ] TODO
`)

    html, headings, err := parser.Parse(md)
    if err != nil {
        panic(err)
    }

    // Print TOC
    for _, h := range headings {
        fmt.Printf("%s# %s\n",
            repeatString("  ", h.Level-1), h.Text)
    }
}

func repeatString(s string, count int) string {
    result := ""
    for i := 0; i < count; i++ {
        result += s
    }
    return result
}

Multi-Document Processing

parser := markdown.NewParser()

documents := [][]byte{
    []byte("# Document 1\nContent..."),
    []byte("# Document 2\nContent..."),
}

for _, doc := range documents {
    html, headings, _ := parser.Parse(doc)
    // Process html and headings
}

Implementation Details

Heading ID Generation

IDs are automatically generated from heading text by:

Converting to lowercase
Removing special characters
Replacing spaces with hyphens
Trimming leading/trailing hyphens

Example: "Hello World" → "hello-world"

Heading Collection

The parser uses a custom AST transformer to collect heading information during parsing. This is done in a thread-safe manner using mutexes.

Extensions Architecture

The module uses goldmark's extensibility properly:

Built-in GFM extensions for standard features
Custom transformers for heading ID generation
Cross-reference link resolver for internal references

Thread Safety

The Parser instance is thread-safe for concurrent parsing operations
Heading collection uses internal synchronization
Recommended: Create one Parser instance and reuse it

Performance

Benchmarks on typical documents:

Simple documents: ~1-2ms
Complex documents with multiple features: ~5-10ms

For optimal performance:

Reuse Parser instances
Parse in parallel for multiple documents
Consider caching results for frequently parsed content

Error Handling

The Parse method returns detailed errors:

Syntax errors in the markdown (rare with goldmark)
Rendering errors (usually configuration issues)

if html, headings, err := parser.Parse(source); err != nil {
    fmt.Printf("Parse error: %v\n", err)
    // Handle error
}

Testing

The module includes comprehensive tests:

Unit tests for basic functionality
Integration tests for extension support
Benchmark tests for performance measurement

Run tests with:

go test ./internal/markdown -v

Future Enhancements

Potential improvements for future versions:

Custom renderer for PDF-specific formatting
Custom CSS class insertion for styling
Metadata/front-matter extraction

Troubleshooting

Issue: Code blocks not highlighted

Solution: Ensure language tag is specified in code fence (e.g., ```go)

Issue: Table not rendering

Solution: Ensure table format follows GFM specification with proper separators

Issue: Heading IDs have conflicts

Solution: The parser automatically appends numbers to duplicate IDs (e.g., "hello-world-1")

Contributing

When extending this module:

Maintain backward compatibility
Add tests for new features
Follow the existing code style
Update this documentation
Ensure thread safety for concurrent operations

License

This module is part of the mdpress project.

Documentation ¶

Overview ¶

math.go implements pre/post processing for math formulas.

Problem: goldmark follows CommonMark spec where `_` inside words may be treated as emphasis delimiters, so `$x_1^2$` becomes `$x<em>1</em>^2$`, breaking the formula structure.

Solution: Before goldmark processes the Markdown source, replace $$...$$ and $...$ with placeholder tokens (e.g. MDPMATHBLOCK000000) that contain no Markdown special characters. After goldmark renders HTML, replace the placeholders back with HTML span elements that KaTeX auto-render can find.

Package markdown provides Markdown parsing and HTML conversion. Built on the goldmark library, it supports GFM extensions, syntax highlighting, footnotes, and more.

Core types:

Parser: Markdown parser; call Parse() to get HTML and a heading list
HeadingInfo: Heading metadata (level, text, ID), used for TOC generation

Usage example:

p := markdown.NewParser(markdown.WithCodeTheme("monokai"))
html, headings, err := p.Parse(source)

Package markdown provides Markdown parsing and HTML conversion. Built on the goldmark library, it supports GFM extensions, syntax highlighting, footnotes, and more.

postprocess.go performs post-processing on HTML emitted by goldmark. Includes: GFM Alert conversion ([!NOTE] etc.) and Mermaid code block conversion.

Index ¶

func NeedsMermaid(html string) bool
type Diagnostic
type HeadingInfo
type Parser
- func NewParser(opts ...ParserOption) *Parser
type ParserOption
- func WithCodeTheme(theme string) ParserOption

Examples ¶

NewParser

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func NeedsMermaid ¶

func NeedsMermaid(html string) bool

NeedsMermaid reports whether the HTML contains any Mermaid diagram elements.

Types ¶

type Diagnostic ¶

type Diagnostic struct {
	Rule    string
	Line    int
	Column  int
	Message string
}

Diagnostic represents a document issue found during the build.

type HeadingInfo ¶

type HeadingInfo struct {
	Level  int    // Heading level (1-6)
	Text   string // Heading text content
	ID     string // Heading ID, used for cross-references
	Line   int    // Line number of the heading
	Column int    // Column number of the heading
}

HeadingInfo holds heading metadata, used for TOC generation.

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser is the Markdown parser.

func NewParser ¶

func NewParser(opts ...ParserOption) *Parser

NewParser creates and returns a new Markdown parser instance.

Example ¶

package main

import (
	"fmt"

	"github.com/yeasy/mdpress/internal/markdown"
)

func main() {
	parser := markdown.NewParser()
	html, headings, err := parser.Parse([]byte("# Hello\n\nWorld"))
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println("HTML length:", len(html) > 0)
	fmt.Println("Headings:", len(headings))
}

Output:
HTML length: true
Headings: 1

func (*Parser) Parse ¶

func (p *Parser) Parse(source []byte) (string, []HeadingInfo, error)

Parse parses Markdown source and returns HTML and heading information.

func (*Parser) ParseWithDiagnostics ¶

func (p *Parser) ParseWithDiagnostics(source []byte) (string, []HeadingInfo, []Diagnostic, error)

ParseWithDiagnostics parses Markdown and also returns build-time warnings.

func (*Parser) SetCodeTheme ¶

func (p *Parser) SetCodeTheme(theme string)

SetCodeTheme sets the syntax highlighting theme.

type ParserOption ¶

type ParserOption func(*Parser)

ParserOption is a functional option type.

func WithCodeTheme ¶

func WithCodeTheme(theme string) ParserOption

WithCodeTheme is an option that sets the syntax highlighting theme.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL