parsercombinator

package module
v1.0.13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 12, 2025 License: Apache-2.0 Imports: 9 Imported by: 4

README

Parser Combinator Library for Go

A powerful and flexible parser combinator library for Go, specifically designed for building parsers that work with pre-tokenized input to construct Abstract Syntax Trees (ASTs).

Features

  • Token-based parsing: Works with pre-tokenized input rather than raw strings
  • Type-safe: Leverages Go's generics for type safety
  • Comprehensive error handling: Advanced error reporting with custom messages
  • Stack overflow protection: Built-in recursion depth limiting to prevent infinite loops
  • Debugging support: Built-in tracing capabilities
  • Recovery mechanisms: Error recovery for robust parsing
  • Lookahead support: Positive and negative lookahead operations
  • Composable: Easy to combine simple parsers into complex ones

Installation

go get github.com/shibukawa/parsercombinator

Quick Start

package main

import (
    "fmt"
    "strconv"
    pc "github.com/shibukawa/parsercombinator"
)

// Define a simple calculator parser
func main() {
    // Create basic parsers
    digit := pc.Trace("digit", func(pctx *pc.ParseContext[int], src []pc.Token[int]) (int, []pc.Token[int], error) {
        if src[0].Type == "raw" {
            i, err := strconv.Atoi(src[0].Raw)
            if err != nil {
                return 0, nil, pc.NewErrNotMatch("integer", src[0].Raw, src[0].Pos)
            }
            return 1, []pc.Token[int]{{Type: "digit", Pos: src[0].Pos, Val: i}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("digit", src[0].Type, src[0].Pos)
    })

    operator := pc.Trace("operator", func(pctx *pc.ParseContext[int], src []pc.Token[int]) (int, []pc.Token[int], error) {
        if src[0].Type == "raw" && (src[0].Raw == "+" || src[0].Raw == "-") {
            return 1, []pc.Token[int]{{Type: "operator", Pos: src[0].Pos, Raw: src[0].Raw}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("operator", src[0].Raw, src[0].Pos)
    })

    // Combine parsers
    expression := pc.Seq(digit, operator, digit)

    // Parse input
    context := pc.NewParseContext[int]()
    result, err := pc.EvaluateWithRawTokens(context, []string{"5", "+", "3"}, expression)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    
    fmt.Printf("Result: %v\n", result) // [5, 3] (operator values are 0)
}

Core Components

Parser Function

The core type is Parser[T]:

type Parser[T any] func(*ParseContext[T], []Token[T]) (consumed int, newTokens []Token[T], err error)
Token Structure
type Token[T any] struct {
    Type string  // Token type identifier
    Pos  *Pos    // Position information
    Raw  string  // Original raw text
    Val  T       // Parsed value
}
Parse Context
type ParseContext[T any] struct {
    Tokens         []Token[T]     // Input tokens
    Pos            int            // Current position
    RemainedTokens []Token[T]     // Remaining tokens after parsing
    Results        []Token[T]     // Parsed result tokens
    Traces         []*TraceInfo   // Debug traces
    Errors         []*ParseError  // Collected errors
    Depth          int            // Current recursion depth
    TraceEnable    bool           // Enable/disable tracing
    MaxDepth       int            // Maximum allowed recursion depth (0 = no limit)
}

Basic Combinators

Sequence (Seq)

Matches parsers in sequence:

parser := pc.Seq(digit, operator, digit) // Matches: digit operator digit
Choice (Or)

Tries parsers in order, returns first successful match:

parser := pc.Or(digit, string, identifier) // Matches any of the alternatives
Choice Parsing with Or

The Or parser tries multiple alternatives and returns the result from the parser that consumes the most tokens (longest match). This ensures predictable behavior and compatibility with complex expression patterns.

// Basic usage - tries each parser in order, returns longest match
parser := pc.Or(
    longKeyword,    // e.g., "interface"
    shortKeyword,   // e.g., "if"
    identifier,     // e.g., "interfaceType"
)

// For input "interface", this will match longKeyword (9 tokens)
// rather than shortKeyword (2 tokens), even if shortKeyword appears first
Important Considerations
  1. Longest Match Behavior: Always returns the alternative that consumes the most tokens
  2. Order Independence: For unambiguous grammars, parser order doesn't matter
  3. Ambiguous Grammars: Be careful with overlapping patterns
// Good: Unambiguous alternatives
parser := pc.Or(
    stringLiteral,   // "hello"
    numberLiteral,   // 42
    identifier,      // variable
)

// Potentially problematic: Overlapping patterns
parser := pc.Or(
    pc.String("for"),     // exact match
    identifier,           // any identifier (including "for")
)
// Solution: Put more specific patterns first or use longer matches
Working with Expression Parsing

The longest match behavior is particularly useful for expression parsing:

// Expression parser that handles operator precedence correctly
expr := pc.Or(
    binaryExpression,    // "a + b * c" (longer)
    primaryExpression,   // "a" (shorter)
)
// Will correctly choose the longer binary expression
Repetition
  • ZeroOrMore: Matches zero or more occurrences
  • OneOrMore: Matches one or more occurrences
  • Repeat: Matches with specific min/max counts
  • Optional: Matches zero or one occurrence
numbers := pc.ZeroOrMore("numbers", digit)
requiredNumbers := pc.OneOrMore("required-numbers", digit)
exactlyThree := pc.Repeat("exactly-three", 3, 3, digit)
maybeDigit := pc.Optional(digit)

Advanced Features

Lookahead Operations
// Positive lookahead - check without consuming
parser := pc.Seq(
    pc.Lookahead(keyword), // Check for keyword
    actualParser,          // Then parse normally
)

// Negative lookahead - ensure something doesn't follow
parser := pc.Seq(
    identifier,
    pc.NotFollowedBy(digit), // Identifier not followed by digit
)

// Peek - get result without consuming
parser := pc.Seq(
    pc.Peek(nextToken), // See what's coming
    conditionalParser,   // Parse based on peek result
)
Error Handling and User-Friendly Messages
// Label for better error messages
numberParser := pc.Label("number", digit)

// Expected - for specific error messages
parser := pc.Or(
    validExpression,
    pc.Expected[int]("closing parenthesis"),
)

// Fail - for explicit failures
parser := pc.Or(
    implementedFeature,
    pc.Fail[int]("feature not implemented in this version"),
)
Error Recovery
// Recover from errors and continue parsing
parser := pc.Recover(
    pc.Digit(),        // Precondition check
    parseStatement,    // Main parsing logic
    pc.Until(";"),     // Recovery: skip until semicolon
)
Transformation

Transform parsed results:

// Transform tokens
addParser := pc.Trans(
    pc.Seq(digit, operator, digit),
    func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
        result := tokens[0].Val + tokens[2].Val // Add the numbers
        return []pc.Token[int]{{
            Type: "result",
            Pos:  tokens[0].Pos,
            Val:  result,
        }}, nil
    },
)
⚠️ Important: Avoiding Infinite Loops in Transformations

When using Trans() to transform tokens, be extremely careful about token compatibility to avoid infinite loops:

// ❌ DANGEROUS: This can cause infinite loops
badParser := pc.Trans(
    pc.Trace("digit", func(pctx *pc.ParseContext[int], src []pc.Token[int]) (int, []pc.Token[int], error) {
        if src[0].Type == "raw" {
            i, err := strconv.Atoi(src[0].Raw)
            if err != nil {
                return 0, nil, pc.NewErrNotMatch("integer", src[0].Raw, src[0].Pos)
            }
            // ❌ PROBLEM: Still produces "raw" type that can be re-parsed
            return 1, []pc.Token[int]{{Type: "raw", Pos: src[0].Pos, Raw: src[0].Raw, Val: i}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("digit", src[0].Type, src[0].Pos)
    }),
    func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
        // The transformed token is still "raw" type - can be parsed again!
        return tokens, nil // ❌ This creates an infinite loop risk
    },
)

// ✅ SAFE: Change token type to prevent re-parsing
goodParser := pc.Trans(
    pc.Trace("digit", func(pctx *pc.ParseContext[int], src []pc.Token[int]) (int, []pc.Token[int], error) {
        if src[0].Type == "raw" {
            i, err := strconv.Atoi(src[0].Raw)
            if err != nil {
                return 0, nil, pc.NewErrNotMatch("integer", src[0].Raw, src[0].Pos)
            }
            return 1, []pc.Token[int]{{Type: "digit", Pos: src[0].Pos, Val: i}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("digit", src[0].Type, src[0].Pos)
    }),
    func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
        // ✅ SAFE: Token type is "digit", not "raw" - won't be re-parsed
        return []pc.Token[int]{{
            Type: "number",  // Different type prevents re-parsing
            Pos:  tokens[0].Pos,
            Val:  tokens[0].Val,
        }}, nil
    },
)

Key Safety Rules:

  1. Always change token types: Never output tokens that could be consumed by the same parser again
  2. Check for state changes: Ensure transformations make meaningful progress (different Type or Val)
  3. Use stack overflow protection: Set appropriate MaxDepth limits as a safety net
  4. Test with tracing: Enable tracing to detect unexpected re-parsing patterns

Detection Strategy:

// Enable tracing to detect infinite loops during development
context := pc.NewParseContext[int]()
context.TraceEnable = true
context.MaxDepth = 50 // Lower limit for debugging

result, err := pc.EvaluateWithRawTokens(context, input, parser)
if errors.Is(err, pc.ErrStackOverflow) {
    fmt.Println("Potential infinite loop detected!")
    context.DumpTrace() // Examine the trace for repeated patterns
}
Automatic Transformation Safety Checks

The library includes optional automatic safety checks that can detect potentially dangerous transformations at runtime:

// Enable automatic transformation safety checks
context := pc.NewParseContext[int]()
context.OrMode = pc.OrModeSafe          // Must be in Safe mode
context.CheckTransformSafety = true     // Enable safety checks

// Parser with potentially unsafe transformation
parser := pc.Trans(
    someParser,
    func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
        // If this transformation returns tokens that the same parser 
        // would consume again with the same result, a warning will be logged
        return tokens, nil // Identity transformation - potentially unsafe!
    },
)

How It Works:

  1. After each transformation in Safe mode (when CheckTransformSafety is enabled)
  2. The system attempts to re-parse the transformed tokens with the same parser
  3. If the re-parsing produces identical results, a warning is logged to stderr
  4. The parsing continues normally, but the warning helps identify potential infinite loops

Safety Check Output Example:

Warning: Transformation safety check failed: potential infinite loop in transformation at myfile.go:123 - parser produces same result when applied to transformed tokens

Configuration:

  • Only active when OrMode is OrModeSafe
  • Must explicitly enable with CheckTransformSafety = true
  • Warnings are logged to stderr but don't stop parsing
  • Uses runtime caller information to show exact file and line location

Limitations:

  • Cannot detect all unsafe patterns (e.g., side effects in transformations)
  • May produce false positives for complex transformations with external state
  • Performance overhead when enabled (use primarily during development)
  • Only checks immediate re-parsing, not multi-step transformation chains

🧵 Stepwise, Flexible Parsing: Favoring Loose, Composable Patterns Over Monolithic Parsers

Traditional parser combinator tutorials often encourage writing a single, strict, monolithic parser that consumes the entire input in one go. However, this approach can make your parser:

  • Hard to maintain: Large, one-shot parsers are difficult to debug and extend
  • Unfriendly error messages: Errors are often reported only at the top level, making it hard to pinpoint the real cause
  • Difficult to reuse: You can't easily extract or reuse sub-parsers for partial matching, splitting, or iterative extraction
Design Intent: Stepwise, Composable Parsing

This library encourages a stepwise, composable parsing style:

  1. Divide and conquer: Split your parsing into small, focused steps (e.g., split statements, then parse each statement)
  2. Partial matching: Use loose, tolerant patterns to extract chunks, then parse the inside with stricter rules
  3. Iterative extraction: Use combinators like Find, Split, or FindIter (see below) to process input in stages
  4. Better error messages: By isolating each step, you can provide more precise, user-friendly errors
Example: Stepwise Parsing for Statements

Suppose you want to parse a list of statements, but want to give a good error message for each statement, not just for the whole program:

// Step 1: Split input into statements (e.g., by semicolon)
statements := pc.Split("statements", pc.Literal(";"))

// Step 2: Parse each statement individually
statementParser := pc.Or(assignStmt, ifStmt, exprStmt)

// Step 3: Map over statements, parse each, collect errors
var results []ASTNode
for i, stmtTokens := range statements {
    ctx := pc.NewParseContext[ASTNode]()
    node, err := pc.EvaluateWithTokens(ctx, stmtTokens, statementParser)
    if err != nil {
        fmt.Printf("Error in statement %d: %v\n", i+1, err)
        continue
    }
    results = append(results, node[0].Val)
}
Example: Partial Matching with Find

You can extract only the relevant part of the input, then parse it:

// Find the first block delimited by braces
blockTokens, found := pc.Find("block", pc.Between(pc.Literal("{"), pc.Literal("}")))
if found {
    ctx := pc.NewParseContext[ASTNode]()
    node, err := pc.EvaluateWithTokens(ctx, blockTokens, blockParser)
    // ...
}
Example: Iterative Extraction with FindIter
// Extract all quoted strings from input
for _, strTokens := range pc.FindIter("quoted", quotedStringParser) {
    // Process each quoted string
}
Benefits
  • Easier debugging: Isolate and test each step separately
  • Better error messages: Report errors at the most relevant level
  • Flexible: Mix and match strict and loose parsing as needed
  • Reusable: Use the same sub-parsers for different tasks (e.g., validation, extraction, transformation)
API: Find, Split, SplitN, FindIter

These utility APIs make partial and stepwise parsing easy and idiomatic in Go:

Find

Finds the first match of a parser in the input tokens, and returns the tokens before, the matched tokens, and the tokens after the match.

skipped, match, remained, found := pc.Find(ctx, parser, tokens)
  • skipped: tokens before the match
  • match: the matched tokens
  • consume: the consumed tokens that uses to produce match. If there is not Trans(), it is as same as len(match).
  • remained: tokens after the match
  • found: true if a match was found
Split

Splits the input tokens by a separator parser, returning a slice of Pair structs. Each Pair contains the skipped tokens before the separator and the matched (converted) tokens for the separator itself. The last element's Match will be nil and Skipped will be the remaining tail. (Like strings.Split, but with richer information.)

for _, consume := range pc.Split(ctx, sepParser, tokens) {
    // consume.Skipped: tokens before the separator
    // consume.Match:   matched (converted) tokens for the separator, or nil at the end
    // consume.Consume: Consumed token before converting to consume.Match
    // consume.Last.    It is a last node or not.
}
SplitN

Splits the input tokens by a separator parser, up to N pieces, returning a slice of Pair structs (like Split, but limited to N splits).

for _, consume := range pc.SplitN(ctx, sepParser, tokens, n) {
    // consume.Skipped: tokens before the separator
    // consume.Match:   matched (converted) tokens for the separator, or nil at the end
    // consume.Consume: Consumed token before converting to consume.Match
    // consume.Last.    It is a last node or not.
}
FindIter

Iterates over all non-overlapping matches of a parser in the input tokens. FindIter returns a channel of two values per iteration: the tokens skipped before the match, and the matched (converted) tokens. On the last iteration, the matched tokens will be nil and the skipped tokens will be the remaining tail.

Example usage (see easy_test.go for real code):

for i, consume := range pc.FindIter(ctx, parser, tokens) {
    // index:           loop index
    // consume.Skipped: tokens before the separator
    // consume.Match:   matched (converted) tokens for the separator, or nil at the end
    // consume.Consume: Consumed token before converting to consume.Match
    // consume.Last.    It is a last node or not.
}

This Go-idiomatic iterator pattern allows you to process matches and skipped regions in a natural, readable way. You can break early from the loop as needed.

These APIs allow you to build flexible, composable, and user-friendly parsers that can extract, split, and iterate over parts of your token stream with minimal boilerplate.

See easy_test.go and examples/ for concrete usage.

Recursive Parsers with Alias and Lazy

Parser combinators provide two approaches for handling recursive grammars:

pc.Lazy - Simple Self-Recursion

For simple self-referential parsers, use pc.Lazy:

var expression pc.Parser[Entity]

expression = pc.Or(
    literal(),
    pc.Trans(
        pc.Seq(
            leftParen(),
            pc.Lazy(func() pc.Parser[Entity] { return expression }),
            rightParen(),
        ),
        func(pctx *pc.ParseContext[Entity], tokens []pc.Token[Entity]) ([]pc.Token[Entity], error) {
            return []pc.Token[Entity]{tokens[1]}, nil
        },
    ),
)
pc.NewAlias - Complex and Mutual Recursion

For mutual recursion or complex grammars, use NewAlias:

// Define recursive grammar safely
defineExpr, exprAlias := pc.NewAlias[int]("expression")

// Primary expressions (numbers, parenthesized expressions)
primaryExpr := pc.Or(
    digit,
    pc.Trans(
        pc.Seq(pc.Literal("("), exprAlias, pc.Literal(")")), // Safe recursive reference
        func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
            return []pc.Token[int]{tokens[1]}, nil // Return inner expression
        },
    ),
)

// Define the alias (this completes the recursion)
expression := defineExpr(primaryExpr)
⚠️ Critical: Avoiding Left Recursion

Left recursion causes infinite loops and must be avoided in parser combinators. Understanding and preventing this is crucial for safe parsing.

What Is Left Recursion?

Left recursion occurs when a parser rule directly or indirectly calls itself as the first step in parsing:

// ❌ DANGEROUS: Direct left recursion
expressionBody, expression := pc.NewAlias[int]("expression")
expression = expressionBody(
    pc.Or(
        pc.Seq(expression, operator, expression), // ← 'expression' calls itself first!
        number,
    ),
)

// ❌ DANGEROUS: Indirect left recursion
// A → B C, B → A d   (A indirectly calls itself through B)
defineA, aliasA := pc.NewAlias[int]("A")
defineB, aliasB := pc.NewAlias[int]("B")

parserA := defineA(pc.Seq(aliasB, pc.Literal("C")))
parserB := defineB(pc.Seq(aliasA, pc.Literal("d")))  // ← Indirect recursion!
Why Left Recursion is Dangerous
  1. Infinite Construction Loop: Parsers are Go functions/closures. During construction, left recursion creates an infinite call chain
  2. Stack Overflow: The Go runtime stack overflows before any parsing begins
  3. No Runtime Protection: This happens at parser construction time, not parse time
  4. Silent Failure: Often manifests as mysterious stack overflow errors
Safe Expression Parsing Patterns

Use precedence climbing and right recursion instead:

// ✅ SAFE: Precedence climbing with iterative patterns
func CreateSafeExpressionParser() pc.Parser[int] {
    defineExpr, exprAlias := pc.NewAlias[int]("expr")
    
    // Primary expressions (highest precedence)
    primaryExpr := pc.Or(
        Number(),
        pc.Trans( // Parenthesized expressions
            pc.Seq(pc.Literal("("), exprAlias, pc.Literal(")")),
            func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
                return []pc.Token[int]{tokens[1]}, nil
            },
        ),
    )
    
    // Multiplication/Division (higher precedence)
    mulExpr := pc.Trans(
        pc.Seq(
            primaryExpr,
            pc.ZeroOrMore("mul_ops", pc.Seq(
                pc.Or(pc.Literal("*"), pc.Literal("/")),
                primaryExpr,
            )),
        ),
        func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
            result := tokens[0].Val
            for i := 1; i < len(tokens); i += 2 {
                op := tokens[i].Raw
                right := tokens[i+1].Val
                switch op {
                case "+": result += right
                case "-": result -= right
                case "*": result *= right
                case "/": result /= right
                }
            }
            return []pc.Token[int]{{Type: "expr", Val: result, Pos: tokens[0].Pos}}, nil
        },
    )
    
    // Addition/Subtraction (lower precedence)
    addExpr := pc.Trans(
        pc.Seq(
            mulExpr,
            pc.ZeroOrMore("add_ops", pc.Seq(
                pc.Or(pc.Literal("+"), pc.Literal("-")),
                mulExpr,
            )),
        ),
        func(pctx *pc.ParseContext[int], tokens []pc.Token[int]) ([]pc.Token[int], error) {
            result := tokens[0].Val
            for i := 1; i < len(tokens); i += 2 {
                op := tokens[i].Raw
                right := tokens[i+1].Val
                switch op {
                case "+": result += right
                case "-": result -= right
                }
            }
            return []pc.Token[int]{{Type: "result", Val: result, Pos: tokens[0].Pos}}, nil
        },
    )
    
    // Complete the recursion
    defineExpr(addExpr)
    return addExpr
}

func NumberLiteral() pc.Parser[ASTNode] {
    return pc.Trans(
        pc.Label("number literal", Digit()),
        func(pctx *pc.ParseContext[ASTNode], tokens []pc.Token[ASTNode]) ([]pc.Token[ASTNode], error) {
            // Extract value from original token and create new AST node
            digitToken := tokens[0]
            astNode := &LiteralNode{
                pos:   digitToken.Pos,
                value: digitToken.Val, // Preserve original int value
            }
            
            return []pc.Token[ASTNode]{{
                Type: "ast_node",
                Pos:  digitToken.Pos,
                Val:  astNode,
            }}, nil
        },
    )
}

func BinaryExpression() pc.Parser[ASTNode] {
    return pc.Trans(
        pc.Seq(NumberLiteral(), Operator(), NumberLiteral()),
        func(pctx *pc.ParseContext[ASTNode], tokens []pc.Token[ASTNode]) ([]pc.Token[ASTNode], error) {
            // Reference existing AST nodes to build new node
            leftNode := tokens[0].Val.(ASTNode)    // Old node reference
            opToken := tokens[1]                   // Operator token
            rightNode := tokens[2].Val.(ASTNode)   // Old node reference
            
            // Create new AST node
            binaryNode := &BinaryOpNode{
                pos:   leftNode.Position(),
                left:  leftNode,
                op:    opToken.Raw,
                right: rightNode,
            }
            
            return []pc.Token[ASTNode]{{
                Type: "ast_node",
                Pos:  leftNode.Position(),
                Val:  binaryNode,
            }}, nil
        },
    )
}
Complex AST Construction Patterns

For more complex structures, staged construction makes management easier:

// Function call node
type FunctionCallNode struct {
    pos       *pc.Pos
    name      string
    arguments []ASTNode
}

func (n *FunctionCallNode) Type() string { return "FunctionCall" }
func (n *FunctionCallNode) Position() *pc.Pos { return n.pos }
func (n *FunctionCallNode) String() string {
    args := make([]string, len(n.arguments))
    for i, arg := range n.arguments {
        args[i] = arg.String()
    }
    return fmt.Sprintf("%s(%s)", n.name, strings.Join(args, ", "))
}

// Argument list construction
func ArgumentList() pc.Parser[ASTNode] {
    return pc.Trans(
        pc.Seq(
            pc.Literal("("),
            pc.Optional(pc.Seq(
                Expression(),
                pc.ZeroOrMore("additional_args", pc.Seq(pc.Literal(","), Expression())),
            )),
            pc.Literal(")"),
        ),
        func(pctx *pc.ParseContext[ASTNode], tokens []pc.Token[ASTNode]) ([]pc.Token[ASTNode], error) {
            var arguments []ASTNode
            
            // If optional arguments exist
            if len(tokens) > 2 && tokens[1].Type == "ast_node" {
                // First argument
                arguments = append(arguments, tokens[1].Val.(ASTNode))
                
                // Additional arguments (, expression repetitions)
                for i := 2; i < len(tokens)-1; i += 2 {
                    if tokens[i].Type == "ast_node" {
                        arguments = append(arguments, tokens[i].Val.(ASTNode))
                    }
                }
            }
            
            // Create meta-node representing argument list
            argListNode := &ArgumentListNode{
                pos:  tokens[0].Pos,
                args: arguments,
            }
            
            return []pc.Token[ASTNode]{{
                Type: "argument_list",
                Pos:  tokens[0].Pos,
                Val:  argListNode,
            }}, nil
        },
    )
}

// Function call construction
func FunctionCall() pc.Parser[ASTNode] {
    return pc.Trans(
        pc.Seq(Identifier(), ArgumentList()),
        func(pctx *pc.ParseContext[ASTNode], tokens []pc.Token[ASTNode]) ([]pc.Token[ASTNode], error) {
            nameToken := tokens[0]
            argListNode := tokens[1].Val.(*ArgumentListNode)
            
            funcCallNode := &FunctionCallNode{
                pos:       nameToken.Pos,
                name:      nameToken.Raw,
                arguments: argListNode.args,
            }
            
            return []pc.Token[ASTNode]{{
                Type: "ast_node", 
                Pos:  nameToken.Pos,
                Val:  funcCallNode,
            }}, nil
        },
    )
}
Post-Tree Processing Patterns

For processing after tree construction, these approaches are effective:

// Visitor pattern for AST processing
type ASTVisitor interface {
    VisitBinaryOp(node *BinaryOpNode) error
    VisitLiteral(node *LiteralNode) error
    VisitFunctionCall(node *FunctionCallNode) error
}

// Type checker example
type TypeChecker struct {
    errors []error
    symbolTable map[string]Type
}

func (tc *TypeChecker) VisitBinaryOp(node *BinaryOpNode) error {
    // Process left and right child nodes recursively
    if err := node.left.Accept(tc); err != nil {
        return err
    }
    if err := node.right.Accept(tc); err != nil {
        return err
    }
    
    // Type checking logic
    leftType := tc.getNodeType(node.left)
    rightType := tc.getNodeType(node.right)
    
    if !tc.isCompatible(leftType, rightType, node.op) {
        return fmt.Errorf("type error: %s and %s cannot be used with operator %s", 
                         leftType, rightType, node.op)
    }
    
    return nil
}

// Transform pattern for AST transformation
type ASTTransformer interface {
    Transform(node ASTNode) (ASTNode, error)
}

// Optimizer example
type Optimizer struct{}

func (o *Optimizer) Transform(node ASTNode) (ASTNode, error) {
    switch n := node.(type) {
    case *BinaryOpNode:
        // Constant folding optimization
        if isConstant(n.left) && isConstant(n.right) {
            result := evaluateConstant(n)
            return &LiteralNode{pos: n.pos, value: result}, nil
        }
        
        // Recursively optimize child nodes
        optimizedLeft, err := o.Transform(n.left)
        if err != nil {
            return nil, err
        }
        optimizedRight, err := o.Transform(n.right)
        if err != nil {
            return nil, err
        }
        
        return &BinaryOpNode{
            pos:   n.pos,
            left:  optimizedLeft,
            op:    n.op,
            right: optimizedRight,
        }, nil
        
    default:
        return node, nil
    }
}
Multi-Pass Processing Patterns

Real compilers typically use multiple passes for processing:

// Main compiler processing
func CompileProgram(input []string) (*Program, error) {
    // Pass 1: Syntax analysis (using parser combinators)
    context := pc.NewParseContext[ASTNode]()
    ast, err := pc.EvaluateWithRawTokens(context, input, Program())
    if err != nil {
        return nil, fmt.Errorf("syntax analysis error: %w", err)
    }
    
    programNode := ast[0].Val.(*ProgramNode)
    
    // Pass 2: Symbol table construction
    symbolBuilder := &SymbolTableBuilder{}
    if err := programNode.Accept(symbolBuilder); err != nil {
        return nil, fmt.Errorf("symbol analysis error: %w", err)
    }
    
    // Pass 3: Type checking
    typeChecker := &TypeChecker{symbolTable: symbolBuilder.table}
    if err := programNode.Accept(typeChecker); err != nil {
        return nil, fmt.Errorf("type checking error: %w", err)
    }
    
    // Pass 4: Optimization
    optimizer := &Optimizer{}
    optimizedAST, err := optimizer.Transform(programNode)
    if err != nil {
        return nil, fmt.Errorf("optimization error: %w", err)
    }
    
    // Pass 5: Code generation
    codeGenerator := &CodeGenerator{}
    code, err := codeGenerator.Generate(optimizedAST)
    if err != nil {
        return nil, fmt.Errorf("code generation error: %w", err)
    }
    
    return &Program{AST: optimizedAST, Code: code}, nil
}

This way, parser combinators are used for the syntax analysis stage, and subsequent processing combines traditional AST processing patterns (Visitor, Transform, Multi-pass) to build practical compilers.

Structured Data Validation Patterns

Tree Structure Serialization for Validation

The approach you suggested is highly effective. Here's how to serialize tree structures with pseudo-nodes for parser validation:

// Tree structure representation
type TreeNode struct {
    Type     string
    Value    interface{}
    Children []*TreeNode
    Pos      *pc.Pos
}

// Serialization pseudo-tokens
type SerializedToken struct {
    Type  string  // "open", "close", "leaf"
    Node  string  // Node name
    Value interface{}
    Pos   *pc.Pos
}

// Serialize tree (DFS order to pseudo-token stream)
func SerializeTree(node *TreeNode) []SerializedToken {
    var tokens []SerializedToken
    
    if len(node.Children) == 0 {
        // Leaf node
        tokens = append(tokens, SerializedToken{
            Type:  "leaf",
            Node:  node.Type,
            Value: node.Value,
            Pos:   node.Pos,
        })
    } else {
        // Internal node: start
        tokens = append(tokens, SerializedToken{
            Type:  "open",
            Node:  node.Type,
            Value: node.Value,
            Pos:   node.Pos,
        })
        
        // Recursively process child nodes
        for _, child := range node.Children {
            tokens = append(tokens, SerializeTree(child)...)
        }
        
        // Internal node: end
        tokens = append(tokens, SerializedToken{
            Type: "close",
            Node: node.Type,
            Pos:  node.Pos,
        })
    }
    
    return tokens
}

// Validator for serialized tokens
func ValidateHTMLStructure() pc.Parser[bool] {
    // HTML tag start
    htmlOpen := pc.Trace("html_open", func(pctx *pc.ParseContext[bool], src []pc.Token[bool]) (int, []pc.Token[bool], error) {
        token := src[0].Val.(SerializedToken)
        if token.Type == "open" && token.Node == "html" {
            return 1, []pc.Token[bool]{{Type: "validated", Val: true}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("HTML start tag", token.Node, src[0].Pos)
    })
    
    // HTML tag end
    htmlClose := pc.Trace("html_close", func(pctx *pc.ParseContext[bool], src []pc.Token[bool]) (int, []pc.Token[bool], error) {
        token := src[0].Val.(SerializedToken)
        if token.Type == "close" && token.Node == "html" {
            return 1, []pc.Token[bool]{{Type: "validated", Val: true}}, nil
        }
        return 0, nil, pc.NewErrNotMatch("HTML end tag", token.Node, src[0].Pos)
    })
    
    // body element validation
    bodyElement := pc.Seq(
        pc.Literal("body_open"),
        pc.ZeroOrMore("body_content", pc.Or(textContent, divElement)),
        pc.Literal("body_close"),
    )
    
    // Complete HTML structure validation
    return pc.Seq(htmlOpen, headElement, bodyElement, htmlClose)
}

// Execute validation
func ValidateHTMLTree(tree *TreeNode) error {
    // Serialize tree
    tokens := SerializeTree(tree)
    
    // Validate with parser combinators
    context := pc.NewParseContext[bool]()
    _, err := pc.EvaluateWithTokens(context, tokens, ValidateHTMLStructure())
    
    return err
}
Schema-Based Structure Validation

More general schema validation patterns:

// Schema definition
type Schema struct {
    Type       string             // "object", "array", "string", etc.
    Properties map[string]*Schema // Object properties
    Items      *Schema            // Array element schema
    Required   []string           // Required fields
    MinItems   int               // Minimum array elements
    MaxItems   int               // Maximum array elements
}

// JSON-like data structure
type DataNode struct {
    Type  string                 // "object", "array", "string", "number", "boolean"
    Value interface{}            // Actual value
    Props map[string]*DataNode   // Object properties
    Items []*DataNode            // Array elements
    Pos   *pc.Pos
}

// Schema validation parser generator
func CreateSchemaValidator(schema *Schema) pc.Parser[bool] {
    return pc.Trace(fmt.Sprintf("validate_%s", schema.Type), 
        func(pctx *pc.ParseContext[bool], src []pc.Token[bool]) (int, []pc.Token[bool], error) {
            data := src[0].Val.(*DataNode)
            
            // Type check
            if data.Type != schema.Type {
                return 0, nil, pc.NewErrNotMatch(
                    fmt.Sprintf("type %s", schema.Type), 
                    data.Type, 
                    data.Pos,
                )
            }
            
            switch schema.Type {
            case "object":
                return validateObject(schema, data, pctx)
            case "array":
                return validateArray(schema, data, pctx)
            default:
                return validatePrimitive(schema, data)
            }
        })
}

func validateObject(schema *Schema, data *DataNode, pctx *pc.ParseContext[bool]) (int, []pc.Token[bool], error) {
    // Required field validation
    for _, required := range schema.Required {
        if _, exists := data.Props[required]; !exists {
            return 0, nil, pc.NewErrCritical(
                fmt.Sprintf("required field '%s' not found", required),
                data.Pos,
            )
        }
    }
    
    // Validate each property
    for propName, propData := range data.Props {
        propSchema, exists := schema.Properties[propName]
        if !exists {
            return 0, nil, pc.NewErrNotMatch(
                "valid property",
                propName,
                propData.Pos,
            )
        }
        
        // Recursively validate property
        validator := CreateSchemaValidator(propSchema)
        _, _, err := validator(pctx, []pc.Token[bool]{{Val: propData, Pos: propData.Pos}})
        if err != nil {
            return 0, nil, fmt.Errorf("property '%s': %w", propName, err)
        }
    }
    
    return 1, []pc.Token[bool]{{Type: "validated_object", Val: true}}, nil
}

// Configuration file validation example
func ValidateConfigFile() pc.Parser[bool] {
    // Configuration file schema definition
    configSchema := &Schema{
        Type: "object",
        Required: []string{"server", "database"},
        Properties: map[string]*Schema{
            "server": {
                Type: "object",
                Required: []string{"host", "port"},
                Properties: map[string]*Schema{
                    "host": {Type: "string"},
                    "port": {Type: "number"},
                    "ssl":  {Type: "boolean"},
                },
            },
            "database": {
                Type: "object",
                Required: []string{"url"},
                Properties: map[string]*Schema{
                    "url":         {Type: "string"},
                    "max_connections": {Type: "number"},
                },
            },
        },
    }
    
    return pc.Label("configuration file", CreateSchemaValidator(configSchema))
}
Flat Structure Partial Validation

Methods for partial validation of existing structured data:

// CSV data row validation
type CSVRow struct {
    Fields []string
    LineNo int
}

func ValidateCSVRow(expectedColumns []string, validators map[string]pc.Parser[bool]) pc.Parser[bool] {
    return pc.Trans(
        pc.Trace("csv_row", func(pctx *pc.ParseContext[bool], src []pc.Token[bool]) (int, []pc.Token[bool], error) {
            row := src[0].Val.(*CSVRow)
            
            // Column count check
            if len(row.Fields) != len(expectedColumns) {
                return 0, nil, pc.NewErrNotMatch(
                    fmt.Sprintf("%d fields", len(expectedColumns)),
                    fmt.Sprintf("%d fields", len(row.Fields)),
                    &pc.Pos{Line: row.LineNo},
                )
            }
            
            // Validate each field
            for i, field := range row.Fields {
                columnName := expectedColumns[i]
                if validator, exists := validators[columnName]; exists {
                    fieldToken := pc.Token[bool]{
                        Type: "field",
                        Raw:  field,
                        Pos:  &pc.Pos{Line: row.LineNo, Column: i + 1},
                        Val:  field,
                    }
                    
                    _, _, err := validator(pctx, []pc.Token[bool]{fieldToken})
                    if err != nil {
                        return 0, nil, fmt.Errorf("column '%s' (line %d): %w", columnName, row.LineNo, err)
                    }
                }
            }
            
            return 1, []pc.Token[bool]{{Type: "validated_row", Val: true}}, nil
        }),
        func(pctx *pc.ParseContext[bool], tokens []pc.Token[bool]) ([]pc.Token[bool], error) {
            return tokens, nil
        },
    )
}

// Usage example: User data CSV validation
func CreateUserCSVValidator() pc.Parser[bool] {
    columns := []string{"name", "email", "age", "active"}
    
    validators := map[string]pc.Parser[bool]{
        "name": pc.Label("username", validateNonEmptyString()),
        "email": pc.Label("email address", validateEmail()),
        "age": pc.Label("age", validatePositiveNumber()),
        "active": pc.Label("active flag", validateBoolean()),
    }
    
    return pc.OneOrMore("csv_rows", ValidateCSVRow(columns, validators))
}
Real-time Validation Patterns

Validation for streaming or real-time data:

// Event stream validation
type Event struct {
    Type      string
    Timestamp time.Time
    Data      interface{}
    Pos       *pc.Pos
}

// State machine-based sequence validation
func ValidateEventSequence() pc.Parser[bool] {
    // User login flow validation
    loginFlow := pc.Seq(
        pc.Label("login start", expectEvent("login_start")),
        pc.Optional(pc.Label("auth attempt", expectEvent("auth_attempt"))),
        pc.Or(
            pc.Label("login success", expectEvent("login_success")),
            pc.Seq(
                pc.Label("login failure", expectEvent("login_failure")),
                pc.Optional(pc.Label("retry", ValidateEventSequence())), // Recursively allow retry
            ),
        ),
    )
    
    return loginFlow
}

func expectEvent(eventType string) pc.Parser[bool] {
    return pc.Trace(eventType, func(pctx *pc.ParseContext[bool], src []pc.Token[bool]) (int, []pc.Token[bool], error) {
        event := src[0].Val.(*Event)
        if event.Type == eventType {
            return 1, []pc.Token[bool]{{Type: "validated_event", Val: true}}, nil
        }
        return 0, nil, pc.NewErrNotMatch(eventType, event.Type, event.Pos)
    })
}

These patterns enable validation of various structured data types:

  1. Tree Serialization: Validation of complex hierarchical structures
  2. Schema-Based: Type-safe validation of JSON/XML-like data
  3. Flat Structure: Validation of tabular data like CSV/TSV
  4. Real-time: Validation of event streams and state transitions

License

Apache 2.0 License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrNotMatch means parser doesn't match structure
	// Repeat, Or ignore this error
	ErrNotMatch = fmt.Errorf("not match")

	// ErrRepeatCount means repeat count miss matching.
	// Repeat, Or don't ignore this error
	ErrRepeatCount = fmt.Errorf("repeat count")

	// ErrCritical means critical error
	//
	// It is not just structure error that doesn't have extra options.
	// Repeat, Or don't ignore this error
	ErrCritical = fmt.Errorf("critical error")

	// ErrStackOverflow means the parser recursion depth exceeded the maximum limit
	// This prevents infinite loops in recursive parsers
	ErrStackOverflow = fmt.Errorf("stack overflow")
)

Functions

func DetectLeftRecursion added in v1.0.5

func DetectLeftRecursion[T any](traces []*TraceInfo) []string

DetectLeftRecursion analyzes traces to identify potential left recursion patterns

func Evaluate

func Evaluate[T any](pctx *ParseContext[T], src []Token[T], parser Parser[T]) (result []T, err error)

func EvaluateWithRawTokens

func EvaluateWithRawTokens[T any](pc *ParseContext[T], src []string, parser Parser[T]) (result []T, err error)

func Find added in v1.0.6

func Find[T any](ctx *ParseContext[T], parser Parser[T], tokens []Token[T]) (skipped, match []Token[T], consume int, remained []Token[T], found bool)

Find: Returns the first match of the parser in the input token slice, along with tokens before and after the match. Returns: (skipped, match, remained, found)

func FindIter added in v1.0.6

func FindIter[T any](ctx *ParseContext[T], sep Parser[T], tokens []Token[T]) iter.Seq2[int, Consume[T]]

FindIter: Calls yield for each non-overlapping match of the parser in the input token slice. Usage: FindIter(ctx, parser, tokens, func(match []Token[T]) bool { ... return true to continue, false to stop })

func NewErrCritical

func NewErrCritical(message string, pos *Pos) error

func NewErrNotMatch

func NewErrNotMatch(expected, actual string, pos *Pos) error

func NewErrRepeatCount

func NewErrRepeatCount(label string, expected, actual int, pos *Pos) error

func NewErrStackOverflow added in v1.0.2

func NewErrStackOverflow(currentDepth, maxDepth int, pos *Pos) error

Types

type Alias

type Alias[T any] struct {
	// contains filtered or unexported fields
}

type Consume added in v1.0.7

type Consume[T any] struct {
	Skipped []Token[T]
	Match   []Token[T]
	Consume int
	Last    bool
}

func Split added in v1.0.6

func Split[T any](ctx *ParseContext[T], sep Parser[T], tokens []Token[T]) []Consume[T]

Split: Splits the input tokens by the separator parser, returning slices of tokens between separators.

func SplitN added in v1.0.6

func SplitN[T any](ctx *ParseContext[T], sep Parser[T], tokens []Token[T], n int) []Consume[T]

SplitN: Splits the input tokens by the separator parser, up to N pieces (N <= 0 means unlimited).

type OrMode added in v1.0.3

type OrMode int

OrMode defines the behavior of Or parser

const (
	// OrModeSafe (default) - Uses longest match logic for consistent and safe behavior
	OrModeSafe OrMode = iota
	// OrModeFast - Uses first match logic for better performance
	OrModeFast
	// OrModeTryFast - Uses first match but warns if longest match would choose differently
	OrModeTryFast
)

func (OrMode) String added in v1.0.3

func (om OrMode) String() string

type ParseContext

type ParseContext[T any] struct {
	Tokens               []Token[T]
	Pos                  int
	RemainedTokens       []Token[T]
	Results              []Token[T]
	Traces               []*TraceInfo
	Errors               []*ParseError
	Depth                int
	TraceEnable          bool
	MaxDepth             int    // Maximum allowed recursion depth (0 means no limit)
	OrMode               OrMode // Or parser behavior mode (default: OrModeSafe)
	CheckTransformSafety bool   // Enable transformation safety checks (default: false)
}

func NewParseContext

func NewParseContext[T any]() *ParseContext[T]

func (*ParseContext[T]) AppendError

func (pc *ParseContext[T]) AppendError(err error, pos *Pos) error

func (*ParseContext[T]) CheckDepthAndIncrement added in v1.0.2

func (pc *ParseContext[T]) CheckDepthAndIncrement(pos *Pos) error

func (*ParseContext[T]) DecrementDepth added in v1.0.2

func (pc *ParseContext[T]) DecrementDepth()

func (*ParseContext[T]) DumpTrace

func (pc *ParseContext[T]) DumpTrace()

func (*ParseContext[T]) DumpTraceAsText

func (pc *ParseContext[T]) DumpTraceAsText() string

func (*ParseContext[T]) DumpTraceTo

func (pc *ParseContext[T]) DumpTraceTo(w io.Writer)

func (ParseContext[T]) GetError

func (pc ParseContext[T]) GetError() error

type ParseError

type ParseError struct {
	Parent error
	Pos    *Pos
}

func (ParseError) Error

func (e ParseError) Error() string

func (ParseError) Unwrap

func (e ParseError) Unwrap() error

type Parser

type Parser[T any] func(*ParseContext[T], []Token[T]) (consumed int, newTokens []Token[T], err error)

func Before

func Before[T any](callback func(token Token[T]) bool) Parser[T]

func Drop added in v1.0.10

func Drop[T any](parser Parser[T]) Parser[T]

Drop: Returns a parser that matches the given parser but discards its result (returns no tokens).

func EOS added in v1.0.12

func EOS[T any]() Parser[T]

EOS: Parser that matches only if there are no remaining input tokens.

func Expected

func Expected[T any](message string) Parser[T]

Expected creates a parser that fails with a specific expected message Useful for creating custom error messages or placeholders

func Fail

func Fail[T any](message string) Parser[T]

Fail always fails with the given message Useful for debugging or creating conditional failures

func FastOr added in v1.0.3

func FastOr[T any](parsers ...Parser[T]) Parser[T]

FastOr creates an Or parser that uses first match (performance optimized)

func FollowedBy

func FollowedBy[T any](parser Parser[T]) Parser[T]

FollowedBy is an alias for Lookahead for better readability

func Label

func Label[T any](label string, parser Parser[T]) Parser[T]

Label provides a user-friendly label for error messages When the parser fails, it replaces technical error details with the provided label Unlike Trace, this is purely for error message improvement, not debugging

func Lazy added in v1.0.5

func Lazy[T any](parserFactory func() Parser[T]) Parser[T]

Lazy creates a lazy parser that evaluates the parser function when called This prevents infinite loops during parser construction by deferring parser resolution until parsing time, allowing for true recursive definitions

func Lookahead

func Lookahead[T any](parser Parser[T]) Parser[T]

Lookahead checks if the parser matches without consuming tokens Returns empty tokens if match, error if not match

func NewAlias

func NewAlias[T any](name string) (instance func(Parser[T]) Parser[T], alias Parser[T])

func None

func None[T any](label ...string) Parser[T]

func NotFollowedBy

func NotFollowedBy[T any](parser Parser[T]) Parser[T]

NotFollowedBy succeeds if the parser does NOT match (negative lookahead) Returns empty tokens if parser fails, error if parser succeeds

func OneOrMore

func OneOrMore[T any](label string, parser Parser[T]) Parser[T]

func Optional

func Optional[T any](parser Parser[T]) Parser[T]

func Or

func Or[T any](parsers ...Parser[T]) Parser[T]

func OrWithMode added in v1.0.3

func OrWithMode[T any](mode OrMode, parsers ...Parser[T]) Parser[T]

OrWithMode creates an Or parser with specific mode for this instance

func Peek

func Peek[T any](parser Parser[T]) Parser[T]

Peek returns the result of the parser without consuming tokens Useful for inspection or conditional parsing

func Recover

func Recover[T any](search, body, skipUntil Parser[T]) Parser[T]

func Repeat

func Repeat[T any](label string, min uint, max int, parser Parser[T]) Parser[T]

func SafeOr added in v1.0.3

func SafeOr[T any](parsers ...Parser[T]) Parser[T]

SafeOr creates an Or parser that uses longest match (safe, default behavior)

func Seq

func Seq[T any](parsers ...Parser[T]) Parser[T]

func SeqWithLabel

func SeqWithLabel[T any](label string, parsers ...Parser[T]) Parser[T]

func Trace

func Trace[T any](name string, p Parser[T]) Parser[T]

func Trans

func Trans[T any](parser Parser[T], tf Transformer[T]) Parser[T]

func TryFastOr added in v1.0.3

func TryFastOr[T any](parsers ...Parser[T]) Parser[T]

TryFastOr creates an Or parser that uses first match but warns when longest match differs

func ZeroOrMore

func ZeroOrMore[T any](label string, parser Parser[T]) Parser[T]

type Pos

type Pos struct {
	Line   int
	Col    int
	Index  int
	Length int
}

func (Pos) Copy

func (p Pos) Copy() *Pos

func (*Pos) String

func (p *Pos) String() string

type Token

type Token[T any] struct {
	Type string
	Pos  *Pos
	Raw  string
	Val  T
}

func (Token[T]) GoString

func (t Token[T]) GoString() string

type Tokens

type Tokens[T any] []*Token[T]

type TraceInfo

type TraceInfo struct {
	TraceType TraceType
	Depth     int
	Name      string
	Pos       *Pos
	Result    string
}

type TraceType

type TraceType int
const (
	Enter TraceType = iota
	Match
	EnterMatch
	NotMatch
	EnterNotMatch
)

func (TraceType) String

func (tp TraceType) String() string

type Transformer

type Transformer[T any] func(pctx *ParseContext[T], src []Token[T]) (converted []Token[T], err error)

Directories

Path Synopsis
basic command
ZeroOrMore Example - コンマ区切りの数値リストをパースする例
ZeroOrMore Example - コンマ区切りの数値リストをパースする例
compiler command
Simple Math Expression to JSON AST Compiler - シンプルな数式→JSON AST コンパイラ
Simple Math Expression to JSON AST Compiler - シンプルな数式→JSON AST コンパイラ
interpreter command
Simple Calculator Interpreter - 数式を直接評価する電卓
Simple Calculator Interpreter - 数式を直接評価する電卓

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL