tpl

package
v1.6.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 18, 2026 License: Apache-2.0 Imports: 13 Imported by: 1

README

TPL: Text Processing Language

Text processing is a common task in programming, and regular expressions have long been the go-to solution. However, regular expressions are notorious for their cryptic syntax and poor readability. Enter XGo TPL (Text Processing Language), an enhanced alternative that offers both power and intuitive syntax.

XGo TPL is a grammar-based language similar to EBNF (Extended Backus-Naur Form) that seamlessly integrates with XGo. It provides a more readable and maintainable approach to text processing while offering capabilities beyond what regular expressions can achieve.

Understanding XGo TPL

To understand XGo TPL, you need to grasp three key concepts:

1. Naming Rules

The foundation of TPL is its naming rules, expressed as name = rule. A TPL grammar consists of a series of named rules, with the first one being the root rule. The rule can be a combination of:

  • Basic Tokens: Fundamental syntax units like INT, FLOAT, CHAR, STRING, IDENT, "+", "++", "+=", "<<=", etc.
  • Keywords: An IDENT enclosed in quotes, such as "if", "else", "for".
  • References: References to other named rules, including self-references.
  • Sequence: R1 R2 ... Rn - matches a sequence of rules.
  • Alternatives: R1 | R2 | ... | Rn - matches any one of the rules.
  • Repetition Operators:
    • *R - matches the rule zero or more times
    • +R - matches the rule one or more times
    • ?R - matches the rule zero or one time (optional)
  • List Operator: R1 % R2 - shorthand for R1 *(R2 R1), representing a sequence of R1 separated by R2. For example, INT % "," represents a comma-separated list of integers.
  • Adjacency Operator: R1 ++ R2 - indicates that R1 and R2 must be adjacent with no whitespace or comments between them.

The default operator precedence is: unary operators (*R, +R, ?R) > ++ > % > sequence (space) > |. Parentheses can be used to change the precedence.

String Literals in Detail

STRING (string literals) can take two forms:

"Hello\nWorld\n"  // QSTRING (quoted string)

`Hello
World
`               // RAWSTRING (raw string)

STRING can be defined as:

STRING = QSTRING | RAWSTRING
The Adjacency Operator Explained

Since TPL rules automatically filter whitespace and comments, the sequence R1 R2 doesn't express that R1 and R2 are adjacent. This is where the adjacency operator ++ comes in.

For example, XGo domain text literal is defined as IDENT ++ RAWSTRING, making these valid:

tpl`expr = INT % ","`
json`{"name": "Ken", age: 15}`

While these would match IDENT STRING but are not valid domain text literals:

tpl"expr = *INT"              // IDENT must be followed by RAWSTRING, not QSTRING
tpl/* comment */`expr = *INT` // No whitespace or comments allowed between IDENT and RAWSTRING
2. Matching Results

Each rule has its built-in matching result:

  • Tokens and Keywords: Result is *tpl.Token.

  • Sequence (R1 R2 ... Rn): Result is a list ([]any) with n elements.

  • Repetition (*R, +R): Result is a list ([]any) with elements depending on how many times R matches.

  • Alternatives (R1 | R2 | ... | Rn): Result depends on which rule matches.

  • Optional (?R): Result is either the result of R or nil if no match.

  • List Operator (R1 % R2): Result is a complex tree-like structure with three levels.

    Let's explain why it has three levels:

    1. The first level is the result of the entire expression R1 % R2 (i.e.R1 *(R2 R1)), which is a list with two elements.
    2. The first element of this list is the result of the first R1.
    3. The second element is a list containing the results of all subsequent (R2 R1) matches.
      • Each element in this second-level list is itself a list with two elements: the result of R2 and the result of R1.

    For example, when parsing "1, 2, 3" with INT % ",", the result structure would be:

    [
      <INT:1>,                // First R1
      [
        [<COMMA>, <INT:2>],   // First (R2 R1)
        [<COMMA>, <INT:3>]    // Second (R2 R1)
      ]
    ]
    

    This tree-like structure preserves all the information about the matched elements and their relationships, but can be complex to work with directly. That's why TPL provides helper functions like ListOp and BinaryOp to transform this structure into more usable forms.

  • Adjacency Operator (R1 ++ R2): Result is a list ([]any) with 2 elements, similar to a R1 R2 sequence.

3. Rewriting Matching Results

The default matching result is called "self" in TPL. You can rewrite this result using an XGo closure => { ... }.

This feature is crucial as it allows seamless integration between TPL and XGo. In XGo, you reference TPL through domain text literal, and within TPL, you can call XGo code through result rewriting.

Practical Examples

Basic Example: Parsing Integers
import "xgo/tpl"

cl := tpl`
expr = INT % "," => {
    return tpl.ListOp[int](self, v => {
        return v.(*tpl.Token).Lit.int!
    })
}
`!

echo cl.parseExpr("1, 2, 3", nil)!  // Outputs: [1 2 3]

This example parses a comma-separated list of integers and converts it to a flat list of integers using TPL's ListOp function.

Building a Calculator

Creating a calculator with XGo TPL is remarkably concise:

import "xgo/tpl"

cl := tpl`
expr = operand % ("*" | "/") % ("+" | "-") => {
    return tpl.BinaryOp(true, self, (op, x, y) => {
        switch op.Tok {
        case '+': return x.(float64) + y.(float64)
        case '-': return x.(float64) - y.(float64)
        case '*': return x.(float64) * y.(float64)
        case '/': return x.(float64) / y.(float64)
        }
        panic("unexpected")
    })
}

operand = basicLit | unaryExpr

unaryExpr = "-" operand => {
    return -(self[1].(float64))
}

basicLit = INT | FLOAT => {
    return self.(*tpl.Token).Lit.float!
}
`!

echo cl.parseExpr("1 + 2 * -3", nil)!  // Outputs: -5

This calculator handles basic arithmetic operations with proper operator precedence in less than 30 lines of code.

Conclusion

XGo TPL offers a powerful yet intuitive alternative to regular expressions for text processing. By combining grammar-based parsing with seamless XGo integration, it enables developers to create clear, maintainable text processing solutions.

For more examples of TPL in action, check out the XGo demos starting with tpl- at https://github.com/goplus/xgo/tree/main/demo. These examples showcase how to implement calculators, parse text to generate ASTs, and even implement entire languages in just a few hundred lines of code.

Whether you're parsing structured text, building domain-specific languages, or implementing complex text transformations, XGo TPL provides a robust and readable approach that surpasses traditional regular expressions.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BasicLit

func BasicLit(this any) *ast.BasicLit

BasicLit converts the matching result of a basic literal to an ast.BasicLit expression.

func BinaryExpr

func BinaryExpr(recursive bool, in []any) ast.Expr

BinaryExpr converts the matching result of (X % op) to a binary expression. X % op means X *(op X)

func BinaryExprNR

func BinaryExprNR(in []any) ast.Expr

func BinaryExprR

func BinaryExprR(in []any) ast.Expr

func BinaryOp

func BinaryOp(recursive bool, in []any, fn func(op *Token, x, y any) any) any

func BinaryOpNR

func BinaryOpNR(in []any, fn func(op *Token, x, y any) any) any

func BinaryOpR

func BinaryOpR(in []any, fn func(op *Token, x, y any) any) any

func Dump

func Dump(result any, omitSemi ...bool)

func Fdump

func Fdump(w io.Writer, ret any, prefix, indent string, omitSemi bool)

func Ident

func Ident(this any) *ast.Ident

Ident converts the matching result of an identifier to an ast.Ident expression.

func List

func List(in []any) []any

List converts the matching result of (R % ",") to a flat list. R % "," means R *("," R)

func ListOp

func ListOp[T any](in []any, fn func(v any) T) []T

ListOp converts the matching result of (R % ",") to a flat list. R % "," means R *("," R)

func Panic

func Panic(pos token.Pos, msg string)

Panic panics with a matcher error.

func RangeOp

func RangeOp(in []any, fn func(v any))

RangeOp travels the matching result of (R % ",") and call fn(result of R). R % "," means R *("," R)

func Relocate

func Relocate(err error, filename string, line, col int) error

Relocate relocates the error positions.

func ShowConflict

func ShowConflict(f bool) int

ShowConflict sets the flag to show or hide conflicts.

func UnaryExpr

func UnaryExpr(in []any) ast.Expr

UnaryExpr converts the matching result of (op X) to a unary expression.

Types

type Compiler

type Compiler struct {
	cl.Result
}

Compiler represents a TPL compiler.

func FromFile

func FromFile(fset *token.FileSet, filename string, src any, conf *cl.Config) (ret Compiler, err error)

FromFile creates a new TPL compiler from a file. fset can be nil.

func New

func New(src any, params ...any) (ret Compiler, err error)

New creates a new TPL compiler. params: ruleName1, retProc1, ..., ruleNameN, retProcN

func NewEx

func NewEx(src any, filename string, line, col int, params ...any) (ret Compiler, err error)

NewEx creates a new TPL compiler. params: ruleName1, retProc1, ..., ruleNameN, retProcN

func (*Compiler) Match

func (p *Compiler) Match(filename string, src any, conf *Config) (ms MatchState, result any, err error)

Match matches a source file.

func (*Compiler) Parse

func (p *Compiler) Parse(filename string, src any, conf *Config) (result any, err error)

Parse parses a source file.

func (*Compiler) ParseExpr

func (p *Compiler) ParseExpr(x string, conf *Config) (result any, err error)

ParseExpr parses an expression.

func (*Compiler) ParseExprFrom

func (p *Compiler) ParseExprFrom(filename string, src any, conf *Config) (result any, err error)

ParseExprFrom parses an expression from a file.

type Config

type Config struct {
	Scanner          Scanner
	ScanErrorHandler scanner.ErrorHandler
	ScanMode         scanner.Mode
	Fset             *token.FileSet
}

Config represents a parsing configuration of Compiler.Parse.

type Error

type Error = matcher.Error

Error represents a matching error.

type MatchState

type MatchState struct {
	Toks []*Token
	Ctx  *matcher.Context
	N    int
}

MatchState represents a matching state.

func (*MatchState) Next

func (p *MatchState) Next() *Token

Next returns the next token.

type Scanner

type Scanner interface {
	Scan() Token
	Init(file *token.File, src []byte, err scanner.ErrorHandler, mode scanner.Mode)
}

Scanner represents a TPL scanner.

type Token

type Token = types.Token

A Token is a lexical unit returned by Scan.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL