tpl

package
v1.4.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 17, 2025 License: Apache-2.0 Imports: 13 Imported by: 0

README

TPL: Text Processing Language

Text processing is a common task in programming, and regular expressions have long been the go-to solution. However, regular expressions are notorious for their cryptic syntax and poor readability. Enter Go+ TPL (Text Processing Language), an enhanced alternative that offers both power and intuitive syntax.

Go+ TPL is a grammar-based language similar to EBNF (Extended Backus-Naur Form) that seamlessly integrates with Go+. It provides a more readable and maintainable approach to text processing while offering capabilities beyond what regular expressions can achieve.

Understanding Go+ TPL

To understand Go+ TPL, you need to grasp three key concepts:

1. Naming Rules

The foundation of TPL is its naming rules, expressed as name = rule. A TPL grammar consists of a series of named rules, with the first one being the root rule. The rule can be a combination of:

  • Basic Tokens: Fundamental syntax units like INT, FLOAT, CHAR, STRING, IDENT, "+", "++", "+=", "<<=", etc.
  • Keywords: An IDENT enclosed in quotes, such as "if", "else", "for".
  • References: References to other named rules, including self-references.
  • Sequence: R1 R2 ... Rn - matches a sequence of rules.
  • Alternatives: R1 | R2 | ... | Rn - matches any one of the rules.
  • Repetition Operators:
    • *R - matches the rule zero or more times
    • +R - matches the rule one or more times
    • ?R - matches the rule zero or one time (optional)
  • List Operator: R1 % R2 - shorthand for R1 *(R2 R1), representing a sequence of R1 separated by R2. For example, INT % "," represents a comma-separated list of integers.
  • Adjacency Operator: R1 ++ R2 - indicates that R1 and R2 must be adjacent with no whitespace or comments between them.

The default operator precedence is: unary operators (*R, +R, ?R) > ++ > % > sequence (space) > |. Parentheses can be used to change the precedence.

String Literals in Detail

STRING (string literals) can take two forms:

"Hello\nWorld\n"  // QSTRING (quoted string)

`Hello
World
`               // RAWSTRING (raw string)

STRING can be defined as:

STRING = QSTRING | RAWSTRING
The Adjacency Operator Explained

Since TPL rules automatically filter whitespace and comments, the sequence R1 R2 doesn't express that R1 and R2 are adjacent. This is where the adjacency operator ++ comes in.

For example, Go+ domain text literal is defined as IDENT ++ RAWSTRING, making these valid:

tpl`expr = INT % ","`
json`{"name": "Ken", age: 15}`

While these would match IDENT STRING but are not valid domain text literals:

tpl"expr = *INT"              // IDENT must be followed by RAWSTRING, not QSTRING
tpl/* comment */`expr = *INT` // No whitespace or comments allowed between IDENT and RAWSTRING
2. Matching Results

Each rule has its built-in matching result:

  • Tokens and Keywords: Result is *tpl.Token.

  • Sequence (R1 R2 ... Rn): Result is a list ([]any) with n elements.

  • Repetition (*R, +R): Result is a list ([]any) with elements depending on how many times R matches.

  • Alternatives (R1 | R2 | ... | Rn): Result depends on which rule matches.

  • Optional (?R): Result is either the result of R or nil if no match.

  • List Operator (R1 % R2): Result is a complex tree-like structure with three levels.

    Let's explain why it has three levels:

    1. The first level is the result of the entire expression R1 % R2 (i.e.R1 *(R2 R1)), which is a list with two elements.
    2. The first element of this list is the result of the first R1.
    3. The second element is a list containing the results of all subsequent (R2 R1) matches.
      • Each element in this second-level list is itself a list with two elements: the result of R2 and the result of R1.

    For example, when parsing "1, 2, 3" with INT % ",", the result structure would be:

    [
      <INT:1>,                // First R1
      [
        [<COMMA>, <INT:2>],   // First (R2 R1)
        [<COMMA>, <INT:3>]    // Second (R2 R1)
      ]
    ]
    

    This tree-like structure preserves all the information about the matched elements and their relationships, but can be complex to work with directly. That's why TPL provides helper functions like ListOp and BinaryOp to transform this structure into more usable forms.

  • Adjacency Operator (R1 ++ R2): Result is a list ([]any) with 2 elements, similar to a R1 R2 sequence.

3. Rewriting Matching Results

The default matching result is called "self" in TPL. You can rewrite this result using a Go+ closure => { ... }.

This feature is crucial as it allows seamless integration between TPL and Go+. In Go+, you reference TPL through domain text literal, and within TPL, you can call Go+ code through result rewriting.

Practical Examples

Basic Example: Parsing Integers
import "gop/tpl"

cl := tpl`
expr = INT % "," => {
    return tpl.ListOp[int](self, v => {
        return v.(*tpl.Token).Lit.int!
    })
}
`!

echo cl.parseExpr("1, 2, 3", nil)!  // Outputs: [1 2 3]

This example parses a comma-separated list of integers and converts it to a flat list of integers using TPL's ListOp function.

Building a Calculator

Creating a calculator with Go+ TPL is remarkably concise:

import "gop/tpl"

cl := tpl`
expr = operand % ("*" | "/") % ("+" | "-") => {
    return tpl.BinaryOp(true, self, (op, x, y) => {
        switch op.Tok {
        case '+': return x.(float64) + y.(float64)
        case '-': return x.(float64) - y.(float64)
        case '*': return x.(float64) * y.(float64)
        case '/': return x.(float64) / y.(float64)
        }
        panic("unexpected")
    })
}

operand = basicLit | unaryExpr

unaryExpr = "-" operand => {
    return -(self[1].(float64))
}

basicLit = INT | FLOAT => {
    return self.(*tpl.Token).Lit.float!
}
`!

echo cl.parseExpr("1 + 2 * -3", nil)!  // Outputs: -5

This calculator handles basic arithmetic operations with proper operator precedence in less than 30 lines of code.

Conclusion

Go+ TPL offers a powerful yet intuitive alternative to regular expressions for text processing. By combining grammar-based parsing with seamless Go+ integration, it enables developers to create clear, maintainable text processing solutions.

For more examples of TPL in action, check out the Go+ demos starting with tpl- at https://github.com/goplus/gop/tree/main/demo. These examples showcase how to implement calculators, parse text to generate ASTs, and even implement entire languages in just a few hundred lines of code.

Whether you're parsing structured text, building domain-specific languages, or implementing complex text transformations, Go+ TPL provides a robust and readable approach that surpasses traditional regular expressions.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BasicLit added in v1.3.7

func BasicLit(this any) *ast.BasicLit

BasicLit converts the matching result of a basic literal to an ast.BasicLit expression.

func BinaryExpr added in v1.3.7

func BinaryExpr(recursive bool, in []any) ast.Expr

BinaryExpr converts the matching result of (X % op) to a binary expression. X % op means X *(op X)

func BinaryExprNR added in v1.3.7

func BinaryExprNR(in []any) ast.Expr

func BinaryExprR added in v1.3.7

func BinaryExprR(in []any) ast.Expr

func BinaryOp added in v1.3.7

func BinaryOp(recursive bool, in []any, fn func(op *Token, x, y any) any) any

func BinaryOpNR added in v1.3.7

func BinaryOpNR(in []any, fn func(op *Token, x, y any) any) any

func BinaryOpR added in v1.3.7

func BinaryOpR(in []any, fn func(op *Token, x, y any) any) any

func Dump

func Dump(result any, omitSemi ...bool)

func Fdump

func Fdump(w io.Writer, ret any, prefix, indent string, omitSemi bool)

func Ident added in v1.3.7

func Ident(this any) *ast.Ident

Ident converts the matching result of an identifier to an ast.Ident expression.

func List added in v1.3.7

func List(in []any) []any

List converts the matching result of (R % ",") to a flat list. R % "," means R *("," R)

func ListOp added in v1.3.9

func ListOp[T any](in []any, fn func(v any) T) []T

ListOp converts the matching result of (R % ",") to a flat list. R % "," means R *("," R)

func Panic added in v1.3.7

func Panic(pos token.Pos, msg string)

Panic panics with a matcher error.

func RangeOp added in v1.3.7

func RangeOp(in []any, fn func(v any))

RangeOp travels the matching result of (R % ",") and call fn(result of R). R % "," means R *("," R)

func Relocate added in v1.3.8

func Relocate(err error, filename string, line, col int) error

Relocate relocates the error positions.

func ShowConflict added in v1.3.8

func ShowConflict(f bool) int

ShowConflict sets the flag to show or hide conflicts.

func UnaryExpr added in v1.3.7

func UnaryExpr(in []any) ast.Expr

UnaryExpr converts the matching result of (op X) to a unary expression.

Types

type Compiler

type Compiler struct {
	cl.Result
}

Compiler represents a TPL compiler.

func FromFile

func FromFile(fset *token.FileSet, filename string, src any, conf *cl.Config) (ret Compiler, err error)

FromFile creates a new TPL compiler from a file. fset can be nil.

func New

func New(src any, params ...any) (ret Compiler, err error)

New creates a new TPL compiler. params: ruleName1, retProc1, ..., ruleNameN, retProcN

func NewEx added in v1.3.8

func NewEx(src any, filename string, line, col int, params ...any) (ret Compiler, err error)

NewEx creates a new TPL compiler. params: ruleName1, retProc1, ..., ruleNameN, retProcN

func (*Compiler) Match

func (p *Compiler) Match(filename string, src any, conf *Config) (ms MatchState, result any, err error)

Match matches a source file.

func (*Compiler) Parse

func (p *Compiler) Parse(filename string, src any, conf *Config) (result any, err error)

Parse parses a source file.

func (*Compiler) ParseExpr

func (p *Compiler) ParseExpr(x string, conf *Config) (result any, err error)

ParseExpr parses an expression.

func (*Compiler) ParseExprFrom

func (p *Compiler) ParseExprFrom(filename string, src any, conf *Config) (result any, err error)

ParseExprFrom parses an expression from a file.

type Config

type Config struct {
	Scanner          Scanner
	ScanErrorHandler scanner.ErrorHandler
	ScanMode         scanner.Mode
	Fset             *token.FileSet
}

Config represents a parsing configuration of Compiler.Parse.

type Error added in v1.3.8

type Error = matcher.Error

Error represents a matching error.

type MatchState added in v1.3.7

type MatchState struct {
	Toks []*Token
	Ctx  *matcher.Context
	N    int
}

MatchState represents a matching state.

func (*MatchState) Next added in v1.3.7

func (p *MatchState) Next() *Token

Next returns the next token.

type Scanner

type Scanner interface {
	Scan() Token
	Init(file *token.File, src []byte, err scanner.ErrorHandler, mode scanner.Mode)
}

Scanner represents a TPL scanner.

type Token

type Token = types.Token

A Token is a lexical unit returned by Scan.

Directories

Path Synopsis
encoding
csv
xml

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL