parsley

module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2018 License: MPL-2.0

README

Parsley - Parser combinator library written in Go

CircleCI status Codecov.io status GoDoc Latest release

Parsley is a general parser combinator library which can be used to parse context-free, left-recursive languages. It handles indirect as well as direct left-recursion in polynomial time and defines a memoization helper for speeding up parsing time. The language grammar can be easily translated to a set of rules using parsers and combinators.

You can read a general introduction about parser combinators here: https://en.wikipedia.org/wiki/Parser_combinator

For more information about handling left-recursion please check out Parser Combinators for Ambiguous Left-Recursive Grammars (2007) by Frost R.A., Hafiz R., and Callaghan P.

Currently the library supports only text processing, but the interfaces are written with binary parsing in mind.

How to use this library?

Basics
Abstract Syntax Tree

An AST is built during parsing for evaluating the parsed data. An AST node has the following interface:

type Node interface {
	Token() string
	Value(ctx interface{}) (interface{}, error)
	Pos() reader.Position
}

There are two types of nodes:

  • terminal node: a leaf node - contains the smallest valid token, always has a constant value, e.g. a number, a string, etc.
  • non-terminal node: a branch node - contains other nodes and an interpreter which defines how to evaluate its children together (example: the non-terminal node has the token "+", the terminal children are: [1, 2] and the interpreter would add the numbers and return 3)

For an arithmetic expression like 1 + 2 * 3 we would build the following AST:

   +
 /   \
1     *
    /   \
   2     3

The ctx evaluation context can be anything you would need for evaluating a tree. (e.g. looking up named variables in a variable store)

Interpreters

An interpreter gets the child nodes of a non-terminal node and returns with a single result. It has the following interface:

type Interpreter interface {
	Eval(ctx interface{}, nodes []Node) (interface{}, error)
}

As you get the nodes as input you can implement lazy evaluation.

A simple example is when the interpreter function defines how to add numbers together.

Node builders

A node builder takes multiple nodes and returns with a new result node. It has the following simple interface:

type NodeBuilder interface {
	BuildNode([]Node) Node
}

An example is if the parser matches [1, +, 2] in order, the node builder would build the following non-terminal node with two terminal children:

   +
 /   \
1     2
Parsers

A parser has a simple interface:

type Parser interface {
	Parse(h *parser.History, leftRecCtx data.IntMap, r reader.Reader) (data.IntSet, ResultSet, Error)
}

A parser processes the next token(s) from the given reader and returns them as a result set. It also handles direct left recursion counters through leftRectCtx and accumulates the curtailing parsers in an int set. (You usually don't have to deal with these).

One result is a tuple of an AST node and a reader clone with the new position.

Combinators

Combinators are special parsers as they are combining other parsers to process more complex token groups. A simple example is the Seq combinator which simply tries to match the given parsers in order. Some combinator also use node builders which tells them how to build an AST node from the parsed token group.

Memoization and handling left-recursion

IMPORTANT: make sure you only call the Memoize generator function once for a specific parser as it generates an internal parser index for every call.

Depending on the language you define a parser can attempt to match at the same reader position multiple times. You can cache the parser results with the provided Memoize wrapper.

Also if your language contains left-recursion you need to use Memoize for any parser that is directly or indirectly part of it as Memoize is responsible for curtailing these calls.

The following code will wrap the integer parser with a memoizer:

h := parser.NewHistory()
cachedParser := combinator.Memoize(terminal.Integer())

The history object will store the result cache and also track left recursion counts and curtailing parsers, so you should only create it once.

A simple example

Let's write a parser which is able to parse the following expression: "INTEGER + INTEGER"

We'll need the following components:

  • a parser which is able to match integer numbers (terminal.Integer)
  • a parser which is able to match the "+" character (terminal.Rune)
  • a combinator which is able to match multiple parsers in order (combinator.Seq)
  • a node builder function which will build the following AST tree: [+] with two children [number 1] and [number 2] (builder.BinaryOperation)
  • an interpreter function which will take two numeric nodes and will add them.
  • whitespaces will be ignored by the reader
add := combinator.Seq(
	builder.BinaryOperation(
		ast.InterpreterFunc(func(ctx interface{}, nodes []ast.Node) (interface{}, reader.Error) {
			value0, _ := nodes[0].Value(ctx)
			value1, _ := nodes[1].Value(ctx)
			return value0.(int) + value1.(int), nil
		}),
	),
	terminal.Integer(),
	terminal.Rune('+', "ADD"),
	terminal.Integer(),
)
s := parsley.NewSentence(add)
value, _, err := s.Evaluate(text.NewReader([]byte("1 + 2"), "", true), nil)
if err != nil {
	panic(err)
}
fmt.Printf("Result: %d\n", value.(int))
// Output: Result: 3

The add variable will contain a parser which is able to parse the given expression and the s variable contains the sentence parser which makes sure we parsed all the input data.

More examples

There is a JSON-like parser implementation in the examples/json directory.

For a more complex expression parser you can check out the Flint interpolation language.

Documentation

Please more information about the available parsers and combinators please check out the Go docs.

Library packages

  • parsley (root): top level helper functions for parsing
  • ast: abstract syntax tree related structs and interfaces
  • ast/builder: AST node builder functions
  • combinator: parser combinator implementations including memoization
  • data: int map and int set implementations
  • examples: examples for how to use this library
  • parser: the main parsing logic
  • parsley: the top-level parser/evaluate methods + root sentence
  • reader: interface definitions for an input reader
  • text: text reader implementation
  • text/terminal: common parsers for text literals (string literal, int, float, etc.)

Versioning

The library is expected to have occasional minor API changes while the version is 0.MAJOR.MINOR.

  • MAJOR version will be increased for backward incompatible changes
  • MINOR version will be increased for improvements and bugfixes

Starting from 1.0.0 the library will use the Semantic Versioning 2.0.0.

You can find the change log in the CHANGELOG.md file.

Testing

For testing you need to have Testify installed in your Go path.

To run all the tests simply run: make test.

LICENSE

This software is distributed under the Mozilla Public License, version 2.0. See the LICENSE file for more details.

Acknowledgements

I would like to say thank you to Frost R.A., Hafiz R., and Callaghan P. as Parsley's main algorithms are based on their paper:

Frost R.A., Hafiz R., Callaghan P. (2007) Parser Combinators for Ambiguous Left-Recursive Grammars. In: Hudak P., Warren D.S. (eds) Practical Aspects of Declarative Languages. PADL 2008. Lecture Notes in Computer Science, vol 4902. Springer, Berlin, Heidelberg


Copyright (c) 2017 Opsidian Ltd.

Directories

Path Synopsis
ast
Package ast defines structs and interfaces for building and evaluating an abstract syntax tree.
Package ast defines structs and interfaces for building and evaluating an abstract syntax tree.
builder
Package builder defines functions for generating basic node builder functions
Package builder defines functions for generating basic node builder functions
mocks
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Package combinator defines generator functions for creating parser combinators
Package combinator defines generator functions for creating parser combinators
Package data defines custom data types for parsing.
Package data defines custom data types for parsing.
examples
json command
This is a JSON parser example.
This is a JSON parser example.
Package parser contains the main structs for parsing
Package parser contains the main structs for parsing
mocks
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Package reader defines interfaces for an input reader and reader position
Package reader defines interfaces for an input reader and reader position
mocks
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Code generated by mockery v1.0.0 Code generated by mockery v1.0.0 Code generated by mockery v1.0.0
Package text defines a text input reader and basic terminal parsers.
Package text defines a text input reader and basic terminal parsers.
terminal
Package terminal contains basic terminal parsers for text parsing
Package terminal contains basic terminal parsers for text parsing

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL