grammargen

package

v0.9.2 Latest Latest Go to latest Published: Mar 17, 2026 License: MIT Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/odvcencio/gotreesitter

Links

Open Source Insights

Documentation ¶

Overview ¶

Package grammargen implements a pure-Go grammar generator for gotreesitter. It compiles grammar definitions expressed in a Go DSL into binary blobs that the gotreesitter runtime can load and use for parsing.

Index ¶

func AddConflict(g *Grammar, symbols ...string)
func AppendChoice(g *Grammar, ruleName string, newAlts ...*Rule)
func EmitC(name string, lang *gotreesitter.Language) (string, error)
func EmitGrammarGo(g *Grammar, pkgName, funcName string) ([]byte, error)
func ExportGrammarJSON(g *Grammar) ([]byte, error)
func Generate(g *Grammar) ([]byte, error)
func GenerateC(g *Grammar) (string, error)
func GenerateHighlightQueries(base, extended *Grammar) string
func GenerateHighlightQuery(g *Grammar) string
func GenerateLanguage(g *Grammar) (*gotreesitter.Language, error)
func GenerateLanguageWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, error)
func LoadLanguageBlob(data []byte) (*gotreesitter.Language, error)
func RunTests(g *Grammar) error
func Validate(g *Grammar) []string
type AliasInfo
type Assoc
type ConflictDiag
- func (d *ConflictDiag) String(ng *NormalizedGrammar) string
type ConflictKind
type FieldAssign
type GenerateReport
- func GenerateWithReport(g *Grammar) (*GenerateReport, error)
type Grammar
- func AliasSuperGrammar() *Grammar
- func CalcGrammar() *Grammar
- func ExtScannerGrammar() *Grammar
- func ExtendGrammar(name string, base *Grammar, customize func(g *Grammar)) *Grammar
- func GLRGrammar() *Grammar
- func GoGrammar() *Grammar
- func INIGrammar() *Grammar
- func ImportGrammarJS(source []byte) (*Grammar, error)
- func ImportGrammarJSON(data []byte) (*Grammar, error)
- func JSONGrammar() *Grammar
- func KeywordGrammar() *Grammar
- func LoxGrammar() *Grammar
- func MustacheGrammar() *Grammar
- func NewGrammar(name string) *Grammar
- func ParseGrammarFile(source string) (*Grammar, error)
- func (g *Grammar) Define(name string, rule *Rule)
- func (g *Grammar) SetConflicts(conflicts ...[]string)
- func (g *Grammar) SetExternals(rules ...*Rule)
- func (g *Grammar) SetExtras(rules ...*Rule)
- func (g *Grammar) SetInline(names ...string)
- func (g *Grammar) SetSupertypes(names ...string)
- func (g *Grammar) SetWord(name string)
- func (g *Grammar) Test(name, input, expected string)
- func (g *Grammar) TestError(name, input string)
type GrammarDiff
- func DiffGrammars(old, new *Grammar) *GrammarDiff
- func (d *GrammarDiff) HasChanges() bool
- func (d *GrammarDiff) String() string
type LRTables
type NormalizedGrammar
- func Normalize(g *Grammar) (*NormalizedGrammar, error)
- func (ng *NormalizedGrammar) TokenCount() int
type Production
type Rule
- func Alias(rule *Rule, name string, named bool) *Rule
- func Blank() *Rule
- func Braces(rule *Rule) *Rule
- func Brackets(rule *Rule) *Rule
- func Choice(rules ...*Rule) *Rule
- func CommaSep(rule *Rule) *Rule
- func CommaSep1(rule *Rule) *Rule
- func Field(name string, rule *Rule) *Rule
- func ImmToken(rule *Rule) *Rule
- func Optional(rule *Rule) *Rule
- func Parens(rule *Rule) *Rule
- func Pat(pattern string) *Rule
- func Prec(n int, rule *Rule) *Rule
- func PrecDynamic(n int, rule *Rule) *Rule
- func PrecLeft(n int, rule *Rule) *Rule
- func PrecRight(n int, rule *Rule) *Rule
- func Repeat(rule *Rule) *Rule
- func Repeat1(rule *Rule) *Rule
- func SepBy(sep, rule *Rule) *Rule
- func SepBy1(sep, rule *Rule) *Rule
- func Seq(rules ...*Rule) *Rule
- func Str(s string) *Rule
- func Surround(open, rule, close *Rule) *Rule
- func Sym(name string) *Rule
- func Token(rule *Rule) *Rule
type RuleKind
type SymbolInfo
type SymbolKind
type TerminalPattern
type TestCase

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func AddConflict ¶

func AddConflict(g *Grammar, symbols ...string)

AddConflict adds a GLR conflict group to the grammar.

func AppendChoice ¶

func AppendChoice(g *Grammar, ruleName string, newAlts ...*Rule)

AppendChoice appends new alternatives to an existing Choice rule. If the named rule is already a Choice, the new alternatives are appended to its children. Otherwise the existing rule and the new alternatives are wrapped in a new Choice.

func EmitC ¶

func EmitC(name string, lang *gotreesitter.Language) (string, error)

EmitC emits a parser.c string from a compiled Language struct.

func EmitGrammarGo ¶

func EmitGrammarGo(g *Grammar, pkgName, funcName string) ([]byte, error)

EmitGrammarGo takes a Grammar IR and emits Go source code that reconstructs it using grammargen DSL calls. The output is a standalone Go file in the given package with a function of the given name that returns *Grammar.

func ExportGrammarJSON ¶

func ExportGrammarJSON(g *Grammar) ([]byte, error)

ExportGrammarJSON serializes a Grammar struct to the tree-sitter grammar.json format. The output is compatible with ImportGrammarJSON — a round-trip ImportGrammarJSON(ExportGrammarJSON(g)) should produce an equivalent grammar.

The JSON structure matches tree-sitter's canonical resolved grammar.json:

{
  "name": "...",
  "word": "...",
  "rules": { ... },
  "extras": [...],
  "conflicts": [...],
  "externals": [...],
  "inline": [...],
  "supertypes": [...]
}

func Generate ¶

func Generate(g *Grammar) ([]byte, error)

Generate compiles a Grammar definition into a binary blob that gotreesitter can load via DecodeLanguageBlob / loadEmbeddedLanguage. LR(1) state splitting is always attempted; a rollback guard reverts to the plain LALR table if splitting does not reduce GLR conflicts.

func GenerateC ¶

func GenerateC(g *Grammar) (string, error)

GenerateC compiles a Grammar to a standard tree-sitter parser.c string. The output is compatible with tree-sitter's C runtime ABI 14.

func GenerateHighlightQueries ¶

func GenerateHighlightQueries(base, extended *Grammar) string

GenerateHighlightQueries produces tree-sitter highlight queries for rules added by a grammar extension. It diffs base and extended to find new rules, then applies naming conventions to generate appropriate highlights.

Conventions:

New Str() tokens matching identifier pattern -> @keyword
*_declaration with "name" field -> name: (identifier) @type.definition
*_variant with "name" field -> name: (identifier) @constructor
*_block with "description" field -> description: @string
*_expression -> no default highlight (expressions are structural)
*_statement -> no default highlight
Field named "params"/"parameters" -> children (identifier) @variable.parameter
let_declaration name -> @variable.definition
New string tokens that are operators (non-alphanumeric) -> @operator
New string tokens that are keywords (alphanumeric) -> @keyword

func GenerateHighlightQuery ¶

func GenerateHighlightQuery(g *Grammar) string

GenerateHighlightQuery infers a tree-sitter highlight query from grammar structure. It maps well-known rule names and patterns to standard capture names:

comment → @comment
string, string_content → @string
number, integer, float → @number
true, false → @boolean
null, nil, none → @constant.builtin
identifier → @variable
type_identifier → @type
function keywords → @keyword.function
control flow keywords → @keyword.control
other keyword-like string terminals → @keyword
operators → @operator

func GenerateLanguage ¶

func GenerateLanguage(g *Grammar) (*gotreesitter.Language, error)

GenerateLanguage compiles a Grammar into a Language struct without encoding. LR(1) state splitting is always attempted; a rollback guard reverts to the plain LALR table if splitting does not reduce GLR conflicts.

func GenerateLanguageWithContext ¶

func GenerateLanguageWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, error)

GenerateLanguageWithContext is like GenerateLanguage but accepts a context for cancellation. When the context is cancelled, LR table construction and DFA building abort promptly, allowing the caller to reclaim memory that would otherwise be held by an orphaned goroutine.

func LoadLanguageBlob ¶

func LoadLanguageBlob(data []byte) (*gotreesitter.Language, error)

LoadLanguageBlob deserializes a compressed language blob back into a Language. This is the inverse of the blob encoding used by GenerateLanguage.

func RunTests ¶

func RunTests(g *Grammar) error

RunTests generates the grammar and runs all embedded test cases. Returns nil if all tests pass, or an error describing failures.

func Validate ¶

func Validate(g *Grammar) []string

Validate checks the grammar for common issues and returns warnings.

Types ¶

type AliasInfo ¶

type AliasInfo struct {
	ChildIndex int
	Name       string
	Named      bool
}

AliasInfo stores alias information for a child position.

type Assoc ¶

type Assoc int

Assoc is the associativity of a production.

const (
	AssocNone Assoc = iota
	AssocLeft
	AssocRight
)

type ConflictDiag ¶

type ConflictDiag struct {
	Kind          ConflictKind
	State         int
	LookaheadSym  int
	Actions       []lrAction // the conflicting actions
	Resolution    string     // how it was resolved (or "GLR" if kept)
	IsMergedState bool       // was this state produced by LALR merging?
	MergeCount    int        // how many merge origins this state has
}

ConflictDiag describes a conflict encountered during LR table construction.

func (*ConflictDiag) String ¶

func (d *ConflictDiag) String(ng *NormalizedGrammar) string

type ConflictKind ¶

type ConflictKind int

ConflictKind describes the type of LR conflict.

const (
	ShiftReduce ConflictKind = iota
	ReduceReduce
)

type FieldAssign ¶

type FieldAssign struct {
	ChildIndex int
	FieldName  string
}

FieldAssign maps a child position in a production to a field name.

type GenerateReport ¶

type GenerateReport struct {
	Language        *gotreesitter.Language
	Blob            []byte
	Conflicts       []ConflictDiag
	SplitCandidates []splitCandidate
	SplitResult     *splitReport
	Warnings        []string
	SymbolCount     int
	StateCount      int
	TokenCount      int
}

GenerateReport holds the result of grammar generation with diagnostics.

func GenerateWithReport ¶

func GenerateWithReport(g *Grammar) (*GenerateReport, error)

GenerateWithReport compiles a grammar and returns a full diagnostic report.

type Grammar ¶

type Grammar struct {
	Name              string
	Rules             map[string]*Rule
	RuleOrder         []string // order rules were defined (first = start rule)
	Extras            []*Rule
	Conflicts         [][]string
	Externals         []*Rule
	Inline            []string
	Word              string
	Supertypes        []string
	Tests             []TestCase      // embedded test cases
	EnableLRSplitting bool            // opt-in: attempt LR(1) state splitting for merge pathology
	BinaryRepeatMode  bool            // use tree-sitter's binary repeat helper shape (aux→seq(aux,aux)|inner)
	NonKeywordStrings map[string]bool // strings that should NOT be promoted via keyword DFA (extension keywords that coexist as identifiers)
}

Grammar is the top-level grammar definition.

func AliasSuperGrammar ¶

func AliasSuperGrammar() *Grammar

AliasSuperGrammar returns a grammar that exercises aliases and supertypes.

Supertypes:

_expression is a supertype with children: number, string, identifier, binary_expression

Aliases:

In assignment, the left-hand side identifier is aliased to "variable"
In binary_expression, the operator string is aliased to "op"

func CalcGrammar ¶

func CalcGrammar() *Grammar

CalcGrammar returns a calculator grammar that exercises precedence and associativity. It defines:

Binary operators: +, -, *, / with standard math precedence
Unary prefix minus: -x (highest precedence)
Parenthesized expressions: (x)
Integer literals: number

func ExtScannerGrammar ¶

func ExtScannerGrammar() *Grammar

ExtScannerGrammar returns a grammar with external scanner tokens. It models a simple block-structured language where INDENT and DEDENT tokens are produced by an external scanner (like Python).

program: repeat(statement)
statement: simple_statement | block
simple_statement: identifier ";"
block: identifier ":" NEWLINE INDENT repeat(statement) DEDENT

External tokens: INDENT, DEDENT, NEWLINE

func ExtendGrammar ¶

func ExtendGrammar(name string, base *Grammar, customize func(g *Grammar)) *Grammar

ExtendGrammar creates a new grammar that inherits from a base grammar. The customize function receives the new grammar with all base rules copied in, and can override rules, add new ones, or modify extras/conflicts/etc.

Example:

cpp := ExtendGrammar("cpp", cGrammar(), func(g *Grammar) {
    g.Define("class_declaration", Seq(Str("class"), Sym("identifier"), Sym("class_body")))
    // Override an existing rule:
    g.Define("declaration", Choice(Sym("class_declaration"), Sym("function_declaration")))
})

func GLRGrammar ¶

func GLRGrammar() *Grammar

GLRGrammar returns a grammar with intentional ambiguity that requires GLR parsing. It models a simplified C-like language where `a * b` can be parsed as either multiplication or a pointer declaration:

expression_statement: a * b ;  (multiplication)
pointer_declaration:  a * b ;  (type * name)

The conflict between _expression and type_name is declared, causing the parser to fork stacks when it encounters the ambiguity.

func GoGrammar ¶

func GoGrammar() *Grammar

GoGrammar returns the go grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func INIGrammar ¶

func INIGrammar() *Grammar

INIGrammar returns a production-grade INI file grammar.

Parses the superset of major INI dialects (Windows API, Python configparser, Git config, PHP parse_ini_file):

Sections: [name] and [section "subsection"] (Git-style)
Key-value pairs: key = value, key : value, key=value
Comments: ; and # (full-line only)
Quoted string values: "..." with \" and \\ escapes
Global pairs: key=value before any [section]
Empty values: key= (value is optional)

INI is line-oriented: newlines are significant (not extras). Only horizontal whitespace (spaces, tabs) is treated as extras.

func ImportGrammarJS ¶

func ImportGrammarJS(source []byte) (*Grammar, error)

ImportGrammarJS parses a tree-sitter grammar.js file and returns a Grammar IR. This uses gotreesitter's own JavaScript grammar to parse the file, demonstrating the full-circle capability: gotreesitter parsing its own input format.

func ImportGrammarJSON ¶

func ImportGrammarJSON(data []byte) (*Grammar, error)

ImportGrammarJSON parses a tree-sitter grammar.json file (the canonical resolved form generated by `tree-sitter generate`) and returns a Grammar IR. This is more reliable than ImportGrammarJS because grammar.json has no require() calls, helper functions, or other JavaScript-specific constructs.

func JSONGrammar ¶

func JSONGrammar() *Grammar

JSONGrammar returns the JSON grammar defined using the Go DSL. This mirrors tree-sitter-json's grammar.js definition.

func KeywordGrammar ¶

func KeywordGrammar() *Grammar

KeywordGrammar returns a simplified language grammar that exercises keyword extraction and the word token mechanism. Keywords "var" and "return" match the identifier pattern but are promoted to their own symbols by the keyword DFA.

func LoxGrammar ¶

func LoxGrammar() *Grammar

LoxGrammar returns a production-grade Lox grammar (Crafting Interpreters spec).

Implements the full Lox language:

Variables: var x = expr;
Functions: fun name(params) { body }
Classes: class Name < Super { methods }
Control flow: if/else, while, for
Operators: or, and, ==, !=, <, >, <=, >=, +, -, *, /, !, unary -
Calls and property access: f(args), obj.prop, obj.prop = val
Literals: numbers, strings, true, false, nil, this, super
Print: print expr;
Return: return expr;
Comments: // line comments
Block scoping: { statements }

func MustacheGrammar ¶

func MustacheGrammar() *Grammar

MustacheGrammar returns a production-grade Mustache template grammar.

Implements the required Mustache spec features:

Interpolation: {{ name }}
Unescaped interpolation: {{{ name }}} and {{& name }}
Sections: {{# name }} ... {{/ name }}
Inverted sections: {{^ name }} ... {{/ name }}
Comments: {{! comment text }}
Partials: {{> partial_name }}
Dotted names: {{ person.name }}
Implicit iterator: {{ . }}
Raw text between tags

The grammar treats {{ and }} as delimiters. Text outside tags is raw content. The DFA handles {{{ vs {{ disambiguation via maximal munch.

func NewGrammar ¶

func NewGrammar(name string) *Grammar

NewGrammar creates a new grammar with the given name.

func ParseGrammarFile ¶

func ParseGrammarFile(source string) (*Grammar, error)

ParseGrammarFile parses a declarative .grammar file into a Grammar IR.

Syntax:

grammar <name>

extras = [ /\s/ ]
word = <rule_name>
supertypes = [ <rule_name>, ... ]
conflicts = [ [<rule>, <rule>], ... ]

rule <name> = <expr>

Expressions:

"string"         string literal
/pattern/        regex pattern
<name>           symbol reference
seq(a, b, ...)   sequence
choice(a, b, ..) alternation
repeat(a)        zero or more
repeat1(a)       one or more
optional(a)      optional
token(a)         token boundary
field("name", a) field annotation
prec(n, a)       precedence
prec.left(n, a)  left-associative precedence
prec.right(n, a) right-associative precedence

func (*Grammar) Define ¶

func (g *Grammar) Define(name string, rule *Rule)

Define adds a rule to the grammar. The first rule defined is the start rule.

func (*Grammar) SetConflicts ¶

func (g *Grammar) SetConflicts(conflicts ...[]string)

SetConflicts declares grammar conflicts for GLR.

func (*Grammar) SetExternals ¶

func (g *Grammar) SetExternals(rules ...*Rule)

SetExternals declares external scanner tokens.

func (*Grammar) SetExtras ¶

func (g *Grammar) SetExtras(rules ...*Rule)

SetExtras sets the extra rules (e.g. whitespace, comments).

func (*Grammar) SetInline ¶

func (g *Grammar) SetInline(names ...string)

SetInline marks rules to be inlined.

func (*Grammar) SetSupertypes ¶

func (g *Grammar) SetSupertypes(names ...string)

SetSupertypes declares supertype rules.

func (*Grammar) SetWord ¶

func (g *Grammar) SetWord(name string)

SetWord sets the word token for keyword extraction.

func (*Grammar) Test ¶

func (g *Grammar) Test(name, input, expected string)

Test adds an embedded test case. Input is parsed and the resulting tree is compared against the expected S-expression. If expected is empty, the test only checks that no ERROR nodes appear.

func (*Grammar) TestError ¶

func (g *Grammar) TestError(name, input string)

TestError adds an embedded test case that expects parse errors.

type GrammarDiff ¶

type GrammarDiff struct {
	AddedRules        []string
	RemovedRules      []string
	ModifiedRules     []string // rules present in both but with different definitions
	ExtrasChanged     bool
	ConflictsChanged  bool
	ExternalsChanged  bool
	WordChanged       bool
	SupertypesChanged bool
}

GrammarDiff describes the differences between two grammar versions.

func DiffGrammars ¶

func DiffGrammars(old, new *Grammar) *GrammarDiff

DiffGrammars compares two grammar versions and returns a diff.

func (*GrammarDiff) HasChanges ¶

func (d *GrammarDiff) HasChanges() bool

HasChanges returns true if any differences were found.

func (*GrammarDiff) String ¶

func (d *GrammarDiff) String() string

String returns a human-readable summary of the diff.

type LRTables ¶

type LRTables struct {
	// ActionTable[state][symbol] = list of actions (multiple = conflict/GLR)
	ActionTable          map[int]map[int][]lrAction
	GotoTable            map[int]map[int]int // [state][nonterminal] → target state
	StateCount           int
	ExtraChainStateStart int // first synthetic nonterminal-extra state, or -1 if none
}

LRTables holds the generated parse tables.

type NormalizedGrammar ¶

type NormalizedGrammar struct {
	Symbols       []SymbolInfo
	Productions   []Production
	Terminals     []TerminalPattern
	ExtraSymbols  []int    // symbol indices of extras
	FieldNames    []string // index 0 is always ""
	Conflicts     [][]int  // symbol index groups
	Supertypes    []int    // symbol indices
	StartSymbol   int
	AugmentProdID int // production index for S' → S

	// Keyword support (populated when Grammar.Word is set).
	KeywordSymbols []int             // symbol IDs that are keywords
	WordSymbolID   int               // word token symbol ID (e.g., identifier)
	KeywordEntries []TerminalPattern // keyword patterns for keyword DFA

	// External scanner support (populated when Grammar.Externals is set).
	ExternalSymbols []int // external token index → symbol ID
	// contains filtered or unexported fields
}

NormalizedGrammar is the output of the normalize step.

func Normalize ¶

func Normalize(g *Grammar) (*NormalizedGrammar, error)

Normalize transforms a Grammar into a NormalizedGrammar.

func (*NormalizedGrammar) TokenCount ¶

func (ng *NormalizedGrammar) TokenCount() int

TokenCount returns the number of terminal symbols (including symbol 0 = end).

type Production ¶

type Production struct {
	LHS          int   // symbol index
	RHS          []int // symbol indices
	Prec         int
	Assoc        Assoc
	DynPrec      int
	ProductionID int
	Fields       []FieldAssign // per-RHS-position field assignments
	Aliases      []AliasInfo   // per-RHS-position alias info
	IsExtra      bool          // true if this production belongs to a nonterminal extra
}

Production is a single LHS → RHS production with metadata.

type Rule ¶

type Rule struct {
	Kind     RuleKind
	Value    string  // literal/pattern/symbol/field name
	Children []*Rule // sub-rules
	Prec     int     // precedence value
	Named    bool    // for alias: whether the alias is a named node
}

Rule is a node in the grammar rule tree.

func Alias ¶

func Alias(rule *Rule, name string, named bool) *Rule

Alias aliases a rule to a different name.

func Blank ¶

func Blank() *Rule

Blank creates an epsilon (empty) rule.

func Braces ¶

func Braces(rule *Rule) *Rule

Braces wraps a rule in curly braces.

func Brackets ¶

func Brackets(rule *Rule) *Rule

Brackets wraps a rule in square brackets.

func Choice ¶

func Choice(rules ...*Rule) *Rule

Choice creates an alternation of rules.

func CommaSep ¶

func CommaSep(rule *Rule) *Rule

CommaSep creates an optional comma-separated list.

func CommaSep1 ¶

func CommaSep1(rule *Rule) *Rule

CommaSep1 creates a non-empty comma-separated list.

func Field ¶

func Field(name string, rule *Rule) *Rule

Field annotates a rule with a field name.

func ImmToken ¶

func ImmToken(rule *Rule) *Rule

ImmToken creates an immediate token (no preceding whitespace).

func Optional ¶

func Optional(rule *Rule) *Rule

Optional creates an optional rule.

func Parens ¶

func Parens(rule *Rule) *Rule

Parens wraps a rule in parentheses.

func Pat ¶

func Pat(pattern string) *Rule

Pat creates a regex pattern rule.

func Prec ¶

func Prec(n int, rule *Rule) *Rule

Prec sets precedence on a rule.

func PrecDynamic ¶

func PrecDynamic(n int, rule *Rule) *Rule

PrecDynamic sets dynamic precedence on a rule.

func PrecLeft ¶

func PrecLeft(n int, rule *Rule) *Rule

PrecLeft sets left-associative precedence on a rule.

func PrecRight ¶

func PrecRight(n int, rule *Rule) *Rule

PrecRight sets right-associative precedence on a rule.

func Repeat ¶

func Repeat(rule *Rule) *Rule

Repeat creates a zero-or-more repetition.

func Repeat1 ¶

func Repeat1(rule *Rule) *Rule

Repeat1 creates a one-or-more repetition.

func SepBy ¶

func SepBy(sep, rule *Rule) *Rule

SepBy creates an optional list separated by the given separator.

func SepBy1 ¶

func SepBy1(sep, rule *Rule) *Rule

SepBy1 creates a non-empty list separated by the given separator.

func Seq ¶

func Seq(rules ...*Rule) *Rule

Seq creates a sequence of rules.

func Str ¶

func Str(s string) *Rule

Str creates a string literal rule.

func Surround ¶

func Surround(open, rule, close *Rule) *Rule

Surround wraps a rule with open and close delimiters.

func Sym ¶

func Sym(name string) *Rule

Sym creates a symbol reference rule.

func Token ¶

func Token(rule *Rule) *Rule

Token creates a token boundary (content is a single lexer token).

type RuleKind ¶

type RuleKind int

RuleKind identifies the type of a grammar rule node.

const (
	RuleString      RuleKind = iota // literal string: "{"
	RulePattern                     // regex pattern: /[0-9]+/
	RuleSymbol                      // symbol reference: $.object
	RuleSeq                         // sequence: seq(a, b, c)
	RuleChoice                      // alternation: choice(a, b)
	RuleRepeat                      // zero-or-more: repeat(a)
	RuleRepeat1                     // one-or-more: repeat1(a)
	RuleOptional                    // optional: optional(a)
	RuleToken                       // token boundary: token(a)
	RuleImmToken                    // immediate token: token.immediate(a)
	RuleField                       // field annotation: field("name", a)
	RulePrec                        // precedence: prec(n, a)
	RulePrecLeft                    // left-associative: prec.left(n, a)
	RulePrecRight                   // right-associative: prec.right(n, a)
	RulePrecDynamic                 // dynamic precedence: prec.dynamic(n, a)
	RuleBlank                       // epsilon / empty
	RuleAlias                       // alias: alias(a, "name")
)

type SymbolInfo ¶

type SymbolInfo struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
	Kind      SymbolKind
	IsExtra   bool
	Immediate bool // token.immediate — no preceding whitespace skip
}

SymbolInfo describes a grammar symbol.

type SymbolKind ¶

type SymbolKind int

SymbolKind classifies a grammar symbol.

const (
	SymbolTerminal    SymbolKind = iota // anonymous terminal like "{"
	SymbolNamedToken                    // named terminal like number, string_content
	SymbolExternal                      // external scanner token
	SymbolNonterminal                   // nonterminal rule
)

type TerminalPattern ¶

type TerminalPattern struct {
	SymbolID  int
	Rule      *Rule // the flattened rule tree for NFA construction
	Priority  int   // lower = higher priority (wins on tie)
	Immediate bool  // token.immediate
}

TerminalPattern describes a terminal symbol's match pattern for DFA generation.

type TestCase ¶

type TestCase struct {
	Name        string // test name
	Input       string // input to parse
	Expected    string // expected S-expression (empty = just check no errors)
	ExpectError bool   // if true, expect ERROR nodes in the tree
}

TestCase is an embedded grammar test case.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL