grammargen

package

v0.20.2 Latest Latest Go to latest Published: Jun 6, 2026 License: MIT Imports: 19 Imported by: 4

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/odvcencio/gotreesitter

Links

Open Source Insights

README ¶

grammargen

grammargen is the pure-Go grammar compiler used by gotreesitter. It turns a grammar definition into a *gotreesitter.Language, a serialized .bin blob, a tree-sitter-compatible parser.c, or generated Go DSL source.

The authoring surface is intentionally input-neutral:

Go DSL grammars built with NewGrammar, Define, Seq, Choice, Token, Field, PrecLeft, and related constructors.
Resolved upstream grammar.json files, which are the preferred import format for tree-sitter grammars.
.grammar files, a compact ecosystem-agnostic grammar format that parses into the same IR and can emit Go DSL.

grammar.js import also exists, but grammar.json is usually more reliable because helper functions, require() calls, and JavaScript evaluation have already been resolved by tree-sitter.

Authoring Commands

Use doctor when changing a grammar. It validates, generates parser tables, runs embedded tests when present, optionally parses a sample, and suggests the next command.

go run ./cmd/grammargen doctor calc -text '1+2*3'
go run ./cmd/grammargen doctor -json /tmp/grammar_parity/go/src/grammar.json -sample sample.go
go run ./cmd/grammargen doctor -grammar ./mini.grammar -text '123'
go run ./cmd/grammargen doctor calc -text '1+2*3' -conflicts 3
go run ./cmd/grammargen doctor calc -text '1+2*3' -format json

Use parse when you want quick sample-to-tree feedback:

go run ./cmd/grammargen parse calc -text '1+2*3'
go run ./cmd/grammargen parse -grammar ./mini.grammar -stdin
go run ./cmd/grammargen parse calc -text '1+2*3' -format sexpr
go run ./cmd/grammargen parse calc -text '1+2*3' -format json

Use emit to write artifacts from any supported input:

# gotreesitter blob
go run ./cmd/grammargen emit go -bin grammars/grammar_blobs/go.bin

# Go DSL source from a resolved grammar.json
go run ./cmd/grammargen emit \
  -json /tmp/grammar_parity/go/src/grammar.json \
  -go grammargen/go_grammar.go \
  -pkg grammargen \
  -func GoGrammar

# Go DSL source from a .grammar file
go run ./cmd/grammargen emit -grammar ./mini.grammar -go ./mini_grammar.go -pkg grammargen

# Resolved grammar.json from any supported input
go run ./cmd/grammargen emit -grammar ./mini.grammar -json-out ./mini.grammar.json

# tree-sitter parser.c
go run ./cmd/grammargen emit calc -c /tmp/parser.c

# inferred highlight query
go run ./cmd/grammargen emit calc -highlight

For golden parse snapshots, write the current tree once, then compare future runs against it:

go run ./cmd/grammargen parse calc -text '1+2*3' -write-expect ./calc.sexpr
go run ./cmd/grammargen parse calc -text '1+2*3' -expect ./calc.sexpr

parse -strict exits non-zero when parsing finishes with ERROR nodes or an early stop condition. doctor treats sample parse errors as gate failures by default.

The legacy flag surface still works:

go run ./cmd/grammargen -validate calc
go run ./cmd/grammargen -report calc
go run ./cmd/grammargen -grammar ./mini.grammar -go ./mini_grammar.go

For grammars that benefit from local LR(1) state splitting, pass -lr-split:

go run ./cmd/grammargen doctor go -lr-split -sample sample.go
go run ./cmd/grammargen emit go -lr-split -bin grammars/grammar_blobs/go.bin

Go DSL

The first defined rule is the start rule. Names beginning with _ are hidden rules. String rules create literal tokens, pattern rules create regex terminals, and Token groups a rule into one lexer token.

func MiniExprGrammar() *Grammar {
	g := NewGrammar("mini_expr")

	g.Define("program", Sym("expression"))
	g.Define("expression", Choice(
		PrecLeft(1, Seq(
			Field("left", Sym("expression")),
			Field("operator", Str("+")),
			Field("right", Sym("expression")),
		)),
		PrecLeft(2, Seq(
			Field("left", Sym("expression")),
			Field("operator", Str("*")),
			Field("right", Sym("expression")),
		)),
		Sym("number"),
		Seq(Str("("), Sym("expression"), Str(")")),
	))
	g.Define("number", Token(Repeat1(Pat(`[0-9]`))))
	g.SetExtras(Pat(`\s`))

	g.Test("precedence", "1 + 2 * 3", "")

	return g
}

An embedded test with an empty expected S-expression only checks that parsing finishes without ERROR nodes. Fill in the expected S-expression when a rule's exact tree shape should be locked down.

Common grammar-level settings:

SetExtras(...): whitespace, comments, or other extra tokens.
SetConflicts(...): declared ambiguity groups that should keep GLR alternatives.
SetExternals(...): external scanner tokens.
SetInline(...): rules to inline during normalization.
SetWord(...): word token used for keyword extraction.
SetSupertypes(...): structural supertypes exposed in metadata.
Precedences: ordered named and symbol precedence levels imported from grammar.json.

Useful DSL helpers live in grammar.go: CommaSep, CommaSep1, SepBy, SepBy1, Parens, Brackets, Braces, AppendChoice, and ExtendGrammar.

`.grammar` Files

.grammar is the ecosystem-agnostic text format. It is currently line-oriented, so keep each rule definition on one line.

grammar mini

extras = [ /\s/ ]

rule program = number
rule number = token(repeat1(/[0-9]/))

Run it through the same command surface:

go run ./cmd/grammargen doctor -grammar ./mini.grammar -text '123'
go run ./cmd/grammargen parse -grammar ./mini.grammar -text '123'
go run ./cmd/grammargen emit -grammar ./mini.grammar -go ./mini_grammar.go -pkg grammargen
go run ./cmd/grammargen emit -grammar ./mini.grammar -json-out ./mini.grammar.json
go run ./cmd/grammargen emit -grammar ./mini.grammar -bin /tmp/mini.bin

Supported top-level lines:

grammar <name>
extras = [ <rule-expr>, ... ]
word = <rule_name>
supertypes = [ <rule_name>, ... ]
conflicts = [ [<rule>, <rule>], ... ]
rule <name> = <rule-expr>

Supported expressions:

"literal"
/regex/
identifier

seq(a, b, ...)
choice(a, b, ...)
repeat(a)
repeat1(a)
optional(a)
token(a)
field("name", a)
prec(1, a)
prec.left(1, a)
prec.right(1, a)
prec.dynamic(1, a)
alias(a, name)
alias(a, "anonymous_name")

For large upstream grammars, resolved grammar.json remains the most complete input. .grammar is the portable authoring format and should stay independent of any host language syntax.

Validation Loop

For small package-local checks, keep tests focused:

go test ./cmd/grammargen ./grammargen \
  -run '^TestJSONGenerate$|^TestGenerateWithReportCtxSkipsDiagnosticsWhenNotRequested$' \
  -count=1

When changing GLR, incremental, import, or parity-sensitive behavior, use the Docker parity runners and keep runs to one grammar at a time:

# Focused package test inside Docker
bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammargen -run '^TestName$' -count=1"

# Real-corpus parity for one grammar
bash cgo_harness/docker/run_single_grammar_parity.sh typescript

# Focused grammargen real-corpus lane
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode real-corpus --langs typescript

# Focused grammargen-vs-C lane
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode cgo --langs typescript

Do not run repo-wide go test ./... or broad race sweeps on the host for grammargen work. Heavy correctness, parity, and race coverage belongs in Docker or CI, scoped to one language or one regression at a time.

Reading the Package

grammar.go: public IR and Go DSL constructors.
parse_grammar_file.go: .grammar parser.
import_grammarjson.go: resolved tree-sitter grammar.json import.
import_grammarjs.go: best-effort grammar.js import.
normalize.go: rule lowering, metadata, fields, terminals, and production construction.
lr.go: LR/LALR table construction and conflict resolution.
lr_split.go, lr_split_oracle.go: local LR(1) split support.
dfa.go, nfa.go, regex.go: lexer construction.
encode.go, assemble.go: Language assembly and blob encoding.
diagnostics.go: validation, embedded tests, and generation reports.
emit_grammar_go.go, export_grammarjson.go, codegen_c.go: artifact emitters.
parity_test.go, parity_real_corpus_test.go: generated-vs-reference parity infrastructure.

Troubleshooting

Start with doctor. It reports validation warnings, generation failures, table sizes, conflict count, embedded test status, and sample parse status. Add -conflicts N when precedence or GLR behavior needs inspection, or -format json when another tool should consume the report.

Use parse when a grammar generates but the tree shape looks wrong. It prints the root type, byte range, error flag, stop reason, and named-node S-expression. Use -format sexpr or -expect/-write-expect for golden tree snapshots.

For upstream imports, prefer src/grammar.json from a generated tree-sitter repository. If import fails on grammar.js, regenerate or locate the resolved JSON first.

External-scanner grammars need a compatible Go scanner binding in grammars/. The generated grammar can expose external tokens, but scanner behavior is still hand-written runtime code.

When corpus parity fails, narrow before changing generator behavior: one language, one focused test, one sample if possible. Use GTS_GRAMMARGEN_REAL_CORPUS_ONLY, GTS_GRAMMARGEN_REAL_CORPUS_MAX_CASES, and the focused Docker runners to keep the workload reproducible and attributable.

Documentation ¶

Overview ¶

Package grammargen implements a pure-Go grammar generator for gotreesitter. It compiles grammar definitions expressed in a Go DSL into binary blobs that the gotreesitter runtime can load and use for parsing.

Index ¶

func AddConflict(g *Grammar, names ...string)
func AppendChoice(g *Grammar, name string, rule *Rule)
func EmitC(name string, lang *gotreesitter.Language) (string, error)
func EmitGrammarGo(g *Grammar, pkgName, funcName string) ([]byte, error)
func ExportGrammarJSON(g *Grammar) ([]byte, error)
func Generate(g *Grammar) ([]byte, error)
func GenerateC(g *Grammar) (string, error)
func GenerateHighlightQueries(base, extended *Grammar) string
func GenerateHighlightQuery(g *Grammar) string
func GenerateLanguage(g *Grammar) (*gotreesitter.Language, error)
func GenerateLanguageAndBlob(g *Grammar) (*gotreesitter.Language, []byte, error)
func GenerateLanguageAndBlobWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, []byte, error)
func GenerateLanguageWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, error)
func RunTests(g *Grammar) error
func Validate(g *Grammar) []string
type AliasInfo
type Assoc
type ConflictDiag
- func (d *ConflictDiag) String(ng *NormalizedGrammar) string
type ConflictKind
type FieldAssign
type GenerateReport
- func GenerateWithReport(g *Grammar) (*GenerateReport, error)
type Grammar
- func AliasSuperGrammar() *Grammar
- func CalcGrammar() *Grammar
- func ExtScannerGrammar() *Grammar
- func ExtendGrammar(name string, base *Grammar, customize func(g *Grammar)) *Grammar
- func FortranGrammar() *Grammar
- func GLRGrammar() *Grammar
- func GoGrammar() *Grammar
- func INIGrammar() *Grammar
- func ImportGrammarJS(source []byte) (*Grammar, error)
- func ImportGrammarJSON(data []byte) (*Grammar, error)
- func JSGrammar() *Grammar
- func JSONGrammar() *Grammar
- func JSXGrammar() *Grammar
- func JavaScriptGrammar() *Grammar
- func JavascriptGrammar() *Grammar
- func KeywordGrammar() *Grammar
- func KotlinGrammar() *Grammar
- func LoxGrammar() *Grammar
- func MarkdownGrammar() *Grammar
- func MustacheGrammar() *Grammar
- func NewGrammar(name string) *Grammar
- func ParseGrammarFile(source string) (*Grammar, error)
- func SwiftABIManglingGrammar() *Grammar
- func SwiftGrammar() *Grammar
- func TSGrammar() *Grammar
- func TSXGrammar() *Grammar
- func TsxGrammar() *Grammar
- func TypeScriptGrammar() *Grammar
- func TypescriptGrammar() *Grammar
- func (g *Grammar) Define(name string, rule *Rule)
- func (g *Grammar) SetConflicts(conflicts ...[]string)
- func (g *Grammar) SetExternals(rules ...*Rule)
- func (g *Grammar) SetExtras(rules ...*Rule)
- func (g *Grammar) SetInline(names ...string)
- func (g *Grammar) SetSupertypes(names ...string)
- func (g *Grammar) SetWord(name string)
- func (g *Grammar) Test(name, input, expected string)
- func (g *Grammar) TestError(name, input string)
type GrammarDiff
- func DiffGrammars(old, new *Grammar) *GrammarDiff
- func (d *GrammarDiff) HasChanges() bool
- func (d *GrammarDiff) String() string
type LRTables
type NormalizedGrammar
- func Normalize(g *Grammar) (*NormalizedGrammar, error)
- func (ng *NormalizedGrammar) TokenCount() int
type PrecEntry
type Production
type ReservedWordSet
type Rule
- func Alias(rule *Rule, name string, named bool) *Rule
- func Blank() *Rule
- func Braces(rule *Rule) *Rule
- func Brackets(rule *Rule) *Rule
- func Choice(rules ...*Rule) *Rule
- func CommaSep(rule *Rule) *Rule
- func CommaSep1(rule *Rule) *Rule
- func Field(name string, rule *Rule) *Rule
- func ImmToken(rule *Rule) *Rule
- func Optional(rule *Rule) *Rule
- func Parens(rule *Rule) *Rule
- func Pat(pattern string) *Rule
- func Prec(n int, rule *Rule) *Rule
- func PrecDynamic(n int, rule *Rule) *Rule
- func PrecLeft(n int, rule *Rule) *Rule
- func PrecRight(n int, rule *Rule) *Rule
- func Repeat(rule *Rule) *Rule
- func Repeat1(rule *Rule) *Rule
- func SepBy(sep, rule *Rule) *Rule
- func SepBy1(sep, rule *Rule) *Rule
- func Seq(rules ...*Rule) *Rule
- func Str(s string) *Rule
- func Surround(open, rule, close *Rule) *Rule
- func Sym(name string) *Rule
- func Token(rule *Rule) *Rule
type RuleKind
type SymbolInfo
type SymbolKind
type TerminalPattern
type TestCase

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func AddConflict ¶

func AddConflict(g *Grammar, names ...string)

AddConflict appends a GLR conflict declaration to the grammar.

func AppendChoice ¶

func AppendChoice(g *Grammar, name string, rule *Rule)

AppendChoice appends an alternative to an existing rule, wrapping the prior definition in a Choice if needed.

func EmitC ¶

func EmitC(name string, lang *gotreesitter.Language) (string, error)

EmitC emits a parser.c string from a compiled Language struct.

func EmitGrammarGo ¶

func EmitGrammarGo(g *Grammar, pkgName, funcName string) ([]byte, error)

EmitGrammarGo takes a Grammar IR and emits Go source code that reconstructs it using grammargen DSL calls. The output is a standalone Go file in the given package with a function of the given name that returns *Grammar.

func ExportGrammarJSON ¶

func ExportGrammarJSON(g *Grammar) ([]byte, error)

ExportGrammarJSON serializes a Grammar struct to the tree-sitter grammar.json format. The output is compatible with ImportGrammarJSON — a round-trip ImportGrammarJSON(ExportGrammarJSON(g)) should produce an equivalent grammar.

The JSON structure matches tree-sitter's canonical resolved grammar.json:

{
  "name": "...",
  "word": "...",
  "rules": { ... },
  "extras": [...],
  "conflicts": [...],
  "externals": [...],
  "inline": [...],
  "supertypes": [...]
}

func Generate ¶

func Generate(g *Grammar) ([]byte, error)

Generate compiles a Grammar definition into a binary blob that gotreesitter can load via DecodeLanguageBlob / loadEmbeddedLanguage. LR(1) state splitting is always attempted; a rollback guard reverts to the plain LALR table if splitting does not reduce GLR conflicts.

func GenerateC ¶

func GenerateC(g *Grammar) (string, error)

GenerateC compiles a Grammar to a standard tree-sitter parser.c string. The output is compatible with tree-sitter's C runtime ABI 14/15 features that grammargen currently emits.

func GenerateHighlightQueries ¶

func GenerateHighlightQueries(base, extended *Grammar) string

GenerateHighlightQueries produces tree-sitter highlight queries for rules added by a grammar extension. It diffs base and extended to find new rules, then applies naming conventions to generate appropriate highlights.

Conventions:

New Str() tokens matching identifier pattern -> @keyword
*_declaration with "name" field -> name: (identifier) @type.definition
*_variant with "name" field -> name: (identifier) @constructor
*_block with "description" field -> description: @string
*_expression -> no default highlight (expressions are structural)
*_statement -> no default highlight
Field named "params"/"parameters" -> children (identifier) @variable.parameter
let_declaration name -> @variable.definition
New string tokens that are operators (non-alphanumeric) -> @operator
New string tokens that are keywords (alphanumeric) -> @keyword

func GenerateHighlightQuery ¶

func GenerateHighlightQuery(g *Grammar) string

GenerateHighlightQuery infers a tree-sitter highlight query from grammar structure. It maps well-known rule names and patterns to standard capture names:

comment → @comment
string, string_content → @string
number, integer, float → @number
true, false → @boolean
null, nil, none → @constant.builtin
identifier → @variable
type_identifier → @type
function keywords → @keyword.function
control flow keywords → @keyword.control
other keyword-like string terminals → @keyword
operators → @operator

func GenerateLanguage ¶

func GenerateLanguage(g *Grammar) (*gotreesitter.Language, error)

GenerateLanguage compiles a Grammar into a Language struct without encoding. LR(1) state splitting is always attempted; a rollback guard reverts to the plain LALR table if splitting does not reduce GLR conflicts.

func GenerateLanguageAndBlob ¶ added in v0.10.0

func GenerateLanguageAndBlob(g *Grammar) (*gotreesitter.Language, []byte, error)

GenerateLanguageAndBlob compiles a Grammar into both a Language and its serialized blob representation in a single generation pass.

func GenerateLanguageAndBlobWithContext ¶ added in v0.10.0

func GenerateLanguageAndBlobWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, []byte, error)

GenerateLanguageAndBlobWithContext is like GenerateLanguageAndBlob but accepts a context for cancellation.

func GenerateLanguageWithContext ¶

func GenerateLanguageWithContext(ctx context.Context, g *Grammar) (*gotreesitter.Language, error)

GenerateLanguageWithContext is like GenerateLanguage but accepts a context for cancellation. When the context is cancelled, LR table construction and DFA building abort promptly, allowing the caller to reclaim memory that would otherwise be held by an orphaned goroutine.

func RunTests ¶

func RunTests(g *Grammar) error

RunTests generates the grammar and runs all embedded test cases. Returns nil if all tests pass, or an error describing failures.

func Validate ¶

func Validate(g *Grammar) []string

Validate checks the grammar for common issues and returns warnings.

Types ¶

type AliasInfo ¶

type AliasInfo struct {
	ChildIndex int
	Name       string
	Named      bool
}

AliasInfo stores alias information for a child position.

type Assoc ¶

type Assoc int

Assoc is the associativity of a production.

const (
	AssocNone Assoc = iota
	AssocLeft
	AssocRight
)

type ConflictDiag ¶

type ConflictDiag struct {
	Kind          ConflictKind
	State         int
	LookaheadSym  int
	Actions       []lrAction // the conflicting actions
	Resolution    string     // how it was resolved (or "GLR" if kept)
	IsMergedState bool       // was this state produced by LALR merging?
	MergeCount    int        // how many merge origins this state has
}

ConflictDiag describes a conflict encountered during LR table construction.

func (*ConflictDiag) String ¶

func (d *ConflictDiag) String(ng *NormalizedGrammar) string

type ConflictKind ¶

type ConflictKind int

ConflictKind describes the type of LR conflict.

const (
	ShiftReduce ConflictKind = iota
	ReduceReduce
)

type FieldAssign ¶

type FieldAssign struct {
	ChildIndex int
	FieldName  string
}

FieldAssign maps a child position in a production to a field name.

type GenerateReport ¶

type GenerateReport struct {
	Language        *gotreesitter.Language
	Blob            []byte
	Conflicts       []ConflictDiag
	SplitCandidates []splitCandidate
	SplitResult     *splitReport
	Warnings        []string
	SymbolCount     int
	StateCount      int
	TokenCount      int
}

GenerateReport holds the result of grammar generation with diagnostics.

func GenerateWithReport ¶

func GenerateWithReport(g *Grammar) (*GenerateReport, error)

GenerateWithReport compiles a grammar and returns a full diagnostic report.

type Grammar ¶

type Grammar struct {
	Name                                       string
	Rules                                      map[string]*Rule
	RuleOrder                                  []string // order rules were defined (first = start rule)
	Extras                                     []*Rule
	Conflicts                                  [][]string
	Externals                                  []*Rule
	Inline                                     []string
	Word                                       string
	ReservedWordSets                           []ReservedWordSet
	Supertypes                                 []string
	Tests                                      []TestCase    // embedded test cases
	EnableLRSplitting                          bool          // opt-in: attempt LR(1) state splitting for merge pathology
	BinaryRepeatMode                           bool          // use tree-sitter's binary repeat helper shape (aux→seq(aux,aux)|inner)
	FlattenGeneratedRepeatAux                  bool          // allow generated repeat helpers to participate in hidden-choice flattening
	ReuseRepeatAuxForParents                   []string      // parent rule names whose repeat helpers may be shared by canonical body
	PreserveKeywordIdentifierConflicts         bool          // keep keyword-as-identifier S/R ambiguity for grammars like Fortran
	ExactPrefixStates                          int           // keep this many LR(1) states exact before merge compaction
	Precedences                                [][]PrecEntry // ordered precedence levels (each level: earlier = higher prec)
	ChoiceLiftThreshold                        int           // if >0, lift inline CHOICE nodes with more alternatives than this into auxiliary nonterminals to prevent production explosion
	SuppressEquivalentExternalReduceLookaheads bool          // suppress external scanner validity for duplicate reduce-only lookaheads
	ExternalReduceFollowLookaheads             []string      // external token names that may be valid after reducing in the current state
	PriorityInlinePatterns                     []string      // anonymous pattern terminals that should win same-length ties against named tokens
}

Grammar is the top-level grammar definition.

func AliasSuperGrammar ¶

func AliasSuperGrammar() *Grammar

AliasSuperGrammar returns a grammar that exercises aliases and supertypes.

Supertypes:

_expression is a supertype with children: number, string, identifier, binary_expression

Aliases:

In assignment, the left-hand side identifier is aliased to "variable"
In binary_expression, the operator string is aliased to "op"

func CalcGrammar ¶

func CalcGrammar() *Grammar

CalcGrammar returns a calculator grammar that exercises precedence and associativity. It defines:

Binary operators: +, -, *, / with standard math precedence
Unary prefix minus: -x (highest precedence)
Parenthesized expressions: (x)
Integer literals: number

func ExtScannerGrammar ¶

func ExtScannerGrammar() *Grammar

ExtScannerGrammar returns a grammar with external scanner tokens. It models a simple block-structured language where INDENT and DEDENT tokens are produced by an external scanner (like Python).

program: repeat(statement)
statement: simple_statement | block
simple_statement: identifier ";"
block: identifier ":" NEWLINE INDENT repeat(statement) DEDENT

External tokens: INDENT, DEDENT, NEWLINE

func ExtendGrammar ¶

func ExtendGrammar(name string, base *Grammar, customize func(g *Grammar)) *Grammar

ExtendGrammar creates a new grammar that inherits from a base grammar. The customize function receives the new grammar with all base rules copied in, and can override rules, add new ones, or modify extras/conflicts/etc.

Example:

cpp := ExtendGrammar("cpp", cGrammar(), func(g *Grammar) {
    g.Define("class_declaration", Seq(Str("class"), Sym("identifier"), Sym("class_body")))
    // Override an existing rule:
    g.Define("declaration", Choice(Sym("class_declaration"), Sym("function_declaration")))
})

func FortranGrammar ¶ added in v0.16.0

func FortranGrammar() *Grammar

FortranGrammar returns the fortran grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func GLRGrammar ¶

func GLRGrammar() *Grammar

GLRGrammar returns a grammar with intentional ambiguity that requires GLR parsing. It models a simplified C-like language where `a * b` can be parsed as either multiplication or a pointer declaration:

expression_statement: a * b ;  (multiplication)
pointer_declaration:  a * b ;  (type * name)

The conflict between _expression and type_name is declared, causing the parser to fork stacks when it encounters the ambiguity.

func GoGrammar ¶

func GoGrammar() *Grammar

GoGrammar returns the go grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func INIGrammar ¶

func INIGrammar() *Grammar

INIGrammar returns a production-grade INI file grammar.

Parses the superset of major INI dialects (Windows API, Python configparser, Git config, PHP parse_ini_file):

Sections: [name] and [section "subsection"] (Git-style)
Key-value pairs: key = value, key : value, key=value
Comments: ; and # (full-line only)
Quoted string values: "..." with \" and \\ escapes
Global pairs: key=value before any [section]
Empty values: key= (value is optional)

INI is line-oriented: newlines are significant (not extras). Only horizontal whitespace (spaces, tabs) is treated as extras.

func ImportGrammarJS ¶

func ImportGrammarJS(source []byte) (*Grammar, error)

ImportGrammarJS parses a tree-sitter grammar.js file and returns a Grammar IR. This uses gotreesitter's own JavaScript grammar to parse the file, demonstrating the full-circle capability: gotreesitter parsing its own input format.

func ImportGrammarJSON ¶

func ImportGrammarJSON(data []byte) (*Grammar, error)

ImportGrammarJSON parses a tree-sitter grammar.json file (the canonical resolved form generated by `tree-sitter generate`) and returns a Grammar IR. This is more reliable than ImportGrammarJS because grammar.json has no require() calls, helper functions, or other JavaScript-specific constructs.

func JSGrammar ¶ added in v0.16.0

func JSGrammar() *Grammar

JSGrammar returns the JSX-capable JavaScript grammar.

func JSONGrammar ¶

func JSONGrammar() *Grammar

JSONGrammar returns the JSON grammar defined using the Go DSL. This mirrors tree-sitter-json's grammar.js definition.

func JSXGrammar ¶ added in v0.16.0

func JSXGrammar() *Grammar

JSXGrammar returns the JSX-capable JavaScript grammar. The upstream lockfile does not carry a separate jsx language; JSX is parsed by the JavaScript grammar.

func JavaScriptGrammar ¶ added in v0.16.0

func JavaScriptGrammar() *Grammar

JavaScriptGrammar returns the javascript grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func JavascriptGrammar ¶ added in v0.16.0

func JavascriptGrammar() *Grammar

JavascriptGrammar is kept for consistency with grammars.JavascriptLanguage.

func KeywordGrammar ¶

func KeywordGrammar() *Grammar

KeywordGrammar returns a simplified language grammar that exercises keyword extraction and the word token mechanism. Keywords "var" and "return" match the identifier pattern but are promoted to their own symbols by the keyword DFA.

func KotlinGrammar ¶ added in v0.16.0

func KotlinGrammar() *Grammar

KotlinGrammar returns the kotlin grammar. Code generated by EmitGrammarGo. DO NOT EDIT. Source: fwcd/tree-sitter-kotlin@57170e50a32b29122b9e41a4a24aea8be1a16599/src/grammar.json.

func LoxGrammar ¶

func LoxGrammar() *Grammar

LoxGrammar returns a production-grade Lox grammar (Crafting Interpreters spec).

Implements the full Lox language:

Variables: var x = expr;
Functions: fun name(params) { body }
Classes: class Name < Super { methods }
Control flow: if/else, while, for
Operators: or, and, ==, !=, <, >, <=, >=, +, -, *, /, !, unary -
Calls and property access: f(args), obj.prop, obj.prop = val
Literals: numbers, strings, true, false, nil, this, super
Print: print expr;
Return: return expr;
Comments: // line comments
Block scoping: { statements }

func MarkdownGrammar ¶ added in v0.20.0

func MarkdownGrammar() *Grammar

MarkdownGrammar returns the Go-DSL definition of the CommonMark + GFM Markdown grammar. Equivalent in shape to the upstream tree-sitter-markdown grammar.json but owned in Go so it can be refactored, extended via ExtendGrammar, and compiled directly with GenerateLanguage(AndBlob).

External scanner is NOT attached here. Callers must follow GenerateLanguage with `grammars.AdaptScannerForLanguage("markdown", lang)` to attach the hand-written external scanner that owns the 47 block/inline external tokens.

func MustacheGrammar ¶

func MustacheGrammar() *Grammar

MustacheGrammar returns a production-grade Mustache template grammar.

Implements the required Mustache spec features:

Interpolation: {{ name }}
Unescaped interpolation: {{{ name }}} and {{& name }}
Sections: {{# name }} ... {{/ name }}
Inverted sections: {{^ name }} ... {{/ name }}
Comments: {{! comment text }}
Partials: {{> partial_name }}
Dotted names: {{ person.name }}
Implicit iterator: {{ . }}
Raw text between tags

The grammar treats {{ and }} as delimiters. Text outside tags is raw content. The DFA handles {{{ vs {{ disambiguation via maximal munch.

func NewGrammar ¶

func NewGrammar(name string) *Grammar

NewGrammar creates a new grammar with the given name.

func ParseGrammarFile ¶

func ParseGrammarFile(source string) (*Grammar, error)

ParseGrammarFile parses a declarative .grammar file into a Grammar IR.

Syntax:

grammar <name>

extras = [ /\s/ ]
word = <rule_name>
supertypes = [ <rule_name>, ... ]
conflicts = [ [<rule>, <rule>], ... ]

rule <name> = <expr>

Expressions:

"string"         string literal
/pattern/        regex pattern
<name>           symbol reference
seq(a, b, ...)   sequence
choice(a, b, ..) alternation
repeat(a)        zero or more
repeat1(a)       one or more
optional(a)      optional
token(a)         token boundary
field("name", a) field annotation
prec(n, a)       precedence
prec.left(n, a)  left-associative precedence
prec.right(n, a) right-associative precedence

func SwiftABIManglingGrammar ¶ added in v0.15.2

func SwiftABIManglingGrammar() *Grammar

SwiftABIManglingGrammar returns a conservative grammar for Swift ABI mangled names. It intentionally models ABI symbol text, not Swift source syntax.

func SwiftGrammar ¶ added in v0.16.0

func SwiftGrammar() *Grammar

SwiftGrammar returns the swift grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func TSGrammar ¶ added in v0.16.0

func TSGrammar() *Grammar

TSGrammar returns the TypeScript grammar.

func TSXGrammar ¶ added in v0.16.0

func TSXGrammar() *Grammar

TSXGrammar returns the tsx grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func TsxGrammar ¶ added in v0.16.0

func TsxGrammar() *Grammar

TsxGrammar is kept for consistency with grammars.TsxLanguage.

func TypeScriptGrammar ¶ added in v0.16.0

func TypeScriptGrammar() *Grammar

TypeScriptGrammar returns the typescript grammar. Code generated by EmitGrammarGo. DO NOT EDIT.

func TypescriptGrammar ¶ added in v0.16.0

func TypescriptGrammar() *Grammar

TypescriptGrammar is kept for consistency with grammars.TypescriptLanguage.

func (*Grammar) Define ¶

func (g *Grammar) Define(name string, rule *Rule)

Define adds a rule to the grammar. The first rule defined is the start rule.

func (*Grammar) SetConflicts ¶

func (g *Grammar) SetConflicts(conflicts ...[]string)

SetConflicts declares grammar conflicts for GLR.

func (*Grammar) SetExternals ¶

func (g *Grammar) SetExternals(rules ...*Rule)

SetExternals declares external scanner tokens.

func (*Grammar) SetExtras ¶

func (g *Grammar) SetExtras(rules ...*Rule)

SetExtras sets the extra rules (e.g. whitespace, comments).

func (*Grammar) SetInline ¶

func (g *Grammar) SetInline(names ...string)

SetInline marks rules to be inlined.

func (*Grammar) SetSupertypes ¶

func (g *Grammar) SetSupertypes(names ...string)

SetSupertypes declares supertype rules.

func (*Grammar) SetWord ¶

func (g *Grammar) SetWord(name string)

SetWord sets the word token for keyword extraction.

func (*Grammar) Test ¶

func (g *Grammar) Test(name, input, expected string)

Test adds an embedded test case. Input is parsed and the resulting tree is compared against the expected S-expression. If expected is empty, the test only checks that no ERROR nodes appear.

func (*Grammar) TestError ¶

func (g *Grammar) TestError(name, input string)

TestError adds an embedded test case that expects parse errors.

type GrammarDiff ¶

type GrammarDiff struct {
	AddedRules        []string
	RemovedRules      []string
	ModifiedRules     []string // rules present in both but with different definitions
	ExtrasChanged     bool
	ConflictsChanged  bool
	ExternalsChanged  bool
	WordChanged       bool
	SupertypesChanged bool
}

GrammarDiff describes the differences between two grammar versions.

func DiffGrammars ¶

func DiffGrammars(old, new *Grammar) *GrammarDiff

DiffGrammars compares two grammar versions and returns a diff.

func (*GrammarDiff) HasChanges ¶

func (d *GrammarDiff) HasChanges() bool

HasChanges returns true if any differences were found.

func (*GrammarDiff) String ¶

func (d *GrammarDiff) String() string

String returns a human-readable summary of the diff.

type LRTables ¶

type LRTables struct {
	// ActionTable[state][symbol] = list of actions (multiple = conflict/GLR)
	ActionTable          map[int]map[int][]lrAction
	GotoTable            map[int]map[int]int // [state][nonterminal] → target state
	StateCount           int
	ExtraChainStateStart int // first synthetic nonterminal-extra state, or -1 if none
}

LRTables holds the generated parse tables.

type NormalizedGrammar ¶

type NormalizedGrammar struct {
	Symbols       []SymbolInfo
	Productions   []Production
	Terminals     []TerminalPattern
	ExtraSymbols  []int    // symbol indices of extras
	FieldNames    []string // index 0 is always ""
	Conflicts     [][]int  // symbol index groups
	Supertypes    []int    // symbol indices
	StartSymbol   int
	AugmentProdID int // production index for S' → S

	// Keyword support (populated when Grammar.Word is set).
	KeywordSymbols []int             // symbol IDs that are keywords
	WordSymbolID   int               // word token symbol ID (e.g., identifier)
	KeywordEntries []TerminalPattern // keyword patterns for keyword DFA
	// ReservedWordSets stores token symbol IDs for each imported reserved word
	// set. The first set is the global set from grammar.json. Current
	// generation derives per-state subsets from that global set.
	ReservedWordSets [][]int

	// External scanner support (populated when Grammar.Externals is set).
	ExternalSymbols []int // external token index → symbol ID

	ExactPrefixStates int

	// PrecedenceOrder stores the symbol-level precedence ordering from the
	// grammar's precedences table. Maps a rule name to its numeric position
	// (higher = higher priority) and whether it's a SYMBOL or STRING entry.
	// Used during conflict resolution to compare a reduce production's LHS
	// against the named precedence of a competing shift action.
	PrecedenceOrder *precOrderTable

	PreserveKeywordIdentifierConflicts         bool
	SuppressEquivalentExternalReduceLookaheads bool
	ExternalReduceFollowLookaheads             map[string]bool
	// contains filtered or unexported fields
}

NormalizedGrammar is the output of the normalize step.

func Normalize ¶

func Normalize(g *Grammar) (*NormalizedGrammar, error)

Normalize transforms a Grammar into a NormalizedGrammar.

func (*NormalizedGrammar) TokenCount ¶

func (ng *NormalizedGrammar) TokenCount() int

TokenCount returns the number of terminal symbols (including symbol 0 = end).

type PrecEntry ¶ added in v0.10.0

type PrecEntry struct {
	IsSymbol bool   // true for SYMBOL entries, false for STRING entries
	Name     string // prec name or rule name
}

PrecEntry is an entry in a precedences level. It is either a named precedence (STRING type, Name is the prec name) or a rule reference (SYMBOL type, Name is the rule name).

type Production ¶

type Production struct {
	LHS  int   // symbol index
	RHS  []int // symbol indices
	Prec int
	// HasExplicitPrec distinguishes an explicit compile-time precedence wrapper
	// (including prec(0, ...)) from the default implicit zero precedence.
	HasExplicitPrec bool
	Assoc           Assoc
	DynPrec         int
	ProductionID    int
	Fields          []FieldAssign // per-RHS-position field assignments
	Aliases         []AliasInfo   // per-RHS-position alias info
	IsExtra         bool          // true if this production belongs to a nonterminal extra
}

Production is a single LHS → RHS production with metadata.

type ReservedWordSet ¶ added in v0.10.2

type ReservedWordSet struct {
	Name  string
	Rules []*Rule
}

ReservedWordSet is an ordered named set of reserved word token rules. The first set is the global set from grammar.json's top-level `reserved` object. Additional sets are preserved for future context-specific support.

type Rule ¶

type Rule struct {
	Kind     RuleKind
	Value    string  // literal/pattern/symbol/field name
	Children []*Rule // sub-rules
	Prec     int     // precedence value
	Named    bool    // for alias: whether the alias is a named node
}

Rule is a node in the grammar rule tree.

func Alias ¶

func Alias(rule *Rule, name string, named bool) *Rule

Alias aliases a rule to a different name.

func Blank ¶

func Blank() *Rule

Blank creates an epsilon (empty) rule.

func Braces ¶

func Braces(rule *Rule) *Rule

Braces wraps a rule in curly braces.

func Brackets ¶

func Brackets(rule *Rule) *Rule

Brackets wraps a rule in square brackets.

func Choice ¶

func Choice(rules ...*Rule) *Rule

Choice creates an alternation of rules.

func CommaSep ¶

func CommaSep(rule *Rule) *Rule

CommaSep creates an optional comma-separated list.

func CommaSep1 ¶

func CommaSep1(rule *Rule) *Rule

CommaSep1 creates a non-empty comma-separated list.

func Field ¶

func Field(name string, rule *Rule) *Rule

Field annotates a rule with a field name.

func ImmToken ¶

func ImmToken(rule *Rule) *Rule

ImmToken creates an immediate token (no preceding whitespace).

func Optional ¶

func Optional(rule *Rule) *Rule

Optional creates an optional rule.

func Parens ¶

func Parens(rule *Rule) *Rule

Parens wraps a rule in parentheses.

func Pat ¶

func Pat(pattern string) *Rule

Pat creates a regex pattern rule.

func Prec ¶

func Prec(n int, rule *Rule) *Rule

Prec sets precedence on a rule.

func PrecDynamic ¶

func PrecDynamic(n int, rule *Rule) *Rule

PrecDynamic sets dynamic precedence on a rule.

func PrecLeft ¶

func PrecLeft(n int, rule *Rule) *Rule

PrecLeft sets left-associative precedence on a rule.

func PrecRight ¶

func PrecRight(n int, rule *Rule) *Rule

PrecRight sets right-associative precedence on a rule.

func Repeat ¶

func Repeat(rule *Rule) *Rule

Repeat creates a zero-or-more repetition.

func Repeat1 ¶

func Repeat1(rule *Rule) *Rule

Repeat1 creates a one-or-more repetition.

func SepBy ¶

func SepBy(sep, rule *Rule) *Rule

SepBy creates an optional list separated by the given separator.

func SepBy1 ¶

func SepBy1(sep, rule *Rule) *Rule

SepBy1 creates a non-empty list separated by the given separator.

func Seq ¶

func Seq(rules ...*Rule) *Rule

Seq creates a sequence of rules.

func Str ¶

func Str(s string) *Rule

Str creates a string literal rule.

func Surround ¶

func Surround(open, rule, close *Rule) *Rule

Surround wraps a rule with open and close delimiters.

func Sym ¶

func Sym(name string) *Rule

Sym creates a symbol reference rule.

func Token ¶

func Token(rule *Rule) *Rule

Token creates a token boundary (content is a single lexer token).

type RuleKind ¶

type RuleKind int

RuleKind identifies the type of a grammar rule node.

const (
	RuleString      RuleKind = iota // literal string: "{"
	RulePattern                     // regex pattern: /[0-9]+/
	RuleSymbol                      // symbol reference: $.object
	RuleSeq                         // sequence: seq(a, b, c)
	RuleChoice                      // alternation: choice(a, b)
	RuleRepeat                      // zero-or-more: repeat(a)
	RuleRepeat1                     // one-or-more: repeat1(a)
	RuleOptional                    // optional: optional(a)
	RuleToken                       // token boundary: token(a)
	RuleImmToken                    // immediate token: token.immediate(a)
	RuleField                       // field annotation: field("name", a)
	RulePrec                        // precedence: prec(n, a)
	RulePrecLeft                    // left-associative: prec.left(n, a)
	RulePrecRight                   // right-associative: prec.right(n, a)
	RulePrecDynamic                 // dynamic precedence: prec.dynamic(n, a)
	RuleBlank                       // epsilon / empty
	RuleAlias                       // alias: alias(a, "name")
)

type SymbolInfo ¶

type SymbolInfo struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
	Kind      SymbolKind
	IsExtra   bool
	Immediate bool // token.immediate — no preceding whitespace skip
}

SymbolInfo describes a grammar symbol.

type SymbolKind ¶

type SymbolKind int

SymbolKind classifies a grammar symbol.

const (
	SymbolTerminal    SymbolKind = iota // anonymous terminal like "{"
	SymbolNamedToken                    // named terminal like number, string_content
	SymbolExternal                      // external scanner token
	SymbolNonterminal                   // nonterminal rule
)

type TerminalPattern ¶

type TerminalPattern struct {
	SymbolID  int
	Rule      *Rule // the flattened rule tree for NFA construction
	Priority  int   // lower = higher priority (wins on tie)
	Immediate bool  // token.immediate
}

TerminalPattern describes a terminal symbol's match pattern for DFA generation.

type TestCase ¶

type TestCase struct {
	Name        string // test name
	Input       string // input to parse
	Expected    string // expected S-expression (empty = just check no errors)
	ExpectError bool   // if true, expect ERROR nodes in the tree
}

TestCase is an embedded grammar test case.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

grammargen

Authoring Commands

Go DSL

.grammar Files

Validation Loop

Reading the Package

Troubleshooting

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func AddConflict ¶

func AppendChoice ¶

func EmitC ¶

func EmitGrammarGo ¶

func ExportGrammarJSON ¶

func Generate ¶

func GenerateC ¶

func GenerateHighlightQueries ¶

func GenerateHighlightQuery ¶

func GenerateLanguage ¶

func GenerateLanguageAndBlob ¶ added in v0.10.0

func GenerateLanguageAndBlobWithContext ¶ added in v0.10.0

func GenerateLanguageWithContext ¶

func RunTests ¶

func Validate ¶

Types ¶

type AliasInfo ¶

type Assoc ¶

type ConflictDiag ¶

func (*ConflictDiag) String ¶

type ConflictKind ¶

type FieldAssign ¶

type GenerateReport ¶

func GenerateWithReport ¶

type Grammar ¶

func AliasSuperGrammar ¶

func CalcGrammar ¶

func ExtScannerGrammar ¶

func ExtendGrammar ¶

func FortranGrammar ¶ added in v0.16.0

func GLRGrammar ¶

func GoGrammar ¶

func INIGrammar ¶

func ImportGrammarJS ¶

func ImportGrammarJSON ¶

func JSGrammar ¶ added in v0.16.0

func JSONGrammar ¶

func JSXGrammar ¶ added in v0.16.0

func JavaScriptGrammar ¶ added in v0.16.0

func JavascriptGrammar ¶ added in v0.16.0

func KeywordGrammar ¶

func KotlinGrammar ¶ added in v0.16.0

func LoxGrammar ¶

func MarkdownGrammar ¶ added in v0.20.0

func MustacheGrammar ¶

func NewGrammar ¶

func ParseGrammarFile ¶

func SwiftABIManglingGrammar ¶ added in v0.15.2

func SwiftGrammar ¶ added in v0.16.0

func TSGrammar ¶ added in v0.16.0

func TSXGrammar ¶ added in v0.16.0

func TsxGrammar ¶ added in v0.16.0

func TypeScriptGrammar ¶ added in v0.16.0

func TypescriptGrammar ¶ added in v0.16.0

func (*Grammar) Define ¶

func (*Grammar) SetConflicts ¶

func (*Grammar) SetExternals ¶

func (*Grammar) SetExtras ¶

func (*Grammar) SetInline ¶

func (*Grammar) SetSupertypes ¶

func (*Grammar) SetWord ¶

func (*Grammar) Test ¶

func (*Grammar) TestError ¶

type GrammarDiff ¶

func DiffGrammars ¶

func (*GrammarDiff) HasChanges ¶

`.grammar` Files