gotreesitter

package module
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 24, 2026 License: MIT Imports: 13 Imported by: 0

README

gotreesitter

Pure-Go tree-sitter runtime — no CGo, no C toolchain, WASM-ready.

go get github.com/odvcencio/gotreesitter

Implements the same parse-table format tree-sitter uses, so existing grammars work without recompilation. Outperforms the CGo binding on every workload — incremental edits (the dominant operation in editors and language servers) are 90x faster than the C implementation.

Why Not CGo?

Every existing Go tree-sitter binding requires CGo. That means:

  • Cross-compilation breaks (GOOS=wasip1, GOARCH=arm64 from Linux, Windows without MSYS2)
  • CI pipelines need a C toolchain in every build image
  • go install fails for end users without gcc
  • Race detector, fuzzing, and coverage tools work poorly across the CGo boundary

gotreesitter is pure Go. go get and build — on any target, any platform.

Quick Start

import (
    "fmt"

    "github.com/odvcencio/gotreesitter"
    "github.com/odvcencio/gotreesitter/grammars"
)

func main() {
    src := []byte(`package main

func main() {}
`)

    lang := grammars.GoLanguage()
    parser := gotreesitter.NewParser(lang)

    tree := parser.Parse(src)
    fmt.Println(tree.RootNode())

    // After editing source, reparse incrementally:
    //   tree.Edit(edit)
    //   tree2 := parser.ParseIncremental(newSrc, tree)
}
Queries

Tree-sitter's S-expression query language is supported, including predicates and cursor-based streaming. See Known Limitations for current caveats.

q, _ := gotreesitter.NewQuery(`(function_declaration name: (identifier) @fn)`, lang)
cursor := q.Exec(tree.RootNode(), lang, src)

for {
    match, ok := cursor.NextMatch()
    if !ok {
        break
    }
    for _, cap := range match.Captures {
        fmt.Println(cap.Node.Text(src))
    }
}
Incremental Editing

After the initial parse, re-parse only the changed region — unchanged subtrees are reused automatically.

// Initial parse
tree := parser.Parse(src)

// User types "x" at byte offset 42
src = append(src[:42], append([]byte("x"), src[42:]...)...)

tree.Edit(gotreesitter.InputEdit{
    StartByte:   42,
    OldEndByte:  42,
    NewEndByte:  43,
    StartPoint:  gotreesitter.Point{Row: 3, Column: 10},
    OldEndPoint: gotreesitter.Point{Row: 3, Column: 10},
    NewEndPoint: gotreesitter.Point{Row: 3, Column: 11},
})

// Incremental reparse — ~1.38 μs vs 124 μs for the CGo binding (90x faster)
tree2 := parser.ParseIncremental(src, tree)

Tip: Use grammars.DetectLanguage("main.go") to pick the right grammar by filename — useful for editor integration.

Syntax Highlighting
hl, _ := gotreesitter.NewHighlighter(lang, highlightQuery)
ranges := hl.Highlight(src)

for _, r := range ranges {
    fmt.Printf("%s: %q\n", r.Capture, src[r.StartByte:r.EndByte])
}

Note: Text predicates (#eq?, #match?, #any-of?, #not-eq?) require source []byte to evaluate. Passing nil disables predicate checks.

Symbol Tagging

Extract definitions and references from source code:

entry := grammars.DetectLanguage("main.go")
lang := entry.Language()

tagger, _ := gotreesitter.NewTagger(lang, entry.TagsQuery)
tags := tagger.Tag(src)

for _, tag := range tags {
    fmt.Printf("%s %s at %d:%d\n", tag.Kind, tag.Name,
        tag.NameRange.StartPoint.Row, tag.NameRange.StartPoint.Column)
}
Parse Quality

Each LangEntry exposes a Quality field indicating how trustworthy the parse output is:

Quality Meaning
full Token source or DFA with external scanner — full fidelity
partial DFA-partial — missing external scanner, tree may have silent gaps
none Cannot parse
entries := grammars.AllLanguages()
for _, e := range entries {
    fmt.Printf("%s: %s\n", e.Name, e.Quality)
}

Benchmarks

Measured against go-tree-sitter (the standard CGo binding), parsing a Go source file with 500 function definitions.

goos: linux / goarch: amd64 / cpu: Intel(R) Core(TM) Ultra 9 285

# pure-Go parser benchmarks (root module)
go test -run '^$' -bench 'BenchmarkGoParse' -benchmem -count=3

# C baseline benchmarks (cgo_harness module)
cd cgo_harness
go test . -run '^$' -tags treesitter_c_bench -bench 'BenchmarkCTreeSitterGoParse' -benchmem -count=3
Benchmark ns/op B/op allocs/op
BenchmarkCTreeSitterGoParseFull 2,058,000 600 6
BenchmarkCTreeSitterGoParseIncrementalSingleByteEdit 124,100 648 7
BenchmarkCTreeSitterGoParseIncrementalNoEdit 121,100 600 6
BenchmarkGoParseFull 1,330,000 10,842 2,495
BenchmarkGoParseIncrementalSingleByteEdit 1,381 361 9
BenchmarkGoParseIncrementalNoEdit 8.63 0 0

Summary:

Workload gotreesitter CGo binding Ratio
Full parse 1,330 μs 2,058 μs ~1.5x faster
Incremental (single-byte edit) 1.38 μs 124 μs ~90x faster
Incremental (no-op reparse) 8.6 ns 121 μs ~14,000x faster

The incremental hot path reuses subtrees aggressively — a single-byte edit reparses in microseconds while the CGo binding pays full C-runtime and call overhead. The no-edit fast path exits on a single nil-check: zero allocations, single-digit nanoseconds.


Supported Languages

205 grammars ship in the registry. Run go run ./cmd/parity_report for live per-language status.

Current summary:

  • 204 clean — parse without errors
  • 1 degradednorg (requires external scanner with 122 tokens, not yet implemented)
  • 0 unsupported

Quality breakdown:

  • 116 full — token source or DFA with complete external scanner
  • 89 partial — DFA-partial (missing external scanner, tree may have silent gaps)

Backend breakdown:

  • 92 dfa — lexer fully generated from grammar tables
  • 89 dfa-partial — generated DFA without external scanner
  • 24 token_source — hand-written or generic pure-Go lexer bridge

12 languages have hand-written Go external scanners: python, elixir, comment, doxygen, foam, nginx, nushell, r, xml, yuck, purescript, typst.

Full language list (205): ada, agda, angular, apex, arduino, asm, astro, authzed, awk, bash, bass, beancount, bibtex, bicep, bitbake, blade, brightscript, c, c_sharp, caddy, cairo, capnp, chatito, circom, clojure, cmake, cobol, comment, commonlisp, cooklang, corn, cpon, cpp, crystal, css, csv, cuda, cue, cylc, d, dart, desktop, devicetree, dhall, diff, disassembly, djot, dockerfile, dot, doxygen, dtd, earthfile, ebnf, editorconfig, eds, eex, elisp, elixir, elm, elsa, embedded_template, enforce, erlang, facility, faust, fennel, fidl, firrtl, fish, foam, forth, fortran, fsharp, gdscript, git_config, git_rebase, gitattributes, gitcommit, gitignore, gleam, glsl, gn, go, godot_resource, gomod, graphql, groovy, hack, hare, haskell, haxe, hcl, heex, hlsl, html, http, hurl, hyprlang, ini, janet, java, javascript, jinja2, jq, jsdoc, json, json5, jsonnet, julia, just, kconfig, kdl, kotlin, ledger, less, linkerscript, liquid, llvm, lua, luau, make, markdown, markdown_inline, matlab, mermaid, meson, mojo, move, nginx, nickel, nim, ninja, nix, norg, nushell, objc, ocaml, odin, org, pascal, pem, perl, php, pkl, powershell, prisma, prolog, promql, properties, proto, pug, puppet, purescript, python, ql, r, racket, regex, rego, requirements, rescript, robot, ron, rst, ruby, rust, scala, scheme, scss, smithy, solidity, sparql, sql, squirrel, ssh_config, starlark, svelte, swift, tablegen, tcl, teal, templ, textproto, thrift, tlaplus, tmux, todotxt, toml, tsx, turtle, twig, typescript, typst, uxntal, v, verilog, vhdl, vimdoc, vue, wgsl, wolfram, xml, yaml, yuck, zig


Query API

Feature Status
Compile + execute (NewQuery, Execute, ExecuteNode) supported
Cursor streaming (Exec, NextMatch, NextCapture) supported
Structural quantifiers (?, *, +) supported
Alternation ([...]) supported
Field matching (name: (identifier)) supported
#eq? / #not-eq? supported
#match? / #not-match? supported
#any-of? / #not-any-of? supported
#lua-match? supported
#has-ancestor? / #not-has-ancestor? supported
#not-has-parent? supported
#is? / #is-not? supported
#set! / #offset! directives parsed and accepted

Known Limitations

Query compiler gaps

As of February 23, 2026, all shipped highlight and tags queries compile in this repo (156/156 non-empty HighlightQuery entries, 69/69 non-empty TagsQuery entries).

No known query-syntax gaps currently block shipped highlight or tags queries.

DFA-partial languages

89 languages require an external scanner that has not been ported to Go. These parse successfully using the DFA lexer alone, but tokens that require the external scanner are silently skipped. The tree structure is valid but may have gaps. Check entry.Quality to distinguish full from partial.


Adding a Language

1. Add the grammar to grammars/languages.manifest.

2. Generate bindings:

go run ./cmd/ts2go -manifest grammars/languages.manifest -outdir ./grammars -package grammars -compact=true

This regenerates grammars/embedded_grammars_gen.go, grammars/grammar_blobs/*.bin, and language register stubs.

3. Add smoke samples to cmd/parity_report/main.go and grammars/parse_support_test.go.

4. Verify:

go run ./cmd/parity_report
go test ./grammars/...

Architecture

gotreesitter reimplements the tree-sitter runtime in pure Go:

  • Parser — table-driven LR(1) with GLR support for ambiguous grammars
  • Incremental reuse — cursor-based subtree reuse; unchanged regions skip reparsing entirely
  • Arena allocator — slab-based node allocation with ref counting, minimizing GC pressure
  • DFA lexer — generated from grammar tables via ts2go, with hand-written bridges where needed
  • External scanner VM — bytecode interpreter for language-specific scanning (Python indentation, etc.)
  • Query engine — S-expression pattern matching with predicate evaluation and streaming cursors
  • Highlighter — query-based syntax highlighting with incremental support
  • Tagger — symbol definition/reference extraction using tags queries

Grammar tables are extracted from upstream tree-sitter parser.c files by the ts2go tool, serialized into compressed binary blobs, and lazy-loaded on first language use. No C code runs at parse time.

To avoid embedding blobs into the binary, build with -tags grammar_blobs_external and set GOTREESITTER_GRAMMAR_BLOB_DIR to a directory containing *.bin grammar blobs. External blob mode uses mmap on Unix by default (GOTREESITTER_GRAMMAR_BLOB_MMAP=false to disable).

To ship a smaller embedded binary with a curated language set, build with -tags grammar_set_core (core set includes common languages like c, go, java, javascript, python, rust, typescript, etc.).

To restrict registered languages at runtime (embedded or external), set:

GOTREESITTER_GRAMMAR_SET=go,json,python

For long-lived processes, grammar cache memory is tunable:

// Keep only the 8 most recently used decoded grammars in cache.
grammars.SetEmbeddedLanguageCacheLimit(8)

// Drop one language blob from cache (e.g. "rust.bin").
grammars.UnloadEmbeddedLanguage("rust.bin")

// Drop all decoded grammars from cache.
grammars.PurgeEmbeddedLanguageCache()

You can also set GOTREESITTER_GRAMMAR_CACHE_LIMIT at process start to apply a cache cap without code changes. Set it to 0 only when you explicitly want no retention (each grammar access will decode again).

Idle eviction can be enabled with env vars:

GOTREESITTER_GRAMMAR_IDLE_TTL=5m
GOTREESITTER_GRAMMAR_IDLE_SWEEP=30s

Loader compaction/interning is enabled by default and tunable via:

GOTREESITTER_GRAMMAR_COMPACT=true
GOTREESITTER_GRAMMAR_STRING_INTERN_LIMIT=200000
GOTREESITTER_GRAMMAR_TRANSITION_INTERN_LIMIT=20000

Testing

The test suite includes:

  • Smoke tests — all 205 grammars parse a sample without crashing or producing ERROR nodes
  • Correctness snapshots — golden S-expression tests for 20 core languages catch parser and grammar regressions
  • Highlight validation — end-to-end test that compiled highlight queries produce highlight ranges
  • Query tests — pattern matching, predicates, cursors, field-based matching
  • Parser tests — incremental reparsing, error recovery, GLR ambiguity resolution
  • FuzzingFuzzGoParseDoesNotPanic for parser robustness
go test ./... -race -count=1

Roadmap

Current: v0.1.0 — 205 grammars, stable parser, incremental reparsing, query engine, highlighting, tagging.

Next:

  • Query engine parity hardening — field-negation semantics, metadata directive behavior, and additional edge-case parity with upstream tree-sitter query execution
  • More hand-written external scanners for high-value dfa-partial languages
  • Parse() (*Tree, error) — return errors instead of silent nil trees
  • Automated parity testing against the C tree-sitter output
  • Fuzzing expansion to cover more languages and the query engine

License

MIT

Documentation

Overview

Package gotreesitter implements a pure Go tree-sitter runtime.

This file defines the core data structures that mirror tree-sitter's TSLanguage C struct and related types. They form the foundation on which the lexer, parser, query engine, and syntax tree are built.

Index

Constants

View Source
const (
	// RuntimeLanguageVersion is the maximum tree-sitter language version this
	// runtime is known to support.
	RuntimeLanguageVersion uint32 = 14
	// MinCompatibleLanguageVersion is the minimum accepted language version.
	MinCompatibleLanguageVersion uint32 = 13
)

Variables

View Source
var DebugDFA bool

DebugDFA enables trace logging for DFA token production.

View Source
var ErrNoLanguage = errors.New("parser has no language configured")

ErrNoLanguage is returned when a Parser has no language configured.

Functions

func RunExternalScanner

func RunExternalScanner(lang *Language, payload any, lexer *ExternalLexer, validSymbols []bool) bool

RunExternalScanner invokes the language's external scanner if present. Returns true if the scanner produced a token, false otherwise.

func Walk

func Walk(node *Node, fn func(node *Node, depth int) WalkAction)

Walk performs a depth-first traversal of the syntax tree rooted at node. The callback receives each node and its depth (0 for the starting node). Return WalkSkipChildren to skip a node's children, or WalkStop to end early.

Types

type BoundTree

type BoundTree struct {
	// contains filtered or unexported fields
}

BoundTree pairs a Tree with its Language and source, eliminating the need to pass *Language and []byte to every node method call.

func Bind

func Bind(tree *Tree) *BoundTree

Bind creates a BoundTree from a Tree. The Tree must have been created with a Language (via NewTree or a Parser). Returns a BoundTree that delegates to the underlying Tree's Language and Source.

func (*BoundTree) ChildByField

func (bt *BoundTree) ChildByField(n *Node, fieldName string) *Node

ChildByField returns the first child assigned to the given field name.

func (*BoundTree) Language

func (bt *BoundTree) Language() *Language

Language returns the tree's language.

func (*BoundTree) NodeText

func (bt *BoundTree) NodeText(n *Node) string

NodeText returns the source text covered by the node.

func (*BoundTree) NodeType

func (bt *BoundTree) NodeType(n *Node) string

NodeType returns the node's type name, resolved via the bound language.

func (*BoundTree) Release

func (bt *BoundTree) Release()

Release releases the underlying tree's arena memory.

func (*BoundTree) RootNode

func (bt *BoundTree) RootNode() *Node

RootNode returns the tree's root node.

func (*BoundTree) Source

func (bt *BoundTree) Source() []byte

Source returns the tree's source bytes.

type ByteSkippableTokenSource

type ByteSkippableTokenSource interface {
	TokenSource
	SkipToByte(offset uint32) Token
}

ByteSkippableTokenSource can jump to a byte offset and return the first token at or after that position.

type ExternalLexer

type ExternalLexer struct {
	// contains filtered or unexported fields
}

ExternalLexer is the scanner-facing lexer API used by external scanners. It mirrors the essential tree-sitter scanner API: lookahead, advance, mark_end, and result_symbol.

func (*ExternalLexer) Advance

func (l *ExternalLexer) Advance(skip bool)

Advance consumes one rune. When skip is true, consumed bytes are excluded from the token span (scanner whitespace skipping behavior).

func (*ExternalLexer) GetColumn

func (l *ExternalLexer) GetColumn() uint32

GetColumn returns the current column (0-based) at the scanner cursor.

func (*ExternalLexer) Lookahead

func (l *ExternalLexer) Lookahead() rune

Lookahead returns the current rune or 0 at EOF.

func (*ExternalLexer) MarkEnd

func (l *ExternalLexer) MarkEnd()

MarkEnd marks the current scanner position as the token end.

func (*ExternalLexer) SetResultSymbol

func (l *ExternalLexer) SetResultSymbol(sym Symbol)

SetResultSymbol sets the token symbol to emit when Scan returns true.

type ExternalScanner

type ExternalScanner interface {
	Create() any
	Destroy(payload any)
	Serialize(payload any, buf []byte) int
	Deserialize(payload any, buf []byte)
	Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool
}

ExternalScanner is the interface for language-specific external scanners. Languages like Python and JavaScript need these for indent tracking, template literals, regex vs division, etc.

type ExternalScannerState

type ExternalScannerState struct {
	Data []byte
}

ExternalScannerState holds serialized state for an external scanner between incremental parse runs.

type ExternalVMInstr

type ExternalVMInstr struct {
	Op  ExternalVMOp
	A   int32
	B   int32
	Alt int32
}

ExternalVMInstr is one instruction in an external scanner VM program.

Operands:

  • A: primary operand (opcode-specific)
  • B: secondary operand (used by range checks)
  • Alt: alternate program counter when a condition fails

func VMAdvance

func VMAdvance(skip bool) ExternalVMInstr

func VMEmit

func VMEmit(sym Symbol) ExternalVMInstr

func VMFail

func VMFail() ExternalVMInstr

func VMIfRuneClass

func VMIfRuneClass(class ExternalVMRuneClass, alt int) ExternalVMInstr

func VMIfRuneEq

func VMIfRuneEq(r rune, alt int) ExternalVMInstr

func VMIfRuneInRange

func VMIfRuneInRange(start, end rune, alt int) ExternalVMInstr

func VMJump

func VMJump(target int) ExternalVMInstr

func VMMarkEnd

func VMMarkEnd() ExternalVMInstr

func VMRequireStateEq

func VMRequireStateEq(state uint32, alt int) ExternalVMInstr

func VMRequireValid

func VMRequireValid(validSymbolIndex, alt int) ExternalVMInstr

func VMSetState

func VMSetState(state uint32) ExternalVMInstr

type ExternalVMOp

type ExternalVMOp uint8

ExternalVMOp is an opcode for the native-Go external scanner VM.

const (
	ExternalVMOpFail ExternalVMOp = iota
	ExternalVMOpJump
	ExternalVMOpRequireValid
	ExternalVMOpRequireStateEq
	ExternalVMOpSetState
	ExternalVMOpIfRuneEq
	ExternalVMOpIfRuneInRange
	ExternalVMOpIfRuneClass
	ExternalVMOpAdvance
	ExternalVMOpMarkEnd
	ExternalVMOpEmit
)

type ExternalVMProgram

type ExternalVMProgram struct {
	Code     []ExternalVMInstr
	MaxSteps int // <=0 uses a safe default based on program size
}

ExternalVMProgram is a small bytecode program interpreted by ExternalVMScanner.

type ExternalVMRuneClass

type ExternalVMRuneClass uint8

ExternalVMRuneClass is a character class used by ExternalVMOpIfRuneClass.

const (
	ExternalVMRuneClassWhitespace ExternalVMRuneClass = iota
	ExternalVMRuneClassDigit
	ExternalVMRuneClassLetter
	ExternalVMRuneClassWord
	ExternalVMRuneClassNewline
)

type ExternalVMScanner

type ExternalVMScanner struct {
	// contains filtered or unexported fields
}

ExternalVMScanner executes an ExternalVMProgram and implements ExternalScanner.

func MustNewExternalVMScanner

func MustNewExternalVMScanner(program ExternalVMProgram) *ExternalVMScanner

MustNewExternalVMScanner is like NewExternalVMScanner but panics on error.

func NewExternalVMScanner

func NewExternalVMScanner(program ExternalVMProgram) (*ExternalVMScanner, error)

NewExternalVMScanner validates and constructs an ExternalVMScanner.

func (*ExternalVMScanner) Create

func (s *ExternalVMScanner) Create() any

Create allocates scanner payload (currently a single uint32 state slot).

func (*ExternalVMScanner) Deserialize

func (s *ExternalVMScanner) Deserialize(payload any, buf []byte)

Deserialize restores payload state from buf.

func (*ExternalVMScanner) Destroy

func (s *ExternalVMScanner) Destroy(payload any)

Destroy releases scanner payload resources.

func (*ExternalVMScanner) Scan

func (s *ExternalVMScanner) Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool

Scan executes the scanner program against the current lexer position.

func (*ExternalVMScanner) Serialize

func (s *ExternalVMScanner) Serialize(payload any, buf []byte) int

Serialize writes payload state into buf.

type FieldID

type FieldID uint16

FieldID is a named field index.

type FieldMapEntry

type FieldMapEntry struct {
	FieldID    FieldID
	ChildIndex uint8
	Inherited  bool
}

FieldMapEntry maps a child index to a field name.

type HighlightRange

type HighlightRange struct {
	StartByte uint32
	EndByte   uint32
	Capture   string // "keyword", "string", "function", etc.
}

HighlightRange represents a styled range of source code, mapping a byte span to a capture name from a highlight query. The editor maps capture names (e.g., "keyword", "string", "function") to FSS style classes.

type Highlighter

type Highlighter struct {
	// contains filtered or unexported fields
}

Highlighter is a high-level API that takes source code and returns styled ranges. It combines a Parser, a compiled Query, and a Language to provide a single Highlight() call for the editor.

func NewHighlighter

func NewHighlighter(lang *Language, highlightQuery string, opts ...HighlighterOption) (*Highlighter, error)

NewHighlighter creates a Highlighter for the given language and highlight query (in tree-sitter .scm format). Returns an error if the query fails to compile.

func (*Highlighter) Highlight

func (h *Highlighter) Highlight(source []byte) []HighlightRange

Highlight parses the source code and executes the highlight query, returning a slice of HighlightRange sorted by StartByte. When ranges overlap, inner (more specific) captures take priority over outer ones.

func (*Highlighter) HighlightIncremental

func (h *Highlighter) HighlightIncremental(source []byte, oldTree *Tree) ([]HighlightRange, *Tree)

HighlightIncremental re-highlights source after edits were applied to oldTree. Returns the new highlight ranges and the new parse tree (for use in subsequent incremental calls). Call oldTree.Edit() before calling this.

type HighlighterOption

type HighlighterOption func(*Highlighter)

HighlighterOption configures a Highlighter.

func WithTokenSourceFactory

func WithTokenSourceFactory(factory func(source []byte) TokenSource) HighlighterOption

WithTokenSourceFactory sets a factory function that creates a TokenSource for each Highlight call. This is needed for languages that use a custom lexer bridge (like Go, which uses go/scanner instead of a DFA lexer).

When set, Highlight() calls ParseWithTokenSource instead of Parse.

type InputEdit

type InputEdit struct {
	StartByte   uint32
	OldEndByte  uint32
	NewEndByte  uint32
	StartPoint  Point
	OldEndPoint Point
	NewEndPoint Point
}

InputEdit describes a single edit to the source text. It tells the parser what byte range was replaced and what the new range looks like, so the incremental parser can skip unchanged subtrees.

type Language

type Language struct {
	Name string

	// LanguageVersion is the tree-sitter language ABI version.
	// A value of 0 means "unknown/unspecified" and is treated as compatible.
	LanguageVersion uint32

	// Counts
	SymbolCount        uint32
	TokenCount         uint32
	ExternalTokenCount uint32
	StateCount         uint32
	LargeStateCount    uint32
	FieldCount         uint32
	ProductionIDCount  uint32

	// Symbol metadata
	SymbolNames    []string
	SymbolMetadata []SymbolMetadata
	FieldNames     []string // index 0 is ""

	// Parse tables
	ParseTable         [][]uint16 // dense: [state][symbol] -> action index
	SmallParseTable    []uint16   // compressed sparse table
	SmallParseTableMap []uint32   // state -> offset into SmallParseTable
	ParseActions       []ParseActionEntry

	// Lex tables
	LexModes            []LexMode
	LexStates           []LexState // main lexer DFA
	KeywordLexStates    []LexState // keyword lexer DFA (optional)
	KeywordCaptureToken Symbol

	// Field mapping
	FieldMapSlices  [][2]uint16 // [production_id] -> (index, length)
	FieldMapEntries []FieldMapEntry

	// Alias sequences
	AliasSequences [][]Symbol // [production_id][child_index] -> alias symbol

	// Primary state IDs (for table dedup)
	PrimaryStateIDs []StateID

	// External scanner (nil if not needed)
	ExternalScanner ExternalScanner
	ExternalSymbols []Symbol // external token index -> symbol

	// InitialState is the parser's start state. In tree-sitter grammars
	// this is always 1 (state 0 is reserved for error recovery). For
	// hand-built grammars it defaults to 0.
	InitialState StateID
	// contains filtered or unexported fields
}

Language holds all data needed to parse a specific language. It mirrors tree-sitter's TSLanguage C struct, translated into idiomatic Go types with slice-based tables instead of raw pointers.

func (*Language) CompatibleWithRuntime

func (l *Language) CompatibleWithRuntime() bool

CompatibleWithRuntime reports whether this language can be parsed by the current runtime version. Unspecified versions (0) are treated as compatible.

func (*Language) FieldByName

func (l *Language) FieldByName(name string) (FieldID, bool)

FieldByName returns the field ID for a given name, or (0, false) if not found. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) SymbolByName

func (l *Language) SymbolByName(name string) (Symbol, bool)

SymbolByName returns the symbol ID for a given name, or (0, false) if not found. The "_" wildcard returns (0, true) as a special case. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) TokenSymbolsByName

func (l *Language) TokenSymbolsByName(name string) []Symbol

TokenSymbolsByName returns all terminal token symbols whose display name matches name. The returned symbols are in grammar order.

func (*Language) Version

func (l *Language) Version() uint32

Version returns the tree-sitter language ABI version.

type LexMode

type LexMode struct {
	LexState         uint16
	ExternalLexState uint16
}

LexMode maps a parser state to its lexer configuration.

type LexState

type LexState struct {
	AcceptToken Symbol // 0 if this state doesn't accept
	Skip        bool   // true if accepted chars are whitespace
	Transitions []LexTransition
	Default     int // default next state (-1 if none)
	EOF         int // state on EOF (-1 if none)
}

LexState is one state in the table-driven lexer DFA.

type LexTransition

type LexTransition struct {
	Lo, Hi    rune // inclusive character range
	NextState int
	// Skip mirrors tree-sitter's SKIP(state): consume the matched rune
	// and continue lexing while resetting token start.
	Skip bool
}

LexTransition maps a character range to a next state.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes source text using a table-driven DFA.

func NewLexer

func NewLexer(states []LexState, source []byte) *Lexer

NewLexer creates a new Lexer that will tokenize source using the given DFA state table.

func (*Lexer) Next

func (l *Lexer) Next(startState uint16) Token

Next lexes the next token starting from the given lex state index. It automatically skips tokens from states where Skip=true (whitespace). Returns a zero-Symbol token with StartByte==EndByte at EOF.

type Node

type Node struct {
	// contains filtered or unexported fields
}

Node is a syntax tree node.

func NewLeafNode

func NewLeafNode(sym Symbol, named bool, startByte, endByte uint32, startPoint, endPoint Point) *Node

NewLeafNode creates a terminal/leaf node.

func NewParentNode

func NewParentNode(sym Symbol, named bool, children []*Node, fieldIDs []FieldID, productionID uint16) *Node

NewParentNode creates a non-terminal node with children. It sets parent pointers on all children and computes byte/point spans from the first and last children. If any child has an error, the parent is marked as having an error too.

func (*Node) Child

func (n *Node) Child(i int) *Node

Child returns the i-th child, or nil if i is out of range.

func (*Node) ChildByFieldName

func (n *Node) ChildByFieldName(name string, lang *Language) *Node

ChildByFieldName returns the first child assigned to the given field name, or nil if no child has that field. The Language is needed to resolve field names to IDs. Uses Language.FieldByName for O(1) lookup.

func (*Node) ChildCount

func (n *Node) ChildCount() int

ChildCount returns the number of children (both named and anonymous).

func (*Node) Children

func (n *Node) Children() []*Node

Children returns a slice of all children.

func (*Node) EndByte

func (n *Node) EndByte() uint32

EndByte returns the byte offset where this node ends (exclusive).

func (*Node) EndPoint

func (n *Node) EndPoint() Point

EndPoint returns the row/column position where this node ends.

func (*Node) HasError

func (n *Node) HasError() bool

HasError reports whether this node or any descendant contains a parse error.

func (*Node) IsMissing

func (n *Node) IsMissing() bool

IsMissing reports whether this node was inserted by error recovery.

func (*Node) IsNamed

func (n *Node) IsNamed() bool

IsNamed reports whether this is a named node (as opposed to anonymous syntax like punctuation).

func (*Node) NamedChild

func (n *Node) NamedChild(i int) *Node

NamedChild returns the i-th named child (skipping anonymous children), or nil if i is out of range.

func (*Node) NamedChildCount

func (n *Node) NamedChildCount() int

NamedChildCount returns the number of named children.

func (*Node) NextSibling

func (n *Node) NextSibling() *Node

NextSibling returns the next sibling node, or nil when this is the last child or has no parent.

func (*Node) Parent

func (n *Node) Parent() *Node

Parent returns this node's parent, or nil if it is the root.

func (*Node) ParseState

func (n *Node) ParseState() StateID

ParseState returns the parser state associated with this node.

func (*Node) PrevSibling

func (n *Node) PrevSibling() *Node

PrevSibling returns the previous sibling node, or nil when this is the first child or has no parent.

func (*Node) Range

func (n *Node) Range() Range

Range returns the full span of this node as a Range.

func (*Node) StartByte

func (n *Node) StartByte() uint32

StartByte returns the byte offset where this node begins.

func (*Node) StartPoint

func (n *Node) StartPoint() Point

StartPoint returns the row/column position where this node begins.

func (*Node) Symbol

func (n *Node) Symbol() Symbol

Symbol returns the node's grammar symbol.

func (*Node) Text

func (n *Node) Text(source []byte) string

Text returns the source text covered by this node.

func (*Node) Type

func (n *Node) Type(lang *Language) string

Type returns the node's type name from the language.

type ParseAction

type ParseAction struct {
	Type              ParseActionType
	State             StateID // target state (shift/recover)
	Symbol            Symbol  // reduced symbol (reduce)
	ChildCount        uint8   // children consumed (reduce)
	DynamicPrecedence int16   // precedence (reduce)
	ProductionID      uint16  // which production (reduce)
	Extra             bool    // is this an extra token (shift)
	Repetition        bool    // is this a repetition (shift)
}

ParseAction is a single parser action from the parse table.

type ParseActionEntry

type ParseActionEntry struct {
	Reusable bool
	Actions  []ParseAction
}

ParseActionEntry is a group of actions for a (state, symbol) pair.

type ParseActionType

type ParseActionType uint8

ParseActionType identifies the kind of parse action.

const (
	ParseActionShift ParseActionType = iota
	ParseActionReduce
	ParseActionAccept
	ParseActionRecover
)

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser reads parse tables from a Language and produces a syntax tree. It supports GLR parsing: when a (state, symbol) pair maps to multiple actions, the parser forks the stack and explores all alternatives in parallel, merging stacks that converge on the same state and picking the highest dynamic-precedence winner for ambiguities.

func NewParser

func NewParser(lang *Language) *Parser

NewParser creates a new Parser for the given language.

func (*Parser) Parse

func (p *Parser) Parse(source []byte) (*Tree, error)

Parse tokenizes and parses source using the built-in DFA lexer, returning a syntax tree. This works for hand-built grammars that provide LexStates. For real grammars that need a custom lexer, use ParseWithTokenSource. If the input is empty, it returns a tree with a nil root and no error.

func (*Parser) ParseIncremental

func (p *Parser) ParseIncremental(source []byte, oldTree *Tree) (*Tree, error)

ParseIncremental re-parses source after edits were applied to oldTree. It reuses unchanged subtrees from the old tree for better performance. Call oldTree.Edit() for each edit before calling this method.

func (*Parser) ParseIncrementalWithTokenSource

func (p *Parser) ParseIncrementalWithTokenSource(source []byte, oldTree *Tree, ts TokenSource) (*Tree, error)

ParseIncrementalWithTokenSource is like ParseIncremental but uses a custom token source.

func (*Parser) ParseWithTokenSource

func (p *Parser) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource parses source using a custom token source. This is used for real grammars where the lexer DFA isn't available as data tables (e.g., Go grammar using go/scanner as a bridge).

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern is a single top-level S-expression pattern in a query.

type Point

type Point struct {
	Row    uint32
	Column uint32
}

Point is a row/column position in source text.

type PointSkippableTokenSource

type PointSkippableTokenSource interface {
	ByteSkippableTokenSource
	SkipToByteWithPoint(offset uint32, pt Point) Token
}

PointSkippableTokenSource extends ByteSkippableTokenSource with a hint-based skip that avoids recomputing row/column from byte offset. During incremental parsing the reused node already carries its endpoint, so passing it directly eliminates the O(n) offset-to-point scan.

type Query

type Query struct {
	// contains filtered or unexported fields
}

Query holds compiled patterns parsed from a tree-sitter .scm query file. It can be executed against a syntax tree to find matching nodes and return captured names.

func NewQuery

func NewQuery(source string, lang *Language) (*Query, error)

NewQuery compiles query source (tree-sitter .scm format) against a language. It returns an error if the query syntax is invalid or references unknown node types or field names.

func (*Query) CaptureNames

func (q *Query) CaptureNames() []string

CaptureNames returns the list of unique capture names used in the query.

func (*Query) Exec

func (q *Query) Exec(node *Node, lang *Language, source []byte) *QueryCursor

Exec creates a streaming cursor over matches rooted at node.

func (*Query) Execute

func (q *Query) Execute(tree *Tree) []QueryMatch

Execute runs the query against a syntax tree and returns all matches.

func (*Query) ExecuteNode

func (q *Query) ExecuteNode(node *Node, lang *Language, source []byte) []QueryMatch

ExecuteNode runs the query starting from a specific node.

source is required for text predicates (like #eq? / #match?); pass the originating source bytes for correct predicate evaluation.

func (*Query) PatternCount

func (q *Query) PatternCount() int

PatternCount returns the number of patterns in the query.

type QueryCapture

type QueryCapture struct {
	Name string
	Node *Node
}

QueryCapture is a single captured node within a match.

type QueryCursor

type QueryCursor struct {
	// contains filtered or unexported fields
}

QueryCursor incrementally walks a node subtree and yields matches one by one. It is the streaming counterpart to Query.Execute and avoids materializing all matches up front.

func (*QueryCursor) NextCapture

func (c *QueryCursor) NextCapture() (QueryCapture, bool)

NextCapture yields captures in match order by draining NextMatch results. This is a practical first-pass ordering: captures are returned in each match's capture order, then by subsequent matches in DFS match order.

func (*QueryCursor) NextMatch

func (c *QueryCursor) NextMatch() (QueryMatch, bool)

NextMatch yields the next query match from the cursor.

type QueryMatch

type QueryMatch struct {
	PatternIndex int
	Captures     []QueryCapture
}

QueryMatch represents a successful pattern match with its captures.

type QueryPredicate

type QueryPredicate struct {
	// contains filtered or unexported fields
}

QueryPredicate is a post-match constraint attached to a pattern. Supported forms:

  • (#eq? @a @b)
  • (#eq? @a "literal")
  • (#not-eq? @a @b)
  • (#not-eq? @a "literal")
  • (#match? @a "regex")
  • (#not-match? @a "regex")
  • (#lua-match? @a "lua-pattern")
  • (#any-of? @a "v1" "v2" ...)
  • (#not-any-of? @a "v1" "v2" ...)
  • (#has-ancestor? @a type ...)
  • (#not-has-ancestor? @a type ...)
  • (#not-has-parent? @a type ...)
  • (#is? ...), (#is-not? ...)
  • (#set! key value), (#offset! @cap ...)

type QueryStep

type QueryStep struct {
	// contains filtered or unexported fields
}

QueryStep is one matching instruction within a pattern.

type Range

type Range struct {
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
}

Range is a span of source text.

type StateID

type StateID uint16

StateID is a parser state index.

type Symbol

type Symbol uint16

Symbol is a grammar symbol ID (terminal or nonterminal).

type SymbolMetadata

type SymbolMetadata struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
}

SymbolMetadata holds display information about a symbol.

type Tag

type Tag struct {
	Kind      string // e.g. "definition.function", "reference.call"
	Name      string // the captured symbol text
	Range     Range  // full span of the tagged node
	NameRange Range  // span of the @name capture
}

Tag represents a tagged symbol in source code, extracted by a Tagger. Kind follows tree-sitter convention: "definition.function", "reference.call", etc. Name is the captured symbol text (e.g., the function name).

type Tagger

type Tagger struct {
	// contains filtered or unexported fields
}

Tagger extracts symbol definitions and references from source code using tree-sitter tags queries. It is the tagging counterpart to Highlighter.

Tags queries use a convention where captures follow the pattern:

  • @name captures the symbol name (e.g., function identifier)
  • @definition.X or @reference.X captures the kind

Example query:

(function_declaration name: (identifier) @name) @definition.function
(call_expression function: (identifier) @name) @reference.call

func NewTagger

func NewTagger(lang *Language, tagsQuery string, opts ...TaggerOption) (*Tagger, error)

NewTagger creates a Tagger for the given language and tags query.

func (*Tagger) Tag

func (tg *Tagger) Tag(source []byte) []Tag

Tag parses source and returns all tags.

func (*Tagger) TagIncremental

func (tg *Tagger) TagIncremental(source []byte, oldTree *Tree) ([]Tag, *Tree)

TagIncremental re-tags source after edits to oldTree. Returns the tags and the new tree for subsequent incremental calls.

func (*Tagger) TagTree

func (tg *Tagger) TagTree(tree *Tree) []Tag

TagTree extracts tags from an already-parsed tree.

type TaggerOption

type TaggerOption func(*Tagger)

TaggerOption configures a Tagger.

func WithTaggerTokenSourceFactory

func WithTaggerTokenSourceFactory(factory func(source []byte) TokenSource) TaggerOption

WithTaggerTokenSourceFactory sets a factory function that creates a TokenSource for each Tag call.

type Token

type Token struct {
	Symbol     Symbol
	Text       string
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
}

Token is a lexed token with position info.

type TokenSource

type TokenSource interface {
	// Next returns the next token. It should skip whitespace and comments
	// as appropriate for the language. Returns a zero-Symbol token at EOF.
	Next() Token
}

TokenSource provides tokens to the parser. This interface abstracts over different lexer implementations: the built-in DFA lexer (for hand-built grammars) or custom bridges like GoTokenSource (for real grammars where we can't extract the C lexer DFA).

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

Tree holds a complete syntax tree along with its source text and language.

func NewTree

func NewTree(root *Node, source []byte, lang *Language) *Tree

NewTree creates a new Tree.

func (*Tree) Edit

func (t *Tree) Edit(edit InputEdit)

Edit records an edit on this tree. Call this before ParseIncremental to inform the parser which regions changed. The edit adjusts byte offsets and marks overlapping nodes as dirty so the incremental parser knows what to re-parse.

func (*Tree) Edits

func (t *Tree) Edits() []InputEdit

Edits returns the pending edits recorded on this tree.

func (*Tree) Language

func (t *Tree) Language() *Language

Language returns the language used to parse this tree.

func (*Tree) Release

func (t *Tree) Release()

Release decrements arena references held by this tree. After Release, the tree should be treated as invalid and not reused.

func (*Tree) RootNode

func (t *Tree) RootNode() *Node

RootNode returns the tree's root node.

func (*Tree) Source

func (t *Tree) Source() []byte

Source returns the original source text.

type WalkAction

type WalkAction int

WalkAction controls the tree walk behavior.

const (
	// WalkContinue continues the walk to children and siblings.
	WalkContinue WalkAction = iota
	// WalkSkipChildren skips the current node's children but continues to siblings.
	WalkSkipChildren
	// WalkStop terminates the walk entirely.
	WalkStop
)

Directories

Path Synopsis
cmd
parity_report command
ts2go command
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL