gotreesitter

package module
v0.8.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 17, 2026 License: MIT Imports: 17 Imported by: 0

README

gotreesitter

Pure-Go tree-sitter runtime. No CGo, no C toolchain. Cross-compiles to any GOOS/GOARCH target Go supports, including wasip1.

go get github.com/odvcencio/gotreesitter

gotreesitter loads the same parse-table format that tree-sitter's C runtime uses. Grammar tables are extracted from upstream parser.c files by ts2go, compressed into binary blobs, and deserialized on first use. 206 grammars ship in the registry.

Motivation

Every Go tree-sitter binding in the ecosystem depends on CGo:

  • Cross-compilation requires a C cross-toolchain per target. GOOS=wasip1, GOARCH=arm64 from a Linux host, or any Windows build without MSYS2/MinGW, will not link.
  • CI images must carry gcc and the grammar's C sources. go install fails for downstream users who don't have a C compiler.
  • The Go race detector, coverage instrumentation, and fuzzer cannot see across the CGo boundary. Bugs in the C runtime or in FFI marshaling are invisible to go test -race.

gotreesitter eliminates the C dependency entirely. The parser, lexer, query engine, incremental reparsing, arena allocator, external scanners, and tree cursor are all implemented in Go. The only input is the grammar blob.

Quick start

import (
    "fmt"

    "github.com/odvcencio/gotreesitter"
    "github.com/odvcencio/gotreesitter/grammars"
)

func main() {
    src := []byte(`package main

func main() {}
`)

    lang := grammars.GoLanguage()
    parser := gotreesitter.NewParser(lang)

    tree, _ := parser.Parse(src)
    fmt.Println(tree.RootNode())
}

grammars.DetectLanguage("main.go") resolves a filename to the appropriate LangEntry.

Queries
q, _ := gotreesitter.NewQuery(`(function_declaration name: (identifier) @fn)`, lang)
cursor := q.Exec(tree.RootNode(), lang, src)

for {
    match, ok := cursor.NextMatch()
    if !ok {
        break
    }
    for _, cap := range match.Captures {
        fmt.Println(cap.Node.Text(src))
    }
}

The query engine supports the full S-expression pattern language: structural quantifiers (?, *, +), alternation ([...]), field constraints, negated fields, anchor (!), and all standard predicates. See Query API.

Typed query codegen

Generate type-safe Go wrappers from .scm query files:

go run ./cmd/tsquery -input queries/go_functions.scm -lang go -output go_functions_query.go -package queries

Given a query like (function_declaration name: (identifier) @name body: (block) @body), tsquery generates:

type FunctionDeclarationMatch struct {
    Name *gotreesitter.Node
    Body *gotreesitter.Node
}

q, _ := queries.NewGoFunctionsQuery(lang)
cursor := q.Exec(tree.RootNode(), lang, src)
for {
    match, ok := cursor.Next()
    if !ok { break }
    fmt.Println(match.Name.Text(src))
}

Multi-pattern queries generate one struct per pattern with MatchPatternN conversion helpers.

Multi-language documents (injection parsing)

Parse documents with embedded languages (HTML+JS+CSS, Markdown+code fences, Vue/Svelte templates):

ip := gotreesitter.NewInjectionParser()
ip.RegisterLanguage("html", htmlLang)
ip.RegisterLanguage("javascript", jsLang)
ip.RegisterLanguage("css", cssLang)
ip.RegisterInjectionQuery("html", injectionQuery)

result, _ := ip.Parse(source, "html")

for _, inj := range result.Injections {
    fmt.Printf("%s: %d ranges\n", inj.Language, len(inj.Ranges))
    // inj.Tree is the child language's parse tree
}

Supports static (#set! injection.language "javascript") and dynamic (@injection.language capture) language detection, recursive nested injections, and incremental reparse with child tree reuse.

Source rewriting

Collect source-level edits and apply atomically, producing InputEdit records for incremental reparse:

rw := gotreesitter.NewRewriter(src)
rw.Replace(funcNameNode, []byte("newName"))
rw.InsertBefore(bodyNode, []byte("// added\n"))
rw.Delete(unusedNode)

newSrc, _ := rw.ApplyToTree(tree)
newTree, _ := parser.ParseIncremental(newSrc, tree)

Apply() returns both the new source bytes and the []InputEdit records. ApplyToTree() is a convenience that calls tree.Edit() for each edit and returns source ready for ParseIncremental.

Incremental reparsing
tree, _ := parser.Parse(src)

// User types "x" at byte offset 42
src = append(src[:42], append([]byte("x"), src[42:]...)...)

tree.Edit(gotreesitter.InputEdit{
    StartByte:   42,
    OldEndByte:  42,
    NewEndByte:  43,
    StartPoint:  gotreesitter.Point{Row: 3, Column: 10},
    OldEndPoint: gotreesitter.Point{Row: 3, Column: 10},
    NewEndPoint: gotreesitter.Point{Row: 3, Column: 11},
})

tree2, _ := parser.ParseIncremental(src, tree)

ParseIncremental walks the old tree's spine, identifies the edit region, and reuses unchanged subtrees by reference. Only the invalidated span is re-lexed and re-parsed. Both leaf and non-leaf subtrees are eligible for reuse; non-leaf reuse is driven by pre-goto state tracking on interior nodes, so the parser can skip entire subtrees without re-deriving their contents.

When no edit has occurred, ParseIncremental detects the nil-edit on a pointer check and returns in single-digit nanoseconds with zero allocations.

Tree cursor

TreeCursor maintains an explicit (node, childIndex) frame stack. Parent, child, and sibling movement are O(1) with zero allocations — sibling traversal indexes directly into the parent's children[] slice.

c := gotreesitter.NewTreeCursorFromTree(tree)

c.GotoFirstChild()
c.GotoChildByFieldName("body")

for ok := c.GotoFirstNamedChild(); ok; ok = c.GotoNextNamedSibling() {
    fmt.Printf("%s at %d\n", c.CurrentNodeType(), c.CurrentNode().StartByte())
}

idx := c.GotoFirstChildForByte(128)

Movement methods: GotoFirstChild, GotoLastChild, GotoNextSibling, GotoPrevSibling, GotoParent, named-only variants (GotoFirstNamedChild, etc.), field-based (GotoChildByFieldName, GotoChildByFieldID), and position-based (GotoFirstChildForByte, GotoFirstChildForPoint).

Cursors hold direct pointers into tree nodes. Recreate after Tree.Release(), Tree.Edit(...), or incremental reparse.

Highlighting
hl, _ := gotreesitter.NewHighlighter(lang, highlightQuery)
ranges := hl.Highlight(src)

for _, r := range ranges {
    fmt.Printf("%s: %q\n", r.Capture, src[r.StartByte:r.EndByte])
}
Tagging
entry := grammars.DetectLanguage("main.go")
lang := entry.Language()

tagger, _ := gotreesitter.NewTagger(lang, entry.TagsQuery)
tags := tagger.Tag(src)

for _, tag := range tags {
    fmt.Printf("%s %s at %d:%d\n", tag.Kind, tag.Name,
        tag.NameRange.StartPoint.Row, tag.NameRange.StartPoint.Column)
}

Benchmarks

All measurements below use the same workload: a generated Go source file with 500 functions (19294 bytes). Numbers are medians from 10 runs on:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) Ultra 9 285
Runtime Full parse Incremental (1-byte edit) Incremental (no edit)
Native C (pure C runtime) 1.76 ms 102.3 μs 101.7 μs
CGo binding (C runtime via cgo) ~2.0 ms ~130 μs
gotreesitter (pure Go) 4.20 ms 1.49 μs 2.18 ns

On this workload:

  • Full parse is ~2.4x slower than native C.
  • Incremental single-byte edits are ~69x faster than native C (~87x faster than CGo).
  • No-edit reparses are ~46,600x faster than native C, zero allocations.
Raw benchmark output
# Pure Go (this repo):
GOMAXPROCS=1 go test . -run '^$' \
  -bench 'BenchmarkGoParseFullDFA|BenchmarkGoParseIncrementalSingleByteEditDFA|BenchmarkGoParseIncrementalNoEditDFA' \
  -benchmem -count=10 -benchtime=1s

# CGo binding benchmarks:
cd cgo_harness
GOMAXPROCS=1 go test . -run '^$' -tags treesitter_c_bench \
  -bench 'BenchmarkCTreeSitterGoParseFull|BenchmarkCTreeSitterGoParseIncrementalSingleByteEdit|BenchmarkCTreeSitterGoParseIncrementalNoEdit' \
  -benchmem -count=10 -benchtime=750ms

# Native C benchmarks (no Go, direct C binary):
./pure_c/run_go_benchmark.sh 500 2000 20000
Benchmark Median ns/op B/op allocs/op
Native C full parse 1,764,436
Native C incremental (1-byte edit) 102,336
Native C incremental (no edit) 101,740
CTreeSitterGoParseFull ~1,990,000 600 6
CTreeSitterGoParseIncrementalSingleByteEdit ~130,000 648 7
GoParseFullDFA 4,197,811 585 7
GoParseIncrementalSingleByteEditDFA 1,490 1,584 9
GoParseIncrementalNoEditDFA 2.181 0 0
Benchmark matrix

For repeatable multi-workload tracking:

go run ./cmd/benchmatrix --count 10

Emits bench_out/matrix.json (machine-readable), bench_out/matrix.md (summary), and raw logs under bench_out/raw/.

Supported languages

206 grammars ship in the registry. All 206 produce error-free parse trees on smoke samples. Run go run ./cmd/parity_report for current status.

  • 116 external scanners (hand-written Go implementations of upstream C scanners)
  • 7 hand-written Go token sources (authzed, c, cpp, go, java, json, lua)
  • Remaining languages use the DFA lexer generated from grammar tables
Parse quality

Each LangEntry carries a Quality field:

Quality Meaning
full All scanner and lexer components present. Parser has full access to the grammar.
partial Missing external scanner. DFA lexer handles what it can; external tokens are skipped.
none Cannot parse.

full means the parser has every component the grammar requires. It does not guarantee error-free trees on all inputs — grammars with high GLR ambiguity may produce syntax errors on very large or deeply nested constructs due to parser safety limits (iteration cap, stack depth cap, node count cap). These limits scale with input size. Check tree.RootNode().HasError() at runtime.

Full language list (206)

ada, agda, angular, apex, arduino, asm, astro, authzed, awk, bash, bass, beancount, bibtex, bicep, bitbake, blade, brightscript, c, c_sharp, caddy, cairo, capnp, chatito, circom, clojure, cmake, cobol, comment, commonlisp, cooklang, corn, cpon, cpp, crystal, css, csv, cuda, cue, cylc, d, dart, desktop, devicetree, dhall, diff, disassembly, djot, dockerfile, dot, doxygen, dtd, earthfile, ebnf, editorconfig, eds, eex, elisp, elixir, elm, elsa, embedded_template, enforce, erlang, facility, faust, fennel, fidl, firrtl, fish, foam, forth, fortran, fsharp, gdscript, git_config, git_rebase, gitattributes, gitcommit, gitignore, gleam, glsl, gn, go, godot_resource, gomod, graphql, groovy, hack, hare, haskell, haxe, hcl, heex, hlsl, html, http, hurl, hyprlang, ini, janet, java, javascript, jinja2, jq, jsdoc, json, json5, jsonnet, julia, just, kconfig, kdl, kotlin, ledger, less, linkerscript, liquid, llvm, lua, luau, make, markdown, markdown_inline, matlab, mermaid, meson, mojo, move, nginx, nickel, nim, ninja, nix, norg, nushell, objc, ocaml, odin, org, pascal, pem, perl, php, pkl, powershell, prisma, prolog, promql, properties, proto, pug, puppet, purescript, python, ql, r, racket, regex, rego, requirements, rescript, robot, ron, rst, ruby, rust, scala, scheme, scss, smithy, solidity, sparql, sql, squirrel, ssh_config, starlark, svelte, swift, tablegen, tcl, teal, templ, textproto, thrift, tlaplus, tmux, todotxt, toml, tsx, turtle, twig, typescript, typst, uxntal, v, verilog, vhdl, vimdoc, vue, wat, wgsl, wolfram, xml, yaml, yuck, zig

Query API

Feature Status
Compile + execute (NewQuery, Execute, ExecuteNode) supported
Cursor streaming (Exec, NextMatch, NextCapture) supported
Structural quantifiers (?, *, +) supported
Alternation ([...]) supported
Field matching (name: (identifier)) supported
#eq? / #not-eq? supported
#match? / #not-match? supported
#any-of? / #not-any-of? supported
#lua-match? supported
#has-ancestor? / #not-has-ancestor? supported
#not-has-parent? supported
#is? / #is-not? supported
#any-eq? / #any-not-eq? supported
#any-match? / #any-not-match? supported
#select-adjacent! supported
#strip! supported
#set! / #offset! directives parsed and accepted
SetValues (read #set! metadata from matches) supported

All shipped highlight and tags queries compile (156/156 highlight, 69/69 tags).

Known limitations

  • Full-parse throughput: ~2.4x slower than the C runtime on cold full parses (the 500-function Go benchmark). Incremental reparsing — the dominant operation in editor workloads — is 69x faster.
  • GLR safety caps: The parser enforces iteration, stack depth, and node count limits proportional to input size. These prevent pathological blowup on grammars with high ambiguity but impose a ceiling on the maximum input complexity that parses without error. The caps are tunable but not removable without risking unbounded resource consumption.

Adding a language

  1. Add the grammar repo to grammars/languages.manifest
  2. Refresh pinned refs in grammars/languages.lock: go run ./cmd/grammar_updater -lock grammars/languages.lock -write -report grammars/grammar_updates.json
  3. Generate tables: go run ./cmd/ts2go -manifest grammars/languages.manifest -outdir ./grammars -package grammars -compact=true
  4. Add smoke samples to cmd/parity_report/main.go and grammars/parse_support_test.go
  5. Verify: go run ./cmd/parity_report && go test ./grammars/...

Grammar lock updates

  • grammars/languages.lock stores pinned refs for grammar update + parity automation.
  • cmd/grammar_updater refreshes refs and emits a machine-readable report.
  • .github/workflows/grammar-lock-update.yml opens scheduled/dispatch update PRs.

Manual refresh:

go run ./cmd/grammar_updater \
  -lock grammars/languages.lock \
  -allow-list grammars/update_tier1_core100.txt \
  -max-updates 10 \
  -write \
  -report grammars/grammar_updates.json

Architecture

gotreesitter is a ground-up reimplementation of the tree-sitter runtime in Go. No code is shared with or translated from the C implementation.

Parser — Table-driven LR(1) with GLR fallback. When a (state, symbol) pair maps to multiple actions in the parse table, the parser forks the stack and explores all alternatives in parallel. Stack merging collapses equivalent paths. Safety limits (iteration count, stack depth, node count) scale with input size and prevent runaway exploration on ambiguous grammars.

Incremental engine — Walks the edit region of the previous tree and reuses unchanged subtrees by reference. Non-leaf subtree reuse is enabled by storing a pre-goto parser state on each interior node, allowing the parser to skip an entire subtree and resume in the correct state without re-deriving its contents. External scanner state is serialized on each node boundary so scanner-dependent subtrees can be reused without replaying the scanner from the start.

Lexer — Two paths. A DFA lexer is generated from the grammar's lex tables by ts2go and handles the majority of languages. For grammars where the DFA is insufficient (e.g., Go's automatic semicolons, YAML's indentation-sensitive structure), hand-written Go token sources implement the TokenSource interface directly.

External scanners — 116 grammars require external scanners for context-sensitive tokens (Python indentation, HTML implicit close tags, Rust raw string delimiters, Swift operator disambiguation, etc.). Each scanner is a hand-written Go implementation of the grammar's ExternalScanner interface: Create, Serialize, Deserialize, Scan. Scanner state is snapshotted after every token and stored on tree nodes so incremental reuse can restore scanner state on skip.

Arena allocator — Nodes are allocated from slab-based arenas to reduce GC pressure. Arenas are released in bulk when a tree is freed.

Query engine — S-expression pattern compiler with predicate evaluation and streaming cursor iteration. Supports all standard tree-sitter predicates (#eq?, #match?, #any-of?, #has-ancestor?, etc.) and directive annotations (#set!, #offset!, #select-adjacent!, #strip!).

Injection parser — Orchestrates multi-language parsing. Runs injection queries against a parent tree to find embedded regions, spawns child parsers with SetIncludedRanges(), and recurses for nested injections. Incremental reparse reuses unchanged child trees.

Rewriter — Collects source-level edits (replace, insert, delete) targeting byte ranges, applies them atomically, and produces InputEdit records for incremental reparse. Edits are validated for non-overlap and applied in a single pass.

Grammar loadingts2go extracts parse tables, lex tables, field maps, symbol metadata, and external token lists from upstream parser.c files. These are serialized to compressed binary blobs under grammars/grammar_blobs/ and lazy-loaded via loadEmbeddedLanguage() with an LRU cache. String and transition interning reduce memory footprint across loaded grammars.

Build tags and environment

External grammar blobs (avoid embedding in the binary):

go build -tags grammar_blobs_external
GOTREESITTER_GRAMMAR_BLOB_DIR=/path/to/blobs  # required
GOTREESITTER_GRAMMAR_BLOB_MMAP=false           # disable mmap (Unix only)

Curated language set (smaller binary):

go build -tags grammar_set_core  # curated Core100 embedded grammar set
GOTREESITTER_GRAMMAR_SET=go,json,python  # runtime restriction

Grammar cache tuning (long-lived processes):

grammars.SetEmbeddedLanguageCacheLimit(8)    // LRU cap
grammars.UnloadEmbeddedLanguage("rust.bin")  // drop one
grammars.PurgeEmbeddedLanguageCache()        // drop all
GOTREESITTER_GRAMMAR_CACHE_LIMIT=8       # LRU cap via env
GOTREESITTER_GRAMMAR_IDLE_TTL=5m         # evict after idle
GOTREESITTER_GRAMMAR_IDLE_SWEEP=30s      # sweep interval
GOTREESITTER_GRAMMAR_COMPACT=true        # loader compaction (default)
GOTREESITTER_GRAMMAR_STRING_INTERN_LIMIT=200000
GOTREESITTER_GRAMMAR_TRANSITION_INTERN_LIMIT=20000

GLR stack cap override:

GOT_GLR_MAX_STACKS=8  # overrides default GLR stack cap (default: 8)

Default is tuned for correctness. Increase only if a grammar/workload needs more GLR alternatives to preserve parity.

Legacy benchmark compatibility only:

GOT_PARSE_NODE_LIMIT_SCALE=3

GOT_PARSE_NODE_LIMIT_SCALE is only needed for comparisons against older truncation-prone benchmark baselines. On current branches, keep it unset.

Testing

go test ./... -race -count=1

Correctness/parity gate commands used in CI and performance work:

# Top-50 smoke correctness
go test ./grammars -run '^TestTop50ParseSmokeNoErrors$' -count=1 -v

# C-oracle parity suites
cd cgo_harness
go test . -tags treesitter_c_parity -run '^TestParityFreshParse$|^TestParityHasNoErrors$|^TestParityIssue3Repros$|^TestParityGLRCanaryGo$' -count=1 -v
go test . -tags treesitter_c_parity -run '^TestParityCorpusFreshParse$' -count=1 -v

Test suite covers: smoke tests (206 grammars), golden S-expression snapshots, highlight query validation, query pattern matching, incremental reparse correctness, error recovery, GLR fork/merge, injection parsing, source rewriting, and fuzz targets.

Roadmap

v0.7.x — 206 grammars (all OK), 116 external scanners, GLR parser, incremental reparsing with external scanner checkpoints, query engine, tree cursor, highlighting, tagging, ABI 15 support, injection parser, typed query codegen, CST rewriter, parser pool, arena memory budgets, and structural parity against 100+ curated C reference grammars.

Next:

  • Pure-Go grammar compiler (grammargen) — eliminate dependency on upstream parser.c files
  • TypeScript full-corpus parity
  • Python incremental parsing with fine-grained indent checkpoint validation
  • Table-based DFA C codegen for grammargen (compact output for Unicode-heavy grammars)

Release history and retroactive notes are tracked in CHANGELOG.md.

License

MIT

Documentation

Overview

Package gotreesitter implements a pure Go tree-sitter runtime.

This file defines the core data structures that mirror tree-sitter's TSLanguage C struct and related types. They form the foundation on which the lexer, parser, query engine, and syntax tree are built.

Index

Constants

View Source
const (
	// RuntimeLanguageVersion is the maximum tree-sitter language version this
	// runtime is known to support.
	RuntimeLanguageVersion uint32 = 15
	// MinCompatibleLanguageVersion is the minimum accepted language version.
	MinCompatibleLanguageVersion uint32 = 13
)

Variables

View Source
var DebugDFA atomic.Bool

DebugDFA enables trace logging for DFA token production.

Use `DebugDFA.Store(true/false)` to toggle at runtime.

View Source
var ErrNoLanguage = errors.New("parser has no language configured")

ErrNoLanguage is returned when a Parser has no language configured.

Functions

func EnableArenaProfile added in v0.6.0

func EnableArenaProfile(enabled bool)

EnableArenaProfile toggles arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func EnableRuntimeAudit added in v0.7.0

func EnableRuntimeAudit(enabled bool)

EnableRuntimeAudit toggles per-parse survivor instrumentation. This debug hook is intended for single-threaded benchmark/profiling runs.

func RegisterHighlighterInjection added in v0.7.0

func RegisterHighlighterInjection(parentLanguage string, spec HighlighterInjectionSpec)

RegisterHighlighterInjection registers nested-highlighting configuration for a parent language name (for example "markdown").

func ResetArenaProfile added in v0.6.0

func ResetArenaProfile()

ResetArenaProfile resets arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func ResetParseEnvConfigCacheForTests added in v0.7.0

func ResetParseEnvConfigCacheForTests()

ResetParseEnvConfigCacheForTests clears memoized parser env config.

Tests in this repo mutate env vars between cases; this helper ensures subsequent parses observe the new values in the same process.

func ResetPerfCounters added in v0.6.0

func ResetPerfCounters()

func RunExternalScanner

func RunExternalScanner(lang *Language, payload any, lexer *ExternalLexer, validSymbols []bool) bool

RunExternalScanner invokes the language's external scanner if present. Returns true if the scanner produced a token, false otherwise.

func Walk

func Walk(node *Node, fn func(node *Node, depth int) WalkAction)

Walk performs a depth-first traversal of the syntax tree rooted at node. The callback receives each node and its depth (0 for the starting node). Return WalkSkipChildren to skip a node's children, or WalkStop to end early.

Types

type ArenaProfile added in v0.6.0

type ArenaProfile struct {
	IncrementalAcquire uint64
	IncrementalNew     uint64
	FullAcquire        uint64
	FullNew            uint64
}

ArenaProfile captures node arena allocation statistics. Enable with SetArenaProfileEnabled(true) and retrieve with GetArenaProfile().

func ArenaProfileSnapshot added in v0.6.0

func ArenaProfileSnapshot() ArenaProfile

ArenaProfileSnapshot returns current arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

type BoundTree

type BoundTree struct {
	// contains filtered or unexported fields
}

BoundTree pairs a Tree with its Language and source, eliminating the need to pass *Language and []byte to every node method call.

func Bind

func Bind(tree *Tree) *BoundTree

Bind creates a BoundTree from a Tree. The Tree must have been created with a Language (via NewTree or a Parser). Returns a BoundTree that delegates to the underlying Tree's Language and Source.

func (*BoundTree) ChildByField

func (bt *BoundTree) ChildByField(n *Node, fieldName string) *Node

ChildByField returns the first child assigned to the given field name.

func (*BoundTree) Language

func (bt *BoundTree) Language() *Language

Language returns the tree's language.

func (*BoundTree) NodeText

func (bt *BoundTree) NodeText(n *Node) string

NodeText returns the source text covered by the node.

func (*BoundTree) NodeType

func (bt *BoundTree) NodeType(n *Node) string

NodeType returns the node's type name, resolved via the bound language.

func (*BoundTree) Release

func (bt *BoundTree) Release()

Release releases the underlying tree's arena memory.

func (*BoundTree) RootNode

func (bt *BoundTree) RootNode() *Node

RootNode returns the tree's root node.

func (*BoundTree) Source

func (bt *BoundTree) Source() []byte

Source returns the tree's source bytes.

func (*BoundTree) TreeCursor added in v0.6.0

func (bt *BoundTree) TreeCursor() *TreeCursor

TreeCursor returns a new TreeCursor starting at the tree's root node.

type ByteSkippableTokenSource

type ByteSkippableTokenSource interface {
	TokenSource
	SkipToByte(offset uint32) Token
}

ByteSkippableTokenSource can jump to a byte offset and return the first token at or after that position.

type ExternalLexer

type ExternalLexer struct {
	// contains filtered or unexported fields
}

ExternalLexer is the scanner-facing lexer API used by external scanners. It mirrors the essential tree-sitter scanner API: lookahead, advance, mark_end, and result_symbol.

func (*ExternalLexer) Advance

func (l *ExternalLexer) Advance(skip bool)

Advance consumes one rune. When skip is true, consumed bytes are excluded from the token span (scanner whitespace skipping behavior).

func (*ExternalLexer) Column added in v0.6.0

func (l *ExternalLexer) Column() uint32

Column returns the current column (0-based) at the scanner cursor.

func (*ExternalLexer) GetColumn deprecated

func (l *ExternalLexer) GetColumn() uint32

GetColumn returns the current column (0-based) at the scanner cursor.

Deprecated: use Column.

func (*ExternalLexer) Lookahead

func (l *ExternalLexer) Lookahead() rune

Lookahead returns the current rune or 0 at EOF.

func (*ExternalLexer) MarkEnd

func (l *ExternalLexer) MarkEnd()

MarkEnd marks the current scanner position as the token end.

func (*ExternalLexer) SetResultSymbol

func (l *ExternalLexer) SetResultSymbol(sym Symbol)

SetResultSymbol sets the token symbol to emit when Scan returns true.

type ExternalScanner

type ExternalScanner interface {
	Create() any
	Destroy(payload any)
	Serialize(payload any, buf []byte) int
	Deserialize(payload any, buf []byte)
	Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool
}

ExternalScanner is the interface for language-specific external scanners. Languages like Python and JavaScript need these for indent tracking, template literals, regex vs division, etc.

The value returned by Create must be accepted by Destroy/Serialize/ Deserialize/Scan for that scanner implementation. Most scanners use a concrete payload pointer type and will panic on mismatched payload types.

type ExternalScannerState

type ExternalScannerState struct {
	Data []byte
}

ExternalScannerState holds serialized state for an external scanner between incremental parse runs.

type ExternalVMInstr

type ExternalVMInstr struct {
	Op  ExternalVMOp
	A   int32
	B   int32
	Alt int32
}

ExternalVMInstr is one instruction in an external scanner VM program.

Operands:

  • A: primary operand (opcode-specific)
  • B: secondary operand (used by range checks)
  • Alt: alternate program counter when a condition fails

func VMAdvance

func VMAdvance(skip bool) ExternalVMInstr

VMAdvance constructs an advance instruction. When skip is true, the advanced rune is skipped from the token text.

func VMEmit

func VMEmit(sym Symbol) ExternalVMInstr

VMEmit constructs an emit instruction for the given symbol.

func VMFail

func VMFail() ExternalVMInstr

VMFail constructs a fail instruction that terminates scan with no token.

func VMIfRuneClass

func VMIfRuneClass(class ExternalVMRuneClass, alt int) ExternalVMInstr

VMIfRuneClass constructs a rune-class branch with alternate target on miss.

func VMIfRuneEq

func VMIfRuneEq(r rune, alt int) ExternalVMInstr

VMIfRuneEq constructs a rune-equality branch with alternate target on miss.

func VMIfRuneInRange

func VMIfRuneInRange(start, end rune, alt int) ExternalVMInstr

VMIfRuneInRange constructs a rune-range branch with alternate target on miss.

func VMJump

func VMJump(target int) ExternalVMInstr

VMJump constructs an unconditional branch to the target instruction index.

func VMMarkEnd

func VMMarkEnd() ExternalVMInstr

VMMarkEnd constructs a mark-end instruction for the current token extent.

func VMRequireStateEq

func VMRequireStateEq(state uint32, alt int) ExternalVMInstr

VMRequireStateEq constructs a payload-state guard with alternate branch on miss.

func VMRequireValid

func VMRequireValid(validSymbolIndex, alt int) ExternalVMInstr

VMRequireValid constructs a valid-symbol guard with alternate branch on miss.

func VMSetState

func VMSetState(state uint32) ExternalVMInstr

VMSetState constructs a payload-state assignment instruction.

type ExternalVMOp

type ExternalVMOp uint8

ExternalVMOp is an opcode for the native-Go external scanner VM.

const (
	ExternalVMOpFail ExternalVMOp = iota
	ExternalVMOpJump
	ExternalVMOpRequireValid
	ExternalVMOpRequireStateEq
	ExternalVMOpSetState
	ExternalVMOpIfRuneEq
	ExternalVMOpIfRuneInRange
	ExternalVMOpIfRuneClass
	ExternalVMOpAdvance
	ExternalVMOpMarkEnd
	ExternalVMOpEmit
)

type ExternalVMProgram

type ExternalVMProgram struct {
	Code     []ExternalVMInstr
	MaxSteps int // <=0 uses a safe default based on program size
}

ExternalVMProgram is a small bytecode program interpreted by ExternalVMScanner.

type ExternalVMRuneClass

type ExternalVMRuneClass uint8

ExternalVMRuneClass is a character class used by ExternalVMOpIfRuneClass.

const (
	ExternalVMRuneClassWhitespace ExternalVMRuneClass = iota
	ExternalVMRuneClassDigit
	ExternalVMRuneClassLetter
	ExternalVMRuneClassWord
	ExternalVMRuneClassNewline
)

type ExternalVMScanner

type ExternalVMScanner struct {
	// contains filtered or unexported fields
}

ExternalVMScanner executes an ExternalVMProgram and implements ExternalScanner.

func MustNewExternalVMScanner

func MustNewExternalVMScanner(program ExternalVMProgram) *ExternalVMScanner

MustNewExternalVMScanner is like NewExternalVMScanner but panics on error. It is intended for package-level initialization where invalid programs are programmer errors.

func NewExternalVMScanner

func NewExternalVMScanner(program ExternalVMProgram) (*ExternalVMScanner, error)

NewExternalVMScanner validates and constructs an ExternalVMScanner.

func (*ExternalVMScanner) Create

func (s *ExternalVMScanner) Create() any

Create allocates scanner payload (currently a single uint32 state slot).

func (*ExternalVMScanner) Deserialize

func (s *ExternalVMScanner) Deserialize(payload any, buf []byte)

Deserialize restores payload state from buf.

func (*ExternalVMScanner) Destroy

func (s *ExternalVMScanner) Destroy(payload any)

Destroy releases scanner payload resources.

func (*ExternalVMScanner) Scan

func (s *ExternalVMScanner) Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool

Scan executes the scanner program against the current lexer position.

func (*ExternalVMScanner) Serialize

func (s *ExternalVMScanner) Serialize(payload any, buf []byte) int

Serialize writes payload state into buf.

type FieldID

type FieldID uint16

FieldID is a named field index.

type FieldMapEntry

type FieldMapEntry struct {
	FieldID    FieldID
	ChildIndex uint8
	Inherited  bool
}

FieldMapEntry maps a child index to a field name.

type HighlightRange

type HighlightRange struct {
	StartByte    uint32
	EndByte      uint32
	Capture      string // "keyword", "string", "function", etc.
	PatternIndex int    // query pattern index; later patterns override earlier for identical ranges
}

HighlightRange represents a styled range of source code, mapping a byte span to a capture name from a highlight query. The editor maps capture names (e.g., "keyword", "string", "function") to FSS style classes.

type Highlighter

type Highlighter struct {
	// contains filtered or unexported fields
}

Highlighter is a high-level API that takes source code and returns styled ranges. It combines a Parser, a compiled Query, and a Language to provide a single Highlight() call for the editor.

func NewHighlighter

func NewHighlighter(lang *Language, highlightQuery string, opts ...HighlighterOption) (*Highlighter, error)

NewHighlighter creates a Highlighter for the given language and highlight query (in tree-sitter .scm format). Returns an error if the query fails to compile.

func (*Highlighter) Highlight

func (h *Highlighter) Highlight(source []byte) []HighlightRange

Highlight parses the source code and executes the highlight query, returning a slice of HighlightRange sorted by StartByte. When ranges overlap, inner (more specific) captures take priority over outer ones.

func (*Highlighter) HighlightIncremental

func (h *Highlighter) HighlightIncremental(source []byte, oldTree *Tree) ([]HighlightRange, *Tree)

HighlightIncremental re-highlights source after edits were applied to oldTree. Returns the new highlight ranges and the new parse tree (for use in subsequent incremental calls). Call oldTree.Edit() before calling this.

type HighlighterInjectionResolver added in v0.7.0

type HighlighterInjectionResolver func(languageHint string) (lang *Language, highlightQuery string, tokenSourceFactory func(source []byte) TokenSource, ok bool)

HighlighterInjectionResolver maps a language hint (for example "go" from a markdown code fence) to a child language and highlight query.

type HighlighterInjectionSpec added in v0.7.0

type HighlighterInjectionSpec struct {
	Query           string
	ResolveLanguage HighlighterInjectionResolver
}

HighlighterInjectionSpec configures nested highlighting for a parent language. Query must emit @injection.content and either @injection.language or #set! injection.language metadata.

type HighlighterOption

type HighlighterOption func(*Highlighter)

HighlighterOption configures a Highlighter.

func WithTokenSourceFactory

func WithTokenSourceFactory(factory func(source []byte) TokenSource) HighlighterOption

WithTokenSourceFactory sets a factory function that creates a TokenSource for each Highlight call. This is needed for languages that use a custom lexer bridge (like Go, which uses go/scanner instead of a DFA lexer).

When set, Highlight() calls ParseWithTokenSource instead of Parse.

type IncrementalParseProfile added in v0.6.0

type IncrementalParseProfile struct {
	ReuseCursorNanos                   int64
	ReparseNanos                       int64
	ReusedSubtrees                     uint64
	ReusedBytes                        uint64
	NewNodesAllocated                  uint64
	ReuseUnsupported                   bool
	ReuseUnsupportedReason             string
	ReuseRejectDirty                   uint64
	ReuseRejectAncestorDirtyBeforeEdit uint64
	ReuseRejectHasError                uint64
	ReuseRejectInvalidSpan             uint64
	ReuseRejectOutOfBounds             uint64
	ReuseRejectRootNonLeafChanged      uint64
	ReuseRejectLargeNonLeaf            uint64
	RecoverSearches                    uint64
	RecoverStateChecks                 uint64
	RecoverStateSkips                  uint64
	RecoverSymbolSkips                 uint64
	RecoverLookups                     uint64
	RecoverHits                        uint64
	MaxStacksSeen                      int
	EntryScratchPeak                   uint64
	StopReason                         ParseStopReason
	TokensConsumed                     uint64
	LastTokenEndByte                   uint32
	ExpectedEOFByte                    uint32
	ArenaBytesAllocated                int64
	ScratchBytesAllocated              int64
	EntryScratchBytesAllocated         int64
	GSSBytesAllocated                  int64
	SingleStackIterations              int
	MultiStackIterations               int
	SingleStackTokens                  uint64
	MultiStackTokens                   uint64
	SingleStackGSSNodes                uint64
	MultiStackGSSNodes                 uint64
	GSSNodesAllocated                  uint64
	GSSNodesRetained                   uint64
	GSSNodesDroppedSameToken           uint64
	ParentNodesAllocated               uint64
	ParentNodesRetained                uint64
	ParentNodesDroppedSameToken        uint64
	LeafNodesAllocated                 uint64
	LeafNodesRetained                  uint64
	LeafNodesDroppedSameToken          uint64
	MergeStacksIn                      uint64
	MergeStacksOut                     uint64
	MergeSlotsUsed                     uint64
	GlobalCullStacksIn                 uint64
	GlobalCullStacksOut                uint64
}

IncrementalParseProfile attributes incremental parse time into coarse buckets.

ReuseCursorNanos includes reuse-cursor setup and subtree-candidate checks. ReparseNanos includes the remainder of incremental parsing/rebuild work.

type IncrementalReuseExternalScanner added in v0.7.0

type IncrementalReuseExternalScanner interface {
	ExternalScanner
	SupportsIncrementalReuse() bool
}

IncrementalReuseExternalScanner is implemented by external scanners that can safely participate in DFA subtree reuse during incremental parses. Scanners with serialized mutable state, such as Python's indentation stack, should leave this unimplemented so edited incremental parses fall back to the conservative full-reparse path.

type IncrementalReuseTokenSource added in v0.7.0

type IncrementalReuseTokenSource interface {
	TokenSource
	SupportsIncrementalReuse() bool
}

IncrementalReuseTokenSource is an opt-in marker for custom token sources that are safe for incremental subtree reuse. Implementations must provide stable token boundaries across edits and support deterministic SkipToByte* behavior so reused-tree fast-forwarding remains correct.

type Injection added in v0.6.0

type Injection struct {
	// Language is the detected language name (e.g., "javascript").
	Language string
	// Tree is the parse tree for this region, or nil if the language
	// was not registered.
	Tree *Tree
	// Ranges are the source ranges this tree covers.
	Ranges []Range
	// Node is the parent tree node that triggered the injection.
	Node *Node
}

Injection is a single embedded language region.

type InjectionParser added in v0.6.0

type InjectionParser struct {
	// contains filtered or unexported fields
}

InjectionParser parses documents with embedded languages.

InjectionParser is not safe for concurrent use. It caches child parsers and mutates shared maps during parse operations.

func NewInjectionParser added in v0.6.0

func NewInjectionParser() *InjectionParser

NewInjectionParser creates an InjectionParser.

func (*InjectionParser) Parse added in v0.6.0

func (ip *InjectionParser) Parse(source []byte, parentLang string) (*InjectionResult, error)

Parse parses source as parentLang, then recursively parses injected regions.

func (*InjectionParser) ParseIncremental added in v0.6.0

func (ip *InjectionParser) ParseIncremental(source []byte, parentLang string,
	oldResult *InjectionResult) (*InjectionResult, error)

ParseIncremental re-parses after edits, reusing unchanged child trees.

func (*InjectionParser) RegisterInjectionQuery added in v0.6.0

func (ip *InjectionParser) RegisterInjectionQuery(parentLang string, query string) error

RegisterInjectionQuery sets the injection query for a parent language. The query should use @injection.content and #set! injection.language conventions. It is compiled against the registered parent language.

func (*InjectionParser) RegisterLanguage added in v0.6.0

func (ip *InjectionParser) RegisterLanguage(name string, lang *Language)

RegisterLanguage adds a language that can be used as parent or child.

func (*InjectionParser) SetMaxDepth added in v0.6.0

func (ip *InjectionParser) SetMaxDepth(depth int)

SetMaxDepth overrides the nested injection recursion limit. Depth values <= 0 restore the default limit.

type InjectionResult added in v0.6.0

type InjectionResult struct {
	// Tree is the parent language's parse tree.
	Tree *Tree
	// Injections contains child language parse results, ordered by position.
	Injections []Injection
}

InjectionResult holds parse results for a multi-language document.

type InputEdit

type InputEdit struct {
	StartByte   uint32
	OldEndByte  uint32
	NewEndByte  uint32
	StartPoint  Point
	OldEndPoint Point
	NewEndPoint Point
}

InputEdit describes a single edit to the source text. It tells the parser what byte range was replaced and what the new range looks like, so the incremental parser can skip unchanged subtrees.

type Language

type Language struct {
	Name string

	// LanguageVersion is the tree-sitter language ABI version.
	// A value of 0 means "unknown/unspecified" and is treated as compatible.
	LanguageVersion uint32

	// Counts
	SymbolCount        uint32
	TokenCount         uint32
	ExternalTokenCount uint32
	StateCount         uint32
	LargeStateCount    uint32
	FieldCount         uint32
	ProductionIDCount  uint32

	// Symbol metadata
	SymbolNames    []string
	SymbolMetadata []SymbolMetadata
	FieldNames     []string // index 0 is ""

	// Parse tables
	ParseTable         [][]uint16 // dense: [state][symbol] -> action index
	SmallParseTable    []uint16   // compressed sparse table
	SmallParseTableMap []uint32   // state -> offset into SmallParseTable
	ParseActions       []ParseActionEntry

	// Lex tables
	LexModes            []LexMode
	LexStates           []LexState // main lexer DFA
	KeywordLexStates    []LexState // keyword lexer DFA (optional)
	KeywordCaptureToken Symbol

	// Field mapping
	FieldMapSlices  [][2]uint16 // [production_id] -> (index, length)
	FieldMapEntries []FieldMapEntry

	// Alias sequences
	AliasSequences [][]Symbol // [production_id][child_index] -> alias symbol

	// Primary state IDs (for table dedup)
	PrimaryStateIDs []StateID

	// ABI 15: Reserved words — flat array indexed by
	// (reserved_word_set_id * MaxReservedWordSetSize + i), terminated by 0.
	ReservedWords          []Symbol
	MaxReservedWordSetSize uint16

	// ABI 15: Supertype hierarchy
	SupertypeSymbols    []Symbol
	SupertypeMapSlices  [][2]uint16 // [supertype_symbol] -> (index, length)
	SupertypeMapEntries []Symbol

	// ABI 15: Grammar semantic version
	Metadata LanguageMetadata

	// External scanner (nil if not needed)
	ExternalScanner ExternalScanner
	ExternalSymbols []Symbol // external token index -> symbol

	// ExternalLexStates maps external lex state IDs (from LexMode.ExternalLexState)
	// to a boolean slice indicating which external tokens are valid. Row 0 is
	// always all-false (no external tokens valid). When non-nil, this table is
	// used instead of parse-action-table probing to compute validSymbols for the
	// external scanner, matching C tree-sitter's ts_external_scanner_states.
	ExternalLexStates [][]bool

	// InitialState is the parser's start state. In tree-sitter grammars
	// this is always 1 (state 0 is reserved for error recovery). For
	// hand-built grammars it defaults to 0.
	InitialState StateID
	// contains filtered or unexported fields
}

Language holds all data needed to parse a specific language. It mirrors tree-sitter's TSLanguage C struct, translated into idiomatic Go types with slice-based tables instead of raw pointers.

func (*Language) CompatibleWithRuntime

func (l *Language) CompatibleWithRuntime() bool

CompatibleWithRuntime reports whether this language can be parsed by the current runtime version. Unspecified versions (0) are treated as compatible.

func (*Language) FieldByName

func (l *Language) FieldByName(name string) (FieldID, bool)

FieldByName returns the field ID for a given name, or (0, false) if not found. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) IsSupertype added in v0.6.0

func (l *Language) IsSupertype(sym Symbol) bool

IsSupertype reports whether sym is a supertype symbol.

func (*Language) PublicSymbol added in v0.7.0

func (l *Language) PublicSymbol(sym Symbol) Symbol

PublicSymbol maps an internal symbol to its canonical public form. Multiple internal symbols may share the same visible name (e.g. HTML's _start_tag_name and _end_tag_name both display as "tag_name"). PublicSymbol returns the first symbol with that name, matching what SymbolByName returns. This ensures query patterns compiled with SymbolByName match nodes regardless of which alias produced them.

func (*Language) SupertypeChildren added in v0.6.0

func (l *Language) SupertypeChildren(sym Symbol) []Symbol

SupertypeChildren returns the subtype symbols for a given supertype. Returns nil if sym is not a supertype or has no entries.

func (*Language) SymbolByName

func (l *Language) SymbolByName(name string) (Symbol, bool)

SymbolByName returns the symbol ID for a given name, or (0, false) if not found. The "_" wildcard returns (0, true) as a special case. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) TokenSymbolsByName

func (l *Language) TokenSymbolsByName(name string) []Symbol

TokenSymbolsByName returns all terminal token symbols whose display name matches name. The returned symbols are in grammar order.

func (*Language) Version

func (l *Language) Version() uint32

Version returns the tree-sitter language ABI version.

type LanguageMetadata added in v0.6.0

type LanguageMetadata struct {
	MajorVersion uint8
	MinorVersion uint8
	PatchVersion uint8
}

LanguageMetadata holds the grammar's semantic version (ABI 15+).

type LexMode

type LexMode struct {
	LexState          uint16
	ExternalLexState  uint16
	ReservedWordSetID uint16
}

LexMode maps a parser state to its lexer configuration.

type LexState

type LexState struct {
	AcceptToken Symbol // 0 if this state doesn't accept
	Skip        bool   // true if accepted chars are whitespace
	Default     int32  // default next state (-1 if none)
	EOF         int32  // state on EOF (-1 if none)
	Transitions []LexTransition
}

LexState is one state in the table-driven lexer DFA.

type LexTransition

type LexTransition struct {
	Lo, Hi    rune // inclusive character range
	NextState int32
	// Skip mirrors tree-sitter's SKIP(state): consume the matched rune
	// and continue lexing while resetting token start.
	Skip bool
}

LexTransition maps a character range to a next state.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes source text using a table-driven DFA.

func NewLexer

func NewLexer(states []LexState, source []byte) *Lexer

NewLexer creates a new Lexer that will tokenize source using the given DFA state table.

func (*Lexer) Next

func (l *Lexer) Next(startState uint16) Token

Next lexes the next token starting from the given lex state index. It automatically skips tokens from states where Skip=true (whitespace). Returns a zero-Symbol token with StartByte==EndByte at EOF.

type LookaheadIterator added in v0.6.0

type LookaheadIterator struct {
	// contains filtered or unexported fields
}

LookaheadIterator iterates over valid symbols for a given parse state. It precomputes the full set of symbols that have valid parse actions in the specified state, enabling autocomplete and error diagnostic use cases.

func NewLookaheadIterator added in v0.6.0

func NewLookaheadIterator(lang *Language, state StateID) (*LookaheadIterator, error)

NewLookaheadIterator creates an iterator over all symbols that have valid parse actions in the given state. Returns an error if the state is out of range for the language's parse tables.

func (*LookaheadIterator) CurrentSymbol added in v0.6.0

func (it *LookaheadIterator) CurrentSymbol() Symbol

CurrentSymbol returns the symbol at the current iterator position. Must be called after a successful Next().

func (*LookaheadIterator) CurrentSymbolName added in v0.6.0

func (it *LookaheadIterator) CurrentSymbolName() string

CurrentSymbolName returns the name of the symbol at the current iterator position. Returns "" if the position is invalid or the symbol has no name.

func (*LookaheadIterator) Language added in v0.6.0

func (it *LookaheadIterator) Language() *Language

Language returns the language associated with this iterator.

func (*LookaheadIterator) Next added in v0.6.0

func (it *LookaheadIterator) Next() bool

Next advances the iterator to the next valid symbol. Returns false when there are no more symbols.

func (*LookaheadIterator) ResetState added in v0.6.0

func (it *LookaheadIterator) ResetState(state StateID) error

ResetState resets the iterator to enumerate valid symbols for a different parse state within the same language. Returns an error if the state is out of range.

type Node

type Node struct {
	// contains filtered or unexported fields
}

Node is a syntax tree node.

func NewLeafNode

func NewLeafNode(sym Symbol, named bool, startByte, endByte uint32, startPoint, endPoint Point) *Node

NewLeafNode creates a terminal/leaf node.

func NewParentNode

func NewParentNode(sym Symbol, named bool, children []*Node, fieldIDs []FieldID, productionID uint16) *Node

NewParentNode creates a non-terminal node with children. It sets parent pointers on all children and computes byte/point spans from the first and last children. If any child has an error, the parent is marked as having an error too.

func (*Node) Child

func (n *Node) Child(i int) *Node

Child returns the i-th child, or nil if i is out of range.

func (*Node) ChildByFieldName

func (n *Node) ChildByFieldName(name string, lang *Language) *Node

ChildByFieldName returns the first child assigned to the given field name, or nil if no child has that field. The Language is needed to resolve field names to IDs. Uses Language.FieldByName for O(1) lookup.

func (*Node) ChildCount

func (n *Node) ChildCount() int

ChildCount returns the number of children (both named and anonymous).

func (*Node) Children

func (n *Node) Children() []*Node

Children returns a slice of all children.

func (*Node) DescendantForByteRange added in v0.6.0

func (n *Node) DescendantForByteRange(startByte, endByte uint32) *Node

DescendantForByteRange returns the smallest descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) DescendantForPointRange added in v0.6.0

func (n *Node) DescendantForPointRange(startPoint, endPoint Point) *Node

DescendantForPointRange returns the smallest descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) Edit added in v0.7.0

func (n *Node) Edit(edit InputEdit)

Edit adjusts this node's byte/point span for a source edit.

If the node belongs to a larger tree, the edit is applied from the containing root so sibling and ancestor spans remain consistent. Unlike Tree.Edit, this method does not record edit history on a Tree.

func (*Node) EndByte

func (n *Node) EndByte() uint32

EndByte returns the byte offset where this node ends (exclusive).

func (*Node) EndPoint

func (n *Node) EndPoint() Point

EndPoint returns the row/column position where this node ends.

func (*Node) FieldNameForChild added in v0.6.0

func (n *Node) FieldNameForChild(i int, lang *Language) string

FieldNameForChild returns the field name assigned to the i-th child, or an empty string when no field is assigned.

func (*Node) HasChanges added in v0.6.0

func (n *Node) HasChanges() bool

HasChanges reports whether this node was marked dirty by Tree.Edit.

func (*Node) HasError

func (n *Node) HasError() bool

HasError reports whether this node or any descendant contains a parse error.

func (*Node) IsError added in v0.6.0

func (n *Node) IsError() bool

IsError reports whether this node is an explicit error node.

func (*Node) IsExtra added in v0.6.0

func (n *Node) IsExtra() bool

IsExtra reports whether this node was marked as extra syntax (e.g. whitespace/comments outside the core parse structure).

func (*Node) IsMissing

func (n *Node) IsMissing() bool

IsMissing reports whether this node was inserted by error recovery.

func (*Node) IsNamed

func (n *Node) IsNamed() bool

IsNamed reports whether this is a named node (as opposed to anonymous syntax like punctuation).

func (*Node) NamedChild

func (n *Node) NamedChild(i int) *Node

NamedChild returns the i-th named child (skipping anonymous children), or nil if i is out of range.

func (*Node) NamedChildCount

func (n *Node) NamedChildCount() int

NamedChildCount returns the number of named children.

func (*Node) NamedDescendantForByteRange added in v0.6.0

func (n *Node) NamedDescendantForByteRange(startByte, endByte uint32) *Node

NamedDescendantForByteRange returns the smallest named descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) NamedDescendantForPointRange added in v0.6.0

func (n *Node) NamedDescendantForPointRange(startPoint, endPoint Point) *Node

NamedDescendantForPointRange returns the smallest named descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) NextSibling

func (n *Node) NextSibling() *Node

NextSibling returns the next sibling node, or nil when this is the last child or has no parent.

func (*Node) Parent

func (n *Node) Parent() *Node

Parent returns this node's parent, or nil if it is the root.

func (*Node) ParseState

func (n *Node) ParseState() StateID

ParseState returns the parser state associated with this node.

func (*Node) PreGotoState added in v0.6.0

func (n *Node) PreGotoState() StateID

PreGotoState returns the parser state that was on top of the stack before this node was pushed (i.e., the state exposed after popping children during reduce). For non-leaf nodes: lookupGoto(PreGotoState, Symbol) == ParseState.

func (*Node) PrevSibling

func (n *Node) PrevSibling() *Node

PrevSibling returns the previous sibling node, or nil when this is the first child or has no parent.

func (*Node) Range

func (n *Node) Range() Range

Range returns the full span of this node as a Range.

func (*Node) SExpr added in v0.6.0

func (n *Node) SExpr(lang *Language) string

SExpr returns a tree-sitter-style S-expression for this node. It includes only named nodes for stable debug snapshots.

func (*Node) StartByte

func (n *Node) StartByte() uint32

StartByte returns the byte offset where this node begins.

func (*Node) StartPoint

func (n *Node) StartPoint() Point

StartPoint returns the row/column position where this node begins.

func (*Node) Symbol

func (n *Node) Symbol() Symbol

Symbol returns the node's grammar symbol.

func (*Node) Text

func (n *Node) Text(source []byte) string

Text returns the source text covered by this node.

func (*Node) Type

func (n *Node) Type(lang *Language) string

Type returns the node's type name from the language.

type ParseAction

type ParseAction struct {
	Type              ParseActionType
	State             StateID // target state (shift/recover)
	Symbol            Symbol  // reduced symbol (reduce)
	ChildCount        uint8   // children consumed (reduce)
	DynamicPrecedence int16   // precedence (reduce)
	ProductionID      uint16  // which production (reduce)
	Extra             bool    // is this an extra token (shift)
	Repetition        bool    // is this a repetition (shift)
}

ParseAction is a single parser action from the parse table.

type ParseActionEntry

type ParseActionEntry struct {
	Reusable bool
	Actions  []ParseAction
}

ParseActionEntry is a group of actions for a (state, symbol) pair.

type ParseActionType

type ParseActionType uint8

ParseActionType identifies the kind of parse action.

const (
	ParseActionShift ParseActionType = iota
	ParseActionReduce
	ParseActionAccept
	ParseActionRecover
)

type ParseOption added in v0.6.0

type ParseOption func(*parseConfig)

ParseOption configures ParseWith behavior.

func WithOldTree added in v0.6.0

func WithOldTree(oldTree *Tree) ParseOption

WithOldTree enables incremental parsing against an edited prior tree.

func WithProfiling added in v0.6.0

func WithProfiling() ParseOption

WithProfiling enables incremental parse attribution in ParseResult.Profile.

func WithTokenSource added in v0.6.0

func WithTokenSource(ts TokenSource) ParseOption

WithTokenSource provides a custom token source for parsing.

type ParseResult added in v0.6.0

type ParseResult struct {
	Tree *Tree
	// Profile is populated only when ParseWith uses WithProfiling for
	// incremental parsing.
	Profile IncrementalParseProfile
	// ProfileAvailable reports whether Profile contains attribution data.
	ProfileAvailable bool
}

ParseResult is returned by ParseWith.

type ParseRuntime added in v0.6.0

type ParseRuntime struct {
	StopReason                  ParseStopReason
	SourceLen                   uint32
	ExpectedEOFByte             uint32
	RootEndByte                 uint32
	Truncated                   bool
	TokenSourceEOFEarly         bool
	TokensConsumed              uint64
	LastTokenEndByte            uint32
	LastTokenSymbol             Symbol
	LastTokenWasEOF             bool
	IterationLimit              int
	StackDepthLimit             int
	NodeLimit                   int
	MemoryBudgetBytes           int64
	Iterations                  int
	NodesAllocated              int
	ArenaBytesAllocated         int64
	ScratchBytesAllocated       int64
	EntryScratchBytesAllocated  int64
	GSSBytesAllocated           int64
	PeakStackDepth              int
	MaxStacksSeen               int
	SingleStackIterations       int
	MultiStackIterations        int
	SingleStackTokens           uint64
	MultiStackTokens            uint64
	SingleStackGSSNodes         uint64
	MultiStackGSSNodes          uint64
	GSSNodesAllocated           uint64
	GSSNodesRetained            uint64
	GSSNodesDroppedSameToken    uint64
	ParentNodesAllocated        uint64
	ParentNodesRetained         uint64
	ParentNodesDroppedSameToken uint64
	LeafNodesAllocated          uint64
	LeafNodesRetained           uint64
	LeafNodesDroppedSameToken   uint64
	MergeStacksIn               uint64
	MergeStacksOut              uint64
	MergeSlotsUsed              uint64
	GlobalCullStacksIn          uint64
	GlobalCullStacksOut         uint64
}

ParseRuntime captures parser-loop diagnostics for a completed tree.

func (ParseRuntime) Summary added in v0.6.0

func (rt ParseRuntime) Summary() string

Summary returns a stable one-line diagnostic string for parse-runtime stats.

type ParseStopReason added in v0.6.0

type ParseStopReason string

ParseStopReason reports why parseInternal terminated.

const (
	ParseStopNone            ParseStopReason = "none"
	ParseStopAccepted        ParseStopReason = "accepted"
	ParseStopNoStacksAlive   ParseStopReason = "no_stacks_alive"
	ParseStopTokenSourceEOF  ParseStopReason = "token_source_eof"
	ParseStopTimeout         ParseStopReason = "timeout"
	ParseStopCancelled       ParseStopReason = "cancelled"
	ParseStopIterationLimit  ParseStopReason = "iteration_limit"
	ParseStopStackDepthLimit ParseStopReason = "stack_depth_limit"
	ParseStopNodeLimit       ParseStopReason = "node_limit"
	ParseStopMemoryBudget    ParseStopReason = "memory_budget"
)

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser reads parse tables from a Language and produces a syntax tree. It supports GLR parsing: when a (state, symbol) pair maps to multiple actions, the parser forks the stack and explores all alternatives in parallel while preserving distinct parse paths. Duplicate stack versions are collapsed and ambiguities are resolved at selection time.

Parser is not safe for concurrent use. Use one parser per goroutine, a ParserPool, or guard shared parser instances with external synchronization.

func NewParser

func NewParser(lang *Language) *Parser

NewParser creates a new Parser for the given language.

func (*Parser) CancellationFlag added in v0.7.0

func (p *Parser) CancellationFlag() *uint32

CancellationFlag returns the parser's current cancellation flag pointer.

func (*Parser) IncludedRanges added in v0.6.0

func (p *Parser) IncludedRanges() []Range

IncludedRanges returns a copy of the configured include ranges.

func (*Parser) Language added in v0.7.0

func (p *Parser) Language() *Language

Language returns the parser's configured language.

func (*Parser) Logger added in v0.7.0

func (p *Parser) Logger() ParserLogger

Logger returns the currently configured parser debug logger.

func (*Parser) Parse

func (p *Parser) Parse(source []byte) (*Tree, error)

Parse tokenizes and parses source using the built-in DFA lexer, returning a syntax tree. This works for hand-built grammars that provide LexStates. For real grammars that need a custom lexer, use ParseWithTokenSource. If the input is empty, it returns a tree with a nil root and no error.

func (*Parser) ParseIncremental

func (p *Parser) ParseIncremental(source []byte, oldTree *Tree) (*Tree, error)

ParseIncremental re-parses source after edits were applied to oldTree. It reuses unchanged subtrees from the old tree for better performance. Call oldTree.Edit() for each edit before calling this method.

func (*Parser) ParseIncrementalProfiled added in v0.6.0

func (p *Parser) ParseIncrementalProfiled(source []byte, oldTree *Tree) (*Tree, IncrementalParseProfile, error)

ParseIncrementalProfiled is like ParseIncremental and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseIncrementalWithTokenSource

func (p *Parser) ParseIncrementalWithTokenSource(source []byte, oldTree *Tree, ts TokenSource) (*Tree, error)

ParseIncrementalWithTokenSource is like ParseIncremental but uses a custom token source.

func (*Parser) ParseIncrementalWithTokenSourceProfiled added in v0.6.0

func (p *Parser) ParseIncrementalWithTokenSourceProfiled(source []byte, oldTree *Tree, ts TokenSource) (*Tree, IncrementalParseProfile, error)

ParseIncrementalWithTokenSourceProfiled is like ParseIncrementalWithTokenSource and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseWith added in v0.6.0

func (p *Parser) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith parses source using option-based configuration.

func (*Parser) ParseWithTokenSource

func (p *Parser) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource parses source using a custom token source. This is used for real grammars where the lexer DFA isn't available as data tables (e.g., Go grammar using go/scanner as a bridge).

func (*Parser) SetCancellationFlag added in v0.7.0

func (p *Parser) SetCancellationFlag(flag *uint32)

SetCancellationFlag configures a caller-owned cancellation flag. Parsing stops when the pointed value becomes non-zero.

func (*Parser) SetGLRTrace added in v0.7.0

func (p *Parser) SetGLRTrace(enabled bool)

SetGLRTrace enables verbose GLR stack tracing to stdout (debug only).

func (*Parser) SetIncludedRanges added in v0.6.0

func (p *Parser) SetIncludedRanges(ranges []Range)

SetIncludedRanges configures parser include ranges. Tokens outside these ranges are skipped.

func (*Parser) SetLogger added in v0.7.0

func (p *Parser) SetLogger(logger ParserLogger)

SetLogger installs a parser debug logger. Pass nil to disable logging.

func (*Parser) SetTimeoutMicros added in v0.7.0

func (p *Parser) SetTimeoutMicros(timeoutMicros uint64)

SetTimeoutMicros configures a per-parse timeout in microseconds. A value of 0 disables timeout checks.

func (*Parser) TimeoutMicros added in v0.7.0

func (p *Parser) TimeoutMicros() uint64

TimeoutMicros returns the parser timeout in microseconds.

type ParserLogType added in v0.7.0

type ParserLogType uint8

ParserLogType categorizes parser log messages.

const (
	// ParserLogParse emits parser-loop lifecycle and control-flow logs.
	ParserLogParse ParserLogType = iota
	// ParserLogLex emits token-source and token-consumption logs.
	ParserLogLex
)

type ParserLogger added in v0.7.0

type ParserLogger func(kind ParserLogType, message string)

ParserLogger receives parser debug logs when configured via SetLogger.

type ParserPool added in v0.7.0

type ParserPool struct {
	// contains filtered or unexported fields
}

ParserPool provides concurrency-safe parsing by reusing Parser instances.

ParserPool is safe for concurrent use. Each call checks out one parser from an internal sync.Pool, applies configured defaults, runs the parse, and returns the parser to the pool.

Mutable parser state (logger, timeout, cancellation flag, included ranges, GLR trace) is reset on checkout so request-local state cannot bleed across callers.

func NewParserPool added in v0.7.0

func NewParserPool(lang *Language, opts ...ParserPoolOption) *ParserPool

NewParserPool creates a concurrency-safe parser pool for lang.

func (*ParserPool) Language added in v0.7.0

func (pp *ParserPool) Language() *Language

Language returns the pool's configured language.

func (*ParserPool) Parse added in v0.7.0

func (pp *ParserPool) Parse(source []byte) (*Tree, error)

Parse delegates to a pooled Parser.Parse call.

func (*ParserPool) ParseWith added in v0.7.0

func (pp *ParserPool) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith delegates to a pooled Parser.ParseWith call.

func (*ParserPool) ParseWithTokenSource added in v0.7.0

func (pp *ParserPool) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource delegates to a pooled Parser.ParseWithTokenSource call.

type ParserPoolOption added in v0.7.0

type ParserPoolOption func(*parserPoolConfig)

ParserPoolOption configures a ParserPool.

func WithParserPoolGLRTrace added in v0.7.0

func WithParserPoolGLRTrace(enabled bool) ParserPoolOption

WithParserPoolGLRTrace toggles GLR trace logs on pooled parser instances.

func WithParserPoolIncludedRanges added in v0.7.0

func WithParserPoolIncludedRanges(ranges []Range) ParserPoolOption

WithParserPoolIncludedRanges sets default include ranges for pooled parsers.

func WithParserPoolLogger added in v0.7.0

func WithParserPoolLogger(logger ParserLogger) ParserPoolOption

WithParserPoolLogger sets the logger applied to pooled parser instances.

func WithParserPoolTimeoutMicros added in v0.7.0

func WithParserPoolTimeoutMicros(timeoutMicros uint64) ParserPoolOption

WithParserPoolTimeoutMicros sets the parse timeout for pooled parsers.

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern is a single top-level S-expression pattern in a query.

type PerfCounters added in v0.6.0

type PerfCounters struct {
	MergeCalls             uint64
	MergeDeadPruned        uint64
	MergePerKeyOverflow    uint64
	MergeReplacements      uint64
	StackEquivalentCalls   uint64
	StackEquivalentTrue    uint64
	StackEqHashMissSkips   uint64
	StackCompareCalls      uint64
	ConflictRR             uint64
	ConflictRS             uint64
	ConflictOther          uint64
	ForkCount              uint64
	FirstConflictToken     uint64
	MaxConcurrentStacks    uint64
	LexBytes               uint64
	LexTokens              uint64
	ReuseNodesVisited      uint64
	ReuseNodesPushed       uint64
	ReuseNodesPopped       uint64
	ReuseCandidatesChecked uint64
	ReuseSuccesses         uint64
	ReuseLeafSuccesses     uint64
	ReuseNonLeafChecks     uint64
	ReuseNonLeafSuccesses  uint64
	ReuseNonLeafBytes      uint64
	ReuseNonLeafNoGoto     uint64
	ReuseNonLeafNoGotoTerm uint64
	ReuseNonLeafNoGotoNt   uint64
	ReuseNonLeafStateMiss  uint64
	ReuseNonLeafStateZero  uint64
	MergeHashZero          uint64
	GlobalCapCulls         uint64
	GlobalCapCullDropped   uint64
	ReduceChainSteps       uint64
	ReduceChainMaxLen      uint64
	ReduceChainBreakMulti  uint64
	ReduceChainBreakShift  uint64
	ReduceChainBreakAccept uint64
	ParentChildPointers    uint64
	ExtraNodes             uint64
	ErrorNodes             uint64
	MergeStacksInHist      [maxGLRStacks + 2]uint64
	MergeAliveHist         [maxGLRStacks + 2]uint64
	MergeOutHist           [maxGLRStacks + 2]uint64
	ForkActionsHist        [8]uint64
}

func PerfCountersSnapshot added in v0.6.0

func PerfCountersSnapshot() PerfCounters

type Point

type Point struct {
	Row    uint32
	Column uint32
}

Point is a row/column position in source text.

type PointSkippableTokenSource

type PointSkippableTokenSource interface {
	ByteSkippableTokenSource
	SkipToByteWithPoint(offset uint32, pt Point) Token
}

PointSkippableTokenSource extends ByteSkippableTokenSource with a hint-based skip that avoids recomputing row/column from byte offset. During incremental parsing the reused node already carries its endpoint, so passing it directly eliminates the O(n) offset-to-point scan.

type Query

type Query struct {
	// contains filtered or unexported fields
}

Query holds compiled patterns parsed from a tree-sitter .scm query file. It can be executed against a syntax tree to find matching nodes and return captured names. Query is safe for concurrent use after construction.

func NewQuery

func NewQuery(source string, lang *Language) (*Query, error)

NewQuery compiles query source (tree-sitter .scm format) against a language. It returns an error if the query syntax is invalid or references unknown node types or field names.

func (*Query) CaptureCount added in v0.7.0

func (q *Query) CaptureCount() uint32

CaptureCount returns the number of unique capture names in this query.

func (*Query) CaptureNameForID added in v0.7.0

func (q *Query) CaptureNameForID(id uint32) (string, bool)

CaptureNameForID returns the capture name for the given capture id.

func (*Query) CaptureNames

func (q *Query) CaptureNames() []string

CaptureNames returns the list of unique capture names used in the query.

func (*Query) DisableCapture added in v0.7.0

func (q *Query) DisableCapture(name string)

DisableCapture removes captures with the given name from future query results. Matching behavior is unchanged; only returned captures are filtered.

func (*Query) DisablePattern added in v0.7.0

func (q *Query) DisablePattern(patternIndex uint32)

DisablePattern disables a pattern by index.

func (*Query) EndByteForPattern added in v0.7.0

func (q *Query) EndByteForPattern(patternIndex uint32) (uint32, bool)

EndByteForPattern returns the query-source end byte for patternIndex.

func (*Query) Exec

func (q *Query) Exec(node *Node, lang *Language, source []byte) *QueryCursor

Exec creates a streaming cursor over matches rooted at node.

func (*Query) Execute

func (q *Query) Execute(tree *Tree) []QueryMatch

Execute runs the query against a syntax tree and returns all matches.

func (*Query) ExecuteNode

func (q *Query) ExecuteNode(node *Node, lang *Language, source []byte) []QueryMatch

ExecuteNode runs the query starting from a specific node.

source is required for text predicates (like #eq? / #match?); pass the originating source bytes for correct predicate evaluation.

func (*Query) IsPatternGuaranteedAtStep added in v0.7.0

func (q *Query) IsPatternGuaranteedAtStep(patternIndex uint32, stepIndex uint32) bool

IsPatternGuaranteedAtStep reports whether all steps through stepIndex are definite and non-quantified.

func (*Query) IsPatternNonLocal added in v0.7.0

func (q *Query) IsPatternNonLocal(patternIndex uint32) bool

IsPatternNonLocal reports whether the pattern can begin at multiple roots.

func (*Query) IsPatternRooted added in v0.7.0

func (q *Query) IsPatternRooted(patternIndex uint32) bool

IsPatternRooted reports whether the pattern has exactly one root step at depth 0. Rooted patterns start matching from a single concrete root.

func (*Query) PatternCount

func (q *Query) PatternCount() int

PatternCount returns the number of patterns in the query.

func (*Query) PredicatesForPattern added in v0.7.0

func (q *Query) PredicatesForPattern(patternIndex uint32) ([]QueryPredicate, bool)

PredicatesForPattern returns a copy of predicates attached to patternIndex.

func (*Query) StartByteForPattern added in v0.7.0

func (q *Query) StartByteForPattern(patternIndex uint32) (uint32, bool)

StartByteForPattern returns the query-source start byte for patternIndex.

func (*Query) StepIsDefinite added in v0.7.0

func (q *Query) StepIsDefinite(patternIndex uint32, stepIndex uint32) bool

StepIsDefinite reports whether a pattern step matches a definite symbol (i.e. not wildcard).

func (*Query) StringCount added in v0.7.0

func (q *Query) StringCount() uint32

StringCount returns the number of unique string literals in this query.

func (*Query) StringValueForID added in v0.7.0

func (q *Query) StringValueForID(id uint32) (string, bool)

StringValueForID returns the string literal for the given string id.

type QueryCapture

type QueryCapture struct {
	Name string
	Node *Node
	// TextOverride, when non-empty, replaces the node's source text for
	// downstream consumers. It is set by the #strip! directive.
	TextOverride string
}

QueryCapture is a single captured node within a match.

func (QueryCapture) Text added in v0.6.0

func (c QueryCapture) Text(source []byte) string

Text returns the effective text for this capture. If TextOverride is set (e.g. by the #strip! directive), it is returned. Otherwise the node's source text is returned.

type QueryCursor

type QueryCursor struct {
	// contains filtered or unexported fields
}

QueryCursor incrementally walks a node subtree and yields matches one by one. It is the streaming counterpart to Query.Execute and avoids materializing all matches up front. QueryCursor is not safe for concurrent use.

func (*QueryCursor) DidExceedMatchLimit added in v0.7.0

func (c *QueryCursor) DidExceedMatchLimit() bool

DidExceedMatchLimit reports whether query execution had additional matches beyond the configured match limit.

func (*QueryCursor) NextCapture

func (c *QueryCursor) NextCapture() (QueryCapture, bool)

NextCapture yields captures in match order by draining NextMatch results. This is a practical first-pass ordering: captures are returned in each match's capture order, then by subsequent matches in DFS match order.

func (*QueryCursor) NextMatch

func (c *QueryCursor) NextMatch() (QueryMatch, bool)

NextMatch yields the next query match from the cursor.

func (*QueryCursor) SetByteRange added in v0.6.0

func (c *QueryCursor) SetByteRange(startByte, endByte uint32)

SetByteRange restricts matches to nodes that intersect [startByte, endByte).

func (*QueryCursor) SetMatchLimit added in v0.7.0

func (c *QueryCursor) SetMatchLimit(limit uint32)

SetMatchLimit sets the maximum number of matches this cursor can return. A limit of 0 means unlimited.

func (*QueryCursor) SetMaxStartDepth added in v0.7.0

func (c *QueryCursor) SetMaxStartDepth(depth uint32)

SetMaxStartDepth limits the depth at which new matches can begin. Depth 0 means only the starting node passed to Exec.

func (*QueryCursor) SetPointRange added in v0.6.0

func (c *QueryCursor) SetPointRange(startPoint, endPoint Point)

SetPointRange restricts matches to nodes that intersect [startPoint, endPoint).

type QueryMatch

type QueryMatch struct {
	PatternIndex int
	Captures     []QueryCapture
}

QueryMatch represents a successful pattern match with its captures.

func (QueryMatch) SetValues added in v0.6.0

func (m QueryMatch) SetValues(q *Query, key string) []string

SetValues returns the values of a #set! directive with the given key for a match's pattern, or nil if not present. This is used by InjectionParser to read injection.language metadata.

type QueryPredicate

type QueryPredicate struct {
	// contains filtered or unexported fields
}

QueryPredicate is a post-match constraint attached to a pattern. Supported forms:

  • (#eq? @a @b)
  • (#eq? @a "literal")
  • (#not-eq? @a @b)
  • (#not-eq? @a "literal")
  • (#match? @a "regex")
  • (#not-match? @a "regex")
  • (#lua-match? @a "lua-pattern")
  • (#any-of? @a "v1" "v2" ...)
  • (#not-any-of? @a "v1" "v2" ...)
  • (#any-eq? @a "literal"), (#any-eq? @a @b)
  • (#any-not-eq? @a "literal"), (#any-not-eq? @a @b)
  • (#any-match? @a "regex")
  • (#any-not-match? @a "regex")
  • (#has-ancestor? @a type ...)
  • (#not-has-ancestor? @a type ...)
  • (#not-has-parent? @a type ...)
  • (#is? ...), (#is-not? ...)
  • (#set! key value), (#offset! @cap ...)
  • (#count? @a op value) -- op: >, <, >=, <=, ==, !=
  • (#is-exported? @a)

type QueryStep

type QueryStep struct {
	// contains filtered or unexported fields
}

QueryStep is one matching instruction within a pattern.

type Range

type Range struct {
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
}

Range is a span of source text.

func DiffChangedRanges added in v0.6.0

func DiffChangedRanges(oldTree, newTree *Tree) []Range

DiffChangedRanges compares two syntax trees and returns the minimal ranges where syntactic structure differs. The old tree should have been edited (via Tree.Edit) to match the new tree's source positions before reparsing.

This is equivalent to C tree-sitter's ts_tree_get_changed_ranges().

type Rewriter added in v0.6.0

type Rewriter struct {
	// contains filtered or unexported fields
}

Rewriter collects source-text edits and applies them atomically. Edits target byte ranges (usually from Node.StartByte/EndByte). Apply returns new source bytes and InputEdit records for incremental reparsing. Rewriter is not safe for concurrent use.

func NewRewriter added in v0.6.0

func NewRewriter(source []byte) *Rewriter

NewRewriter creates a Rewriter for the given source text.

func (*Rewriter) Apply added in v0.6.0

func (r *Rewriter) Apply() (newSource []byte, edits []InputEdit, err error)

Apply sorts edits, validates no overlaps, applies them, and returns the new source bytes plus InputEdit records for incremental reparsing. Returns error if edits overlap.

func (*Rewriter) ApplyToTree added in v0.6.0

func (r *Rewriter) ApplyToTree(tree *Tree) ([]byte, error)

ApplyToTree is a convenience that calls Apply(), then tree.Edit() for each edit, returning the new source ready for ParseIncremental.

func (*Rewriter) Delete added in v0.6.0

func (r *Rewriter) Delete(node *Node)

Delete removes the source text covered by node.

func (*Rewriter) InsertAfter added in v0.6.0

func (r *Rewriter) InsertAfter(node *Node, text []byte)

InsertAfter inserts text immediately after node.

func (*Rewriter) InsertBefore added in v0.6.0

func (r *Rewriter) InsertBefore(node *Node, text []byte)

InsertBefore inserts text immediately before node.

func (*Rewriter) Replace added in v0.6.0

func (r *Rewriter) Replace(node *Node, newText []byte)

Replace replaces the source text covered by node with newText.

func (*Rewriter) ReplaceRange added in v0.6.0

func (r *Rewriter) ReplaceRange(startByte, endByte uint32, newText []byte)

ReplaceRange replaces bytes in [startByte, endByte) with newText.

type StateID

type StateID uint16

StateID is a parser state index.

type Symbol

type Symbol uint16

Symbol is a grammar symbol ID (terminal or nonterminal).

type SymbolMetadata

type SymbolMetadata struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
}

SymbolMetadata holds display information about a symbol.

type Tag

type Tag struct {
	Kind      string // e.g. "definition.function", "reference.call"
	Name      string // the captured symbol text
	Range     Range  // full span of the tagged node
	NameRange Range  // span of the @name capture
}

Tag represents a tagged symbol in source code, extracted by a Tagger. Kind follows tree-sitter convention: "definition.function", "reference.call", etc. Name is the captured symbol text (e.g., the function name).

type Tagger

type Tagger struct {
	// contains filtered or unexported fields
}

Tagger extracts symbol definitions and references from source code using tree-sitter tags queries. It is the tagging counterpart to Highlighter.

Tags queries use a convention where captures follow the pattern:

  • @name captures the symbol name (e.g., function identifier)
  • @definition.X or @reference.X captures the kind

Example query:

(function_declaration name: (identifier) @name) @definition.function
(call_expression function: (identifier) @name) @reference.call

func NewTagger

func NewTagger(lang *Language, tagsQuery string, opts ...TaggerOption) (*Tagger, error)

NewTagger creates a Tagger for the given language and tags query.

func (*Tagger) Tag

func (tg *Tagger) Tag(source []byte) []Tag

Tag parses source and returns all tags.

func (*Tagger) TagIncremental

func (tg *Tagger) TagIncremental(source []byte, oldTree *Tree) ([]Tag, *Tree)

TagIncremental re-tags source after edits to oldTree. Returns the tags and the new tree for subsequent incremental calls.

func (*Tagger) TagTree

func (tg *Tagger) TagTree(tree *Tree) []Tag

TagTree extracts tags from an already-parsed tree.

type TaggerOption

type TaggerOption func(*Tagger)

TaggerOption configures a Tagger.

func WithTaggerTokenSourceFactory

func WithTaggerTokenSourceFactory(factory func(source []byte) TokenSource) TaggerOption

WithTaggerTokenSourceFactory sets a factory function that creates a TokenSource for each Tag call.

type Token

type Token struct {
	Symbol     Symbol
	Text       string
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
	Missing    bool
	// NoLookahead marks a synthetic EOF used to force EOF-table reductions
	// without consuming input, matching tree-sitter's lex_state = -1.
	NoLookahead bool
}

Token is a lexed token with position info.

type TokenSource

type TokenSource interface {
	// Next returns the next token. It should skip whitespace and comments
	// as appropriate for the language. Returns a zero-Symbol token at EOF.
	Next() Token
}

TokenSource provides tokens to the parser. This interface abstracts over different lexer implementations: the built-in DFA lexer (for hand-built grammars) or custom bridges like GoTokenSource (for real grammars where we can't extract the C lexer DFA).

type TokenSourceRebuilder added in v0.7.0

type TokenSourceRebuilder interface {
	RebuildTokenSource(source []byte, lang *Language) (TokenSource, error)
}

TokenSourceRebuilder is an optional extension for token sources that can build a fresh equivalent token source for another source buffer. Result normalization uses this to reparse isolated fragments with the same lexer backend as the original parse.

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

Tree holds a complete syntax tree along with its source text and language. Tree is safe for concurrent reads after construction. Edit and Release are not safe for concurrent use.

func NewTree

func NewTree(root *Node, source []byte, lang *Language) *Tree

NewTree creates a new Tree.

func (*Tree) ChangedRanges added in v0.6.0

func (t *Tree) ChangedRanges() []Range

ChangedRanges converts this tree's recorded edits into changed source ranges. Overlapping ranges are coalesced.

func (*Tree) Copy added in v0.7.0

func (t *Tree) Copy() *Tree

Copy returns an independent copy of this tree.

The copied tree has distinct node objects, so subsequent Tree.Edit calls on either tree do not mutate the other's spans/dirty bits. Source bytes and language pointer are shared (read-only).

func (*Tree) DOT added in v0.7.0

func (t *Tree) DOT(lang *Language) string

DOT returns a DOT graph representation of this tree.

func (*Tree) Edit

func (t *Tree) Edit(edit InputEdit)

Edit records an edit on this tree. Call this before ParseIncremental to inform the parser which regions changed. The edit adjusts byte offsets and marks overlapping nodes as dirty so the incremental parser knows what to re-parse.

func (*Tree) Edits

func (t *Tree) Edits() []InputEdit

Edits returns the pending edits recorded on this tree.

func (*Tree) Language

func (t *Tree) Language() *Language

Language returns the language used to parse this tree.

func (*Tree) ParseRuntime added in v0.6.0

func (t *Tree) ParseRuntime() ParseRuntime

ParseRuntime returns parser-loop diagnostics captured when this tree was built.

func (*Tree) ParseStopReason added in v0.6.0

func (t *Tree) ParseStopReason() ParseStopReason

ParseStopReason reports why parsing terminated.

func (*Tree) ParseStoppedEarly added in v0.6.0

func (t *Tree) ParseStoppedEarly() bool

ParseStoppedEarly reports whether parsing hit an early-stop condition.

func (*Tree) Release

func (t *Tree) Release()

Release decrements arena references held by this tree. After Release, the tree should be treated as invalid and not reused.

func (*Tree) RootNode

func (t *Tree) RootNode() *Node

RootNode returns the tree's root node.

func (*Tree) RootNodeWithOffset added in v0.7.0

func (t *Tree) RootNodeWithOffset(offsetBytes uint32, offsetExtent Point) *Node

RootNodeWithOffset returns a copy of the root node with all spans shifted by the provided byte and point offsets.

This mirrors tree-sitter C's root-node-with-offset behavior for callers that need to embed a parsed tree at a larger document offset.

func (*Tree) Source

func (t *Tree) Source() []byte

Source returns the original source text.

func (*Tree) WriteDOT added in v0.7.0

func (t *Tree) WriteDOT(w io.Writer, lang *Language) error

WriteDOT writes a DOT graph representation of this tree to w.

type TreeCursor added in v0.6.0

type TreeCursor struct {
	// contains filtered or unexported fields
}

TreeCursor provides stateful, O(1) tree navigation. It maintains a stack of (node, childIndex) frames enabling efficient parent, child, and sibling movement without scanning.

The cursor holds pointers to Nodes. If the underlying Tree is released, edited, or replaced via incremental reparse, the cursor should be recreated.

func NewTreeCursor added in v0.6.0

func NewTreeCursor(node *Node, tree *Tree) *TreeCursor

NewTreeCursor creates a cursor starting at the given node. The optional tree reference enables field name resolution and text extraction.

func NewTreeCursorFromTree added in v0.6.0

func NewTreeCursorFromTree(tree *Tree) *TreeCursor

NewTreeCursorFromTree creates a cursor starting at the tree's root node.

func (*TreeCursor) Copy added in v0.6.0

func (c *TreeCursor) Copy() *TreeCursor

Copy returns an independent copy of the cursor. The copy shares the same tree reference but has its own navigation stack.

func (*TreeCursor) CurrentFieldID added in v0.6.0

func (c *TreeCursor) CurrentFieldID() FieldID

CurrentFieldID returns the field ID of the current node within its parent. Returns 0 if the cursor is at the root or the node has no field assignment.

func (*TreeCursor) CurrentFieldName added in v0.6.0

func (c *TreeCursor) CurrentFieldName() string

CurrentFieldName returns the field name of the current node within its parent. Returns "" if no tree is associated, the cursor is at the root, or the node has no field assignment.

func (*TreeCursor) CurrentNode added in v0.6.0

func (c *TreeCursor) CurrentNode() *Node

CurrentNode returns the node the cursor is currently pointing to.

func (*TreeCursor) CurrentNodeIsNamed added in v0.6.0

func (c *TreeCursor) CurrentNodeIsNamed() bool

CurrentNodeIsNamed returns whether the current node is a named node.

func (*TreeCursor) CurrentNodeText added in v0.6.0

func (c *TreeCursor) CurrentNodeText() string

CurrentNodeText returns the source text of the current node. Requires a tree with source to be associated.

func (*TreeCursor) CurrentNodeType added in v0.6.0

func (c *TreeCursor) CurrentNodeType() string

CurrentNodeType returns the type name of the current node. Requires a tree with a language to be associated.

func (*TreeCursor) Depth added in v0.6.0

func (c *TreeCursor) Depth() int

Depth returns the cursor's current depth (0 at the root).

func (*TreeCursor) GotoChildByFieldID added in v0.6.0

func (c *TreeCursor) GotoChildByFieldID(fid FieldID) bool

GotoChildByFieldID moves the cursor to the first child with the given field ID. Returns false if no child has that field.

func (*TreeCursor) GotoChildByFieldName added in v0.6.0

func (c *TreeCursor) GotoChildByFieldName(name string) bool

GotoChildByFieldName moves the cursor to the first child with the given field name. Returns false if the tree has no language, the field name is unknown, or no child has that field.

func (*TreeCursor) GotoFirstChild added in v0.6.0

func (c *TreeCursor) GotoFirstChild() bool

GotoFirstChild moves the cursor to the first child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoFirstChildForByte added in v0.6.0

func (c *TreeCursor) GotoFirstChildForByte(targetByte uint32) int64

GotoFirstChildForByte moves the cursor to the first child whose byte range contains targetByte (i.e., first child where endByte > targetByte). Returns the child index, or -1 when no child contains the byte.

func (*TreeCursor) GotoFirstChildForPoint added in v0.6.0

func (c *TreeCursor) GotoFirstChildForPoint(targetPoint Point) int64

GotoFirstChildForPoint moves the cursor to the first child whose point range contains targetPoint (i.e., first child where endPoint > targetPoint). Returns the child index, or -1 when no child contains the point.

func (*TreeCursor) GotoFirstNamedChild added in v0.6.0

func (c *TreeCursor) GotoFirstNamedChild() bool

GotoFirstNamedChild moves the cursor to the first named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoLastChild added in v0.6.0

func (c *TreeCursor) GotoLastChild() bool

GotoLastChild moves the cursor to the last child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoLastNamedChild added in v0.6.0

func (c *TreeCursor) GotoLastNamedChild() bool

GotoLastNamedChild moves the cursor to the last named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoNextNamedSibling added in v0.6.0

func (c *TreeCursor) GotoNextNamedSibling() bool

GotoNextNamedSibling moves the cursor to the next named sibling, skipping anonymous nodes. Returns false if no named sibling follows.

func (*TreeCursor) GotoNextSibling added in v0.6.0

func (c *TreeCursor) GotoNextSibling() bool

GotoNextSibling moves the cursor to the next sibling. Returns false if the cursor is at the root or the last sibling.

func (*TreeCursor) GotoParent added in v0.6.0

func (c *TreeCursor) GotoParent() bool

GotoParent moves the cursor to the parent of the current node. Returns false if the cursor is at the root.

func (*TreeCursor) GotoPrevNamedSibling added in v0.6.0

func (c *TreeCursor) GotoPrevNamedSibling() bool

GotoPrevNamedSibling moves the cursor to the previous named sibling, skipping anonymous nodes. Returns false if no named sibling precedes.

func (*TreeCursor) GotoPrevSibling added in v0.6.0

func (c *TreeCursor) GotoPrevSibling() bool

GotoPrevSibling moves the cursor to the previous sibling. Returns false if the cursor is at the root or the first sibling.

func (*TreeCursor) Reset added in v0.6.0

func (c *TreeCursor) Reset(node *Node)

Reset resets the cursor to a new root node, clearing the navigation stack.

func (*TreeCursor) ResetTree added in v0.6.0

func (c *TreeCursor) ResetTree(tree *Tree)

ResetTree resets the cursor to the root of a new tree.

type WalkAction

type WalkAction int

WalkAction controls the tree walk behavior.

const (
	// WalkContinue continues the walk to children and siblings.
	WalkContinue WalkAction = iota
	// WalkSkipChildren skips the current node's children but continues to siblings.
	WalkSkipChildren
	// WalkStop terminates the walk entirely.
	WalkStop
)

Directories

Path Synopsis
cmd
benchgate command
benchmatrix command
gen_linguist command
Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.
Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.
grammar_updater command
Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.
Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.
harnessgate command
parity_report command
perfprobe command
ts2go command
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
tsquery command
Command tsquery generates type-safe Go code from tree-sitter .scm query files.
Command tsquery generates type-safe Go code from tree-sitter .scm query files.
Package grammars provides 206 embedded tree-sitter grammars as compressed binary blobs with lazy loading.
Package grammars provides 206 embedded tree-sitter grammars as compressed binary blobs with lazy loading.
Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.
Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL