gotreesitter

package module

v0.15.3 Latest Latest Go to latest Published: Apr 26, 2026 License: MIT Imports: 20 Imported by: 26

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/odvcencio/gotreesitter

Links

Open Source Insights

README ¶

gotreesitter

Pure-Go tree-sitter runtime. No CGo, no C toolchain. Cross-compiles to any GOOS/GOARCH target Go supports, including wasip1.

go get github.com/odvcencio/gotreesitter

gotreesitter loads the same parse-table format that tree-sitter's C runtime uses. Grammar tables are extracted from upstream parser.c files by ts2go, compressed into binary blobs, and deserialized on first use. 206 grammars ship in the registry.

Motivation

Every Go tree-sitter binding in the ecosystem depends on CGo:

Cross-compilation requires a C cross-toolchain per target. GOOS=wasip1, GOARCH=arm64 from a Linux host, or any Windows build without MSYS2/MinGW, will not link.
CI images must carry gcc and the grammar's C sources. go install fails for downstream users who don't have a C compiler.
The Go race detector, coverage instrumentation, and fuzzer cannot see across the CGo boundary. Bugs in the C runtime or in FFI marshaling are invisible to go test -race.

gotreesitter eliminates the C dependency entirely. The parser, lexer, query engine, incremental reparsing, arena allocator, external scanners, and tree cursor are all implemented in Go. The only input is the grammar blob.

Quick start

import (
    "fmt"

    "github.com/odvcencio/gotreesitter"
    "github.com/odvcencio/gotreesitter/grammars"
)

func main() {
    src := []byte(`package main

func main() {}
`)

    lang := grammars.GoLanguage()
    parser := gotreesitter.NewParser(lang)

    tree, _ := parser.Parse(src)
    fmt.Println(tree.RootNode())
}

grammars.DetectLanguage("main.go") resolves a filename to the appropriate LangEntry.

Queries

q, _ := gotreesitter.NewQuery(`(function_declaration name: (identifier) @fn)`, lang)
cursor := q.Exec(tree.RootNode(), lang, src)

for {
    match, ok := cursor.NextMatch()
    if !ok {
        break
    }
    for _, cap := range match.Captures {
        fmt.Println(cap.Node.Text(src))
    }
}

The query engine supports the full S-expression pattern language: structural quantifiers (?, *, +), alternation ([...]), field constraints, negated fields, anchor (!), and all standard predicates. See Query API.

Typed query codegen

Generate type-safe Go wrappers from .scm query files:

go run ./cmd/tsquery -input queries/go_functions.scm -lang go -output go_functions_query.go -package queries

Given a query like (function_declaration name: (identifier) @name body: (block) @body), tsquery generates:

type FunctionDeclarationMatch struct {
    Name *gotreesitter.Node
    Body *gotreesitter.Node
}

q, _ := queries.NewGoFunctionsQuery(lang)
cursor := q.Exec(tree.RootNode(), lang, src)
for {
    match, ok := cursor.Next()
    if !ok { break }
    fmt.Println(match.Name.Text(src))
}

Multi-pattern queries generate one struct per pattern with MatchPatternN conversion helpers.

Multi-language documents (injection parsing)

Parse documents with embedded languages (HTML+JS+CSS, Markdown+code fences, Vue/Svelte templates):

ip := gotreesitter.NewInjectionParser()
ip.RegisterLanguage("html", htmlLang)
ip.RegisterLanguage("javascript", jsLang)
ip.RegisterLanguage("css", cssLang)
ip.RegisterInjectionQuery("html", injectionQuery)

result, _ := ip.Parse(source, "html")

for _, inj := range result.Injections {
    fmt.Printf("%s: %d ranges\n", inj.Language, len(inj.Ranges))
    // inj.Tree is the child language's parse tree
}

Supports static (#set! injection.language "javascript") and dynamic (@injection.language capture) language detection, recursive nested injections, and incremental reparse with child tree reuse.

Source rewriting

Collect source-level edits and apply atomically, producing InputEdit records for incremental reparse:

rw := gotreesitter.NewRewriter(src)
rw.Replace(funcNameNode, []byte("newName"))
rw.InsertBefore(bodyNode, []byte("// added\n"))
rw.Delete(unusedNode)

newSrc, _ := rw.ApplyToTree(tree)
newTree, _ := parser.ParseIncremental(newSrc, tree)

Apply() returns both the new source bytes and the []InputEdit records. ApplyToTree() is a convenience that calls tree.Edit() for each edit and returns source ready for ParseIncremental.

Incremental reparsing

tree, _ := parser.Parse(src)

// User types "x" at byte offset 42
src = append(src[:42], append([]byte("x"), src[42:]...)...)

tree.Edit(gotreesitter.InputEdit{
    StartByte:   42,
    OldEndByte:  42,
    NewEndByte:  43,
    StartPoint:  gotreesitter.Point{Row: 3, Column: 10},
    OldEndPoint: gotreesitter.Point{Row: 3, Column: 10},
    NewEndPoint: gotreesitter.Point{Row: 3, Column: 11},
})

tree2, _ := parser.ParseIncremental(src, tree)

ParseIncremental walks the old tree's spine, identifies the edit region, and reuses unchanged subtrees by reference. Only the invalidated span is re-lexed and re-parsed. Both leaf and non-leaf subtrees are eligible for reuse; non-leaf reuse is driven by pre-goto state tracking on interior nodes, so the parser can skip entire subtrees without re-deriving their contents.

When no edit has occurred, ParseIncremental detects the nil-edit on a pointer check and returns in single-digit nanoseconds with zero allocations.

Tree cursor

TreeCursor maintains an explicit (node, childIndex) frame stack. Parent, child, and sibling movement are O(1) with zero allocations — sibling traversal indexes directly into the parent's children[] slice.

c := gotreesitter.NewTreeCursorFromTree(tree)

c.GotoFirstChild()
c.GotoChildByFieldName("body")

for ok := c.GotoFirstNamedChild(); ok; ok = c.GotoNextNamedSibling() {
    fmt.Printf("%s at %d\n", c.CurrentNodeType(), c.CurrentNode().StartByte())
}

idx := c.GotoFirstChildForByte(128)

Movement methods: GotoFirstChild, GotoLastChild, GotoNextSibling, GotoPrevSibling, GotoParent, named-only variants (GotoFirstNamedChild, etc.), field-based (GotoChildByFieldName, GotoChildByFieldID), and position-based (GotoFirstChildForByte, GotoFirstChildForPoint).

Cursors hold direct pointers into tree nodes. Recreate after Tree.Release(), Tree.Edit(...), or incremental reparse.

Highlighting

hl, _ := gotreesitter.NewHighlighter(lang, highlightQuery)
ranges := hl.Highlight(src)

for _, r := range ranges {
    fmt.Printf("%s: %q\n", r.Capture, src[r.StartByte:r.EndByte])
}

Tagging

entry := grammars.DetectLanguage("main.go")
lang := entry.Language()

tagger, _ := gotreesitter.NewTagger(lang, entry.TagsQuery)
tags := tagger.Tag(src)

for _, tag := range tags {
    fmt.Printf("%s %s at %d:%d\n", tag.Kind, tag.Name,
        tag.NameRange.StartPoint.Row, tag.NameRange.StartPoint.Column)
}

Benchmarks

All measurements below use the same workload: a generated Go source file with 500 functions (19294 bytes). Numbers are medians from 10 runs on:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) Ultra 9 285

Runtime	Full parse	Incremental (1-byte edit)	Incremental (no edit)
Native C (pure C runtime)	1.76 ms	102.3 μs	101.7 μs
CGo binding (C runtime via cgo)	~2.0 ms	~130 μs	—
gotreesitter (pure Go)	4.20 ms	1.49 μs	2.18 ns

On this workload:

Full parse is ~2.4x slower than native C.
Incremental single-byte edits are ~69x faster than native C (~87x faster than CGo).
No-edit reparses are ~46,600x faster than native C, zero allocations.

Raw benchmark output

# Pure Go (this repo):
GOMAXPROCS=1 go test . -run '^$' \
  -bench 'BenchmarkGoParseFullDFA|BenchmarkGoParseIncrementalSingleByteEditDFA|BenchmarkGoParseIncrementalNoEditDFA' \
  -benchmem -count=10 -benchtime=1s

# CGo binding benchmarks:
cd cgo_harness
GOMAXPROCS=1 go test . -run '^$' -tags treesitter_c_bench \
  -bench 'BenchmarkCTreeSitterGoParseFull|BenchmarkCTreeSitterGoParseIncrementalSingleByteEdit|BenchmarkCTreeSitterGoParseIncrementalNoEdit' \
  -benchmem -count=10 -benchtime=750ms

# Native C benchmarks (no Go, direct C binary):
./pure_c/run_go_benchmark.sh 500 2000 20000

Benchmark	Median ns/op	B/op	allocs/op
Native C full parse	1,764,436	—	—
Native C incremental (1-byte edit)	102,336	—	—
Native C incremental (no edit)	101,740	—	—
`CTreeSitterGoParseFull`	~1,990,000	600	6
`CTreeSitterGoParseIncrementalSingleByteEdit`	~130,000	648	7
`GoParseFullDFA`	4,197,811	585	7
`GoParseIncrementalSingleByteEditDFA`	1,490	1,584	9
`GoParseIncrementalNoEditDFA`	2.181	0	0

Benchmark matrix

For repeatable multi-workload tracking:

go run ./cmd/benchmatrix --count 10

Emits bench_out/matrix.json (machine-readable), bench_out/matrix.md (summary), and raw logs under bench_out/raw/.

Supported languages

206 grammars ship in the registry. All 206 produce error-free parse trees on smoke samples. Run go run ./cmd/parity_report for current status.

116 external scanners (hand-written Go implementations of upstream C scanners)
7 hand-written Go token sources (authzed, c, cpp, go, java, json, lua)
Remaining languages use the DFA lexer generated from grammar tables

Parse quality

Each LangEntry carries a Quality field:

Quality	Meaning
`full`	All scanner and lexer components present. Parser has full access to the grammar.
`partial`	Missing external scanner. DFA lexer handles what it can; external tokens are skipped.
`none`	Cannot parse.

full means the parser has every component the grammar requires. It does not guarantee error-free trees on all inputs — grammars with high GLR ambiguity may produce syntax errors on very large or deeply nested constructs due to parser safety limits (iteration cap, stack depth cap, node count cap). These limits scale with input size. Check tree.RootNode().HasError() at runtime.

Full language list (206)

ada, agda, angular, apex, arduino, asm, astro, authzed, awk, bash, bass, beancount, bibtex, bicep, bitbake, blade, brightscript, c, c_sharp, caddy, cairo, capnp, chatito, circom, clojure, cmake, cobol, comment, commonlisp, cooklang, corn, cpon, cpp, crystal, css, csv, cuda, cue, cylc, d, dart, desktop, devicetree, dhall, diff, disassembly, djot, dockerfile, dot, doxygen, dtd, earthfile, ebnf, editorconfig, eds, eex, elisp, elixir, elm, elsa, embedded_template, enforce, erlang, facility, faust, fennel, fidl, firrtl, fish, foam, forth, fortran, fsharp, gdscript, git_config, git_rebase, gitattributes, gitcommit, gitignore, gleam, glsl, gn, go, godot_resource, gomod, graphql, groovy, hack, hare, haskell, haxe, hcl, heex, hlsl, html, http, hurl, hyprlang, ini, janet, java, javascript, jinja2, jq, jsdoc, json, json5, jsonnet, julia, just, kconfig, kdl, kotlin, ledger, less, linkerscript, liquid, llvm, lua, luau, make, markdown, markdown_inline, matlab, mermaid, meson, mojo, move, nginx, nickel, nim, ninja, nix, norg, nushell, objc, ocaml, odin, org, pascal, pem, perl, php, pkl, powershell, prisma, prolog, promql, properties, proto, pug, puppet, purescript, python, ql, r, racket, regex, rego, requirements, rescript, robot, ron, rst, ruby, rust, scala, scheme, scss, smithy, solidity, sparql, sql, squirrel, ssh_config, starlark, svelte, swift, tablegen, tcl, teal, templ, textproto, thrift, tlaplus, tmux, todotxt, toml, tsx, turtle, twig, typescript, typst, uxntal, v, verilog, vhdl, vimdoc, vue, wat, wgsl, wolfram, xml, yaml, yuck, zig

Query API

Feature	Status
Compile + execute (`NewQuery`, `Execute`, `ExecuteNode`)	supported
Cursor streaming (`Exec`, `NextMatch`, `NextCapture`)	supported
Structural quantifiers (`?`, `*`, `+`)	supported
Alternation (`[...]`)	supported
Field matching (`name: (identifier)`)	supported
`#eq?` / `#not-eq?`	supported
`#match?` / `#not-match?`	supported
`#any-of?` / `#not-any-of?`	supported
`#lua-match?`	supported
`#has-ancestor?` / `#not-has-ancestor?`	supported
`#not-has-parent?`	supported
`#is?` / `#is-not?`	supported
`#any-eq?` / `#any-not-eq?`	supported
`#any-match?` / `#any-not-match?`	supported
`#select-adjacent!`	supported
`#strip!`	supported
`#set!` / `#offset!` directives	parsed and accepted
`SetValues` (read `#set!` metadata from matches)	supported

All shipped highlight and tags queries compile (156/156 highlight, 69/69 tags).

Known limitations

Full-parse throughput: ~2.4x slower than the C runtime on cold full parses (the 500-function Go benchmark). Incremental reparsing — the dominant operation in editor workloads — is 69x faster.
GLR safety caps: The parser enforces iteration, stack depth, and node count limits proportional to input size. These prevent pathological blowup on grammars with high ambiguity but impose a ceiling on the maximum input complexity that parses without error. The caps are tunable but not removable without risking unbounded resource consumption.

Adding a language

Add the grammar repo to grammars/languages.manifest
Refresh pinned refs in grammars/languages.lock: go run ./cmd/grammar_updater -lock grammars/languages.lock -write -report grammars/grammar_updates.json
Generate tables: go run ./cmd/ts2go -manifest grammars/languages.manifest -outdir ./grammars -package grammars -compact=true
Add smoke samples to cmd/parity_report/main.go and grammars/parse_support_test.go
Verify: go run ./cmd/parity_report && go test ./grammars/...

Grammar lock updates

grammars/languages.lock stores pinned refs for grammar update + parity automation.
cmd/grammar_updater refreshes refs and emits a machine-readable report.
.github/workflows/grammar-lock-update.yml opens scheduled/dispatch update PRs.

Manual refresh:

go run ./cmd/grammar_updater \
  -lock grammars/languages.lock \
  -allow-list grammars/update_tier1_core100.txt \
  -max-updates 10 \
  -write \
  -report grammars/grammar_updates.json

Architecture

gotreesitter is a ground-up reimplementation of the tree-sitter runtime in Go. No code is shared with or translated from the C implementation.

Parser — Table-driven LR(1) with GLR fallback. When a (state, symbol) pair maps to multiple actions in the parse table, the parser forks the stack and explores all alternatives in parallel. Stack merging collapses equivalent paths. Safety limits (iteration count, stack depth, node count) scale with input size and prevent runaway exploration on ambiguous grammars.

Incremental engine — Walks the edit region of the previous tree and reuses unchanged subtrees by reference. Non-leaf subtree reuse is enabled by storing a pre-goto parser state on each interior node, allowing the parser to skip an entire subtree and resume in the correct state without re-deriving its contents. External scanner state is serialized on each node boundary so scanner-dependent subtrees can be reused without replaying the scanner from the start.

Lexer — Two paths. A DFA lexer is generated from the grammar's lex tables by ts2go and handles the majority of languages. For grammars where the DFA is insufficient (e.g., Go's automatic semicolons, YAML's indentation-sensitive structure), hand-written Go token sources implement the TokenSource interface directly.

External scanners — 116 grammars require external scanners for context-sensitive tokens (Python indentation, HTML implicit close tags, Rust raw string delimiters, Swift operator disambiguation, etc.). Each scanner is a hand-written Go implementation of the grammar's ExternalScanner interface: Create, Serialize, Deserialize, Scan. Scanner state is snapshotted after every token and stored on tree nodes so incremental reuse can restore scanner state on skip.

Arena allocator — Nodes are allocated from slab-based arenas to reduce GC pressure. Arenas are released in bulk when a tree is freed.

Query engine — S-expression pattern compiler with predicate evaluation and streaming cursor iteration. Supports all standard tree-sitter predicates (#eq?, #match?, #any-of?, #has-ancestor?, etc.) and directive annotations (#set!, #offset!, #select-adjacent!, #strip!).

Injection parser — Orchestrates multi-language parsing. Runs injection queries against a parent tree to find embedded regions, spawns child parsers with SetIncludedRanges(), and recurses for nested injections. Incremental reparse reuses unchanged child trees.

Rewriter — Collects source-level edits (replace, insert, delete) targeting byte ranges, applies them atomically, and produces InputEdit records for incremental reparse. Edits are validated for non-overlap and applied in a single pass.

Grammar loading — ts2go extracts parse tables, lex tables, field maps, symbol metadata, and external token lists from upstream parser.c files. These are serialized to compressed binary blobs under grammars/grammar_blobs/ and lazy-loaded via loadEmbeddedLanguage() with an LRU cache. String and transition interning reduce memory footprint across loaded grammars.

Build tags and environment

External grammar blobs (avoid embedding in the binary):

go build -tags grammar_blobs_external
GOTREESITTER_GRAMMAR_BLOB_DIR=/path/to/blobs  # required
GOTREESITTER_GRAMMAR_BLOB_MMAP=false           # disable mmap (Unix only)

Curated language set (smaller binary):

go build -tags grammar_set_core  # curated Core100 embedded grammar set
GOTREESITTER_GRAMMAR_SET=go,json,python  # runtime restriction

Grammar cache tuning (long-lived processes):

grammars.SetEmbeddedLanguageCacheLimit(8)    // LRU cap
grammars.UnloadEmbeddedLanguage("rust.bin")  // drop one
grammars.PurgeEmbeddedLanguageCache()        // drop all

GOTREESITTER_GRAMMAR_CACHE_LIMIT=8       # LRU cap via env
GOTREESITTER_GRAMMAR_IDLE_TTL=5m         # evict after idle
GOTREESITTER_GRAMMAR_IDLE_SWEEP=30s      # sweep interval
GOTREESITTER_GRAMMAR_COMPACT=true        # loader compaction (default)
GOTREESITTER_GRAMMAR_STRING_INTERN_LIMIT=200000
GOTREESITTER_GRAMMAR_TRANSITION_INTERN_LIMIT=20000

GLR stack cap override:

GOT_GLR_MAX_STACKS=8  # overrides default GLR stack cap (default: 8)

Default is tuned for correctness. Increase only if a grammar/workload needs more GLR alternatives to preserve parity.

Legacy benchmark compatibility only:

GOT_PARSE_NODE_LIMIT_SCALE=3

GOT_PARSE_NODE_LIMIT_SCALE is only needed for comparisons against older truncation-prone benchmark baselines. On current branches, keep it unset.

Testing

bash cgo_harness/docker/run_single_grammar_parity.sh typescript

For local correctness/parity work, prefer isolated one-language Docker runs:

# Real-corpus parity for one grammar
bash cgo_harness/docker/run_single_grammar_parity.sh typescript

# Focused grammargen real-corpus lane for one language
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode real-corpus --langs typescript

# Focused grammargen-vs-C lane for one language
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode cgo --langs typescript

run_grammargen_focus_targets.sh is the safest local lane for high-value grammars: it runs one grammar per container and defaults to a single-worker profile (--cpus 1, --pids 512, GOMAXPROCS=1, GOFLAGS=-p=1).

For Fortran, both real-corpus runners also default to a tighter bounded local preset unless you explicitly override it or pass --unsafe-fortran-defaults: --memory 3g, --cpus 1, --pids 512, GOMAXPROCS=1, GOFLAGS=-p=1, GOT_LALR_LR0_CORE_BUDGET=160000000, and GTS_GRAMMARGEN_REAL_CORPUS_GENERATE_TIMEOUT=15m.

If you only need a fast package-local regression check, keep it in Docker and narrow the -run regex:

bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammargen -run '^TestTypeScriptConditionalTypeParity$' -count=1"

Avoid go test ./... and host-side multi-language or race sweeps on developer machines while chasing OOMs. Use CI or a dedicated container when broader race coverage is required.

Other focused correctness/parity commands:

# Top-50 smoke correctness for the grammars package only
bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammars -run '^TestTop50ParseSmokeNoErrors$' -count=1 -v"

# C-oracle parity suites inside the cgo harness
bash cgo_harness/docker/run_parity_in_docker.sh \
  --run '^TestParityFreshParse$|^TestParityHasNoErrors$|^TestParityIssue3Repros$|^TestParityGLRCanaryGo$'
bash cgo_harness/docker/run_parity_in_docker.sh \
  --run '^TestParityCorpusFreshParse$'

CI may still run broader race coverage on hosted runners. Do not copy those commands onto a developer host during OOM diagnosis.

Test suite covers: smoke tests (206 grammars), golden S-expression snapshots, highlight query validation, query pattern matching, incremental reparse correctness, error recovery, GLR fork/merge, injection parsing, source rewriting, and fuzz targets.

Roadmap

v0.15.x — Large-repo consumer safety and parser-maintenance release. ParsePolicy.ShouldSkipDir lets gateway callers prune generated/vendor directories before descent, the GLR node-equivalence cache is smaller and checks epoch first for L2-friendly lookups, Tree.Edit avoids scanning unchanged right-side siblings when there is no tail shift, and parser-result compatibility normalization now keeps language-specific call sequences beside the relevant parser_result_*.go helpers. The v0.15.1 patch also hardens arena release/GC behavior, releases retry loser arenas promptly, and fixes query predicate backtracking for nested Starlark dictionary matches. v0.15.2 folds the drifting main and release lines back together, adds a Swift ABI mangling grammar, and ships grammar_updater pin verification and manifest-only sync flags. v0.15.3 caps JavaScript/TypeScript full-parse merge survivors, tunes markdown retry and node budgets, tolerates external-scanner symbol-list drift, and adds a scoped Canopy harness runner for bounded repo analysis. This line carries the post-0.14 tier-1 grammar refreshes and reserved-word import fixes.

v0.14.x — Go grammar now shipped as a grammargen-compiled blob (our own pure-Go LR(1) state-splitting compiler), eliminating a dead-end state inherited from tree-sitter-go that wrapped several valid Go files in ERROR. Combined with arena retention/initial-sizing fixes, retry-lifecycle cleanup, and a GLR cap update keyed to the new grammar's conflict profile, warm-reuse heap allocation across a six-file self-parse benchmark dropped ~54% (498 → 229 MB/iter); cold-case dropped ~61%.

v0.12.x — 206 grammars (all OK), 116 external scanners, pure-Go runtime plus grammargen, ABI 15 support including reserved-word sets, GLR parser, incremental reparsing with external scanner checkpoints, query engine, tree cursor, highlighting, tagging, injection parser, typed query codegen, CST rewriter, parser pool, arena memory budgets, and structural parity against 100+ curated C reference grammars.

Full-parse grammargen performance work that keeps the recent incremental wins without regressing the main DFA benchmark
Remaining parser-result recovery/parity backlog on high-value C#, Rust, Scala, TypeScript, and Python corpus cases
The next highest-value parser/grammargen parity language after YAML and C# stabilization
Table-size and codegen compaction work for Unicode-heavy grammars

Release history and retroactive notes are tracked in CHANGELOG.md.

License

MIT

Documentation ¶

Overview ¶

Package gotreesitter implements a pure Go tree-sitter runtime.

This file defines the core data structures that mirror tree-sitter's TSLanguage C struct and related types. They form the foundation on which the lexer, parser, query engine, and syntax tree are built.

Index ¶

Constants
Variables
func DrainArenaPools()
func EnableArenaProfile(enabled bool)
func EnableRuntimeAudit(enabled bool)
func RegisterHighlighterInjection(parentLanguage string, spec HighlighterInjectionSpec)
func RepairNoLookaheadLexModes(lang *Language)
func ResetArenaProfile()
func ResetParseEnvConfigCacheForTests()
func ResetPerfCounters()
func RunExternalScanner(lang *Language, payload any, lexer *ExternalLexer, validSymbols []bool) bool
func Walk(node *Node, fn func(node *Node, depth int) WalkAction)
type ArenaProfile
- func ArenaProfileSnapshot() ArenaProfile
type BoundTree
- func Bind(tree *Tree) *BoundTree
- func (bt *BoundTree) ChildByField(n *Node, fieldName string) *Node
- func (bt *BoundTree) Language() *Language
- func (bt *BoundTree) NodeText(n *Node) string
- func (bt *BoundTree) NodeType(n *Node) string
- func (bt *BoundTree) Release()
- func (bt *BoundTree) RootNode() *Node
- func (bt *BoundTree) Source() []byte
- func (bt *BoundTree) TreeCursor() *TreeCursor
type ByteSkippableTokenSource
type ExternalLexer
- func (l *ExternalLexer) Advance(skip bool)
- func (l *ExternalLexer) Column() uint32
- func (l *ExternalLexer) GetColumn() uint32deprecated
- func (l *ExternalLexer) Lookahead() rune
- func (l *ExternalLexer) MarkEnd()
- func (l *ExternalLexer) SetResultSymbol(sym Symbol)
type ExternalScanner
- func AdaptExternalScannerByExternalOrder(sourceLang, targetLang *Language) (ExternalScanner, bool)
type ExternalScannerState
type ExternalSymbolResolver
- func NewExternalSymbolResolver(lang *Language) *ExternalSymbolResolver
- func (r *ExternalSymbolResolver) ByIndex(idx int) (Symbol, bool)
- func (r *ExternalSymbolResolver) ByName(name string) (Symbol, bool)
- func (r *ExternalSymbolResolver) Count() int
type ExternalVMInstr
- func VMAdvance(skip bool) ExternalVMInstr
- func VMEmit(sym Symbol) ExternalVMInstr
- func VMFail() ExternalVMInstr
- func VMIfRuneClass(class ExternalVMRuneClass, alt int) ExternalVMInstr
- func VMIfRuneEq(r rune, alt int) ExternalVMInstr
- func VMIfRuneInRange(start, end rune, alt int) ExternalVMInstr
- func VMJump(target int) ExternalVMInstr
- func VMMarkEnd() ExternalVMInstr
- func VMRequireStateEq(state uint32, alt int) ExternalVMInstr
- func VMRequireValid(validSymbolIndex, alt int) ExternalVMInstr
- func VMSetState(state uint32) ExternalVMInstr
type ExternalVMOp
type ExternalVMProgram
type ExternalVMRuneClass
type ExternalVMScanner
- func MustNewExternalVMScanner(program ExternalVMProgram) *ExternalVMScanner
- func NewExternalVMScanner(program ExternalVMProgram) (*ExternalVMScanner, error)
- func (s *ExternalVMScanner) Create() any
- func (s *ExternalVMScanner) Deserialize(payload any, buf []byte)
- func (s *ExternalVMScanner) Destroy(payload any)
- func (s *ExternalVMScanner) Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool
- func (s *ExternalVMScanner) Serialize(payload any, buf []byte) int
type FieldID
type FieldMapEntry
type HighlightRange
type Highlighter
- func NewHighlighter(lang *Language, highlightQuery string, opts ...HighlighterOption) (*Highlighter, error)
- func (h *Highlighter) Highlight(source []byte) []HighlightRange
- func (h *Highlighter) HighlightIncremental(source []byte, oldTree *Tree) ([]HighlightRange, *Tree)
type HighlighterInjectionResolver
type HighlighterInjectionSpec
type HighlighterOption
- func WithTokenSourceFactory(factory func(source []byte) TokenSource) HighlighterOption
type IncrementalParseProfile
type IncrementalReuseExternalScanner
type IncrementalReuseTokenSource
type Injection
type InjectionParser
- func NewInjectionParser() *InjectionParser
- func (ip *InjectionParser) Parse(source []byte, parentLang string) (*InjectionResult, error)
- func (ip *InjectionParser) ParseIncremental(source []byte, parentLang string, oldResult *InjectionResult) (*InjectionResult, error)
- func (ip *InjectionParser) RegisterInjectionQuery(parentLang string, query string) error
- func (ip *InjectionParser) RegisterLanguage(name string, lang *Language)
- func (ip *InjectionParser) SetMaxDepth(depth int)
type InjectionResult
type InputEdit
type Language
- func LoadLanguage(data []byte) (*Language, error)
- func (l *Language) CompatibleWithRuntime() bool
- func (l *Language) FieldByName(name string) (FieldID, bool)
- func (l *Language) IsSupertype(sym Symbol) bool
- func (l *Language) KeywordLexAsciiTable() [][128]int32
- func (l *Language) LexAsciiTable() [][128]int32
- func (l *Language) PublicSymbol(sym Symbol) Symbol
- func (l *Language) SupertypeChildren(sym Symbol) []Symbol
- func (l *Language) SymbolByName(name string) (Symbol, bool)
- func (l *Language) TokenSymbolsByName(name string) []Symbol
- func (l *Language) Version() uint32
type LanguageMetadata
type LexMode
type LexState
type LexTransition
type Lexer
- func NewLexer(states []LexState, source []byte) *Lexer
- func (l *Lexer) Next(startState uint16) Token
type LookaheadIterator
- func NewLookaheadIterator(lang *Language, state StateID) (*LookaheadIterator, error)
- func (it *LookaheadIterator) CurrentSymbol() Symbol
- func (it *LookaheadIterator) CurrentSymbolName() string
- func (it *LookaheadIterator) Language() *Language
- func (it *LookaheadIterator) Next() bool
- func (it *LookaheadIterator) ResetState(state StateID) error
type Node
- func NewLeafNode(sym Symbol, named bool, startByte, endByte uint32, startPoint, endPoint Point) *Node
- func NewParentNode(sym Symbol, named bool, children []*Node, fieldIDs []FieldID, ...) *Node
- func (n *Node) Child(i int) *Node
- func (n *Node) ChildByFieldName(name string, lang *Language) *Node
- func (n *Node) ChildCount() int
- func (n *Node) Children() []*Node
- func (n *Node) DescendantForByteRange(startByte, endByte uint32) *Node
- func (n *Node) DescendantForPointRange(startPoint, endPoint Point) *Node
- func (n *Node) Edit(edit InputEdit)
- func (n *Node) EndByte() uint32
- func (n *Node) EndPoint() Point
- func (n *Node) FieldNameForChild(i int, lang *Language) string
- func (n *Node) HasChanges() bool
- func (n *Node) HasError() bool
- func (n *Node) IsError() bool
- func (n *Node) IsExtra() bool
- func (n *Node) IsMissing() bool
- func (n *Node) IsNamed() bool
- func (n *Node) NamedChild(i int) *Node
- func (n *Node) NamedChildCount() int
- func (n *Node) NamedDescendantForByteRange(startByte, endByte uint32) *Node
- func (n *Node) NamedDescendantForPointRange(startPoint, endPoint Point) *Node
- func (n *Node) NextSibling() *Node
- func (n *Node) Parent() *Node
- func (n *Node) ParseState() StateID
- func (n *Node) PreGotoState() StateID
- func (n *Node) PrevSibling() *Node
- func (n *Node) Range() Range
- func (n *Node) SExpr(lang *Language) string
- func (n *Node) StartByte() uint32
- func (n *Node) StartPoint() Point
- func (n *Node) Symbol() Symbol
- func (n *Node) Text(source []byte) string
- func (n *Node) Type(lang *Language) string
type ParseAction
type ParseActionEntry
type ParseActionType
type ParseOption
- func WithOldTree(oldTree *Tree) ParseOption
- func WithProfiling() ParseOption
- func WithTokenSource(ts TokenSource) ParseOption
type ParseResult
type ParseRuntime
- func (rt ParseRuntime) Summary() string
type ParseStopReason
type Parser
- func NewParser(lang *Language) *Parser
- func (p *Parser) CancellationFlag() *uint32
- func (p *Parser) IncludedRanges() []Range
- func (p *Parser) InferredRootSymbol() (Symbol, bool)
- func (p *Parser) Language() *Language
- func (p *Parser) Logger() ParserLogger
- func (p *Parser) Parse(source []byte) (*Tree, error)
- func (p *Parser) ParseIncremental(source []byte, oldTree *Tree) (*Tree, error)
- func (p *Parser) ParseIncrementalProfiled(source []byte, oldTree *Tree) (*Tree, IncrementalParseProfile, error)
- func (p *Parser) ParseIncrementalWithTokenSource(source []byte, oldTree *Tree, ts TokenSource) (*Tree, error)
- func (p *Parser) ParseIncrementalWithTokenSourceProfiled(source []byte, oldTree *Tree, ts TokenSource) (*Tree, IncrementalParseProfile, error)
- func (p *Parser) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)
- func (p *Parser) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)
- func (p *Parser) SetCancellationFlag(flag *uint32)
- func (p *Parser) SetGLRTrace(enabled bool)
- func (p *Parser) SetIncludedRanges(ranges []Range)
- func (p *Parser) SetLogger(logger ParserLogger)
- func (p *Parser) SetTimeoutMicros(timeoutMicros uint64)
- func (p *Parser) TimeoutMicros() uint64
type ParserLogType
type ParserLogger
type ParserPool
- func NewParserPool(lang *Language, opts ...ParserPoolOption) *ParserPool
- func (pp *ParserPool) Language() *Language
- func (pp *ParserPool) Parse(source []byte) (*Tree, error)
- func (pp *ParserPool) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)
- func (pp *ParserPool) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)
type ParserPoolOption
- func WithParserPoolGLRTrace(enabled bool) ParserPoolOption
- func WithParserPoolIncludedRanges(ranges []Range) ParserPoolOption
- func WithParserPoolLogger(logger ParserLogger) ParserPoolOption
- func WithParserPoolTimeoutMicros(timeoutMicros uint64) ParserPoolOption
type Pattern
type PerfCounters
- func PerfCountersSnapshot() PerfCounters
type Point
type PointSkippableTokenSource
type Query
- func NewQuery(source string, lang *Language) (*Query, error)
- func (q *Query) CaptureCount() uint32
- func (q *Query) CaptureNameForID(id uint32) (string, bool)
- func (q *Query) CaptureNames() []string
- func (q *Query) DisableCapture(name string)
- func (q *Query) DisablePattern(patternIndex uint32)
- func (q *Query) EndByteForPattern(patternIndex uint32) (uint32, bool)
- func (q *Query) Exec(node *Node, lang *Language, source []byte) *QueryCursor
- func (q *Query) Execute(tree *Tree) []QueryMatch
- func (q *Query) ExecuteInto(tree *Tree, dst []QueryMatch) []QueryMatch
- func (q *Query) ExecuteNode(node *Node, lang *Language, source []byte) []QueryMatch
- func (q *Query) IsPatternGuaranteedAtStep(patternIndex uint32, stepIndex uint32) bool
- func (q *Query) IsPatternNonLocal(patternIndex uint32) bool
- func (q *Query) IsPatternRooted(patternIndex uint32) bool
- func (q *Query) PatternCount() int
- func (q *Query) PredicatesForPattern(patternIndex uint32) ([]QueryPredicate, bool)
- func (q *Query) StartByteForPattern(patternIndex uint32) (uint32, bool)
- func (q *Query) StepIsDefinite(patternIndex uint32, stepIndex uint32) bool
- func (q *Query) StringCount() uint32
- func (q *Query) StringValueForID(id uint32) (string, bool)
type QueryCapture
- func (c QueryCapture) Text(source []byte) string
type QueryCursor
- func (c *QueryCursor) DidExceedMatchLimit() bool
- func (c *QueryCursor) NextCapture() (QueryCapture, bool)
- func (c *QueryCursor) NextMatch() (QueryMatch, bool)
- func (c *QueryCursor) SetByteRange(startByte, endByte uint32)
- func (c *QueryCursor) SetMatchLimit(limit uint32)
- func (c *QueryCursor) SetMaxStartDepth(depth uint32)
- func (c *QueryCursor) SetPointRange(startPoint, endPoint Point)
type QueryMatch
- func (m QueryMatch) SetValues(q *Query, key string) []string
type QueryPredicate
type QueryStep
type Range
- func DiffChangedRanges(oldTree, newTree *Tree) []Range
type Rewriter
- func NewRewriter(source []byte) *Rewriter
- func (r *Rewriter) Apply() (newSource []byte, edits []InputEdit, err error)
- func (r *Rewriter) ApplyToTree(tree *Tree) ([]byte, error)
- func (r *Rewriter) Delete(node *Node)
- func (r *Rewriter) InsertAfter(node *Node, text []byte)
- func (r *Rewriter) InsertBefore(node *Node, text []byte)
- func (r *Rewriter) Replace(node *Node, newText []byte)
- func (r *Rewriter) ReplaceRange(startByte, endByte uint32, newText []byte)
type StateID
type Symbol
type SymbolMetadata
type Tag
type Tagger
- func NewTagger(lang *Language, tagsQuery string, opts ...TaggerOption) (*Tagger, error)
- func (tg *Tagger) Tag(source []byte) []Tag
- func (tg *Tagger) TagIncremental(source []byte, oldTree *Tree) ([]Tag, *Tree)
- func (tg *Tagger) TagTree(tree *Tree) []Tag
type TaggerOption
- func WithTaggerTokenSourceFactory(factory func(source []byte) TokenSource) TaggerOption
type Token
type TokenSource
type TokenSourceRebuilder
type Tree
- func NewTree(root *Node, source []byte, lang *Language) *Tree
- func (t *Tree) ChangedRanges() []Range
- func (t *Tree) Copy() *Tree
- func (t *Tree) DOT(lang *Language) string
- func (t *Tree) Edit(edit InputEdit)
- func (t *Tree) Edits() []InputEdit
- func (t *Tree) Language() *Language
- func (t *Tree) ParseRuntime() ParseRuntime
- func (t *Tree) ParseStopReason() ParseStopReason
- func (t *Tree) ParseStoppedEarly() bool
- func (t *Tree) Release()
- func (t *Tree) RootNode() *Node
- func (t *Tree) RootNodeWithOffset(offsetBytes uint32, offsetExtent Point) *Node
- func (t *Tree) Source() []byte
- func (t *Tree) WriteDOT(w io.Writer, lang *Language) error
type TreeCursor
- func NewTreeCursor(node *Node, tree *Tree) *TreeCursor
- func NewTreeCursorFromTree(tree *Tree) *TreeCursor
- func (c *TreeCursor) Copy() *TreeCursor
- func (c *TreeCursor) CurrentFieldID() FieldID
- func (c *TreeCursor) CurrentFieldName() string
- func (c *TreeCursor) CurrentNode() *Node
- func (c *TreeCursor) CurrentNodeIsNamed() bool
- func (c *TreeCursor) CurrentNodeText() string
- func (c *TreeCursor) CurrentNodeType() string
- func (c *TreeCursor) Depth() int
- func (c *TreeCursor) GotoChildByFieldID(fid FieldID) bool
- func (c *TreeCursor) GotoChildByFieldName(name string) bool
- func (c *TreeCursor) GotoFirstChild() bool
- func (c *TreeCursor) GotoFirstChildForByte(targetByte uint32) int64
- func (c *TreeCursor) GotoFirstChildForPoint(targetPoint Point) int64
- func (c *TreeCursor) GotoFirstNamedChild() bool
- func (c *TreeCursor) GotoLastChild() bool
- func (c *TreeCursor) GotoLastNamedChild() bool
- func (c *TreeCursor) GotoNextNamedSibling() bool
- func (c *TreeCursor) GotoNextSibling() bool
- func (c *TreeCursor) GotoParent() bool
- func (c *TreeCursor) GotoPrevNamedSibling() bool
- func (c *TreeCursor) GotoPrevSibling() bool
- func (c *TreeCursor) Reset(node *Node)
- func (c *TreeCursor) ResetTree(tree *Tree)
type WalkAction

Constants ¶

View Source

const (
	// RuntimeLanguageVersion is the maximum tree-sitter language version this
	// runtime is known to support.
	RuntimeLanguageVersion uint32 = 15
	// MinCompatibleLanguageVersion is the minimum accepted language version.
	MinCompatibleLanguageVersion uint32 = 13
)

Variables ¶

View Source

var DebugDFA atomic.Bool

DebugDFA enables trace logging for DFA token production.

Use `DebugDFA.Store(true/false)` to toggle at runtime.

View Source

var ErrNoLanguage = errors.New("parser has no language configured")

ErrNoLanguage is returned when a Parser has no language configured.

Functions ¶

func DrainArenaPools ¶ added in v0.14.0

func DrainArenaPools()

DrainArenaPools releases all cached arenas from both incremental and full-parse pools. Arenas held in the pool are strong Go references and are not collected by the GC until explicitly drained or the process exits.

Call this after a large batch scan (e.g. after WalkAndParse returns) to allow the GC to reclaim the arena memory. The next parse will allocate a fresh arena.

func EnableArenaProfile ¶ added in v0.6.0

func EnableArenaProfile(enabled bool)

EnableArenaProfile toggles arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func EnableRuntimeAudit ¶ added in v0.7.0

func EnableRuntimeAudit(enabled bool)

EnableRuntimeAudit toggles per-parse survivor instrumentation. This debug hook is intended for single-threaded benchmark/profiling runs.

func RegisterHighlighterInjection ¶ added in v0.7.0

func RegisterHighlighterInjection(parentLanguage string, spec HighlighterInjectionSpec)

RegisterHighlighterInjection registers nested-highlighting configuration for a parent language name (for example "markdown").

func RepairNoLookaheadLexModes ¶ added in v0.9.0

func RepairNoLookaheadLexModes(lang *Language)

RepairNoLookaheadLexModes marks parser states as no-lookahead when they only need EOF-triggered reductions plus external/trivia handling. Tree-sitter's C runtime uses these states to reduce before lexing the next real token.

func ResetArenaProfile ¶ added in v0.6.0

func ResetArenaProfile()

ResetArenaProfile resets arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func ResetParseEnvConfigCacheForTests ¶ added in v0.7.0

func ResetParseEnvConfigCacheForTests()

ResetParseEnvConfigCacheForTests clears memoized parser env config.

Tests in this repo mutate env vars between cases; this helper ensures subsequent parses observe the new values in the same process.

func ResetPerfCounters ¶ added in v0.6.0

func ResetPerfCounters()

func RunExternalScanner ¶

func RunExternalScanner(lang *Language, payload any, lexer *ExternalLexer, validSymbols []bool) bool

RunExternalScanner invokes the language's external scanner if present. Returns true if the scanner produced a token, false otherwise.

func Walk ¶

func Walk(node *Node, fn func(node *Node, depth int) WalkAction)

Walk performs a depth-first traversal of the syntax tree rooted at node. The callback receives each node and its depth (0 for the starting node). Return WalkSkipChildren to skip a node's children, or WalkStop to end early.

Types ¶

type ArenaProfile ¶ added in v0.6.0

type ArenaProfile struct {
	IncrementalAcquire uint64
	IncrementalNew     uint64
	FullAcquire        uint64
	FullNew            uint64
}

ArenaProfile captures node arena allocation statistics. Enable with SetArenaProfileEnabled(true) and retrieve with GetArenaProfile().

func ArenaProfileSnapshot ¶ added in v0.6.0

func ArenaProfileSnapshot() ArenaProfile

ArenaProfileSnapshot returns current arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

type BoundTree ¶

type BoundTree struct {
	// contains filtered or unexported fields
}

BoundTree pairs a Tree with its Language and source, eliminating the need to pass *Language and []byte to every node method call.

func Bind ¶

func Bind(tree *Tree) *BoundTree

Bind creates a BoundTree from a Tree. The Tree must have been created with a Language (via NewTree or a Parser). Returns a BoundTree that delegates to the underlying Tree's Language and Source.

func (*BoundTree) ChildByField ¶

func (bt *BoundTree) ChildByField(n *Node, fieldName string) *Node

ChildByField returns the first child assigned to the given field name.

func (*BoundTree) Language ¶

func (bt *BoundTree) Language() *Language

Language returns the tree's language.

func (*BoundTree) NodeText ¶

func (bt *BoundTree) NodeText(n *Node) string

NodeText returns the source text covered by the node.

func (*BoundTree) NodeType ¶

func (bt *BoundTree) NodeType(n *Node) string

NodeType returns the node's type name, resolved via the bound language.

func (*BoundTree) Release ¶

func (bt *BoundTree) Release()

Release releases the underlying tree's arena memory.

func (*BoundTree) RootNode ¶

func (bt *BoundTree) RootNode() *Node

RootNode returns the tree's root node.

func (*BoundTree) Source ¶

func (bt *BoundTree) Source() []byte

Source returns the tree's source bytes.

func (*BoundTree) TreeCursor ¶ added in v0.6.0

func (bt *BoundTree) TreeCursor() *TreeCursor

TreeCursor returns a new TreeCursor starting at the tree's root node.

type ByteSkippableTokenSource ¶

type ByteSkippableTokenSource interface {
	TokenSource
	SkipToByte(offset uint32) Token
}

ByteSkippableTokenSource can jump to a byte offset and return the first token at or after that position.

type ExternalLexer ¶

type ExternalLexer struct {
	// contains filtered or unexported fields
}

ExternalLexer is the scanner-facing lexer API used by external scanners. It mirrors the essential tree-sitter scanner API: lookahead, advance, mark_end, and result_symbol.

func (*ExternalLexer) Advance ¶

func (l *ExternalLexer) Advance(skip bool)

Advance consumes one rune. When skip is true, consumed bytes are excluded from the token span (scanner whitespace skipping behavior).

func (*ExternalLexer) Column ¶ added in v0.6.0

func (l *ExternalLexer) Column() uint32

Column returns the current column (0-based) at the scanner cursor.

func (*ExternalLexer) GetColumn deprecated

func (l *ExternalLexer) GetColumn() uint32

GetColumn returns the current column (0-based) at the scanner cursor.

Deprecated: use Column.

func (*ExternalLexer) Lookahead ¶

func (l *ExternalLexer) Lookahead() rune

Lookahead returns the current rune or 0 at EOF.

func (*ExternalLexer) MarkEnd ¶

func (l *ExternalLexer) MarkEnd()

MarkEnd marks the current scanner position as the token end.

func (*ExternalLexer) SetResultSymbol ¶

func (l *ExternalLexer) SetResultSymbol(sym Symbol)

SetResultSymbol sets the token symbol to emit when Scan returns true.

type ExternalScanner ¶

type ExternalScanner interface {
	Create() any
	Destroy(payload any)
	Serialize(payload any, buf []byte) int
	Deserialize(payload any, buf []byte)
	Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool
}

ExternalScanner is the interface for language-specific external scanners. Languages like Python and JavaScript need these for indent tracking, template literals, regex vs division, etc.

The value returned by Create must be accepted by Destroy/Serialize/ Deserialize/Scan for that scanner implementation. Most scanners use a concrete payload pointer type and will panic on mismatched payload types.

func AdaptExternalScannerByExternalOrder ¶ added in v0.9.0

func AdaptExternalScannerByExternalOrder(sourceLang, targetLang *Language) (ExternalScanner, bool)

AdaptExternalScannerByExternalOrder builds an ExternalScanner adapter that reuses sourceLang's scanner for targetLang by remapping external symbols.

Mapping strategy:

If either side has duplicate external names, use index mapping (capped to the shorter list length).
Otherwise, prefer exact external-symbol-name matches.
Fill remaining slots by index order (within the shorter dimension).

When source and target have different external symbol counts, name-based matching pairs tokens that exist in both grammars. Target externals with no source match get -1 (the scanner will never produce them). Source externals with no target match are silently ignored.

Returns (nil, false) when adaptation is not possible.

type ExternalScannerState ¶

type ExternalScannerState struct {
	Data []byte
}

ExternalScannerState holds serialized state for an external scanner between incremental parse runs.

type ExternalSymbolResolver ¶ added in v0.9.0

type ExternalSymbolResolver struct {
	// contains filtered or unexported fields
}

ExternalSymbolResolver maps external token names to their concrete Symbol IDs in a specific Language. This allows external scanners to resolve symbol IDs at runtime rather than using hardcoded constants, making them compatible with any Language that defines the same external tokens (whether from ts2go extraction or grammargen).

func NewExternalSymbolResolver ¶ added in v0.9.0

func NewExternalSymbolResolver(lang *Language) *ExternalSymbolResolver

NewExternalSymbolResolver builds a resolver from a Language's external symbol definitions. Returns nil if the Language has no external symbols.

func (*ExternalSymbolResolver) ByIndex ¶ added in v0.9.0

func (r *ExternalSymbolResolver) ByIndex(idx int) (Symbol, bool)

ByIndex returns the Symbol ID for the given external token index (position in the grammar's externals array). Returns 0, false if the index is out of range.

func (*ExternalSymbolResolver) ByName ¶ added in v0.9.0

func (r *ExternalSymbolResolver) ByName(name string) (Symbol, bool)

ByName returns the Symbol ID for the given external token name. Returns 0, false if the name is not found.

func (*ExternalSymbolResolver) Count ¶ added in v0.9.0

func (r *ExternalSymbolResolver) Count() int

Count returns the number of external tokens.

type ExternalVMInstr ¶

type ExternalVMInstr struct {
	Op  ExternalVMOp
	A   int32
	B   int32
	Alt int32
}

ExternalVMInstr is one instruction in an external scanner VM program.

Operands:

A: primary operand (opcode-specific)
B: secondary operand (used by range checks)
Alt: alternate program counter when a condition fails

func VMAdvance ¶

func VMAdvance(skip bool) ExternalVMInstr

VMAdvance constructs an advance instruction. When skip is true, the advanced rune is skipped from the token text.

func VMEmit ¶

func VMEmit(sym Symbol) ExternalVMInstr

VMEmit constructs an emit instruction for the given symbol.

func VMFail ¶

func VMFail() ExternalVMInstr

VMFail constructs a fail instruction that terminates scan with no token.

func VMIfRuneClass ¶

func VMIfRuneClass(class ExternalVMRuneClass, alt int) ExternalVMInstr

VMIfRuneClass constructs a rune-class branch with alternate target on miss.

func VMIfRuneEq ¶

func VMIfRuneEq(r rune, alt int) ExternalVMInstr

VMIfRuneEq constructs a rune-equality branch with alternate target on miss.

func VMIfRuneInRange ¶

func VMIfRuneInRange(start, end rune, alt int) ExternalVMInstr

VMIfRuneInRange constructs a rune-range branch with alternate target on miss.

func VMJump ¶

func VMJump(target int) ExternalVMInstr

VMJump constructs an unconditional branch to the target instruction index.

func VMMarkEnd ¶

func VMMarkEnd() ExternalVMInstr

VMMarkEnd constructs a mark-end instruction for the current token extent.

func VMRequireStateEq ¶

func VMRequireStateEq(state uint32, alt int) ExternalVMInstr

VMRequireStateEq constructs a payload-state guard with alternate branch on miss.

func VMRequireValid ¶

func VMRequireValid(validSymbolIndex, alt int) ExternalVMInstr

VMRequireValid constructs a valid-symbol guard with alternate branch on miss.

func VMSetState ¶

func VMSetState(state uint32) ExternalVMInstr

VMSetState constructs a payload-state assignment instruction.

type ExternalVMOp ¶

type ExternalVMOp uint8

ExternalVMOp is an opcode for the native-Go external scanner VM.

const (
	ExternalVMOpFail ExternalVMOp = iota
	ExternalVMOpJump
	ExternalVMOpRequireValid
	ExternalVMOpRequireStateEq
	ExternalVMOpSetState
	ExternalVMOpIfRuneEq
	ExternalVMOpIfRuneInRange
	ExternalVMOpIfRuneClass
	ExternalVMOpAdvance
	ExternalVMOpMarkEnd
	ExternalVMOpEmit
)

type ExternalVMProgram ¶

type ExternalVMProgram struct {
	Code     []ExternalVMInstr
	MaxSteps int // <=0 uses a safe default based on program size
}

ExternalVMProgram is a small bytecode program interpreted by ExternalVMScanner.

type ExternalVMRuneClass ¶

type ExternalVMRuneClass uint8

ExternalVMRuneClass is a character class used by ExternalVMOpIfRuneClass.

const (
	ExternalVMRuneClassWhitespace ExternalVMRuneClass = iota
	ExternalVMRuneClassDigit
	ExternalVMRuneClassLetter
	ExternalVMRuneClassWord
	ExternalVMRuneClassNewline
)

type ExternalVMScanner ¶

type ExternalVMScanner struct {
	// contains filtered or unexported fields
}

ExternalVMScanner executes an ExternalVMProgram and implements ExternalScanner.

func MustNewExternalVMScanner ¶

func MustNewExternalVMScanner(program ExternalVMProgram) *ExternalVMScanner

MustNewExternalVMScanner is like NewExternalVMScanner but panics on error. It is intended for package-level initialization where invalid programs are programmer errors.

func NewExternalVMScanner ¶

func NewExternalVMScanner(program ExternalVMProgram) (*ExternalVMScanner, error)

NewExternalVMScanner validates and constructs an ExternalVMScanner.

func (*ExternalVMScanner) Create ¶

func (s *ExternalVMScanner) Create() any

Create allocates scanner payload (currently a single uint32 state slot).

func (*ExternalVMScanner) Deserialize ¶

func (s *ExternalVMScanner) Deserialize(payload any, buf []byte)

Deserialize restores payload state from buf.

func (*ExternalVMScanner) Destroy ¶

func (s *ExternalVMScanner) Destroy(payload any)

Destroy releases scanner payload resources.

func (*ExternalVMScanner) Scan ¶

func (s *ExternalVMScanner) Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool

Scan executes the scanner program against the current lexer position.

func (*ExternalVMScanner) Serialize ¶

func (s *ExternalVMScanner) Serialize(payload any, buf []byte) int

Serialize writes payload state into buf.

type FieldID ¶

type FieldID uint16

FieldID is a named field index.

type FieldMapEntry ¶

type FieldMapEntry struct {
	FieldID    FieldID
	ChildIndex uint8
	Inherited  bool
}

FieldMapEntry maps a child index to a field name.

type HighlightRange ¶

type HighlightRange struct {
	StartByte    uint32
	EndByte      uint32
	Capture      string // "keyword", "string", "function", etc.
	PatternIndex int    // query pattern index; later patterns override earlier for identical ranges
}

HighlightRange represents a styled range of source code, mapping a byte span to a capture name from a highlight query. The editor maps capture names (e.g., "keyword", "string", "function") to FSS style classes.

type Highlighter ¶

type Highlighter struct {
	// contains filtered or unexported fields
}

Highlighter is a high-level API that takes source code and returns styled ranges. It combines a Parser, a compiled Query, and a Language to provide a single Highlight() call for the editor.

func NewHighlighter ¶

func NewHighlighter(lang *Language, highlightQuery string, opts ...HighlighterOption) (*Highlighter, error)

NewHighlighter creates a Highlighter for the given language and highlight query (in tree-sitter .scm format). Returns an error if the query fails to compile.

func (*Highlighter) Highlight ¶

func (h *Highlighter) Highlight(source []byte) []HighlightRange

Highlight parses the source code and executes the highlight query, returning a slice of HighlightRange sorted by StartByte. When ranges overlap, inner (more specific) captures take priority over outer ones.

func (*Highlighter) HighlightIncremental ¶

func (h *Highlighter) HighlightIncremental(source []byte, oldTree *Tree) ([]HighlightRange, *Tree)

HighlightIncremental re-highlights source after edits were applied to oldTree. Returns the new highlight ranges and the new parse tree (for use in subsequent incremental calls). Call oldTree.Edit() before calling this.

type HighlighterInjectionResolver ¶ added in v0.7.0

type HighlighterInjectionResolver func(languageHint string) (lang *Language, highlightQuery string, tokenSourceFactory func(source []byte) TokenSource, ok bool)

HighlighterInjectionResolver maps a language hint (for example "go" from a markdown code fence) to a child language and highlight query.

type HighlighterInjectionSpec ¶ added in v0.7.0

type HighlighterInjectionSpec struct {
	Query           string
	ResolveLanguage HighlighterInjectionResolver
}

HighlighterInjectionSpec configures nested highlighting for a parent language. Query must emit @injection.content and either @injection.language or #set! injection.language metadata.

type HighlighterOption ¶

type HighlighterOption func(*Highlighter)

HighlighterOption configures a Highlighter.

func WithTokenSourceFactory ¶

func WithTokenSourceFactory(factory func(source []byte) TokenSource) HighlighterOption

WithTokenSourceFactory sets a factory function that creates a TokenSource for each Highlight call. This is needed for languages that use a custom lexer bridge (like Go, which uses go/scanner instead of a DFA lexer).

When set, Highlight() calls ParseWithTokenSource instead of Parse.

type IncrementalParseProfile ¶ added in v0.6.0

type IncrementalParseProfile struct {
	ReuseCursorNanos                   int64
	ReparseNanos                       int64
	ReusedSubtrees                     uint64
	ReusedBytes                        uint64
	NewNodesAllocated                  uint64
	ReuseUnsupported                   bool
	ReuseUnsupportedReason             string
	ReuseRejectDirty                   uint64
	ReuseRejectAncestorDirtyBeforeEdit uint64
	ReuseRejectHasError                uint64
	ReuseRejectInvalidSpan             uint64
	ReuseRejectOutOfBounds             uint64
	ReuseRejectRootNonLeafChanged      uint64
	ReuseRejectLargeNonLeaf            uint64
	RecoverSearches                    uint64
	RecoverStateChecks                 uint64
	RecoverStateSkips                  uint64
	RecoverSymbolSkips                 uint64
	RecoverLookups                     uint64
	RecoverHits                        uint64
	MaxStacksSeen                      int
	EntryScratchPeak                   uint64
	StopReason                         ParseStopReason
	TokensConsumed                     uint64
	LastTokenEndByte                   uint32
	ExpectedEOFByte                    uint32
	ArenaBytesAllocated                int64
	ScratchBytesAllocated              int64
	EntryScratchBytesAllocated         int64
	GSSBytesAllocated                  int64
	SingleStackIterations              int
	MultiStackIterations               int
	SingleStackTokens                  uint64
	MultiStackTokens                   uint64
	SingleStackGSSNodes                uint64
	MultiStackGSSNodes                 uint64
	GSSNodesAllocated                  uint64
	GSSNodesRetained                   uint64
	GSSNodesDroppedSameToken           uint64
	ParentNodesAllocated               uint64
	ParentNodesRetained                uint64
	ParentNodesDroppedSameToken        uint64
	LeafNodesAllocated                 uint64
	LeafNodesRetained                  uint64
	LeafNodesDroppedSameToken          uint64
	MergeStacksIn                      uint64
	MergeStacksOut                     uint64
	MergeSlotsUsed                     uint64
	GlobalCullStacksIn                 uint64
	GlobalCullStacksOut                uint64
}

IncrementalParseProfile attributes incremental parse time into coarse buckets.

ReuseCursorNanos includes reuse-cursor setup and subtree-candidate checks. ReparseNanos includes the remainder of incremental parsing/rebuild work.

type IncrementalReuseExternalScanner ¶ added in v0.7.0

type IncrementalReuseExternalScanner interface {
	ExternalScanner
	SupportsIncrementalReuse() bool
}

IncrementalReuseExternalScanner is implemented by external scanners that can safely participate in DFA subtree reuse during incremental parses. Scanners with serialized mutable state, such as Python's indentation stack, should leave this unimplemented so edited incremental parses fall back to the conservative full-reparse path.

type IncrementalReuseTokenSource ¶ added in v0.7.0

type IncrementalReuseTokenSource interface {
	TokenSource
	SupportsIncrementalReuse() bool
}

IncrementalReuseTokenSource is an opt-in marker for custom token sources that are safe for incremental subtree reuse. Implementations must provide stable token boundaries across edits and support deterministic SkipToByte* behavior so reused-tree fast-forwarding remains correct.

type Injection ¶ added in v0.6.0

type Injection struct {
	// Language is the detected language name (e.g., "javascript").
	Language string
	// Tree is the parse tree for this region, or nil if the language
	// was not registered.
	Tree *Tree
	// Ranges are the source ranges this tree covers.
	Ranges []Range
	// Node is the parent tree node that triggered the injection.
	Node *Node
}

Injection is a single embedded language region.

type InjectionParser ¶ added in v0.6.0

type InjectionParser struct {
	// contains filtered or unexported fields
}

InjectionParser parses documents with embedded languages.

InjectionParser is not safe for concurrent use. It caches child parsers and mutates shared maps during parse operations.

func NewInjectionParser ¶ added in v0.6.0

func NewInjectionParser() *InjectionParser

NewInjectionParser creates an InjectionParser.

func (*InjectionParser) Parse ¶ added in v0.6.0

func (ip *InjectionParser) Parse(source []byte, parentLang string) (*InjectionResult, error)

Parse parses source as parentLang, then recursively parses injected regions.

func (*InjectionParser) ParseIncremental ¶ added in v0.6.0

func (ip *InjectionParser) ParseIncremental(source []byte, parentLang string,
	oldResult *InjectionResult) (*InjectionResult, error)

ParseIncremental re-parses after edits, reusing unchanged child trees.

func (*InjectionParser) RegisterInjectionQuery ¶ added in v0.6.0

func (ip *InjectionParser) RegisterInjectionQuery(parentLang string, query string) error

RegisterInjectionQuery sets the injection query for a parent language. The query should use @injection.content and #set! injection.language conventions. It is compiled against the registered parent language.

func (*InjectionParser) RegisterLanguage ¶ added in v0.6.0

func (ip *InjectionParser) RegisterLanguage(name string, lang *Language)

RegisterLanguage adds a language that can be used as parent or child.

func (*InjectionParser) SetMaxDepth ¶ added in v0.6.0

func (ip *InjectionParser) SetMaxDepth(depth int)

SetMaxDepth overrides the nested injection recursion limit. Depth values <= 0 restore the default limit.

type InjectionResult ¶ added in v0.6.0

type InjectionResult struct {
	// Tree is the parent language's parse tree.
	Tree *Tree
	// Injections contains child language parse results, ordered by position.
	Injections []Injection
}

InjectionResult holds parse results for a multi-language document.

type InputEdit ¶

type InputEdit struct {
	StartByte   uint32
	OldEndByte  uint32
	NewEndByte  uint32
	StartPoint  Point
	OldEndPoint Point
	NewEndPoint Point
}

InputEdit describes a single edit to the source text. It tells the parser what byte range was replaced and what the new range looks like, so the incremental parser can skip unchanged subtrees.

type Language ¶

type Language struct {
	Name string

	// LanguageVersion is the tree-sitter language ABI version.
	// A value of 0 means "unknown/unspecified" and is treated as compatible.
	LanguageVersion uint32

	// Counts
	SymbolCount        uint32
	TokenCount         uint32
	ExternalTokenCount uint32
	StateCount         uint32
	LargeStateCount    uint32
	FieldCount         uint32
	ProductionIDCount  uint32

	// Symbol metadata
	SymbolNames    []string
	SymbolMetadata []SymbolMetadata
	FieldNames     []string // index 0 is ""

	// Parse tables
	ParseTable         [][]uint16 // dense: [state][symbol] -> action index
	SmallParseTable    []uint16   // compressed sparse table
	SmallParseTableMap []uint32   // state -> offset into SmallParseTable
	ParseActions       []ParseActionEntry

	// Lex tables
	LexModes            []LexMode
	LexStates           []LexState // main lexer DFA
	KeywordLexStates    []LexState // keyword lexer DFA (optional)
	KeywordCaptureToken Symbol
	// LayoutFallbackLexState is an optional broad DFA start state used only in
	// layout-entry parser states. It lets the runtime avoid skipping over
	// zero-width external layout markers before the layout scanner fires.
	LayoutFallbackLexState    uint16
	HasLayoutFallbackLexState bool

	// Field mapping
	FieldMapSlices  [][2]uint16 // [production_id] -> (index, length)
	FieldMapEntries []FieldMapEntry

	// Alias sequences
	AliasSequences [][]Symbol // [production_id][child_index] -> alias symbol

	// Primary state IDs (for table dedup)
	PrimaryStateIDs []StateID

	// ABI 15: Reserved words — flat array indexed by
	// (reserved_word_set_id * MaxReservedWordSetSize + i), terminated by 0.
	ReservedWords          []Symbol
	MaxReservedWordSetSize uint16

	// ABI 15: Supertype hierarchy
	SupertypeSymbols    []Symbol
	SupertypeMapSlices  [][2]uint16 // [supertype_symbol] -> (index, length)
	SupertypeMapEntries []Symbol

	// ABI 15: Grammar semantic version
	Metadata LanguageMetadata

	// External scanner (nil if not needed)
	ExternalScanner ExternalScanner
	ExternalSymbols []Symbol // external token index -> symbol
	// ImmediateTokens is a bitmask of symbol IDs that are token.immediate() tokens.
	// When the lexer matches one of these after consuming whitespace, the match
	// should be rejected — immediate tokens must match at the original position.
	// nil means no immediate tokens (common for ts2go grammars).
	ImmediateTokens []bool

	// ExternalLexStates maps external lex state IDs (from LexMode.ExternalLexState)
	// to a boolean slice indicating which external tokens are valid. Row 0 is
	// always all-false (no external tokens valid). When non-nil, this table is
	// used instead of parse-action-table probing to compute validSymbols for the
	// external scanner, matching C tree-sitter's ts_external_scanner_states.
	ExternalLexStates [][]bool

	// InitialState is the parser's start state. In tree-sitter grammars
	// this is always 1 (state 0 is reserved for error recovery). For
	// hand-built grammars it defaults to 0.
	InitialState StateID
	// contains filtered or unexported fields
}

Language holds all data needed to parse a specific language. It mirrors tree-sitter's TSLanguage C struct, translated into idiomatic Go types with slice-based tables instead of raw pointers.

func LoadLanguage ¶ added in v0.9.0

func LoadLanguage(data []byte) (*Language, error)

LoadLanguage deserializes a compressed grammar blob into a Language. Blobs are produced by grammargen's GenerateLanguage or the grammar build toolchain. This is the only function needed at runtime to load pre-compiled grammars — no grammargen import required.

func (*Language) CompatibleWithRuntime ¶

func (l *Language) CompatibleWithRuntime() bool

CompatibleWithRuntime reports whether this language can be parsed by the current runtime version. Unspecified versions (0) are treated as compatible.

func (*Language) FieldByName ¶

func (l *Language) FieldByName(name string) (FieldID, bool)

FieldByName returns the field ID for a given name, or (0, false) if not found. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) IsSupertype ¶ added in v0.6.0

func (l *Language) IsSupertype(sym Symbol) bool

IsSupertype reports whether sym is a supertype symbol.

func (*Language) KeywordLexAsciiTable ¶ added in v0.10.2

func (l *Language) KeywordLexAsciiTable() [][128]int32

KeywordLexAsciiTable returns the ASCII fast-path table for the keyword lexer DFA.

func (*Language) LexAsciiTable ¶ added in v0.10.2

func (l *Language) LexAsciiTable() [][128]int32

LexAsciiTable returns the pre-built ASCII fast-path transition table for the main lexer DFA. The table is built once per Language. Entry format:

bit 31 set  → skip transition (consume and reset token start)
bits 0-30   → next state ID (lexAsciiNoMatch if no transition)

func (*Language) PublicSymbol ¶ added in v0.7.0

func (l *Language) PublicSymbol(sym Symbol) Symbol

PublicSymbol maps an internal symbol to its canonical public form. Multiple internal symbols may share the same visible name (e.g. HTML's _start_tag_name and _end_tag_name both display as "tag_name"). PublicSymbol returns the first symbol with that name, matching what SymbolByName returns. This ensures query patterns compiled with SymbolByName match nodes regardless of which alias produced them.

func (*Language) SupertypeChildren ¶ added in v0.6.0

func (l *Language) SupertypeChildren(sym Symbol) []Symbol

SupertypeChildren returns the subtype symbols for a given supertype. Returns nil if sym is not a supertype or has no entries.

func (*Language) SymbolByName ¶

func (l *Language) SymbolByName(name string) (Symbol, bool)

SymbolByName returns the symbol ID for a given name, or (0, false) if not found. The "_" wildcard returns (0, true) as a special case. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) TokenSymbolsByName ¶

func (l *Language) TokenSymbolsByName(name string) []Symbol

TokenSymbolsByName returns all terminal token symbols whose display name matches name. The returned symbols are in grammar order.

func (*Language) Version ¶

func (l *Language) Version() uint32

Version returns the tree-sitter language ABI version.

type LanguageMetadata ¶ added in v0.6.0

type LanguageMetadata struct {
	MajorVersion uint8
	MinorVersion uint8
	PatchVersion uint8
}

LanguageMetadata holds the grammar's semantic version (ABI 15+).

type LexMode ¶

type LexMode struct {
	LexState                uint16
	ExternalLexState        uint16
	ReservedWordSetID       uint16
	AfterWhitespaceLexState uint16 // DFA start state to use after whitespace (0 = same as LexState)
}

LexMode maps a parser state to its lexer configuration.

type LexState ¶

type LexState struct {
	AcceptToken    Symbol // 0 if this state doesn't accept
	AcceptPriority int16  // lower = higher priority (0 for ts2go blobs = longest-match)
	Skip           bool   // true if accepted chars are whitespace
	Default        int    // default next state (-1 if none)
	EOF            int    // state on EOF (-1 if none)
	Transitions    []LexTransition
}

LexState is one state in the table-driven lexer DFA.

type LexTransition ¶

type LexTransition struct {
	Lo, Hi    rune // inclusive character range
	NextState int
	// Skip mirrors tree-sitter's SKIP(state): consume the matched rune
	// and continue lexing while resetting token start.
	Skip bool
}

LexTransition maps a character range to a next state.

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes source text using a table-driven DFA.

func NewLexer ¶

func NewLexer(states []LexState, source []byte) *Lexer

NewLexer creates a new Lexer that will tokenize source using the given DFA state table.

func (*Lexer) Next ¶

func (l *Lexer) Next(startState uint16) Token

Next lexes the next token starting from the given lex state index. It automatically skips tokens from states where Skip=true (whitespace). Returns a zero-Symbol token with StartByte==EndByte at EOF.

type LookaheadIterator ¶ added in v0.6.0

type LookaheadIterator struct {
	// contains filtered or unexported fields
}

LookaheadIterator iterates over valid symbols for a given parse state. It precomputes the full set of symbols that have valid parse actions in the specified state, enabling autocomplete and error diagnostic use cases.

func NewLookaheadIterator ¶ added in v0.6.0

func NewLookaheadIterator(lang *Language, state StateID) (*LookaheadIterator, error)

NewLookaheadIterator creates an iterator over all symbols that have valid parse actions in the given state. Returns an error if the state is out of range for the language's parse tables.

func (*LookaheadIterator) CurrentSymbol ¶ added in v0.6.0

func (it *LookaheadIterator) CurrentSymbol() Symbol

CurrentSymbol returns the symbol at the current iterator position. Must be called after a successful Next().

func (*LookaheadIterator) CurrentSymbolName ¶ added in v0.6.0

func (it *LookaheadIterator) CurrentSymbolName() string

CurrentSymbolName returns the name of the symbol at the current iterator position. Returns "" if the position is invalid or the symbol has no name.

func (*LookaheadIterator) Language ¶ added in v0.6.0

func (it *LookaheadIterator) Language() *Language

Language returns the language associated with this iterator.

func (*LookaheadIterator) Next ¶ added in v0.6.0

func (it *LookaheadIterator) Next() bool

Next advances the iterator to the next valid symbol. Returns false when there are no more symbols.

func (*LookaheadIterator) ResetState ¶ added in v0.6.0

func (it *LookaheadIterator) ResetState(state StateID) error

ResetState resets the iterator to enumerate valid symbols for a different parse state within the same language. Returns an error if the state is out of range.

type Node ¶

type Node struct {
	// contains filtered or unexported fields
}

Node is a syntax tree node.

func NewLeafNode ¶

func NewLeafNode(sym Symbol, named bool, startByte, endByte uint32, startPoint, endPoint Point) *Node

NewLeafNode creates a terminal/leaf node.

func NewParentNode ¶

func NewParentNode(sym Symbol, named bool, children []*Node, fieldIDs []FieldID, productionID uint16) *Node

NewParentNode creates a non-terminal node with children. It sets parent pointers on all children and computes byte/point spans from the first and last children. If any child has an error, the parent is marked as having an error too.

func (*Node) Child ¶

func (n *Node) Child(i int) *Node

Child returns the i-th child, or nil if i is out of range.

func (*Node) ChildByFieldName ¶

func (n *Node) ChildByFieldName(name string, lang *Language) *Node

ChildByFieldName returns the first child assigned to the given field name, or nil if no child has that field. The Language is needed to resolve field names to IDs. Uses Language.FieldByName for O(1) lookup.

func (*Node) ChildCount ¶

func (n *Node) ChildCount() int

ChildCount returns the number of children (both named and anonymous).

func (*Node) Children ¶

func (n *Node) Children() []*Node

Children returns a slice of all children.

func (*Node) DescendantForByteRange ¶ added in v0.6.0

func (n *Node) DescendantForByteRange(startByte, endByte uint32) *Node

DescendantForByteRange returns the smallest descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) DescendantForPointRange ¶ added in v0.6.0

func (n *Node) DescendantForPointRange(startPoint, endPoint Point) *Node

DescendantForPointRange returns the smallest descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) Edit ¶ added in v0.7.0

func (n *Node) Edit(edit InputEdit)

Edit adjusts this node's byte/point span for a source edit.

If the node belongs to a larger tree, the edit is applied from the containing root so sibling and ancestor spans remain consistent. Unlike Tree.Edit, this method does not record edit history on a Tree.

func (*Node) EndByte ¶

func (n *Node) EndByte() uint32

EndByte returns the byte offset where this node ends (exclusive).

func (*Node) EndPoint ¶

func (n *Node) EndPoint() Point

EndPoint returns the row/column position where this node ends.

func (*Node) FieldNameForChild ¶ added in v0.6.0

func (n *Node) FieldNameForChild(i int, lang *Language) string

FieldNameForChild returns the field name assigned to the i-th child, or an empty string when no field is assigned.

func (*Node) HasChanges ¶ added in v0.6.0

func (n *Node) HasChanges() bool

HasChanges reports whether this node was marked dirty by Tree.Edit.

func (*Node) HasError ¶

func (n *Node) HasError() bool

HasError reports whether this node or any descendant contains a parse error.

func (*Node) IsError ¶ added in v0.6.0

func (n *Node) IsError() bool

IsError reports whether this node is an explicit error node.

func (*Node) IsExtra ¶ added in v0.6.0

func (n *Node) IsExtra() bool

IsExtra reports whether this node was marked as extra syntax (e.g. whitespace/comments outside the core parse structure).

func (*Node) IsMissing ¶

func (n *Node) IsMissing() bool

IsMissing reports whether this node was inserted by error recovery.

func (*Node) IsNamed ¶

func (n *Node) IsNamed() bool

IsNamed reports whether this is a named node (as opposed to anonymous syntax like punctuation).

func (*Node) NamedChild ¶

func (n *Node) NamedChild(i int) *Node

NamedChild returns the i-th named child (skipping anonymous children), or nil if i is out of range.

func (*Node) NamedChildCount ¶

func (n *Node) NamedChildCount() int

NamedChildCount returns the number of named children.

func (*Node) NamedDescendantForByteRange ¶ added in v0.6.0

func (n *Node) NamedDescendantForByteRange(startByte, endByte uint32) *Node

NamedDescendantForByteRange returns the smallest named descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) NamedDescendantForPointRange ¶ added in v0.6.0

func (n *Node) NamedDescendantForPointRange(startPoint, endPoint Point) *Node

NamedDescendantForPointRange returns the smallest named descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) NextSibling ¶

func (n *Node) NextSibling() *Node

NextSibling returns the next sibling node, or nil when this is the last child or has no parent.

func (*Node) Parent ¶

func (n *Node) Parent() *Node

Parent returns this node's parent, or nil if it is the root.

func (*Node) ParseState ¶

func (n *Node) ParseState() StateID

ParseState returns the parser state associated with this node.

func (*Node) PreGotoState ¶ added in v0.6.0

func (n *Node) PreGotoState() StateID

PreGotoState returns the parser state that was on top of the stack before this node was pushed (i.e., the state exposed after popping children during reduce). For non-leaf nodes: lookupGoto(PreGotoState, Symbol) == ParseState.

func (*Node) PrevSibling ¶

func (n *Node) PrevSibling() *Node

PrevSibling returns the previous sibling node, or nil when this is the first child or has no parent.

func (*Node) Range ¶

func (n *Node) Range() Range

Range returns the full span of this node as a Range.

func (*Node) SExpr ¶ added in v0.6.0

func (n *Node) SExpr(lang *Language) string

SExpr returns a tree-sitter-style S-expression for this node. It includes only named nodes for stable debug snapshots.

func (*Node) StartByte ¶

func (n *Node) StartByte() uint32

StartByte returns the byte offset where this node begins.

func (*Node) StartPoint ¶

func (n *Node) StartPoint() Point

StartPoint returns the row/column position where this node begins.

func (*Node) Symbol ¶

func (n *Node) Symbol() Symbol

Symbol returns the node's grammar symbol.

func (*Node) Text ¶

func (n *Node) Text(source []byte) string

Text returns the source text covered by this node. Returns an empty string for nil nodes or invalid byte ranges.

func (*Node) Type ¶

func (n *Node) Type(lang *Language) string

Type returns the node's type name from the language.

type ParseAction ¶

type ParseAction struct {
	Type              ParseActionType
	State             StateID // target state (shift/recover)
	Symbol            Symbol  // reduced symbol (reduce)
	ChildCount        uint8   // children consumed (reduce)
	DynamicPrecedence int16   // precedence (reduce)
	ProductionID      uint16  // which production (reduce)
	Extra             bool    // is this an extra token (shift)
	ExtraChain        bool    // does this shift enter a nonterminal extra chain
	Repetition        bool    // is this a repetition (shift)
}

ParseAction is a single parser action from the parse table.

type ParseActionEntry ¶

type ParseActionEntry struct {
	Reusable bool
	Actions  []ParseAction
}

ParseActionEntry is a group of actions for a (state, symbol) pair.

type ParseActionType ¶

type ParseActionType uint8

ParseActionType identifies the kind of parse action.

const (
	ParseActionShift ParseActionType = iota
	ParseActionReduce
	ParseActionAccept
	ParseActionRecover
)

type ParseOption ¶ added in v0.6.0

type ParseOption func(*parseConfig)

ParseOption configures ParseWith behavior.

func WithOldTree ¶ added in v0.6.0

func WithOldTree(oldTree *Tree) ParseOption

WithOldTree enables incremental parsing against an edited prior tree.

func WithProfiling ¶ added in v0.6.0

func WithProfiling() ParseOption

WithProfiling enables incremental parse attribution in ParseResult.Profile.

func WithTokenSource ¶ added in v0.6.0

func WithTokenSource(ts TokenSource) ParseOption

WithTokenSource provides a custom token source for parsing.

type ParseResult ¶ added in v0.6.0

type ParseResult struct {
	Tree *Tree
	// Profile is populated only when ParseWith uses WithProfiling for
	// incremental parsing.
	Profile IncrementalParseProfile
	// ProfileAvailable reports whether Profile contains attribution data.
	ProfileAvailable bool
}

ParseResult is returned by ParseWith.

type ParseRuntime ¶ added in v0.6.0

type ParseRuntime struct {
	StopReason                  ParseStopReason
	SourceLen                   uint32
	ExpectedEOFByte             uint32
	RootEndByte                 uint32
	Truncated                   bool
	TokenSourceEOFEarly         bool
	TokensConsumed              uint64
	LastTokenEndByte            uint32
	LastTokenSymbol             Symbol
	LastTokenWasEOF             bool
	IterationLimit              int
	StackDepthLimit             int
	NodeLimit                   int
	MemoryBudgetBytes           int64
	Iterations                  int
	NodesAllocated              int
	ArenaBytesAllocated         int64
	ScratchBytesAllocated       int64
	EntryScratchBytesAllocated  int64
	GSSBytesAllocated           int64
	PeakStackDepth              int
	MaxStacksSeen               int
	SingleStackIterations       int
	MultiStackIterations        int
	SingleStackTokens           uint64
	MultiStackTokens            uint64
	SingleStackGSSNodes         uint64
	MultiStackGSSNodes          uint64
	GSSNodesAllocated           uint64
	GSSNodesRetained            uint64
	GSSNodesDroppedSameToken    uint64
	ParentNodesAllocated        uint64
	ParentNodesRetained         uint64
	ParentNodesDroppedSameToken uint64
	LeafNodesAllocated          uint64
	LeafNodesRetained           uint64
	LeafNodesDroppedSameToken   uint64
	MergeStacksIn               uint64
	MergeStacksOut              uint64
	MergeSlotsUsed              uint64
	GlobalCullStacksIn          uint64
	GlobalCullStacksOut         uint64
}

ParseRuntime captures parser-loop diagnostics for a completed tree.

func (ParseRuntime) Summary ¶ added in v0.6.0

func (rt ParseRuntime) Summary() string

Summary returns a stable one-line diagnostic string for parse-runtime stats.

type ParseStopReason ¶ added in v0.6.0

type ParseStopReason string

ParseStopReason reports why parseInternal terminated.

const (
	ParseStopNone            ParseStopReason = "none"
	ParseStopAccepted        ParseStopReason = "accepted"
	ParseStopNoStacksAlive   ParseStopReason = "no_stacks_alive"
	ParseStopTokenSourceEOF  ParseStopReason = "token_source_eof"
	ParseStopTimeout         ParseStopReason = "timeout"
	ParseStopCancelled       ParseStopReason = "cancelled"
	ParseStopIterationLimit  ParseStopReason = "iteration_limit"
	ParseStopStackDepthLimit ParseStopReason = "stack_depth_limit"
	ParseStopNodeLimit       ParseStopReason = "node_limit"
	ParseStopMemoryBudget    ParseStopReason = "memory_budget"
)

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser reads parse tables from a Language and produces a syntax tree. It supports GLR parsing: when a (state, symbol) pair maps to multiple actions, the parser forks the stack and explores all alternatives in parallel while preserving distinct parse paths. Duplicate stack versions are collapsed and ambiguities are resolved at selection time.

Parser is not safe for concurrent use. Use one parser per goroutine, a ParserPool, or guard shared parser instances with external synchronization.

func NewParser ¶

func NewParser(lang *Language) *Parser

NewParser creates a new Parser for the given language.

func (*Parser) CancellationFlag ¶ added in v0.7.0

func (p *Parser) CancellationFlag() *uint32

CancellationFlag returns the parser's current cancellation flag pointer.

func (*Parser) IncludedRanges ¶ added in v0.6.0

func (p *Parser) IncludedRanges() []Range

IncludedRanges returns a copy of the configured include ranges.

func (*Parser) InferredRootSymbol ¶ added in v0.9.0

func (p *Parser) InferredRootSymbol() (Symbol, bool)

InferredRootSymbol returns the root symbol inferred during parser construction, and whether inference succeeded.

func (*Parser) Language ¶ added in v0.7.0

func (p *Parser) Language() *Language

Language returns the parser's configured language.

func (*Parser) Logger ¶ added in v0.7.0

func (p *Parser) Logger() ParserLogger

Logger returns the currently configured parser debug logger.

func (*Parser) Parse ¶

func (p *Parser) Parse(source []byte) (*Tree, error)

Parse tokenizes and parses source using the built-in DFA lexer, returning a syntax tree. This works for hand-built grammars that provide LexStates. For real grammars that need a custom lexer, use ParseWithTokenSource. If the input is empty, it returns a tree with a nil root and no error.

func (*Parser) ParseIncremental ¶

func (p *Parser) ParseIncremental(source []byte, oldTree *Tree) (*Tree, error)

ParseIncremental re-parses source after edits were applied to oldTree. It reuses unchanged subtrees from the old tree for better performance. Call oldTree.Edit() for each edit before calling this method.

func (*Parser) ParseIncrementalProfiled ¶ added in v0.6.0

func (p *Parser) ParseIncrementalProfiled(source []byte, oldTree *Tree) (*Tree, IncrementalParseProfile, error)

ParseIncrementalProfiled is like ParseIncremental and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseIncrementalWithTokenSource ¶

func (p *Parser) ParseIncrementalWithTokenSource(source []byte, oldTree *Tree, ts TokenSource) (*Tree, error)

ParseIncrementalWithTokenSource is like ParseIncremental but uses a custom token source.

func (*Parser) ParseIncrementalWithTokenSourceProfiled ¶ added in v0.6.0

func (p *Parser) ParseIncrementalWithTokenSourceProfiled(source []byte, oldTree *Tree, ts TokenSource) (*Tree, IncrementalParseProfile, error)

ParseIncrementalWithTokenSourceProfiled is like ParseIncrementalWithTokenSource and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseWith ¶ added in v0.6.0

func (p *Parser) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith parses source using option-based configuration.

func (*Parser) ParseWithTokenSource ¶

func (p *Parser) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource parses source using a custom token source. This is used for real grammars where the lexer DFA isn't available as data tables (e.g., Go grammar using go/scanner as a bridge).

func (*Parser) SetCancellationFlag ¶ added in v0.7.0

func (p *Parser) SetCancellationFlag(flag *uint32)

SetCancellationFlag configures a caller-owned cancellation flag. Parsing stops when the pointed value becomes non-zero.

func (*Parser) SetGLRTrace ¶ added in v0.7.0

func (p *Parser) SetGLRTrace(enabled bool)

SetGLRTrace enables verbose GLR stack tracing to stdout (debug only).

func (*Parser) SetIncludedRanges ¶ added in v0.6.0

func (p *Parser) SetIncludedRanges(ranges []Range)

SetIncludedRanges configures parser include ranges. Tokens outside these ranges are skipped.

func (*Parser) SetLogger ¶ added in v0.7.0

func (p *Parser) SetLogger(logger ParserLogger)

SetLogger installs a parser debug logger. Pass nil to disable logging.

func (*Parser) SetTimeoutMicros ¶ added in v0.7.0

func (p *Parser) SetTimeoutMicros(timeoutMicros uint64)

SetTimeoutMicros configures a per-parse timeout in microseconds. A value of 0 disables timeout checks.

func (*Parser) TimeoutMicros ¶ added in v0.7.0

func (p *Parser) TimeoutMicros() uint64

TimeoutMicros returns the parser timeout in microseconds.

type ParserLogType ¶ added in v0.7.0

type ParserLogType uint8

ParserLogType categorizes parser log messages.

const (
	// ParserLogParse emits parser-loop lifecycle and control-flow logs.
	ParserLogParse ParserLogType = iota
	// ParserLogLex emits token-source and token-consumption logs.
	ParserLogLex
)

type ParserLogger ¶ added in v0.7.0

type ParserLogger func(kind ParserLogType, message string)

ParserLogger receives parser debug logs when configured via SetLogger.

type ParserPool ¶ added in v0.7.0

type ParserPool struct {
	// contains filtered or unexported fields
}

ParserPool provides concurrency-safe parsing by reusing Parser instances.

ParserPool is safe for concurrent use. Each call checks out one parser from an internal sync.Pool, applies configured defaults, runs the parse, and returns the parser to the pool.

Mutable parser state (logger, timeout, cancellation flag, included ranges, GLR trace) is reset on checkout so request-local state cannot bleed across callers.

func NewParserPool ¶ added in v0.7.0

func NewParserPool(lang *Language, opts ...ParserPoolOption) *ParserPool

NewParserPool creates a concurrency-safe parser pool for lang.

func (*ParserPool) Language ¶ added in v0.7.0

func (pp *ParserPool) Language() *Language

Language returns the pool's configured language.

func (*ParserPool) Parse ¶ added in v0.7.0

func (pp *ParserPool) Parse(source []byte) (*Tree, error)

Parse delegates to a pooled Parser.Parse call.

func (*ParserPool) ParseWith ¶ added in v0.7.0

func (pp *ParserPool) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith delegates to a pooled Parser.ParseWith call.

func (*ParserPool) ParseWithTokenSource ¶ added in v0.7.0

func (pp *ParserPool) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource delegates to a pooled Parser.ParseWithTokenSource call.

type ParserPoolOption ¶ added in v0.7.0

type ParserPoolOption func(*parserPoolConfig)

ParserPoolOption configures a ParserPool.

func WithParserPoolGLRTrace ¶ added in v0.7.0

func WithParserPoolGLRTrace(enabled bool) ParserPoolOption

WithParserPoolGLRTrace toggles GLR trace logs on pooled parser instances.

func WithParserPoolIncludedRanges ¶ added in v0.7.0

func WithParserPoolIncludedRanges(ranges []Range) ParserPoolOption

WithParserPoolIncludedRanges sets default include ranges for pooled parsers.

func WithParserPoolLogger ¶ added in v0.7.0

func WithParserPoolLogger(logger ParserLogger) ParserPoolOption

WithParserPoolLogger sets the logger applied to pooled parser instances.

func WithParserPoolTimeoutMicros ¶ added in v0.7.0

func WithParserPoolTimeoutMicros(timeoutMicros uint64) ParserPoolOption

WithParserPoolTimeoutMicros sets the parse timeout for pooled parsers.

type Pattern ¶

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern is a single top-level S-expression pattern in a query.

type PerfCounters ¶ added in v0.6.0

type PerfCounters struct {
	MergeCalls             uint64
	MergeDeadPruned        uint64
	MergePerKeyOverflow    uint64
	MergeReplacements      uint64
	StackEquivalentCalls   uint64
	StackEquivalentTrue    uint64
	StackEqHashMissSkips   uint64
	StackCompareCalls      uint64
	ConflictRR             uint64
	ConflictRS             uint64
	ConflictOther          uint64
	ForkCount              uint64
	FirstConflictToken     uint64
	MaxConcurrentStacks    uint64
	LexBytes               uint64
	LexTokens              uint64
	ReuseNodesVisited      uint64
	ReuseNodesPushed       uint64
	ReuseNodesPopped       uint64
	ReuseCandidatesChecked uint64
	ReuseSuccesses         uint64
	ReuseLeafSuccesses     uint64
	ReuseNonLeafChecks     uint64
	ReuseNonLeafSuccesses  uint64
	ReuseNonLeafBytes      uint64
	ReuseNonLeafNoGoto     uint64
	ReuseNonLeafNoGotoTerm uint64
	ReuseNonLeafNoGotoNt   uint64
	ReuseNonLeafStateMiss  uint64
	ReuseNonLeafStateZero  uint64
	MergeHashZero          uint64
	GlobalCapCulls         uint64
	GlobalCapCullDropped   uint64
	ReduceChainSteps       uint64
	ReduceChainMaxLen      uint64
	ReduceChainBreakMulti  uint64
	ReduceChainBreakShift  uint64
	ReduceChainBreakAccept uint64
	ParentChildPointers    uint64
	ExtraNodes             uint64
	ErrorNodes             uint64
	MergeStacksInHist      [maxGLRStacks + 2]uint64
	MergeAliveHist         [maxGLRStacks + 2]uint64
	MergeOutHist           [maxGLRStacks + 2]uint64
	ForkActionsHist        [8]uint64
}

func PerfCountersSnapshot ¶ added in v0.6.0

func PerfCountersSnapshot() PerfCounters

type Point ¶

type Point struct {
	Row    uint32
	Column uint32
}

Point is a row/column position in source text.

type PointSkippableTokenSource ¶

type PointSkippableTokenSource interface {
	ByteSkippableTokenSource
	SkipToByteWithPoint(offset uint32, pt Point) Token
}

PointSkippableTokenSource extends ByteSkippableTokenSource with a hint-based skip that avoids recomputing row/column from byte offset. During incremental parsing the reused node already carries its endpoint, so passing it directly eliminates the O(n) offset-to-point scan.

type Query ¶

type Query struct {
	// contains filtered or unexported fields
}

Query holds compiled patterns parsed from a tree-sitter .scm query file. It can be executed against a syntax tree to find matching nodes and return captured names. Query is safe for concurrent use after construction.

func NewQuery ¶

func NewQuery(source string, lang *Language) (*Query, error)

NewQuery compiles query source (tree-sitter .scm format) against a language. It returns an error if the query syntax is invalid or references unknown node types or field names.

func (*Query) CaptureCount ¶ added in v0.7.0

func (q *Query) CaptureCount() uint32

CaptureCount returns the number of unique capture names in this query.

func (*Query) CaptureNameForID ¶ added in v0.7.0

func (q *Query) CaptureNameForID(id uint32) (string, bool)

CaptureNameForID returns the capture name for the given capture id.

func (*Query) CaptureNames ¶

func (q *Query) CaptureNames() []string

CaptureNames returns the list of unique capture names used in the query.

func (*Query) DisableCapture ¶ added in v0.7.0

func (q *Query) DisableCapture(name string)

DisableCapture removes captures with the given name from future query results. Matching behavior is unchanged; only returned captures are filtered.

func (*Query) DisablePattern ¶ added in v0.7.0

func (q *Query) DisablePattern(patternIndex uint32)

DisablePattern disables a pattern by index.

func (*Query) EndByteForPattern ¶ added in v0.7.0

func (q *Query) EndByteForPattern(patternIndex uint32) (uint32, bool)

EndByteForPattern returns the query-source end byte for patternIndex.

func (*Query) Exec ¶

func (q *Query) Exec(node *Node, lang *Language, source []byte) *QueryCursor

Exec creates a streaming cursor over matches rooted at node.

func (*Query) Execute ¶

func (q *Query) Execute(tree *Tree) []QueryMatch

Execute runs the query against a syntax tree and returns all matches.

func (*Query) ExecuteInto ¶ added in v0.10.2

func (q *Query) ExecuteInto(tree *Tree, dst []QueryMatch) []QueryMatch

ExecuteInto runs the query against a syntax tree, appending matches into dst and returning the updated slice. Callers can pre-allocate or reuse dst across calls to eliminate the per-call slice allocation from Execute.

Example:

var buf []QueryMatch
for _, tree := range trees {
    buf = q.ExecuteInto(tree, buf[:0])
    process(buf)
}

func (*Query) ExecuteNode ¶

func (q *Query) ExecuteNode(node *Node, lang *Language, source []byte) []QueryMatch

ExecuteNode runs the query starting from a specific node.

source is required for text predicates (like #eq? / #match?); pass the originating source bytes for correct predicate evaluation.

func (*Query) IsPatternGuaranteedAtStep ¶ added in v0.7.0

func (q *Query) IsPatternGuaranteedAtStep(patternIndex uint32, stepIndex uint32) bool

IsPatternGuaranteedAtStep reports whether all steps through stepIndex are definite and non-quantified.

func (*Query) IsPatternNonLocal ¶ added in v0.7.0

func (q *Query) IsPatternNonLocal(patternIndex uint32) bool

IsPatternNonLocal reports whether the pattern can begin at multiple roots.

func (*Query) IsPatternRooted ¶ added in v0.7.0

func (q *Query) IsPatternRooted(patternIndex uint32) bool

IsPatternRooted reports whether the pattern has exactly one root step at depth 0. Rooted patterns start matching from a single concrete root.

func (*Query) PatternCount ¶

func (q *Query) PatternCount() int

PatternCount returns the number of patterns in the query.

func (*Query) PredicatesForPattern ¶ added in v0.7.0

func (q *Query) PredicatesForPattern(patternIndex uint32) ([]QueryPredicate, bool)

PredicatesForPattern returns a copy of predicates attached to patternIndex.

func (*Query) StartByteForPattern ¶ added in v0.7.0

func (q *Query) StartByteForPattern(patternIndex uint32) (uint32, bool)

StartByteForPattern returns the query-source start byte for patternIndex.

func (*Query) StepIsDefinite ¶ added in v0.7.0

func (q *Query) StepIsDefinite(patternIndex uint32, stepIndex uint32) bool

StepIsDefinite reports whether a pattern step matches a definite symbol (i.e. not wildcard).

func (*Query) StringCount ¶ added in v0.7.0

func (q *Query) StringCount() uint32

StringCount returns the number of unique string literals in this query.

func (*Query) StringValueForID ¶ added in v0.7.0

func (q *Query) StringValueForID(id uint32) (string, bool)

StringValueForID returns the string literal for the given string id.

type QueryCapture ¶

type QueryCapture struct {
	Name string
	Node *Node
	// TextOverride, when non-empty, replaces the node's source text for
	// downstream consumers. It is set by the #strip! directive.
	TextOverride string
}

QueryCapture is a single captured node within a match.

func (QueryCapture) Text ¶ added in v0.6.0

func (c QueryCapture) Text(source []byte) string

Text returns the effective text for this capture. If TextOverride is set (e.g. by the #strip! directive), it is returned. Otherwise the node's source text is returned.

type QueryCursor ¶

type QueryCursor struct {
	// contains filtered or unexported fields
}

QueryCursor incrementally walks a node subtree and yields matches one by one. It is the streaming counterpart to Query.Execute and avoids materializing all matches up front. QueryCursor is not safe for concurrent use.

func (*QueryCursor) DidExceedMatchLimit ¶ added in v0.7.0

func (c *QueryCursor) DidExceedMatchLimit() bool

DidExceedMatchLimit reports whether query execution had additional matches beyond the configured match limit.

func (*QueryCursor) NextCapture ¶

func (c *QueryCursor) NextCapture() (QueryCapture, bool)

NextCapture yields captures in match order by draining NextMatch results. This is a practical first-pass ordering: captures are returned in each match's capture order, then by subsequent matches in DFS match order.

func (*QueryCursor) NextMatch ¶

func (c *QueryCursor) NextMatch() (QueryMatch, bool)

NextMatch yields the next query match from the cursor.

func (*QueryCursor) SetByteRange ¶ added in v0.6.0

func (c *QueryCursor) SetByteRange(startByte, endByte uint32)

SetByteRange restricts matches to nodes that intersect [startByte, endByte).

func (*QueryCursor) SetMatchLimit ¶ added in v0.7.0

func (c *QueryCursor) SetMatchLimit(limit uint32)

SetMatchLimit sets the maximum number of matches this cursor can return. A limit of 0 means unlimited.

func (*QueryCursor) SetMaxStartDepth ¶ added in v0.7.0

func (c *QueryCursor) SetMaxStartDepth(depth uint32)

SetMaxStartDepth limits the depth at which new matches can begin. Depth 0 means only the starting node passed to Exec.

func (*QueryCursor) SetPointRange ¶ added in v0.6.0

func (c *QueryCursor) SetPointRange(startPoint, endPoint Point)

SetPointRange restricts matches to nodes that intersect [startPoint, endPoint).

type QueryMatch ¶

type QueryMatch struct {
	PatternIndex int
	Captures     []QueryCapture
}

QueryMatch represents a successful pattern match with its captures.

func (QueryMatch) SetValues ¶ added in v0.6.0

func (m QueryMatch) SetValues(q *Query, key string) []string

SetValues returns the values of a #set! directive with the given key for a match's pattern, or nil if not present. This is used by InjectionParser to read injection.language metadata.

type QueryPredicate ¶

type QueryPredicate struct {
	// contains filtered or unexported fields
}

QueryPredicate is a post-match constraint attached to a pattern. Supported forms:

(#eq? @a @b)
(#eq? @a "literal")
(#not-eq? @a @b)
(#not-eq? @a "literal")
(#match? @a "regex")
(#not-match? @a "regex")
(#lua-match? @a "lua-pattern")
(#any-of? @a "v1" "v2" ...)
(#not-any-of? @a "v1" "v2" ...)
(#any-eq? @a "literal"), (#any-eq? @a @b)
(#any-not-eq? @a "literal"), (#any-not-eq? @a @b)
(#any-match? @a "regex")
(#any-not-match? @a "regex")
(#has-ancestor? @a type ...)
(#not-has-ancestor? @a type ...)
(#not-has-parent? @a type ...)
(#is? ...), (#is-not? ...)
(#set! key value), (#offset! @cap ...)
(#count? @a op value) -- op: >, <, >=, <=, ==, !=
(#is-exported? @a)

type QueryStep ¶

type QueryStep struct {
	// contains filtered or unexported fields
}

QueryStep is one matching instruction within a pattern.

type Range ¶

type Range struct {
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
}

Range is a span of source text.

func DiffChangedRanges ¶ added in v0.6.0

func DiffChangedRanges(oldTree, newTree *Tree) []Range

DiffChangedRanges compares two syntax trees and returns the minimal ranges where syntactic structure differs. The old tree should have been edited (via Tree.Edit) to match the new tree's source positions before reparsing.

This is equivalent to C tree-sitter's ts_tree_get_changed_ranges().

type Rewriter ¶ added in v0.6.0

type Rewriter struct {
	// contains filtered or unexported fields
}

Rewriter collects source-text edits and applies them atomically. Edits target byte ranges (usually from Node.StartByte/EndByte). Apply returns new source bytes and InputEdit records for incremental reparsing. Rewriter is not safe for concurrent use.

func NewRewriter ¶ added in v0.6.0

func NewRewriter(source []byte) *Rewriter

NewRewriter creates a Rewriter for the given source text.

func (*Rewriter) Apply ¶ added in v0.6.0

func (r *Rewriter) Apply() (newSource []byte, edits []InputEdit, err error)

Apply sorts edits, validates no overlaps, applies them, and returns the new source bytes plus InputEdit records for incremental reparsing. Returns error if edits overlap.

func (*Rewriter) ApplyToTree ¶ added in v0.6.0

func (r *Rewriter) ApplyToTree(tree *Tree) ([]byte, error)

ApplyToTree is a convenience that calls Apply(), then tree.Edit() for each edit, returning the new source ready for ParseIncremental.

func (*Rewriter) Delete ¶ added in v0.6.0

func (r *Rewriter) Delete(node *Node)

Delete removes the source text covered by node.

func (*Rewriter) InsertAfter ¶ added in v0.6.0

func (r *Rewriter) InsertAfter(node *Node, text []byte)

InsertAfter inserts text immediately after node.

func (*Rewriter) InsertBefore ¶ added in v0.6.0

func (r *Rewriter) InsertBefore(node *Node, text []byte)

InsertBefore inserts text immediately before node.

func (*Rewriter) Replace ¶ added in v0.6.0

func (r *Rewriter) Replace(node *Node, newText []byte)

Replace replaces the source text covered by node with newText.

func (*Rewriter) ReplaceRange ¶ added in v0.6.0

func (r *Rewriter) ReplaceRange(startByte, endByte uint32, newText []byte)

ReplaceRange replaces bytes in [startByte, endByte) with newText.

type StateID ¶

type StateID uint32

StateID is a parser state index. uint32 supports grammars with >65K states (e.g. COBOL with 67K states from 1071 rules).

type Symbol ¶

type Symbol uint16

Symbol is a grammar symbol ID (terminal or nonterminal).

type SymbolMetadata ¶

type SymbolMetadata struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
}

SymbolMetadata holds display information about a symbol.

type Tag ¶

type Tag struct {
	Kind      string // e.g. "definition.function", "reference.call"
	Name      string // the captured symbol text
	Range     Range  // full span of the tagged node
	NameRange Range  // span of the @name capture
}

Tag represents a tagged symbol in source code, extracted by a Tagger. Kind follows tree-sitter convention: "definition.function", "reference.call", etc. Name is the captured symbol text (e.g., the function name).

type Tagger ¶

type Tagger struct {
	// contains filtered or unexported fields
}

Tagger extracts symbol definitions and references from source code using tree-sitter tags queries. It is the tagging counterpart to Highlighter.

Tags queries use a convention where captures follow the pattern:

@name captures the symbol name (e.g., function identifier)
@definition.X or @reference.X captures the kind

Example query:

(function_declaration name: (identifier) @name) @definition.function
(call_expression function: (identifier) @name) @reference.call

func NewTagger ¶

func NewTagger(lang *Language, tagsQuery string, opts ...TaggerOption) (*Tagger, error)

NewTagger creates a Tagger for the given language and tags query.

func (*Tagger) Tag ¶

func (tg *Tagger) Tag(source []byte) []Tag

Tag parses source and returns all tags.

func (*Tagger) TagIncremental ¶

func (tg *Tagger) TagIncremental(source []byte, oldTree *Tree) ([]Tag, *Tree)

TagIncremental re-tags source after edits to oldTree. Returns the tags and the new tree for subsequent incremental calls.

func (*Tagger) TagTree ¶

func (tg *Tagger) TagTree(tree *Tree) []Tag

TagTree extracts tags from an already-parsed tree.

type TaggerOption ¶

type TaggerOption func(*Tagger)

TaggerOption configures a Tagger.

func WithTaggerTokenSourceFactory ¶

func WithTaggerTokenSourceFactory(factory func(source []byte) TokenSource) TaggerOption

WithTaggerTokenSourceFactory sets a factory function that creates a TokenSource for each Tag call.

type Token ¶

type Token struct {
	Symbol     Symbol
	Text       string
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
	Missing    bool
	// NoLookahead marks a synthetic EOF used to force EOF-table reductions
	// without consuming input, matching tree-sitter's lex_state = -1.
	NoLookahead bool
}

Token is a lexed token with position info.

type TokenSource ¶

type TokenSource interface {
	// Next returns the next token. It should skip whitespace and comments
	// as appropriate for the language. Returns a zero-Symbol token at EOF.
	Next() Token
}

TokenSource provides tokens to the parser. This interface abstracts over different lexer implementations: the built-in DFA lexer (for hand-built grammars) or custom bridges like GoTokenSource (for real grammars where we can't extract the C lexer DFA).

type TokenSourceRebuilder ¶ added in v0.7.0

type TokenSourceRebuilder interface {
	RebuildTokenSource(source []byte, lang *Language) (TokenSource, error)
}

TokenSourceRebuilder is an optional extension for token sources that can build a fresh equivalent token source for another source buffer. Result normalization uses this to reparse isolated fragments with the same lexer backend as the original parse.

type Tree ¶

type Tree struct {
	// contains filtered or unexported fields
}

Tree holds a complete syntax tree along with its source text and language. Tree is safe for concurrent reads after construction. Edit and Release are not safe for concurrent use.

func NewTree ¶

func NewTree(root *Node, source []byte, lang *Language) *Tree

NewTree creates a new Tree.

func (*Tree) ChangedRanges ¶ added in v0.6.0

func (t *Tree) ChangedRanges() []Range

ChangedRanges converts this tree's recorded edits into changed source ranges. Overlapping ranges are coalesced.

func (*Tree) Copy ¶ added in v0.7.0

func (t *Tree) Copy() *Tree

Copy returns an independent copy of this tree.

The copied tree has distinct node objects, so subsequent Tree.Edit calls on either tree do not mutate the other's spans/dirty bits. Source bytes and language pointer are shared (read-only).

func (*Tree) DOT ¶ added in v0.7.0

func (t *Tree) DOT(lang *Language) string

DOT returns a DOT graph representation of this tree.

func (*Tree) Edit ¶

func (t *Tree) Edit(edit InputEdit)

Edit records an edit on this tree. Call this before ParseIncremental to inform the parser which regions changed. The edit adjusts byte offsets and marks overlapping nodes as dirty so the incremental parser knows what to re-parse.

func (*Tree) Edits ¶

func (t *Tree) Edits() []InputEdit

Edits returns the pending edits recorded on this tree.

func (*Tree) Language ¶

func (t *Tree) Language() *Language

Language returns the language used to parse this tree.

func (*Tree) ParseRuntime ¶ added in v0.6.0

func (t *Tree) ParseRuntime() ParseRuntime

ParseRuntime returns parser-loop diagnostics captured when this tree was built.

func (*Tree) ParseStopReason ¶ added in v0.6.0

func (t *Tree) ParseStopReason() ParseStopReason

ParseStopReason reports why parsing terminated.

func (*Tree) ParseStoppedEarly ¶ added in v0.6.0

func (t *Tree) ParseStoppedEarly() bool

ParseStoppedEarly reports whether parsing hit an early-stop condition.

func (*Tree) Release ¶

func (t *Tree) Release()

Release decrements arena references held by this tree. After Release, the tree should be treated as invalid and not reused.

func (*Tree) RootNode ¶

func (t *Tree) RootNode() *Node

RootNode returns the tree's root node.

func (*Tree) RootNodeWithOffset ¶ added in v0.7.0

func (t *Tree) RootNodeWithOffset(offsetBytes uint32, offsetExtent Point) *Node

RootNodeWithOffset returns a copy of the root node with all spans shifted by the provided byte and point offsets.

This mirrors tree-sitter C's root-node-with-offset behavior for callers that need to embed a parsed tree at a larger document offset.

func (*Tree) Source ¶

func (t *Tree) Source() []byte

Source returns the original source text.

func (*Tree) WriteDOT ¶ added in v0.7.0

func (t *Tree) WriteDOT(w io.Writer, lang *Language) error

WriteDOT writes a DOT graph representation of this tree to w.

type TreeCursor ¶ added in v0.6.0

type TreeCursor struct {
	// contains filtered or unexported fields
}

TreeCursor provides stateful, O(1) tree navigation. It maintains a stack of (node, childIndex) frames enabling efficient parent, child, and sibling movement without scanning.

The cursor holds pointers to Nodes. If the underlying Tree is released, edited, or replaced via incremental reparse, the cursor should be recreated.

func NewTreeCursor ¶ added in v0.6.0

func NewTreeCursor(node *Node, tree *Tree) *TreeCursor

NewTreeCursor creates a cursor starting at the given node. The optional tree reference enables field name resolution and text extraction.

func NewTreeCursorFromTree ¶ added in v0.6.0

func NewTreeCursorFromTree(tree *Tree) *TreeCursor

NewTreeCursorFromTree creates a cursor starting at the tree's root node.

func (*TreeCursor) Copy ¶ added in v0.6.0

func (c *TreeCursor) Copy() *TreeCursor

Copy returns an independent copy of the cursor. The copy shares the same tree reference but has its own navigation stack.

func (*TreeCursor) CurrentFieldID ¶ added in v0.6.0

func (c *TreeCursor) CurrentFieldID() FieldID

CurrentFieldID returns the field ID of the current node within its parent. Returns 0 if the cursor is at the root or the node has no field assignment.

func (*TreeCursor) CurrentFieldName ¶ added in v0.6.0

func (c *TreeCursor) CurrentFieldName() string

CurrentFieldName returns the field name of the current node within its parent. Returns "" if no tree is associated, the cursor is at the root, or the node has no field assignment.

func (*TreeCursor) CurrentNode ¶ added in v0.6.0

func (c *TreeCursor) CurrentNode() *Node

CurrentNode returns the node the cursor is currently pointing to.

func (*TreeCursor) CurrentNodeIsNamed ¶ added in v0.6.0

func (c *TreeCursor) CurrentNodeIsNamed() bool

CurrentNodeIsNamed returns whether the current node is a named node.

func (*TreeCursor) CurrentNodeText ¶ added in v0.6.0

func (c *TreeCursor) CurrentNodeText() string

CurrentNodeText returns the source text of the current node. Requires a tree with source to be associated.

func (*TreeCursor) CurrentNodeType ¶ added in v0.6.0

func (c *TreeCursor) CurrentNodeType() string

CurrentNodeType returns the type name of the current node. Requires a tree with a language to be associated.

func (*TreeCursor) Depth ¶ added in v0.6.0

func (c *TreeCursor) Depth() int

Depth returns the cursor's current depth (0 at the root).

func (*TreeCursor) GotoChildByFieldID ¶ added in v0.6.0

func (c *TreeCursor) GotoChildByFieldID(fid FieldID) bool

GotoChildByFieldID moves the cursor to the first child with the given field ID. Returns false if no child has that field.

func (*TreeCursor) GotoChildByFieldName ¶ added in v0.6.0

func (c *TreeCursor) GotoChildByFieldName(name string) bool

GotoChildByFieldName moves the cursor to the first child with the given field name. Returns false if the tree has no language, the field name is unknown, or no child has that field.

func (*TreeCursor) GotoFirstChild ¶ added in v0.6.0

func (c *TreeCursor) GotoFirstChild() bool

GotoFirstChild moves the cursor to the first child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoFirstChildForByte ¶ added in v0.6.0

func (c *TreeCursor) GotoFirstChildForByte(targetByte uint32) int64

GotoFirstChildForByte moves the cursor to the first child whose byte range contains targetByte (i.e., first child where endByte > targetByte). Returns the child index, or -1 when no child contains the byte.

func (*TreeCursor) GotoFirstChildForPoint ¶ added in v0.6.0

func (c *TreeCursor) GotoFirstChildForPoint(targetPoint Point) int64

GotoFirstChildForPoint moves the cursor to the first child whose point range contains targetPoint (i.e., first child where endPoint > targetPoint). Returns the child index, or -1 when no child contains the point.

func (*TreeCursor) GotoFirstNamedChild ¶ added in v0.6.0

func (c *TreeCursor) GotoFirstNamedChild() bool

GotoFirstNamedChild moves the cursor to the first named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoLastChild ¶ added in v0.6.0

func (c *TreeCursor) GotoLastChild() bool

GotoLastChild moves the cursor to the last child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoLastNamedChild ¶ added in v0.6.0

func (c *TreeCursor) GotoLastNamedChild() bool

GotoLastNamedChild moves the cursor to the last named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoNextNamedSibling ¶ added in v0.6.0

func (c *TreeCursor) GotoNextNamedSibling() bool

GotoNextNamedSibling moves the cursor to the next named sibling, skipping anonymous nodes. Returns false if no named sibling follows.

func (*TreeCursor) GotoNextSibling ¶ added in v0.6.0

func (c *TreeCursor) GotoNextSibling() bool

GotoNextSibling moves the cursor to the next sibling. Returns false if the cursor is at the root or the last sibling.

func (*TreeCursor) GotoParent ¶ added in v0.6.0

func (c *TreeCursor) GotoParent() bool

GotoParent moves the cursor to the parent of the current node. Returns false if the cursor is at the root.

func (*TreeCursor) GotoPrevNamedSibling ¶ added in v0.6.0

func (c *TreeCursor) GotoPrevNamedSibling() bool

GotoPrevNamedSibling moves the cursor to the previous named sibling, skipping anonymous nodes. Returns false if no named sibling precedes.

func (*TreeCursor) GotoPrevSibling ¶ added in v0.6.0

func (c *TreeCursor) GotoPrevSibling() bool

GotoPrevSibling moves the cursor to the previous sibling. Returns false if the cursor is at the root or the first sibling.

func (*TreeCursor) Reset ¶ added in v0.6.0

func (c *TreeCursor) Reset(node *Node)

Reset resets the cursor to a new root node, clearing the navigation stack.

func (*TreeCursor) ResetTree ¶ added in v0.6.0

func (c *TreeCursor) ResetTree(tree *Tree)

ResetTree resets the cursor to the root of a new tree.

type WalkAction ¶

type WalkAction int

WalkAction controls the tree walk behavior.

const (
	// WalkContinue continues the walk to children and siblings.
	WalkContinue WalkAction = iota
	// WalkSkipChildren skips the current node's children but continues to siblings.
	WalkSkipChildren
	// WalkStop terminates the walk entirely.
	WalkStop
)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cgo_harness module
cmd
benchgate command
benchmatrix command
emit_grammargen_go_blob command emit_grammargen_go_blob rebuilds grammars/grammar_blobs/go.bin using our own grammargen (pure-Go LALR(1) + LR(1) state splitting) rather than the ts2go pipeline.	emit_grammargen_go_blob rebuilds grammars/grammar_blobs/go.bin using our own grammargen (pure-Go LALR(1) + LR(1) state splitting) rather than the ts2go pipeline.
gen_linguist command Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.	Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.
grammar_updater command Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.	Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.
grammargen command Command grammargen generates tree-sitter parser artifacts from grammar definitions.	Command grammargen generates tree-sitter parser artifacts from grammar definitions.
harnessgate command
parity_report command
perfprobe command
probe_generic_support command
ts2go command Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.	Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
tsquery command Command tsquery generates type-safe Go code from tree-sitter .scm query files.	Command tsquery generates type-safe Go code from tree-sitter .scm query files.
grammargen Package grammargen implements a pure-Go grammar generator for gotreesitter.	Package grammargen implements a pure-Go grammar generator for gotreesitter.
grammarlsp
grammars Package grammars provides built-in and extension tree-sitter grammars with lazy loading.	Package grammars provides built-in and extension tree-sitter grammars with lazy loading.
grep Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.	Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.
wasm
grammargen command
runtime command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL