gotreesitter

package module
v0.20.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2026 License: MIT Imports: 23 Imported by: 26

README

gotreesitter

Pure-Go tree-sitter runtime. No CGo, no C toolchain. Cross-compiles to any GOOS/GOARCH target Go supports, including wasip1.

go get github.com/odvcencio/gotreesitter

gotreesitter loads the same parse-table format that tree-sitter's C runtime uses. Grammar tables are extracted from upstream parser.c files by ts2go, compressed into binary blobs, and deserialized on first use. 206 grammars ship in the registry.

Agent Skill

Agents working with gotreesitter should use the using-gotreesitter skill.

Motivation

Every Go tree-sitter binding in the ecosystem depends on CGo:

  • Cross-compilation requires a C cross-toolchain per target. GOOS=wasip1, GOARCH=arm64 from a Linux host, or any Windows build without MSYS2/MinGW, will not link.
  • CI images must carry gcc and the grammar's C sources. go install fails for downstream users who don't have a C compiler.
  • The Go race detector, coverage instrumentation, and fuzzer cannot see across the CGo boundary. Bugs in the C runtime or in FFI marshaling are invisible to go test -race.

gotreesitter eliminates the C dependency entirely. The parser, lexer, query engine, incremental reparsing, arena allocator, external scanners, and tree cursor are all implemented in Go. The only input is the grammar blob.

Quick start

import (
    "fmt"

    "github.com/odvcencio/gotreesitter"
    "github.com/odvcencio/gotreesitter/grammars"
)

func main() {
    src := []byte(`package main

func main() {}
`)

    lang := grammars.GoLanguage()
    parser := gotreesitter.NewParser(lang)

    tree, _ := parser.Parse(src)
    fmt.Println(tree.RootNode())
}

grammars.DetectLanguage("main.go") resolves a filename to the appropriate LangEntry.

Queries
q, _ := gotreesitter.NewQuery(`(function_declaration name: (identifier) @fn)`, lang)
cursor := q.Exec(tree.RootNode(), lang, src)

for {
    match, ok := cursor.NextMatch()
    if !ok {
        break
    }
    for _, cap := range match.Captures {
        fmt.Println(cap.Node.Text(src))
    }
}

The query engine supports the full S-expression pattern language: structural quantifiers (?, *, +), alternation ([...]), field constraints, negated fields, anchor (!), and all standard predicates. See Query API.

Typed query codegen

Generate type-safe Go wrappers from .scm query files:

go run ./cmd/tsquery -input queries/go_functions.scm -lang go -output go_functions_query.go -package queries

Given a query like (function_declaration name: (identifier) @name body: (block) @body), tsquery generates:

type FunctionDeclarationMatch struct {
    Name *gotreesitter.Node
    Body *gotreesitter.Node
}

q, _ := queries.NewGoFunctionsQuery(lang)
cursor := q.Exec(tree.RootNode(), lang, src)
for {
    match, ok := cursor.Next()
    if !ok { break }
    fmt.Println(match.Name.Text(src))
}

Multi-pattern queries generate one struct per pattern with MatchPatternN conversion helpers.

Multi-language documents (injection parsing)

Parse documents with embedded languages (HTML+JS+CSS, Markdown+code fences, Vue/Svelte templates):

ip := gotreesitter.NewInjectionParser()
ip.RegisterLanguage("html", htmlLang)
ip.RegisterLanguage("javascript", jsLang)
ip.RegisterLanguage("css", cssLang)
ip.RegisterInjectionQuery("html", injectionQuery)

result, _ := ip.Parse(source, "html")

for _, inj := range result.Injections {
    fmt.Printf("%s: %d ranges\n", inj.Language, len(inj.Ranges))
    // inj.Tree is the child language's parse tree
}

Supports static (#set! injection.language "javascript") and dynamic (@injection.language capture) language detection, recursive nested injections, and incremental reparse with child tree reuse.

Source rewriting

Collect source-level edits and apply atomically, producing InputEdit records for incremental reparse:

rw := gotreesitter.NewRewriter(src)
rw.Replace(funcNameNode, []byte("newName"))
rw.InsertBefore(bodyNode, []byte("// added\n"))
rw.Delete(unusedNode)

newSrc, _ := rw.ApplyToTree(tree)
newTree, _ := parser.ParseIncremental(newSrc, tree)

Apply() returns both the new source bytes and the []InputEdit records. ApplyToTree() is a convenience that calls tree.Edit() for each edit and returns source ready for ParseIncremental.

Incremental reparsing
tree, _ := parser.Parse(src)

// User types "x" at byte offset 42
src = append(src[:42], append([]byte("x"), src[42:]...)...)

tree.Edit(gotreesitter.InputEdit{
    StartByte:   42,
    OldEndByte:  42,
    NewEndByte:  43,
    StartPoint:  gotreesitter.Point{Row: 3, Column: 10},
    OldEndPoint: gotreesitter.Point{Row: 3, Column: 10},
    NewEndPoint: gotreesitter.Point{Row: 3, Column: 11},
})

tree2, _ := parser.ParseIncremental(src, tree)

ParseIncremental walks the old tree's spine, identifies the edit region, and reuses unchanged subtrees by reference. Only the invalidated span is re-lexed and re-parsed. Both leaf and non-leaf subtrees are eligible for reuse; non-leaf reuse is driven by pre-goto state tracking on interior nodes, so the parser can skip entire subtrees without re-deriving their contents.

When no edit has occurred, ParseIncremental detects the nil-edit on a pointer check and returns in single-digit nanoseconds with zero allocations.

UTF-16 input and editor coordinates

UTF-16 callers can parse Go-native code units or endian-specific byte buffers without converting offsets by hand. The parser core keeps its canonical UTF-8 view internally, while the returned tree retains the original UTF-16 source and maps nodes, edits, included ranges, query filters, highlights, tags, and injections back to UTF-16 code-unit coordinates.

src := utf16.Encode([]rune("1+2"))

parser := gotreesitter.NewParser(lang)
tree, _ := parser.ParseUTF16(src)

rng, _ := tree.UTF16RangeForNode(tree.RootNode())
fmt.Println(rng.StartCodeUnit, rng.EndCodeUnit)

node := tree.DescendantForUTF16Range(0, uint32(len(src)))
_ = node

// Incremental edits can be described in UTF-16 code units.
next := utf16.Encode([]rune("1+3"))
tree.EditUTF16(gotreesitter.UTF16Edit{
    StartCodeUnit:  2,
    OldEndCodeUnit: 3,
    NewEndCodeUnit: 3,
}, next)
tree2, _ := parser.ParseIncrementalUTF16(next, tree)
_ = tree2

UTF-16 byte input is explicit about byte order:

tree, _ := parser.ParseUTF16Bytes(buf, gotreesitter.UTF16LittleEndian)

Editor-facing APIs have UTF-16 variants:

q, _ := gotreesitter.NewQuery(`(NUMBER) @number`, lang)
cursor := q.Exec(tree.RootNode(), lang, tree.Source())
cursor.SetUTF16Range(tree, 2, 3)

hl, _ := gotreesitter.NewHighlighter(lang, `(NUMBER) @number`)
highlightRanges := hl.HighlightUTF16(src)

tagger, _ := gotreesitter.NewTagger(lang, `(NUMBER) @name @definition.number`)
tags := tagger.TagUTF16(src)

Node byte APIs such as DescendantForByteRange still use the tree's canonical UTF-8 byte offsets. Use DescendantForUTF16Range or convert with UTF8ByteForUTF16Offset when starting from editor UTF-16 offsets.

Tree cursor

TreeCursor maintains an explicit (node, childIndex) frame stack. Parent, child, and sibling movement are O(1) with zero allocations — sibling traversal indexes directly into the parent's children[] slice.

c := gotreesitter.NewTreeCursorFromTree(tree)

c.GotoFirstChild()
c.GotoChildByFieldName("body")

for ok := c.GotoFirstNamedChild(); ok; ok = c.GotoNextNamedSibling() {
    fmt.Printf("%s at %d\n", c.CurrentNodeType(), c.CurrentNode().StartByte())
}

idx := c.GotoFirstChildForByte(128)

Movement methods: GotoFirstChild, GotoLastChild, GotoNextSibling, GotoPrevSibling, GotoParent, named-only variants (GotoFirstNamedChild, etc.), field-based (GotoChildByFieldName, GotoChildByFieldID), and position-based (GotoFirstChildForByte, GotoFirstChildForPoint).

Cursors hold direct pointers into tree nodes. Recreate after Tree.Release(), Tree.Edit(...), or incremental reparse.

Highlighting
hl, _ := gotreesitter.NewHighlighter(lang, highlightQuery)
ranges := hl.Highlight(src)

for _, r := range ranges {
    fmt.Printf("%s: %q\n", r.Capture, src[r.StartByte:r.EndByte])
}
Tagging
entry := grammars.DetectLanguage("main.go")
lang := entry.Language()

tagger, _ := gotreesitter.NewTagger(lang, entry.TagsQuery)
tags := tagger.Tag(src)

for _, tag := range tags {
    fmt.Printf("%s %s at %d:%d\n", tag.Kind, tag.Name,
        tag.NameRange.StartPoint.Row, tag.NameRange.StartPoint.Column)
}

Benchmarks

All measurements below use the same workload: a generated Go source file with 500 functions (19294 bytes). Numbers are medians from 10 runs on:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) Ultra 9 285
Runtime Full parse Incremental (1-byte edit) Incremental (no edit)
Native C (pure C runtime) 1.76 ms 102.3 μs 101.7 μs
CGo binding (C runtime via cgo) ~2.0 ms ~130 μs
gotreesitter (pure Go) 1.54 ms 649 ns 2.43 ns

On this workload:

  • Full parse is faster than both listed C baselines: ~1.15x faster than native C and ~1.29x faster than the CGo binding.
  • Incremental single-byte edits are ~158x faster than native C (~200x faster than CGo).
  • No-edit reparses are ~41,800x faster than native C, zero allocations.
Raw benchmark output
# Pure Go (this repo):
GOMAXPROCS=1 go test . -run '^$' \
  -bench 'BenchmarkGoParseFullDFA|BenchmarkGoParseIncrementalSingleByteEditDFA|BenchmarkGoParseIncrementalNoEditDFA' \
  -benchmem -count=10 -benchtime=750ms

# CGo binding benchmarks:
cd cgo_harness
GOMAXPROCS=1 go test . -run '^$' -tags treesitter_c_bench \
  -bench 'BenchmarkCTreeSitterGoParseFull|BenchmarkCTreeSitterGoParseIncrementalSingleByteEdit|BenchmarkCTreeSitterGoParseIncrementalNoEdit' \
  -benchmem -count=10 -benchtime=750ms

# Native C benchmarks (no Go, direct C binary):
./pure_c/run_go_benchmark.sh 500 2000 20000
Benchmark Median ns/op B/op allocs/op
Native C full parse 1,764,436
Native C incremental (1-byte edit) 102,336
Native C incremental (no edit) 101,740
CTreeSitterGoParseFull ~1,990,000 600 6
CTreeSitterGoParseIncrementalSingleByteEdit ~130,000 648 7
GoParseFullDFA 1,538,089 728 7
GoParseIncrementalSingleByteEditDFA 648.9 176 3
GoParseIncrementalNoEditDFA 2.432 0 0
Benchmark matrix

For repeatable multi-workload tracking:

go run ./cmd/benchmatrix --count 10

Emits bench_out/matrix.json (machine-readable), bench_out/matrix.md (summary), and raw logs under bench_out/raw/. The default matrix includes a bounded, warmed language-family full-parse group, reported with MB/s so parser throughput can be compared across generated source sizes. Use --only-family to isolate that group, --family-unit-count to scale it, or --no-family for the narrower Go/editor matrix.

Supported languages

206 grammars ship in the registry. All 206 produce error-free parse trees on smoke samples. Run go run ./cmd/parity_report for current status.

  • 116 external scanners (hand-written Go implementations of upstream C scanners)
  • 7 hand-written Go token sources (authzed, c, cpp, go, java, json, lua)
  • Remaining languages use the DFA lexer generated from grammar tables
Parse quality

Each LangEntry carries a Quality field:

Quality Meaning
full All scanner and lexer components present. Parser has full access to the grammar.
partial Missing external scanner. DFA lexer handles what it can; external tokens are skipped.
none Cannot parse.

full means the parser has every component the grammar requires. It does not guarantee error-free trees on all inputs — grammars with high GLR ambiguity may produce syntax errors on very large or deeply nested constructs due to parser safety limits (iteration cap, stack depth cap, node count cap). These limits scale with input size. Check tree.RootNode().HasError() at runtime.

Full language list (206)

ada, agda, angular, apex, arduino, asm, astro, authzed, awk, bash, bass, beancount, bibtex, bicep, bitbake, blade, brightscript, c, c_sharp, caddy, cairo, capnp, chatito, circom, clojure, cmake, cobol, comment, commonlisp, cooklang, corn, cpon, cpp, crystal, css, csv, cuda, cue, cylc, d, dart, desktop, devicetree, dhall, diff, disassembly, djot, dockerfile, dot, doxygen, dtd, earthfile, ebnf, editorconfig, eds, eex, elisp, elixir, elm, elsa, embedded_template, enforce, erlang, facility, faust, fennel, fidl, firrtl, fish, foam, forth, fortran, fsharp, gdscript, git_config, git_rebase, gitattributes, gitcommit, gitignore, gleam, glsl, gn, go, godot_resource, gomod, graphql, groovy, hack, hare, haskell, haxe, hcl, heex, hlsl, html, http, hurl, hyprlang, ini, janet, java, javascript, jinja2, jq, jsdoc, json, json5, jsonnet, julia, just, kconfig, kdl, kotlin, ledger, less, linkerscript, liquid, llvm, lua, luau, make, markdown, markdown_inline, matlab, mermaid, meson, mojo, move, nginx, nickel, nim, ninja, nix, norg, nushell, objc, ocaml, odin, org, pascal, pem, perl, php, pkl, powershell, prisma, prolog, promql, properties, proto, pug, puppet, purescript, python, ql, r, racket, regex, rego, requirements, rescript, robot, ron, rst, ruby, rust, scala, scheme, scss, smithy, solidity, sparql, sql, squirrel, ssh_config, starlark, svelte, swift, tablegen, tcl, teal, templ, textproto, thrift, tlaplus, tmux, todotxt, toml, tsx, turtle, twig, typescript, typst, uxntal, v, verilog, vhdl, vimdoc, vue, wat, wgsl, wolfram, xml, yaml, yuck, zig

Query API

Feature Status
Compile + execute (NewQuery, Execute, ExecuteNode) supported
Cursor streaming (Exec, NextMatch, NextCapture) supported
Structural quantifiers (?, *, +) supported
Alternation ([...]) supported
Field matching (name: (identifier)) supported
#eq? / #not-eq? supported
#match? / #not-match? supported
#any-of? / #not-any-of? supported
#lua-match? supported
#has-ancestor? / #not-has-ancestor? supported
#has-parent? / #not-has-parent? supported
#is? / #is-not? supported
#any-eq? / #any-not-eq? supported
#any-match? / #any-not-match? supported
#select-adjacent! supported
#strip! supported
#set! / #offset! directives parsed and accepted
SetValues (read #set! metadata from matches) supported

All shipped highlight and tags queries compile (156/156 highlight, 69/69 tags).

Known limitations

  • Full-parse throughput: the 500-function Go benchmark is now faster than the listed C baselines, but full-parse throughput still varies by grammar and corpus shape. Highly ambiguous languages and very large generated files remain the main parity/performance frontier.
  • GLR safety caps: The parser enforces iteration, stack depth, and node count limits proportional to input size. These prevent pathological blowup on grammars with high ambiguity but impose a ceiling on the maximum input complexity that parses without error. The caps are tunable but not removable without risking unbounded resource consumption.

Adding a language

  1. Add the grammar repo to grammars/languages.manifest
  2. Refresh pinned refs in grammars/languages.lock: go run ./cmd/grammar_updater -lock grammars/languages.lock -write -report grammars/grammar_updates.json
  3. Generate tables: go run ./cmd/ts2go -manifest grammars/languages.manifest -outdir ./grammars -package grammars -compact=true
  4. Add smoke samples to cmd/parity_report/main.go and grammars/parse_support_test.go
  5. Verify: go run ./cmd/parity_report && go test ./grammars/...

Grammar lock updates

  • grammars/languages.lock stores pinned refs for grammar update + parity automation.
  • cmd/grammar_updater refreshes refs and emits a machine-readable report.
  • .github/workflows/grammar-lock-update.yml opens scheduled/dispatch update PRs.
  • Hand-written scanner ports can also declare ExternalScannerSpec metadata with upstream source hashes and external-token names. When a grammar update changes src/scanner.c or the external-token list, treat it as scanner work: update the Go scanner binding/port before replacing generated blobs. Grammar JSON-only changes with unchanged externals can usually follow the normal grammar.json -> grammargen Go DSL -> blob -> parity path.

Manual refresh:

go run ./cmd/grammar_updater \
  -lock grammars/languages.lock \
  -allow-list grammars/update_tier1_core100.txt \
  -max-updates 10 \
  -write \
  -report grammars/grammar_updates.json

Architecture

gotreesitter is a ground-up reimplementation of the tree-sitter runtime in Go. No code is shared with or translated from the C implementation.

Parser — Table-driven LR(1) with GLR fallback. When a (state, symbol) pair maps to multiple actions in the parse table, the parser forks the stack and explores all alternatives in parallel. Stack merging collapses equivalent paths. Safety limits (iteration count, stack depth, node count) scale with input size and prevent runaway exploration on ambiguous grammars.

Incremental engine — Walks the edit region of the previous tree and reuses unchanged subtrees by reference. Non-leaf subtree reuse is enabled by storing a pre-goto parser state on each interior node, allowing the parser to skip an entire subtree and resume in the correct state without re-deriving its contents. External scanner state is serialized on each node boundary so scanner-dependent subtrees can be reused without replaying the scanner from the start.

Lexer — Two paths. A DFA lexer is generated from the grammar's lex tables by ts2go and handles the majority of languages. For grammars where the DFA is insufficient (e.g., Go's automatic semicolons, YAML's indentation-sensitive structure), hand-written Go token sources implement the TokenSource interface directly.

External scanners — 116 grammars require external scanners for context-sensitive tokens (Python indentation, HTML implicit close tags, Rust raw string delimiters, Swift operator disambiguation, etc.). Each scanner is a hand-written Go implementation of the grammar's ExternalScanner interface: Create, Serialize, Deserialize, Scan. Scanner state is snapshotted after every token and stored on tree nodes so incremental reuse can restore scanner state on skip.

Arena allocator — Nodes are allocated from slab-based arenas to reduce GC pressure. Arenas are released in bulk when a tree is freed.

Query engine — S-expression pattern compiler with predicate evaluation and streaming cursor iteration. Supports all standard tree-sitter predicates (#eq?, #match?, #any-of?, #has-ancestor?, etc.) and directive annotations (#set!, #offset!, #select-adjacent!, #strip!).

Injection parser — Orchestrates multi-language parsing. Runs injection queries against a parent tree to find embedded regions, spawns child parsers with SetIncludedRanges(), and recurses for nested injections. Incremental reparse reuses unchanged child trees.

Rewriter — Collects source-level edits (replace, insert, delete) targeting byte ranges, applies them atomically, and produces InputEdit records for incremental reparse. Edits are validated for non-overlap and applied in a single pass.

Grammar loadingts2go extracts parse tables, lex tables, field maps, symbol metadata, and external token lists from upstream parser.c files. These are serialized to compressed binary blobs under grammars/grammar_blobs/ and lazy-loaded via loadEmbeddedLanguage() with an LRU cache. String and transition interning reduce memory footprint across loaded grammars. Grammargen-backed blobs use the same CLI surface; for example, the Go blob can be regenerated with go run ./cmd/grammargen -lr-split -bin grammars/grammar_blobs/go.bin go.

Build tags and environment

External grammar blobs (avoid embedding in the binary):

go build -tags grammar_blobs_external
GOTREESITTER_GRAMMAR_BLOB_DIR=/path/to/blobs  # required
GOTREESITTER_GRAMMAR_BLOB_MMAP=false           # disable mmap (Unix only)

Curated language set (smaller binary):

go build -tags grammar_set_core  # curated Core100 embedded grammar set
GOTREESITTER_GRAMMAR_SET=go,json,python  # runtime restriction

Selective embedded grammars (smallest self-contained binary — pick exactly the languages you ship):

# Embeds ONLY go.bin + java.bin into the binary (everything else is dropped at
# link time). No GOTREESITTER_GRAMMAR_BLOB_DIR needed — still a single static binary.
go build -tags 'grammar_subset grammar_subset_go grammar_subset_java'

Add one grammar_subset_<lang> tag per grammar you need (names match the blob file: grammar_subset_c_sharp, grammar_subset_python, …). A single-language build drops from ~24MB to a few MB. This is finer-grained than grammar_set_core (a fixed set) and, unlike grammar_blobs_external, keeps the blobs embedded. Pairing grammar_subset with grammar_blobs_external instead loads the selected blobs from GOTREESITTER_GRAMMAR_BLOB_DIR at runtime (no embedded blobs at all).

The four embedding modes are mutually exclusive at the build-tag level: default (all embedded) · grammar_set_core (Core100 embedded) · grammar_subset + grammar_subset_<lang> (selected embedded) · grammar_blobs_external (none embedded). Regenerate the per-language embed files after adding a grammar with go run ./cmd/gen_subset_blob_embeds.

Grammar cache tuning (long-lived processes):

grammars.SetEmbeddedLanguageCacheLimit(8)    // LRU cap
grammars.UnloadEmbeddedLanguage("rust.bin")  // drop one
grammars.PurgeEmbeddedLanguageCache()        // drop all
GOTREESITTER_GRAMMAR_CACHE_LIMIT=8       # LRU cap via env
GOTREESITTER_GRAMMAR_IDLE_TTL=5m         # evict after idle
GOTREESITTER_GRAMMAR_IDLE_SWEEP=30s      # sweep interval
GOTREESITTER_GRAMMAR_COMPACT=true        # loader compaction (default)
GOTREESITTER_GRAMMAR_STRING_INTERN_LIMIT=200000
GOTREESITTER_GRAMMAR_TRANSITION_INTERN_LIMIT=20000

GLR stack cap override:

GOT_GLR_MAX_STACKS=8  # overrides default GLR stack cap (default: 8)

Default is tuned for correctness. Increase only if a grammar/workload needs more GLR alternatives to preserve parity.

Legacy benchmark compatibility only:

GOT_PARSE_NODE_LIMIT_SCALE=3

GOT_PARSE_NODE_LIMIT_SCALE is only needed for comparisons against older truncation-prone benchmark baselines. On current branches, keep it unset.

Testing

bash cgo_harness/docker/run_single_grammar_parity.sh typescript

For local correctness/parity work, prefer isolated one-language Docker runs:

# Real-corpus parity for one grammar
bash cgo_harness/docker/run_single_grammar_parity.sh typescript

# Focused grammargen real-corpus lane for one language
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode real-corpus --langs typescript

# Focused grammargen-vs-C lane for one language
bash cgo_harness/docker/run_grammargen_focus_targets.sh --mode cgo --langs typescript

run_grammargen_focus_targets.sh is the safest local lane for high-value grammars: it runs one grammar per container and defaults to a single-worker profile (--cpus 1, --pids 512, GOMAXPROCS=1, GOFLAGS=-p=1).

For Fortran, both real-corpus runners also default to a tighter bounded local preset unless you explicitly override it or pass --unsafe-fortran-defaults: --memory 3g, --cpus 1, --pids 512, GOMAXPROCS=1, GOFLAGS=-p=1, GOT_LALR_LR0_CORE_BUDGET=160000000, and GTS_GRAMMARGEN_REAL_CORPUS_GENERATE_TIMEOUT=15m.

If you only need a fast package-local regression check, keep it in Docker and narrow the -run regex:

bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammargen -run '^TestTypeScriptConditionalTypeParity$' -count=1"

Avoid go test ./... and host-side multi-language or race sweeps on developer machines while chasing OOMs. Use CI or a dedicated container when broader race coverage is required.

Other focused correctness/parity commands:

# Top-50 smoke correctness for the grammars package only
bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammars -run '^TestTop50(ParseSmokeNoErrors|CorrectnessListMatchesLockFile)$' -count=1 -v"

# Top-50 grammargen import/parity registry coverage
bash cgo_harness/docker/run_parity_in_docker.sh \
  -- "cd /workspace && go test ./grammargen -run '^TestTop50GrammarImportParityCoverage$' -count=1 -v"

# C-oracle parity suites inside the cgo harness
bash cgo_harness/docker/run_parity_in_docker.sh \
  --run '^TestParityFreshParse$|^TestParityHasNoErrors$|^TestParityIssue3Repros$|^TestParityGLRCanaryGo$'
bash cgo_harness/docker/run_parity_in_docker.sh \
  --run '^TestParityCorpusFreshParse$'

CI may still run broader race coverage on hosted runners. Do not copy those commands onto a developer host during OOM diagnosis.

Test suite covers: smoke tests (206 grammars), golden S-expression snapshots, highlight query validation, query pattern matching, incremental reparse correctness, error recovery, GLR fork/merge, injection parsing, source rewriting, and fuzz targets.

Roadmap

v0.19.x — GLR materialization, query parity, and parser hot-path release. Compact/lazy final child refs now survive parser result assembly and public tree operations, so queries, cursors, edits, descendant lookup, and traversal can avoid broad eager materialization. Nested repeated query patterns now preserve tree-sitter-compatible match rows, including downstream Kotlin Orion queries such as source_file -> import_list -> import_header. The release also adds reduce-chain hints, GLR/action/result timing attribution, parse-gap reporting, and full-parse scratch tuning while restoring compatibility shapes for Go, JavaScript/TypeScript, Python, Rust, C, and Java edge cases.

v0.18.x — Cold dependency extraction and parser materialization diagnostics release. Adds language-neutral import extraction APIs, source-vs-tree import parity fixtures, cgo_harness/cmd/import_replay, Python materialization benchmarks, and parser runtime attribution for arena usage, checkpoint storage, reduction/transient storage, final tree materialization, normalization timing, and GLR collapse behavior. Hybrid source extraction now gives downstream dependency scanners a fast path with structured fallback reporting.

v0.17.x — Java corpus parity and parser-performance release. Java now has bounded Docker corpus lanes for Apache Lucene, including largest-file, random, timeout-sweep, cgo comparison, no-tree diagnostic, UAX generated-file, ambiguity, materialization, traversal, and query/API-shape runs. Targeted Java lexer and GLR fixes close the correctness/timeout cliff for the sampled Lucene corpora, while deferred parent-link wiring and parser scratch reuse move Java full parses much closer to cgo. The release also expands grammargen top-50 parity coverage and fixes Bash, Python, Swift, comment, gomod, ini, CPON, D, PowerShell, Julia, and Java parity gaps.

v0.16.x — Grammar extensibility and parser-resilience release. Adds native UTF-16 parser/editor APIs, grammargen DSL constructors and extension smoke coverage for Kotlin, Swift, JavaScript, TypeScript/TSX, and Fortran, and grammar-update guardrails that block scanner-facing lock refreshes until regenerated artifacts and focused parity are handled. C# pathological recovery is bounded, TypeScript and Fortran grammargen parity advanced, Python f-string scanner checkpoints preserve interpolated-string state, and parser-result compatibility shims are isolated behind an explicit strut registry with language-owned helper files.

v0.15.x — Large-repo consumer safety and parser-maintenance release. ParsePolicy.ShouldSkipDir lets gateway callers prune generated/vendor directories before descent, the GLR node-equivalence cache is smaller and checks epoch first for L2-friendly lookups, Tree.Edit avoids scanning unchanged right-side siblings when there is no tail shift, and parser-result compatibility normalization now keeps language-specific call sequences beside the relevant parser_result_*.go helpers. The v0.15.1 patch also hardens arena release/GC behavior, releases retry loser arenas promptly, and fixes query predicate backtracking for nested Starlark dictionary matches. v0.15.2 folds the drifting main and release lines back together, adds a Swift ABI mangling grammar, and ships grammar_updater pin verification and manifest-only sync flags. v0.15.3 caps JavaScript/TypeScript full-parse merge survivors, tunes markdown retry and node budgets, tolerates external-scanner symbol-list drift, and adds a scoped Canopy harness runner for bounded repo analysis. This line carries the post-0.14 tier-1 grammar refreshes and reserved-word import fixes.

v0.14.x — Go grammar now shipped as a grammargen-compiled blob (our own pure-Go LR(1) state-splitting compiler), eliminating a dead-end state inherited from tree-sitter-go that wrapped several valid Go files in ERROR. Combined with arena retention/initial-sizing fixes, retry-lifecycle cleanup, and a GLR cap update keyed to the new grammar's conflict profile, warm-reuse heap allocation across a six-file self-parse benchmark dropped ~54% (498 → 229 MB/iter); cold-case dropped ~61%.

v0.12.x — 206 grammars (all OK), 116 external scanners, pure-Go runtime plus grammargen, ABI 15 support including reserved-word sets, GLR parser, incremental reparsing with external scanner checkpoints, query engine, tree cursor, highlighting, tagging, injection parser, typed query codegen, CST rewriter, parser pool, arena memory budgets, and structural parity against 100+ curated C reference grammars.

Next:

  • Retire parser-result struts by moving C#, Rust, Scala, TypeScript, and Python recovery into runtime or grammar generation paths
  • Grammar refresh automation that moves from lock-only PRs to regenerated artifacts and focused parity for allow-listed grammargen-backed languages
  • Table-size and codegen compaction work for Unicode-heavy grammars

Release history and retroactive notes are tracked in CHANGELOG.md.

License

MIT

Documentation

Overview

Package gotreesitter implements a pure Go tree-sitter runtime.

This file defines the core data structures that mirror tree-sitter's TSLanguage C struct and related types. They form the foundation on which the lexer, parser, query engine, and syntax tree are built.

Index

Constants

View Source
const (
	// RuntimeLanguageVersion is the maximum tree-sitter language version this
	// runtime is known to support.
	RuntimeLanguageVersion uint32 = 15
	// MinCompatibleLanguageVersion is the minimum accepted language version.
	MinCompatibleLanguageVersion uint32 = 13
)

Variables

View Source
var (
	// ErrInvalidUTF16ByteLength is returned when a UTF-16 byte source has a
	// dangling trailing byte.
	ErrInvalidUTF16ByteLength = errors.New("utf16: byte source length must be even")

	// ErrInvalidUTF16ByteOrder is returned for an unknown UTF-16ByteOrder.
	ErrInvalidUTF16ByteOrder = errors.New("utf16: invalid byte order")

	// ErrInvalidUTF16Range is returned when a UTF-16 range does not align to
	// valid code-point boundaries or has an inverted span.
	ErrInvalidUTF16Range = errors.New("utf16: invalid range")
)
View Source
var DebugDFA atomic.Bool

DebugDFA enables trace logging for DFA token production.

Use `DebugDFA.Store(true/false)` to toggle at runtime.

View Source
var ErrNoLanguage = errors.New("parser has no language configured")

ErrNoLanguage is returned when a Parser has no language configured.

View Source
var ErrNoTokenSource = errors.New("parser has no token source")

ErrNoTokenSource is returned when a token-source parse is called without a token source.

View Source
var ErrNoTokenSourceFactory = errors.New("parser has no token source factory")

ErrNoTokenSourceFactory is returned when a factory-based parse is called without a token source factory.

Functions

func DecodeUTF16Bytes added in v0.16.0

func DecodeUTF16Bytes(source []byte, order UTF16ByteOrder) ([]uint16, error)

DecodeUTF16Bytes decodes an endian-specific UTF-16 byte source into Go UTF-16 code units.

func DrainArenaPools added in v0.14.0

func DrainArenaPools()

DrainArenaPools releases all cached arenas from both incremental and full-parse pools. Arenas held in the pool are strong Go references and are not collected by the GC until explicitly drained or the process exits.

Call this after a large batch scan (e.g. after WalkAndParse returns) to allow the GC to reclaim the arena memory. The next parse will allocate a fresh arena.

func EnableArenaBreakdown added in v0.18.0

func EnableArenaBreakdown(enabled bool)

EnableArenaBreakdown toggles detailed arena accounting for subsequently acquired arenas. It is intended for diagnostics and benchmark attribution; normal parser paths leave it disabled to avoid perturbing hot allocation paths.

func EnableArenaProfile added in v0.6.0

func EnableArenaProfile(enabled bool)

EnableArenaProfile toggles arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func EnableGLREquivAudit added in v0.19.0

func EnableGLREquivAudit(enabled bool)

EnableGLREquivAudit toggles lightweight GLR equivalence attribution. This is intended for parser gap diagnostics and avoids the heavier survivor maps used by EnableRuntimeAudit.

func EnableRuntimeAudit added in v0.7.0

func EnableRuntimeAudit(enabled bool)

EnableRuntimeAudit toggles per-parse survivor instrumentation. This debug hook is intended for single-threaded benchmark/profiling runs.

func RegisterHighlighterInjection added in v0.7.0

func RegisterHighlighterInjection(parentLanguage string, spec HighlighterInjectionSpec)

RegisterHighlighterInjection registers nested-highlighting configuration for a parent language name (for example "markdown").

func RepairNoLookaheadLexModes added in v0.9.0

func RepairNoLookaheadLexModes(lang *Language)

RepairNoLookaheadLexModes marks parser states as no-lookahead when they only need EOF-triggered reductions plus external/trivia handling. Tree-sitter's C runtime uses these states to reduce before lexing the next real token.

func ResetArenaProfile added in v0.6.0

func ResetArenaProfile()

ResetArenaProfile resets arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

func ResetParseEnvConfigCacheForTests added in v0.7.0

func ResetParseEnvConfigCacheForTests()

ResetParseEnvConfigCacheForTests clears memoized parser env config.

Tests in this repo mutate env vars between cases; this helper ensures subsequent parses observe the new values in the same process.

func ResetPerfCounters added in v0.6.0

func ResetPerfCounters()

func RunExternalScanner

func RunExternalScanner(lang *Language, payload any, lexer *ExternalLexer, validSymbols []bool) bool

RunExternalScanner invokes the language's external scanner if present. Returns true if the scanner produced a token, false otherwise.

func SetGLRForestEnabled added in v0.20.0

func SetGLRForestEnabled(on bool)

SetGLRForestEnabled toggles the GSS-forest path at runtime (tests/benchmarks).

func SetInternLeavesObserveEnabled added in v0.20.0

func SetInternLeavesObserveEnabled(on bool)

SetInternLeavesObserveEnabled toggles leaf-interning observation at runtime. Tests and benches that want to A/B observation without re-running the test binary set this directly. Not safe to flip while a parse is in flight on another goroutine. Phase 2 scaffolding; the API may change before becoming public.

func SetInternLeavesSubstituteEnabled added in v0.20.0

func SetInternLeavesSubstituteEnabled(on bool)

SetInternLeavesSubstituteEnabled toggles canonical substitution at runtime. See internLeavesSubstituteEnabled.

func Walk

func Walk(node *Node, fn func(node *Node, depth int) WalkAction)

Walk performs a depth-first traversal of the syntax tree rooted at node. The callback receives each node and its depth (0 for the starting node). Return WalkSkipChildren to skip a node's children, or WalkStop to end early.

Types

type AmbiguityKey added in v0.17.0

type AmbiguityKey struct {
	State                          StateID
	Lookahead                      Symbol
	ActionCount                    uint8
	ShiftCount                     uint8
	ReduceCount                    uint8
	ReduceSymbol                   Symbol
	ChildCount                     uint8
	ProductionID                   uint16
	ReduceChainTerminalState       StateID
	ReduceChainTerminalActionClass uint8
}

AmbiguityKey identifies one parse-table ambiguity bucket.

type AmbiguityProfile added in v0.17.0

type AmbiguityProfile struct {
	// contains filtered or unexported fields
}

AmbiguityProfile aggregates parser states/lookaheads that contribute to GLR fanout. It is intended for diagnostics and benchmark runs, not normal API use.

func NewAmbiguityProfile added in v0.17.0

func NewAmbiguityProfile() *AmbiguityProfile

NewAmbiguityProfile creates an empty GLR ambiguity profile.

func (*AmbiguityProfile) Reset added in v0.17.0

func (p *AmbiguityProfile) Reset()

Reset clears all accumulated ambiguity counters.

func (*AmbiguityProfile) SnapshotReduceChainTotals added in v0.19.0

func (p *AmbiguityProfile) SnapshotReduceChainTotals() AmbiguityStat

SnapshotReduceChainTotals returns aggregate deterministic reduce-chain run counters across all profiled start states/lookaheads.

func (*AmbiguityProfile) SnapshotTop added in v0.17.0

func (p *AmbiguityProfile) SnapshotTop(limit int) []AmbiguityStat

SnapshotTop returns the highest-impact ambiguity buckets ordered by stack pressure, then hit count.

func (*AmbiguityProfile) SnapshotTopMergeStates added in v0.19.0

func (p *AmbiguityProfile) SnapshotTopMergeStates(limit int) []AmbiguityStat

SnapshotTopMergeStates returns parser states that most often participate in multi-stack merge passes. These rows are keyed by state only, because merge happens before the next lookahead dispatch.

func (*AmbiguityProfile) SnapshotTopReduceChainRuns added in v0.19.0

func (p *AmbiguityProfile) SnapshotTopReduceChainRuns(limit int) []AmbiguityStat

SnapshotTopReduceChainRuns returns the starting states/lookaheads that begin the most expensive deterministic reduce chains.

func (*AmbiguityProfile) SnapshotTopReduceChains added in v0.19.0

func (p *AmbiguityProfile) SnapshotTopReduceChains(limit int) []AmbiguityStat

SnapshotTopReduceChains returns the parser states/lookaheads that spent the most time in deterministic reduce-chain fusion.

type AmbiguityStat added in v0.17.0

type AmbiguityStat struct {
	State                          StateID
	Lookahead                      Symbol
	ActionCount                    uint8
	ShiftCount                     uint8
	ReduceCount                    uint8
	ReduceSymbol                   Symbol
	ChildCount                     uint8
	ProductionID                   uint16
	Actions                        []ParseAction
	Hits                           uint64
	Forks                          uint64
	MultiStackHits                 uint64
	StackInTotal                   uint64
	StackInMax                     int
	ReduceChainHits                uint64
	ReduceChainSteps               uint64
	ReduceChainMaxLen              int
	ReduceChainNanos               int64
	ReduceChainRuns                uint64
	ReduceChainClassHits           uint64
	ReduceChainStopNoAction        uint64
	ReduceChainStopMulti           uint64
	ReduceChainStopShift           uint64
	ReduceChainStopAccept          uint64
	ReduceChainStopDead            uint64
	ReduceChainStopCycle           uint64
	ReduceChainStopLimit           uint64
	ReduceChainTerminalState       StateID
	ReduceChainTerminalActionClass uint8
	ActionNanos                    int64
	ExtraShiftNanos                int64
	NoActionNanos                  int64
	ConflictChoiceNanos            int64
	ConflictForkNanos              int64
	SingleShiftNanos               int64
	SingleReduceNanos              int64
	SingleAcceptNanos              int64
	SingleRecoverNanos             int64
	SingleOtherNanos               int64
	MergeCalls                     uint64
	MergeStacksIn                  uint64
	MergeStacksOut                 uint64
	MergeStacksInMax               int
	MergeStacksOutMax              int
}

AmbiguityStat is a snapshot row from AmbiguityProfile.

type ArenaBreakdown added in v0.18.0

type ArenaBreakdown struct {
	NodeStructBytesAllocated        int64
	NoTreeNodeBytesAllocated        int64
	CompactFullLeafBytesAllocated   int64
	PendingParentBytesAllocated     int64
	PendingChildEntryBytesAllocated int64
	FinalChildSidecarBytesAllocated int64
	PendingChildEntriesAllocated    uint64
	PendingChildEntryCapacity       uint64
	PendingChildEntryWaste          uint64
	ChildSliceBytesAllocated        int64
	FieldIDBytesAllocated           int64
	FieldSourceBytesAllocated       int64
	MergeScratchBytesAllocated      int64

	ArenaNodesConstructed uint64
	// NodeLiveCount is arena allocation-slot usage, not root-reachable tree
	// liveness. It includes parser alternatives and recovery nodes allocated
	// during the parse.
	NodeLiveCount                     uint64
	NodeCapacityCount                 uint64
	NodeCapacityWaste                 uint64
	PrimaryNodeCapacity               uint64
	PrimaryNodeUsed                   uint64
	OverflowNodeCapacity              uint64
	OverflowNodeUsed                  uint64
	OverflowNodeSlabs                 uint64
	LargestNodeSlabUsedFraction       float64
	LeafNodesConstructed              uint64
	ParentNodesConstructed            uint64
	FieldedParentNodesConstructed     uint64
	UnfieldedParentNodesConstructed   uint64
	ParentConstructedChildLen0        uint64
	ParentConstructedChildLen1        uint64
	ParentConstructedChildLen2        uint64
	ParentConstructedChildLen3        uint64
	ParentConstructedChildLen4Plus    uint64
	ParentConstructedNoLinks          uint64
	ParentConstructedWithLinks        uint64
	ParentConstructedTrackErrors      uint64
	ParentConstructedFieldSources     uint64
	ParentReductionVisible            uint64
	ParentReductionInvisible          uint64
	ParentReductionVisibleFielded     uint64
	ParentReductionVisibleUnfielded   uint64
	ParentReductionInvisibleFielded   uint64
	ParentReductionInvisibleUnfielded uint64
	ParentReductionVisibleChildPtrs   uint64
	ParentReductionInvisibleChildPtrs uint64
	ParentReductionVisibleLen0        uint64
	ParentReductionVisibleLen1        uint64
	ParentReductionVisibleLen2        uint64
	ParentReductionVisibleLen3        uint64
	ParentReductionVisibleLen4Plus    uint64
	ParentReductionInvisibleLen0      uint64
	ParentReductionInvisibleLen1      uint64
	ParentReductionInvisibleLen2      uint64
	ParentReductionInvisibleLen3      uint64
	ParentReductionInvisibleLen4Plus  uint64
	ReduceChildSlicesFastGSS          uint64
	ReduceChildPointersFastGSS        uint64
	ReduceChildSlicesAllVisible       uint64
	ReduceChildPointersAllVisible     uint64
	ReduceChildSlicesNoAlias          uint64
	ReduceChildPointersNoAlias        uint64
	ReduceChildSlicesScratchGeneral   uint64
	ReduceChildPointersScratchGeneral uint64
	ReduceChildSlicesScratchNoAlias   uint64
	ReduceChildPointersScratchNoAlias uint64
	CollapseRawUnaryAttempts          uint64
	CollapseRawUnarySuccesses         uint64
	CollapseRawUnaryMissShape         uint64
	CollapseRawUnaryMissGrammar       uint64
	CollapseRawUnaryMissChild         uint64
	CollapseRawUnaryMissRule          uint64
	CollapseUnaryAttempts             uint64
	CollapseUnarySuccesses            uint64
	CollapseUnaryMissShape            uint64
	CollapseUnaryMissGrammar          uint64
	CollapseUnaryMissFielded          uint64
	CollapseUnaryMissChild            uint64
	CollapseUnaryMissRule             uint64
	CollapseRuleSameSymbol            uint64
	CollapseRuleInvisibleWrapper      uint64
	CollapseRuleNamedLeafAlias        uint64
	NoTreeReduceNodesConstructed      uint64
	NoTreeLeafNodesConstructed        uint64
	NoTreePlaceholderNodesConstructed uint64
	OtherNodesConstructed             uint64
	ExtraNodesConstructed             uint64
	ErrorSymbolNodesConstructed       uint64
	HasErrorNodesConstructed          uint64
	ChildSlicesConstructed            uint64
	ChildPointersConstructed          uint64
	ChildSlicesLen1                   uint64
	ChildSlicesLen2                   uint64
	ChildSlicesLen3                   uint64
	ChildSlicesLen4Plus               uint64
	ParentChildPointersConstructed    uint64
	ParentChildrenLen0                uint64
	ParentChildrenLen1                uint64
	ParentChildrenLen2                uint64
	ParentChildrenLen3                uint64
	ParentChildrenLen4Plus            uint64
	FieldIDElementsConstructed        uint64
	FieldSourceElementsConstructed    uint64
}

ArenaBreakdown captures optional arena/materialization attribution. It is populated only when EnableArenaBreakdown(true) is set before parsing.

type ArenaProfile added in v0.6.0

type ArenaProfile struct {
	IncrementalAcquire uint64
	IncrementalNew     uint64
	FullAcquire        uint64
	FullNew            uint64
}

ArenaProfile captures node arena allocation statistics. Enable with SetArenaProfileEnabled(true) and retrieve with GetArenaProfile().

func ArenaProfileSnapshot added in v0.6.0

func ArenaProfileSnapshot() ArenaProfile

ArenaProfileSnapshot returns current arena pool counters. This debug hook is not concurrency-safe and is intended for single-threaded benchmark/profiling runs.

type BoundTree

type BoundTree struct {
	// contains filtered or unexported fields
}

BoundTree pairs a Tree with its Language and source, eliminating the need to pass *Language and []byte to every node method call.

func Bind

func Bind(tree *Tree) *BoundTree

Bind creates a BoundTree from a Tree. The Tree must have been created with a Language (via NewTree or a Parser). Returns a BoundTree that delegates to the underlying Tree's Language and Source.

func (*BoundTree) ChildByField

func (bt *BoundTree) ChildByField(n *Node, fieldName string) *Node

ChildByField returns the first child assigned to the given field name.

func (*BoundTree) Language

func (bt *BoundTree) Language() *Language

Language returns the tree's language.

func (*BoundTree) NodeText

func (bt *BoundTree) NodeText(n *Node) string

NodeText returns the source text covered by the node.

func (*BoundTree) NodeType

func (bt *BoundTree) NodeType(n *Node) string

NodeType returns the node's type name, resolved via the bound language.

func (*BoundTree) Release

func (bt *BoundTree) Release()

Release releases the underlying tree's arena memory.

func (*BoundTree) RootNode

func (bt *BoundTree) RootNode() *Node

RootNode returns the tree's root node.

func (*BoundTree) Source

func (bt *BoundTree) Source() []byte

Source returns the tree's source bytes.

func (*BoundTree) TreeCursor added in v0.6.0

func (bt *BoundTree) TreeCursor() *TreeCursor

TreeCursor returns a new TreeCursor starting at the tree's root node.

type ByteSkippableTokenSource

type ByteSkippableTokenSource interface {
	TokenSource
	SkipToByte(offset uint32) Token
}

ByteSkippableTokenSource can jump to a byte offset and return the first token at or after that position.

type ExternalLexer

type ExternalLexer struct {
	// contains filtered or unexported fields
}

ExternalLexer is the scanner-facing lexer API used by external scanners. It mirrors the essential tree-sitter scanner API: lookahead, advance, mark_end, and result_symbol.

func (*ExternalLexer) Advance

func (l *ExternalLexer) Advance(skip bool)

Advance consumes one rune. When skip is true, consumed bytes are excluded from the token span (scanner whitespace skipping behavior).

func (*ExternalLexer) Column added in v0.6.0

func (l *ExternalLexer) Column() uint32

Column returns the current column (0-based) at the scanner cursor.

func (*ExternalLexer) GetColumn deprecated

func (l *ExternalLexer) GetColumn() uint32

GetColumn returns the current column (0-based) at the scanner cursor.

Deprecated: use Column.

func (*ExternalLexer) Lookahead

func (l *ExternalLexer) Lookahead() rune

Lookahead returns the current rune or 0 at EOF.

func (*ExternalLexer) MarkEnd

func (l *ExternalLexer) MarkEnd()

MarkEnd marks the current scanner position as the token end.

func (*ExternalLexer) SetResultSymbol

func (l *ExternalLexer) SetResultSymbol(sym Symbol)

SetResultSymbol sets the token symbol to emit when Scan returns true.

type ExternalScanner

type ExternalScanner interface {
	Create() any
	Destroy(payload any)
	Serialize(payload any, buf []byte) int
	Deserialize(payload any, buf []byte)
	Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool
}

ExternalScanner is the interface for language-specific external scanners. Languages like Python and JavaScript need these for indent tracking, template literals, regex vs division, etc.

The value returned by Create must be accepted by Destroy/Serialize/ Deserialize/Scan for that scanner implementation. Most scanners use a concrete payload pointer type and will panic on mismatched payload types.

func AdaptExternalScannerByExternalOrder added in v0.9.0

func AdaptExternalScannerByExternalOrder(sourceLang, targetLang *Language) (ExternalScanner, bool)

AdaptExternalScannerByExternalOrder builds an ExternalScanner adapter that reuses sourceLang's scanner for targetLang by remapping external symbols.

Mapping strategy:

  1. If either side has duplicate external names, use index mapping (capped to the shorter list length).
  2. Otherwise, prefer exact external-symbol-name matches.
  3. Fill remaining slots by index order (within the shorter dimension).

When source and target have different external symbol counts, name-based matching pairs tokens that exist in both grammars. Target externals with no source match get -1 (the scanner will never produce them). Source externals with no target match are silently ignored.

Returns (nil, false) when adaptation is not possible.

type ExternalScannerState

type ExternalScannerState struct {
	Data []byte
}

ExternalScannerState holds serialized state for an external scanner between incremental parse runs.

type ExternalSymbolResolver added in v0.9.0

type ExternalSymbolResolver struct {
	// contains filtered or unexported fields
}

ExternalSymbolResolver maps external token names to their concrete Symbol IDs in a specific Language. This allows external scanners to resolve symbol IDs at runtime rather than using hardcoded constants, making them compatible with any Language that defines the same external tokens (whether from ts2go extraction or grammargen).

func NewExternalSymbolResolver added in v0.9.0

func NewExternalSymbolResolver(lang *Language) *ExternalSymbolResolver

NewExternalSymbolResolver builds a resolver from a Language's external symbol definitions. Returns nil if the Language has no external symbols.

func (*ExternalSymbolResolver) ByIndex added in v0.9.0

func (r *ExternalSymbolResolver) ByIndex(idx int) (Symbol, bool)

ByIndex returns the Symbol ID for the given external token index (position in the grammar's externals array). Returns 0, false if the index is out of range.

func (*ExternalSymbolResolver) ByName added in v0.9.0

func (r *ExternalSymbolResolver) ByName(name string) (Symbol, bool)

ByName returns the Symbol ID for the given external token name. Returns 0, false if the name is not found.

func (*ExternalSymbolResolver) Count added in v0.9.0

func (r *ExternalSymbolResolver) Count() int

Count returns the number of external tokens.

type ExternalVMInstr

type ExternalVMInstr struct {
	Op  ExternalVMOp
	A   int32
	B   int32
	Alt int32
}

ExternalVMInstr is one instruction in an external scanner VM program.

Operands:

  • A: primary operand (opcode-specific)
  • B: secondary operand (used by range checks)
  • Alt: alternate program counter when a condition fails

func VMAdvance

func VMAdvance(skip bool) ExternalVMInstr

VMAdvance constructs an advance instruction. When skip is true, the advanced rune is skipped from the token text.

func VMEmit

func VMEmit(sym Symbol) ExternalVMInstr

VMEmit constructs an emit instruction for the given symbol.

func VMFail

func VMFail() ExternalVMInstr

VMFail constructs a fail instruction that terminates scan with no token.

func VMIfRuneClass

func VMIfRuneClass(class ExternalVMRuneClass, alt int) ExternalVMInstr

VMIfRuneClass constructs a rune-class branch with alternate target on miss.

func VMIfRuneEq

func VMIfRuneEq(r rune, alt int) ExternalVMInstr

VMIfRuneEq constructs a rune-equality branch with alternate target on miss.

func VMIfRuneInRange

func VMIfRuneInRange(start, end rune, alt int) ExternalVMInstr

VMIfRuneInRange constructs a rune-range branch with alternate target on miss.

func VMJump

func VMJump(target int) ExternalVMInstr

VMJump constructs an unconditional branch to the target instruction index.

func VMMarkEnd

func VMMarkEnd() ExternalVMInstr

VMMarkEnd constructs a mark-end instruction for the current token extent.

func VMRequireStateEq

func VMRequireStateEq(state uint32, alt int) ExternalVMInstr

VMRequireStateEq constructs a payload-state guard with alternate branch on miss.

func VMRequireValid

func VMRequireValid(validSymbolIndex, alt int) ExternalVMInstr

VMRequireValid constructs a valid-symbol guard with alternate branch on miss.

func VMSetState

func VMSetState(state uint32) ExternalVMInstr

VMSetState constructs a payload-state assignment instruction.

type ExternalVMOp

type ExternalVMOp uint8

ExternalVMOp is an opcode for the native-Go external scanner VM.

const (
	ExternalVMOpFail ExternalVMOp = iota
	ExternalVMOpJump
	ExternalVMOpRequireValid
	ExternalVMOpRequireStateEq
	ExternalVMOpSetState
	ExternalVMOpIfRuneEq
	ExternalVMOpIfRuneInRange
	ExternalVMOpIfRuneClass
	ExternalVMOpAdvance
	ExternalVMOpMarkEnd
	ExternalVMOpEmit
)

type ExternalVMProgram

type ExternalVMProgram struct {
	Code     []ExternalVMInstr
	MaxSteps int // <=0 uses a safe default based on program size
}

ExternalVMProgram is a small bytecode program interpreted by ExternalVMScanner.

type ExternalVMRuneClass

type ExternalVMRuneClass uint8

ExternalVMRuneClass is a character class used by ExternalVMOpIfRuneClass.

const (
	ExternalVMRuneClassWhitespace ExternalVMRuneClass = iota
	ExternalVMRuneClassDigit
	ExternalVMRuneClassLetter
	ExternalVMRuneClassWord
	ExternalVMRuneClassNewline
)

type ExternalVMScanner

type ExternalVMScanner struct {
	// contains filtered or unexported fields
}

ExternalVMScanner executes an ExternalVMProgram and implements ExternalScanner.

func MustNewExternalVMScanner

func MustNewExternalVMScanner(program ExternalVMProgram) *ExternalVMScanner

MustNewExternalVMScanner is like NewExternalVMScanner but panics on error. It is intended for package-level initialization where invalid programs are programmer errors.

func NewExternalVMScanner

func NewExternalVMScanner(program ExternalVMProgram) (*ExternalVMScanner, error)

NewExternalVMScanner validates and constructs an ExternalVMScanner.

func (*ExternalVMScanner) Create

func (s *ExternalVMScanner) Create() any

Create allocates scanner payload (currently a single uint32 state slot).

func (*ExternalVMScanner) Deserialize

func (s *ExternalVMScanner) Deserialize(payload any, buf []byte)

Deserialize restores payload state from buf.

func (*ExternalVMScanner) Destroy

func (s *ExternalVMScanner) Destroy(payload any)

Destroy releases scanner payload resources.

func (*ExternalVMScanner) Scan

func (s *ExternalVMScanner) Scan(payload any, lexer *ExternalLexer, validSymbols []bool) bool

Scan executes the scanner program against the current lexer position.

func (*ExternalVMScanner) Serialize

func (s *ExternalVMScanner) Serialize(payload any, buf []byte) int

Serialize writes payload state into buf.

type FieldID

type FieldID uint16

FieldID is a named field index.

type FieldMapEntry

type FieldMapEntry struct {
	FieldID    FieldID
	ChildIndex uint8
	Inherited  bool
}

FieldMapEntry maps a child index to a field name.

type HighlightRange

type HighlightRange struct {
	StartByte    uint32
	EndByte      uint32
	Capture      string // "keyword", "string", "function", etc.
	PatternIndex int    // query pattern index; later patterns override earlier for identical ranges
}

HighlightRange represents a styled range of source code, mapping a byte span to a capture name from a highlight query. The editor maps capture names (e.g., "keyword", "string", "function") to FSS style classes.

type Highlighter

type Highlighter struct {
	// contains filtered or unexported fields
}

Highlighter is a high-level API that takes source code and returns styled ranges. It combines a Parser, a compiled Query, and a Language to provide a single Highlight() call for the editor.

func NewHighlighter

func NewHighlighter(lang *Language, highlightQuery string, opts ...HighlighterOption) (*Highlighter, error)

NewHighlighter creates a Highlighter for the given language and highlight query (in tree-sitter .scm format). Returns an error if the query fails to compile.

func (*Highlighter) Highlight

func (h *Highlighter) Highlight(source []byte) []HighlightRange

Highlight parses the source code and executes the highlight query, returning a slice of HighlightRange sorted by StartByte. When ranges overlap, inner (more specific) captures take priority over outer ones.

func (*Highlighter) HighlightIncremental

func (h *Highlighter) HighlightIncremental(source []byte, oldTree *Tree) ([]HighlightRange, *Tree)

HighlightIncremental re-highlights source after edits were applied to oldTree. Returns the new highlight ranges and the new parse tree (for use in subsequent incremental calls). Call oldTree.Edit() before calling this.

func (*Highlighter) HighlightIncrementalUTF16 added in v0.16.0

func (h *Highlighter) HighlightIncrementalUTF16(source []uint16, oldTree *Tree) ([]UTF16HighlightRange, *Tree)

HighlightIncrementalUTF16 re-highlights UTF-16 source after edits were applied to oldTree with Tree.EditUTF16.

func (*Highlighter) HighlightIncrementalUTF16Bytes added in v0.16.0

func (h *Highlighter) HighlightIncrementalUTF16Bytes(source []byte, oldTree *Tree, order UTF16ByteOrder) ([]UTF16HighlightRange, *Tree, error)

HighlightIncrementalUTF16Bytes is like HighlightIncrementalUTF16 for endian-specific UTF-16 bytes.

func (*Highlighter) HighlightUTF16 added in v0.16.0

func (h *Highlighter) HighlightUTF16(source []uint16) []UTF16HighlightRange

HighlightUTF16 parses UTF-16 source and returns highlight ranges in UTF-16 code-unit coordinates.

func (*Highlighter) HighlightUTF16Bytes added in v0.16.0

func (h *Highlighter) HighlightUTF16Bytes(source []byte, order UTF16ByteOrder) ([]UTF16HighlightRange, error)

HighlightUTF16Bytes is like HighlightUTF16 for endian-specific UTF-16 bytes.

type HighlighterInjectionResolver added in v0.7.0

type HighlighterInjectionResolver func(languageHint string) (lang *Language, highlightQuery string, tokenSourceFactory func(source []byte) TokenSource, ok bool)

HighlighterInjectionResolver maps a language hint (for example "go" from a markdown code fence) to a child language and highlight query.

type HighlighterInjectionSpec added in v0.7.0

type HighlighterInjectionSpec struct {
	Query           string
	ResolveLanguage HighlighterInjectionResolver
}

HighlighterInjectionSpec configures nested highlighting for a parent language. Query must emit @injection.content and either @injection.language or #set! injection.language metadata.

type HighlighterOption

type HighlighterOption func(*Highlighter)

HighlighterOption configures a Highlighter.

func WithTokenSourceFactory

func WithTokenSourceFactory(factory func(source []byte) TokenSource) HighlighterOption

WithTokenSourceFactory sets a factory function that creates a TokenSource for each Highlight call. This is needed for languages that use a custom lexer bridge (like Go, which uses go/scanner instead of a DFA lexer).

When set, Highlight() calls ParseWithTokenSource instead of Parse.

type ImportExtractResult added in v0.18.0

type ImportExtractResult struct {
	Imports             []ImportRef
	Status              ImportExtractStatus
	Reason              string
	FallbackRecommended bool
}

ImportExtractResult is returned by source-only dependency extraction. When FallbackRecommended is true, callers that need exact tree-sitter behavior should parse the file and use ExtractImports.

func ExtractImportsFromSourceWithReport added in v0.18.0

func ExtractImportsFromSourceWithReport(lang *Language, source []byte) ImportExtractResult

ExtractImportsFromSourceWithReport returns source-only dependency declarations and a confidence report for fallback policy.

type ImportExtractStatus added in v0.18.0

type ImportExtractStatus string

ImportExtractStatus describes the confidence of source-only import extraction.

const (
	ImportExtractOK                   ImportExtractStatus = "ok"
	ImportExtractUnsupportedConstruct ImportExtractStatus = "unsupported_construct"
	ImportExtractScannerError         ImportExtractStatus = "scanner_error"
	ImportExtractAmbiguous            ImportExtractStatus = "ambiguous"
	ImportExtractFallbackToTree       ImportExtractStatus = "fallback_to_tree"
)

type ImportRef added in v0.18.0

type ImportRef struct {
	Lang      string
	Kind      string
	Path      string
	From      string
	Name      string
	Alias     string
	Static    bool
	Wildcard  bool
	Relative  int
	StartByte uint32
	EndByte   uint32
}

ImportRef is a compact language-neutral dependency declaration extracted from a syntax tree.

func ExtractImports added in v0.18.0

func ExtractImports(tree *Tree) []ImportRef

ExtractImports returns package/import declarations for the languages used by Gazelle-style dependency extraction. It is intentionally independent from the generic query engine so it can later be backed by compact parser refs.

func ExtractImportsFromSource added in v0.18.0

func ExtractImportsFromSource(lang *Language, source []byte) []ImportRef

ExtractImportsFromSource returns language-neutral dependency declarations directly from source text. It is intended for cold dependency-extraction workflows that do not need a public syntax tree.

type IncrementalParseProfile added in v0.6.0

type IncrementalParseProfile struct {
	ReuseCursorNanos                    int64
	ReparseNanos                        int64
	ReusedSubtrees                      uint64
	ReusedBytes                         uint64
	NewNodesAllocated                   uint64
	ReuseUnsupported                    bool
	ReuseUnsupportedReason              string
	ReuseRejectDirty                    uint64
	ReuseRejectAncestorDirtyBeforeEdit  uint64
	ReuseRejectHasError                 uint64
	ReuseRejectInvalidSpan              uint64
	ReuseRejectOutOfBounds              uint64
	ReuseRejectRootNonLeafChanged       uint64
	ReuseRejectLargeNonLeaf             uint64
	RecoverSearches                     uint64
	RecoverStateChecks                  uint64
	RecoverStateSkips                   uint64
	RecoverSymbolSkips                  uint64
	RecoverLookups                      uint64
	RecoverHits                         uint64
	MaxStacksSeen                       int
	EntryScratchPeak                    uint64
	StopReason                          ParseStopReason
	TokensConsumed                      uint64
	LastTokenEndByte                    uint32
	ExpectedEOFByte                     uint32
	ArenaBytesAllocated                 int64
	ScratchBytesAllocated               int64
	EntryScratchBytesAllocated          int64
	GSSBytesAllocated                   int64
	SingleStackIterations               int
	MultiStackIterations                int
	SingleStackTokens                   uint64
	MultiStackTokens                    uint64
	SingleStackGSSNodes                 uint64
	MultiStackGSSNodes                  uint64
	GSSNodesAllocated                   uint64
	GSSNodesRetained                    uint64
	GSSNodesDroppedSameToken            uint64
	ParentNodesAllocated                uint64
	ParentNodesRetained                 uint64
	ParentNodesDroppedSameToken         uint64
	LeafNodesAllocated                  uint64
	LeafNodesRetained                   uint64
	LeafNodesDroppedSameToken           uint64
	MergeStacksIn                       uint64
	MergeStacksOut                      uint64
	MergeSlotsUsed                      uint64
	GlobalCullStacksIn                  uint64
	GlobalCullStacksOut                 uint64
	ParserLoopNanos                     int64
	TokenNextNanos                      int64
	ActionDispatchNanos                 int64
	ActionLookupNanos                   int64
	GLRMergeNanos                       int64
	GLRCullNanos                        int64
	ResultSelectionNanos                int64
	TransientParentMaterializationNanos int64
	ResultTreeBuildNanos                int64
	TransientChildMaterializationNanos  int64
	ResultPythonKeywordRepairNanos      int64
	ResultPythonRootRepairNanos         int64
	ResultFinalizeRootNanos             int64
	ResultExtendTrailingNanos           int64
	ResultNormalizeRootStartNanos       int64
	ResultCompatibilityNanos            int64
	ResultParentLinkNanos               int64
	ReduceRangeNanos                    int64
	ReducePendingParentNanos            int64
	ReduceChildBuildNanos               int64
	ReduceParentBuildNanos              int64
	ReduceSpanNanos                     int64
	ReduceStackPushNanos                int64
	ReduceNoTreeBuildNanos              int64
	ActionExtraShiftNanos               int64
	ActionNoActionNanos                 int64
	ActionNoActionRelexNanos            int64
	ActionNoActionMissingNanos          int64
	ActionNoActionRecoverNanos          int64
	ActionNoActionErrorNanos            int64
	ActionConflictChoiceNanos           int64
	ActionConflictForkNanos             int64
	ActionSingleShiftNanos              int64
	ActionSingleReduceNanos             int64
	ActionSingleAcceptNanos             int64
	ActionSingleRecoverNanos            int64
	ActionSingleOtherNanos              int64
	NormalizationNanos                  int64
}

IncrementalParseProfile attributes incremental parse time into coarse buckets.

ReuseCursorNanos includes reuse-cursor setup and subtree-candidate checks. ReparseNanos includes the remainder of incremental parsing/rebuild work.

type IncrementalReuseExternalScanner added in v0.7.0

type IncrementalReuseExternalScanner interface {
	ExternalScanner
	SupportsIncrementalReuse() bool
}

IncrementalReuseExternalScanner is implemented by external scanners that can safely participate in DFA subtree reuse during incremental parses. Scanners with serialized mutable state, such as Python's indentation stack, should leave this unimplemented so edited incremental parses fall back to the conservative full-reparse path.

type IncrementalReuseTokenSource added in v0.7.0

type IncrementalReuseTokenSource interface {
	TokenSource
	SupportsIncrementalReuse() bool
}

IncrementalReuseTokenSource is an opt-in marker for custom token sources that are safe for incremental subtree reuse. Implementations must provide stable token boundaries across edits and support deterministic SkipToByte* behavior so reused-tree fast-forwarding remains correct.

type Injection added in v0.6.0

type Injection struct {
	// Language is the detected language name (e.g., "javascript").
	Language string
	// Tree is the parse tree for this region, or nil if the language
	// was not registered.
	Tree *Tree
	// Ranges are the source ranges this tree covers.
	Ranges []Range
	// Node is the parent tree node that triggered the injection.
	Node *Node
}

Injection is a single embedded language region.

type InjectionParser added in v0.6.0

type InjectionParser struct {
	// contains filtered or unexported fields
}

InjectionParser parses documents with embedded languages.

InjectionParser is not safe for concurrent use. It caches child parsers and mutates shared maps during parse operations.

func NewInjectionParser added in v0.6.0

func NewInjectionParser() *InjectionParser

NewInjectionParser creates an InjectionParser.

func (*InjectionParser) Parse added in v0.6.0

func (ip *InjectionParser) Parse(source []byte, parentLang string) (*InjectionResult, error)

Parse parses source as parentLang, then recursively parses injected regions.

func (*InjectionParser) ParseIncremental added in v0.6.0

func (ip *InjectionParser) ParseIncremental(source []byte, parentLang string,
	oldResult *InjectionResult) (*InjectionResult, error)

ParseIncremental re-parses after edits, reusing unchanged child trees.

func (*InjectionParser) ParseIncrementalUTF16 added in v0.16.0

func (ip *InjectionParser) ParseIncrementalUTF16(source []uint16, parentLang string,
	oldResult *UTF16InjectionResult) (*UTF16InjectionResult, error)

ParseIncrementalUTF16 re-parses UTF-16 source after edits, reusing unchanged child trees. Call oldResult.Tree.EditUTF16 before calling this.

func (*InjectionParser) ParseIncrementalUTF16Bytes added in v0.16.0

func (ip *InjectionParser) ParseIncrementalUTF16Bytes(source []byte, parentLang string,
	oldResult *UTF16InjectionResult, order UTF16ByteOrder) (*UTF16InjectionResult, error)

ParseIncrementalUTF16Bytes is like ParseIncrementalUTF16 for endian-specific UTF-16 bytes.

func (*InjectionParser) ParseUTF16 added in v0.16.0

func (ip *InjectionParser) ParseUTF16(source []uint16, parentLang string) (*UTF16InjectionResult, error)

ParseUTF16 parses UTF-16 source as parentLang, then recursively parses injected regions. The returned injection ranges are in UTF-16 code units.

func (*InjectionParser) ParseUTF16Bytes added in v0.16.0

func (ip *InjectionParser) ParseUTF16Bytes(source []byte, parentLang string, order UTF16ByteOrder) (*UTF16InjectionResult, error)

ParseUTF16Bytes is like ParseUTF16 for endian-specific UTF-16 bytes.

func (*InjectionParser) RegisterInjectionQuery added in v0.6.0

func (ip *InjectionParser) RegisterInjectionQuery(parentLang string, query string) error

RegisterInjectionQuery sets the injection query for a parent language. The query should use @injection.content and #set! injection.language conventions. It is compiled against the registered parent language.

func (*InjectionParser) RegisterLanguage added in v0.6.0

func (ip *InjectionParser) RegisterLanguage(name string, lang *Language)

RegisterLanguage adds a language that can be used as parent or child.

func (*InjectionParser) SetMaxDepth added in v0.6.0

func (ip *InjectionParser) SetMaxDepth(depth int)

SetMaxDepth overrides the nested injection recursion limit. Depth values <= 0 restore the default limit.

type InjectionResult added in v0.6.0

type InjectionResult struct {
	// Tree is the parent language's parse tree.
	Tree *Tree
	// Injections contains child language parse results, ordered by position.
	Injections []Injection
}

InjectionResult holds parse results for a multi-language document.

type InputEdit

type InputEdit struct {
	StartByte   uint32
	OldEndByte  uint32
	NewEndByte  uint32
	StartPoint  Point
	OldEndPoint Point
	NewEndPoint Point
}

InputEdit describes a single edit to the source text. It tells the parser what byte range was replaced and what the new range looks like, so the incremental parser can skip unchanged subtrees.

type InputEncoding added in v0.16.0

type InputEncoding uint8

InputEncoding identifies the source encoding used to build a Tree.

const (
	InputEncodingUTF8 InputEncoding = iota
	InputEncodingUTF16
)

func (InputEncoding) String added in v0.16.0

func (e InputEncoding) String() string

type InternObservationStats added in v0.20.0

type InternObservationStats struct {
	// Phase 2 counters (parseState-blind observation across ALL leaves).
	LeafLookups uint64
	LeafHits    uint64
	LeafMisses  uint64
	LeafStores  uint64
	LeafGrowths uint64
	// Phase 3 attribution. Shift-path leaves get parseState set per-fork
	// so they can't be canonically substituted via the parseState-blind
	// measurement; non-shift leaves can. "Safe to substitute" via blind
	// measurement = (LeafMisses+LeafHits) - ShiftLeafObserved.
	ShiftLeafObserved uint64
	// Phase 3 parseState-aware measurement. Same hook as LeafLookups
	// but with parseState/preGotoState included in the key. A hit here
	// means a truly dedup-safe duplicate; the difference between this
	// and the blind hit rate quantifies how much of the blind
	// observation was an artifact of ignoring state.
	FullLookups uint64
	FullHits    uint64
	FullMisses  uint64
}

InternObservationStats is the externally-visible snapshot of leaf-interning observation counters for a single parse. Returned from InternStatsFor.

func InternStatsFor added in v0.20.0

func InternStatsFor(root *Node) InternObservationStats

InternStatsFor returns a snapshot of the leaf-interning observation counters for the arena that owns the given root node. Returns the zero value if observation is disabled or the root is not arena-backed. Exposed so external benches can read hit rates without grepping internal logs.

type Language

type Language struct {
	Name string
	// GeneratedByGrammargen is true for languages assembled by grammargen at
	// runtime rather than decoded from a checked-in ts2go blob.
	GeneratedByGrammargen bool

	// LanguageVersion is the tree-sitter language ABI version.
	// A value of 0 means "unknown/unspecified" and is treated as compatible.
	LanguageVersion uint32

	// Counts
	SymbolCount        uint32
	TokenCount         uint32
	ExternalTokenCount uint32
	StateCount         uint32
	LargeStateCount    uint32
	FieldCount         uint32
	ProductionIDCount  uint32

	// Symbol metadata
	SymbolNames    []string
	SymbolMetadata []SymbolMetadata
	FieldNames     []string // index 0 is ""

	// Parse tables
	ParseTable         [][]uint16 // dense: [state][symbol] -> action index
	SmallParseTable    []uint16   // compressed sparse table
	SmallParseTableMap []uint32   // state -> offset into SmallParseTable
	ParseActions       []ParseActionEntry

	// ReduceChainHints are optional generated hot-path hints for deterministic
	// reduce runs. They are only consumed when reduce-chain hints are enabled.
	ReduceChainHints []ReduceChainHint

	// Lex tables
	LexModes            []LexMode
	LexStates           []LexState // main lexer DFA
	KeywordLexStates    []LexState // keyword lexer DFA (optional)
	KeywordCaptureToken Symbol
	// LayoutFallbackLexState is an optional broad DFA start state used only in
	// layout-entry parser states. It lets the runtime avoid skipping over
	// zero-width external layout markers before the layout scanner fires.
	LayoutFallbackLexState    uint16
	HasLayoutFallbackLexState bool

	// Field mapping
	FieldMapSlices  [][2]uint16 // [production_id] -> (index, length)
	FieldMapEntries []FieldMapEntry

	// Alias sequences
	AliasSequences [][]Symbol // [production_id][child_index] -> alias symbol

	// Primary state IDs (for table dedup)
	PrimaryStateIDs []StateID

	// ABI 15: Reserved words — flat array indexed by
	// (reserved_word_set_id * MaxReservedWordSetSize + i), terminated by 0.
	ReservedWords          []Symbol
	MaxReservedWordSetSize uint16

	// ABI 15: Supertype hierarchy
	SupertypeSymbols    []Symbol
	SupertypeMapSlices  [][2]uint16 // [supertype_symbol] -> (index, length)
	SupertypeMapEntries []Symbol

	// ABI 15: Grammar semantic version
	Metadata LanguageMetadata

	// External scanner (nil if not needed)
	ExternalScanner ExternalScanner
	ExternalSymbols []Symbol // external token index -> symbol
	// ImmediateTokens is a bitmask of symbol IDs that are token.immediate() tokens.
	// When the lexer matches one of these after consuming whitespace, the match
	// should be rejected — immediate tokens must match at the original position.
	// nil means no immediate tokens (common for ts2go grammars).
	ImmediateTokens []bool
	// ZeroWidthTokens is a bitmask of symbol IDs whose DFA terminal pattern can
	// intentionally match empty input. nil means this information is unavailable,
	// which preserves historical lexer behavior for ts2go blobs.
	ZeroWidthTokens []bool

	// ExternalLexStates maps external lex state IDs (from LexMode.ExternalLexState)
	// to a boolean slice indicating which external tokens are valid. Row 0 is
	// always all-false (no external tokens valid). When non-nil, this table is
	// used instead of parse-action-table probing to compute validSymbols for the
	// external scanner, matching C tree-sitter's ts_external_scanner_states.
	ExternalLexStates [][]bool

	// InitialState is the parser's start state. In tree-sitter grammars
	// this is always 1 (state 0 is reserved for error recovery). For
	// hand-built grammars it defaults to 0.
	InitialState StateID
	// contains filtered or unexported fields
}

Language holds all data needed to parse a specific language. It mirrors tree-sitter's TSLanguage C struct, translated into idiomatic Go types with slice-based tables instead of raw pointers.

func LoadLanguage added in v0.9.0

func LoadLanguage(data []byte) (*Language, error)

LoadLanguage deserializes a compressed grammar blob into a Language. Blobs are produced by grammargen's GenerateLanguage or the grammar build toolchain. This is the only function needed at runtime to load pre-compiled grammars — no grammargen import required.

func (*Language) CompatibleWithRuntime

func (l *Language) CompatibleWithRuntime() bool

CompatibleWithRuntime reports whether this language can be parsed by the current runtime version. Unspecified versions (0) are treated as compatible.

func (*Language) FieldByName

func (l *Language) FieldByName(name string) (FieldID, bool)

FieldByName returns the field ID for a given name, or (0, false) if not found. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) IsSupertype added in v0.6.0

func (l *Language) IsSupertype(sym Symbol) bool

IsSupertype reports whether sym is a supertype symbol.

func (*Language) KeywordLexAsciiTable added in v0.10.2

func (l *Language) KeywordLexAsciiTable() [][128]int32

KeywordLexAsciiTable returns the ASCII fast-path table for the keyword lexer DFA.

func (*Language) LexAsciiTable added in v0.10.2

func (l *Language) LexAsciiTable() [][128]int32

LexAsciiTable returns the pre-built ASCII fast-path transition table for the main lexer DFA. The table is built once per Language. Entry format:

bit 31 set  → skip transition (consume and reset token start)
bits 0-30   → next state ID (lexAsciiNoMatch if no transition)

func (*Language) LexModeStarts added in v0.19.0

func (l *Language) LexModeStarts() []lexModeStart

func (*Language) PublicSymbol added in v0.7.0

func (l *Language) PublicSymbol(sym Symbol) Symbol

PublicSymbol maps an internal symbol to its canonical public form. Multiple internal symbols may share the same visible name (e.g. HTML's _start_tag_name and _end_tag_name both display as "tag_name"). PublicSymbol returns the first symbol with that name, matching what SymbolByName returns. This ensures query patterns compiled with SymbolByName match nodes regardless of which alias produced them.

func (*Language) PublicSymbolForNamedness added in v0.19.0

func (l *Language) PublicSymbolForNamedness(sym Symbol, named bool) Symbol

PublicSymbolForNamedness maps an internal symbol to the canonical public symbol with the same display name and requested namedness. This lets query matching distinguish named nodes from anonymous tokens that share text.

func (*Language) SupertypeChildren added in v0.6.0

func (l *Language) SupertypeChildren(sym Symbol) []Symbol

SupertypeChildren returns the subtype symbols for a given supertype. Returns nil if sym is not a supertype or has no entries.

func (*Language) SymbolByName

func (l *Language) SymbolByName(name string) (Symbol, bool)

SymbolByName returns the symbol ID for a given name, or (0, false) if not found. The "_" wildcard returns (0, true) as a special case. Builds an internal map on first call for O(1) subsequent lookups.

func (*Language) TokenSymbolsByName

func (l *Language) TokenSymbolsByName(name string) []Symbol

TokenSymbolsByName returns all terminal token symbols whose display name matches name. The returned symbols are in grammar order.

func (*Language) Version

func (l *Language) Version() uint32

Version returns the tree-sitter language ABI version.

type LanguageMetadata added in v0.6.0

type LanguageMetadata struct {
	MajorVersion uint8
	MinorVersion uint8
	PatchVersion uint8
}

LanguageMetadata holds the grammar's semantic version (ABI 15+).

type LexMode

type LexMode struct {
	LexState                  uint16
	ExternalLexState          uint16
	ReservedWordSetID         uint16
	AfterWhitespaceLexState   uint16 // DFA start state to use after whitespace (0 = same as LexState)
	LexStateID                uint32 // widened DFA start state for grammargen tables with >64K lexer states
	AfterWhitespaceLexStateID uint32
}

LexMode maps a parser state to its lexer configuration.

func (LexMode) AfterWhitespaceLexStateIndex added in v0.16.0

func (m LexMode) AfterWhitespaceLexStateIndex() uint32

AfterWhitespaceLexStateIndex returns the alternate DFA start state used after whitespace, or zero when the primary lex state should be used.

func (LexMode) LexStateIndex added in v0.16.0

func (m LexMode) LexStateIndex() uint32

LexStateIndex returns the DFA start state for this lex mode. Older grammar blobs only populate the uint16 LexState field; grammargen-generated tables can populate LexStateID when the DFA table exceeds 64K states.

func (*LexMode) SetAfterWhitespaceLexStateIndex added in v0.16.0

func (m *LexMode) SetAfterWhitespaceLexStateIndex(idx uint32)

func (*LexMode) SetLexStateIndex added in v0.16.0

func (m *LexMode) SetLexStateIndex(idx uint32)

type LexState

type LexState struct {
	AcceptToken    Symbol // 0 if this state doesn't accept
	AcceptPriority int16  // lower = higher priority (0 for ts2go blobs = longest-match)
	Skip           bool   // true if accepted chars are whitespace
	Default        int    // default next state (-1 if none)
	EOF            int    // state on EOF (-1 if none)
	Transitions    []LexTransition
}

LexState is one state in the table-driven lexer DFA.

type LexTransition

type LexTransition struct {
	Lo, Hi    rune // inclusive character range
	NextState int
	// Skip mirrors tree-sitter's SKIP(state): consume the matched rune
	// and continue lexing while resetting token start.
	Skip bool
}

LexTransition maps a character range to a next state.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes source text using a table-driven DFA.

func NewLexer

func NewLexer(states []LexState, source []byte) *Lexer

NewLexer creates a new Lexer that will tokenize source using the given DFA state table.

func (*Lexer) Next

func (l *Lexer) Next(startState uint32) Token

Next lexes the next token starting from the given lex state index. It automatically skips tokens from states where Skip=true (whitespace). Returns a zero-Symbol token with StartByte==EndByte at EOF.

type LookaheadIterator added in v0.6.0

type LookaheadIterator struct {
	// contains filtered or unexported fields
}

LookaheadIterator iterates over valid symbols for a given parse state. It precomputes the full set of symbols that have valid parse actions in the specified state, enabling autocomplete and error diagnostic use cases.

func NewLookaheadIterator added in v0.6.0

func NewLookaheadIterator(lang *Language, state StateID) (*LookaheadIterator, error)

NewLookaheadIterator creates an iterator over all symbols that have valid parse actions in the given state. Returns an error if the state is out of range for the language's parse tables.

func (*LookaheadIterator) CurrentSymbol added in v0.6.0

func (it *LookaheadIterator) CurrentSymbol() Symbol

CurrentSymbol returns the symbol at the current iterator position. Must be called after a successful Next().

func (*LookaheadIterator) CurrentSymbolName added in v0.6.0

func (it *LookaheadIterator) CurrentSymbolName() string

CurrentSymbolName returns the name of the symbol at the current iterator position. Returns "" if the position is invalid or the symbol has no name.

func (*LookaheadIterator) Language added in v0.6.0

func (it *LookaheadIterator) Language() *Language

Language returns the language associated with this iterator.

func (*LookaheadIterator) Next added in v0.6.0

func (it *LookaheadIterator) Next() bool

Next advances the iterator to the next valid symbol. Returns false when there are no more symbols.

func (*LookaheadIterator) ResetState added in v0.6.0

func (it *LookaheadIterator) ResetState(state StateID) error

ResetState resets the iterator to enumerate valid symbols for a different parse state within the same language. Returns an error if the state is out of range.

type Node

type Node struct {
	// contains filtered or unexported fields
}

Node is a syntax tree node.

func NewLeafNode

func NewLeafNode(sym Symbol, named bool, startByte, endByte uint32, startPoint, endPoint Point) *Node

NewLeafNode creates a terminal/leaf node.

func NewParentNode

func NewParentNode(sym Symbol, named bool, children []*Node, fieldIDs []FieldID, productionID uint16) *Node

NewParentNode creates a non-terminal node with children. It sets parent pointers on all children and computes byte/point spans from the first and last children. If any child has an error, the parent is marked as having an error too.

func (*Node) Child

func (n *Node) Child(i int) *Node

Child returns the i-th child, or nil if i is out of range.

func (*Node) ChildByFieldName

func (n *Node) ChildByFieldName(name string, lang *Language) *Node

ChildByFieldName returns the first child assigned to the given field name, or nil if no child has that field. The Language is needed to resolve field names to IDs. Uses Language.FieldByName for O(1) lookup.

func (*Node) ChildCount

func (n *Node) ChildCount() int

ChildCount returns the number of children (both named and anonymous).

func (*Node) Children

func (n *Node) Children() []*Node

Children returns a slice of all children.

func (*Node) DescendantForByteRange added in v0.6.0

func (n *Node) DescendantForByteRange(startByte, endByte uint32) *Node

DescendantForByteRange returns the smallest descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) DescendantForPointRange added in v0.6.0

func (n *Node) DescendantForPointRange(startPoint, endPoint Point) *Node

DescendantForPointRange returns the smallest descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) Edit added in v0.7.0

func (n *Node) Edit(edit InputEdit)

Edit adjusts this node's byte/point span for a source edit.

If the node belongs to a larger tree, the edit is applied from the containing root so sibling and ancestor spans remain consistent. Unlike Tree.Edit, this method does not record edit history on a Tree.

func (*Node) EndByte

func (n *Node) EndByte() uint32

EndByte returns the byte offset where this node ends (exclusive).

func (*Node) EndPoint

func (n *Node) EndPoint() Point

EndPoint returns the row/column position where this node ends.

func (*Node) FieldNameForChild added in v0.6.0

func (n *Node) FieldNameForChild(i int, lang *Language) string

FieldNameForChild returns the field name assigned to the i-th child, or an empty string when no field is assigned.

func (*Node) HasChanges added in v0.6.0

func (n *Node) HasChanges() bool

HasChanges reports whether this node was marked dirty by Tree.Edit.

func (*Node) HasError

func (n *Node) HasError() bool

HasError reports whether this node or any descendant contains a parse error.

func (*Node) IsError added in v0.6.0

func (n *Node) IsError() bool

IsError reports whether this node is an explicit error node.

func (*Node) IsExtra added in v0.6.0

func (n *Node) IsExtra() bool

IsExtra reports whether this node was marked as extra syntax (e.g. whitespace/comments outside the core parse structure).

func (*Node) IsMissing

func (n *Node) IsMissing() bool

IsMissing reports whether this node was inserted by error recovery.

func (*Node) IsNamed

func (n *Node) IsNamed() bool

IsNamed reports whether this is a named node (as opposed to anonymous syntax like punctuation).

func (*Node) NamedChild

func (n *Node) NamedChild(i int) *Node

NamedChild returns the i-th named child (skipping anonymous children), or nil if i is out of range.

func (*Node) NamedChildCount

func (n *Node) NamedChildCount() int

NamedChildCount returns the number of named children.

func (*Node) NamedDescendantForByteRange added in v0.6.0

func (n *Node) NamedDescendantForByteRange(startByte, endByte uint32) *Node

NamedDescendantForByteRange returns the smallest named descendant that fully contains the given byte range, or nil when no such descendant exists.

func (*Node) NamedDescendantForPointRange added in v0.6.0

func (n *Node) NamedDescendantForPointRange(startPoint, endPoint Point) *Node

NamedDescendantForPointRange returns the smallest named descendant that fully contains the given point range, or nil when no such descendant exists.

func (*Node) NextSibling

func (n *Node) NextSibling() *Node

NextSibling returns the next sibling node, or nil when this is the last child or has no parent.

func (*Node) Parent

func (n *Node) Parent() *Node

Parent returns this node's parent, or nil if it is the root.

func (*Node) ParseState

func (n *Node) ParseState() StateID

ParseState returns the parser state associated with this node.

func (*Node) PreGotoState added in v0.6.0

func (n *Node) PreGotoState() StateID

PreGotoState returns the parser state that was on top of the stack before this node was pushed (i.e., the state exposed after popping children during reduce). For non-leaf nodes: lookupGoto(PreGotoState, Symbol) == ParseState.

func (*Node) PrevSibling

func (n *Node) PrevSibling() *Node

PrevSibling returns the previous sibling node, or nil when this is the first child or has no parent.

func (*Node) Range

func (n *Node) Range() Range

Range returns the full span of this node as a Range.

func (*Node) SExpr added in v0.6.0

func (n *Node) SExpr(lang *Language) string

SExpr returns a tree-sitter-style S-expression for this node. It includes only named nodes for stable debug snapshots.

func (*Node) StartByte

func (n *Node) StartByte() uint32

StartByte returns the byte offset where this node begins.

func (*Node) StartPoint

func (n *Node) StartPoint() Point

StartPoint returns the row/column position where this node begins.

func (*Node) Symbol

func (n *Node) Symbol() Symbol

Symbol returns the node's grammar symbol.

func (*Node) Text

func (n *Node) Text(source []byte) string

Text returns the source text covered by this node. Returns an empty string for nil nodes or invalid byte ranges.

func (*Node) Type

func (n *Node) Type(lang *Language) string

Type returns the node's type name from the language.

type NormalizationPassRuntime added in v0.20.0

type NormalizationPassRuntime struct {
	Name           string
	Checked        uint64
	Run            uint64
	NodesVisited   uint64
	NodesRewritten uint64
	Nanos          int64
}

type ParseAction

type ParseAction struct {
	Type              ParseActionType
	State             StateID // target state (shift/recover)
	Symbol            Symbol  // reduced symbol (reduce)
	ChildCount        uint8   // children consumed (reduce)
	DynamicPrecedence int16   // precedence (reduce)
	ProductionID      uint16  // which production (reduce)
	Extra             bool    // is this an extra token (shift)
	ExtraChain        bool    // does this shift enter a nonterminal extra chain
	Repetition        bool    // is this a repetition (shift)
}

ParseAction is a single parser action from the parse table.

type ParseActionEntry

type ParseActionEntry struct {
	Reusable bool
	Actions  []ParseAction
}

ParseActionEntry is a group of actions for a (state, symbol) pair.

type ParseActionTiming added in v0.19.0

type ParseActionTiming struct {
	ExtraShiftNanos      int64
	NoActionNanos        int64
	NoActionRelexNanos   int64
	NoActionMissingNanos int64
	NoActionRecoverNanos int64
	NoActionErrorNanos   int64
	ConflictChoiceNanos  int64
	ConflictForkNanos    int64
	SingleShiftNanos     int64
	SingleReduceNanos    int64
	SingleAcceptNanos    int64
	SingleRecoverNanos   int64
	SingleOtherNanos     int64
}

type ParseActionType

type ParseActionType uint8

ParseActionType identifies the kind of parse action.

const (
	ParseActionShift ParseActionType = iota
	ParseActionReduce
	ParseActionAccept
	ParseActionRecover
)

type ParseEquivStateRuntime added in v0.19.0

type ParseEquivStateRuntime struct {
	State                                 StateID
	StackEquivCalls                       uint64
	StackEquivTrue                        uint64
	StackEquivDepthMismatch               uint64
	StackEquivHashMismatch                uint64
	StackEquivStateMismatch               uint64
	StackEquivPayloadMismatch             uint64
	StackEquivEntryCompares               uint64
	StackEquivStateMismatchDepthSum       uint64
	StackEquivStateMismatchMaxDepth       uint32
	StackEquivStateMismatchDepthBuckets   [stackEquivMismatchDepthBucketCount]uint64
	StackEquivPayloadMismatchDepthSum     uint64
	StackEquivPayloadMismatchMaxDepth     uint32
	StackEquivPayloadMismatchDepthBuckets [stackEquivMismatchDepthBucketCount]uint64
	StackEquivPayloadHeaderSigDiff        uint64
	StackEquivPayloadHeaderSigSame        uint64
	StackEquivPayloadShallowSigDiff       uint64
	StackEquivPayloadShallowSigSame       uint64
	StackEquivPairKeyed                   uint64
	StackEquivPairUnkeyed                 uint64
	StackEquivPairRepeats                 uint64
	StackEquivPairRepeatTrue              uint64
	StackEquivPairRepeatFalse             uint64
	StackEquivPairRepeatMismatch          uint64
	StackEquivPairStores                  uint64
	EquivCacheLookups                     uint64
	EquivCacheHits                        uint64
	EquivCacheStores                      uint64
	EquivCacheMisses                      uint64
	EquivCacheTrueHits                    uint64
	EquivCacheFalseHits                   uint64
	EquivCacheEpochMisses                 uint64
	EquivCacheKeyMisses                   uint64
	EquivCacheVersionMisses               uint64
	EquivSkipError                        uint64
	EquivSkipLeaf                         uint64
	EquivSkipFieldMismatch                uint64
	EquivExactCalls                       uint64
	EquivExactTrue                        uint64
	EquivExactPointerTrue                 uint64
	EquivExactNilMismatch                 uint64
	EquivExactHeaderMismatch              uint64
	EquivExactChildMismatch               uint64
	EquivExactTerminalCalls               uint64
	EquivExactTerminalTrue                uint64
	EquivExactTerminalFalse               uint64
	EquivFrontierCalls                    uint64
	EquivFrontierTrue                     uint64
	EquivExactChildCompares               uint64
	EquivFrontierChildScans               uint64
	EquivFrontierCandidateCompares        uint64
}

type ParseOption added in v0.6.0

type ParseOption func(*parseConfig)

ParseOption configures ParseWith behavior.

func WithOldTree added in v0.6.0

func WithOldTree(oldTree *Tree) ParseOption

WithOldTree enables incremental parsing against an edited prior tree.

func WithProfiling added in v0.6.0

func WithProfiling() ParseOption

WithProfiling enables incremental parse attribution in ParseResult.Profile.

func WithTokenSource added in v0.6.0

func WithTokenSource(ts TokenSource) ParseOption

WithTokenSource provides a custom token source for parsing.

type ParseReduceTiming added in v0.19.0

type ParseReduceTiming struct {
	RangeNanos         int64
	PendingParentNanos int64
	ChildBuildNanos    int64
	ParentBuildNanos   int64
	SpanNanos          int64
	StackPushNanos     int64
	NoTreeBuildNanos   int64
}

type ParseResult added in v0.6.0

type ParseResult struct {
	Tree *Tree
	// Profile is populated only when ParseWith uses WithProfiling for
	// incremental parsing.
	Profile IncrementalParseProfile
	// ProfileAvailable reports whether Profile contains attribution data.
	ProfileAvailable bool
}

ParseResult is returned by ParseWith.

type ParseRuntime added in v0.6.0

type ParseRuntime struct {
	StopReason                                   ParseStopReason
	SourceLen                                    uint32
	ExpectedEOFByte                              uint32
	RootEndByte                                  uint32
	Truncated                                    bool
	TokenSourceEOFEarly                          bool
	TokensConsumed                               uint64
	LastTokenEndByte                             uint32
	LastTokenSymbol                              Symbol
	LastTokenWasEOF                              bool
	IterationLimit                               int
	StackDepthLimit                              int
	NodeLimit                                    int
	MemoryBudgetBytes                            int64
	Iterations                                   int
	NodesAllocated                               int
	ArenaBytesAllocated                          int64
	ScratchBytesAllocated                        int64
	EntryScratchBytesAllocated                   int64
	GSSBytesAllocated                            int64
	PeakStackDepth                               int
	MaxStacksSeen                                int
	SingleStackIterations                        int
	MultiStackIterations                         int
	SingleStackTokens                            uint64
	MultiStackTokens                             uint64
	SingleStackGSSNodes                          uint64
	MultiStackGSSNodes                           uint64
	GSSNodesAllocated                            uint64
	GSSNodesRetained                             uint64
	GSSNodesDroppedSameToken                     uint64
	ParentNodesAllocated                         uint64
	ParentNodesRetained                          uint64
	ParentNodesDroppedSameToken                  uint64
	LeafNodesAllocated                           uint64
	LeafNodesRetained                            uint64
	LeafNodesDroppedSameToken                    uint64
	ChildSlicesAllocated                         uint64
	ChildSlicesRetained                          uint64
	ChildSlicesDroppedSameToken                  uint64
	ChildPointersAllocated                       uint64
	ChildPointersRetained                        uint64
	ChildPointersDroppedSameToken                uint64
	ReduceChildFastGSS                           ReduceChildPathRuntime
	ReduceChildAllVisible                        ReduceChildPathRuntime
	ReduceChildNoAlias                           ReduceChildPathRuntime
	ReduceChildScratchGeneral                    ReduceChildPathRuntime
	ReduceChildScratchNoAlias                    ReduceChildPathRuntime
	TransientChildSlicesAllocated                uint64
	TransientChildPointersAllocated              uint64
	TransientChildSlicesMaterialized             uint64
	TransientChildPointersMaterialized           uint64
	TransientParentNodesAllocated                uint64
	TransientParentNodesMaterialized             uint64
	FinalNodes                                   uint64
	FinalParentNodes                             uint64
	FinalLeafNodes                               uint64
	FinalFieldedParentNodes                      uint64
	FinalUnfieldedParentNodes                    uint64
	FinalVisibleParentNodes                      uint64
	FinalHiddenParentNodes                       uint64
	FinalCheckpointLeafNodes                     uint64
	FinalChildSlices                             uint64
	FinalChildPointers                           uint64
	FinalFieldIDElements                         uint64
	FinalFieldSourceElements                     uint64
	FinalChildRefParents                         uint64
	FinalChildRefs                               uint64
	FinalChildRefMaterializedParents             uint64
	FinalChildRefMaterializedChildren            uint64
	FinalChildRefSingleChildAccesses             uint64
	FinalChildRefSingleChildMaterializedChildren uint64
	MergeStacksIn                                uint64
	MergeStacksOut                               uint64
	MergeSlotsUsed                               uint64
	GlobalCullStacksIn                           uint64
	GlobalCullStacksOut                          uint64
	StackEquivCalls                              uint64
	StackEquivTrue                               uint64
	StackEquivDepthMismatch                      uint64
	StackEquivHashMismatch                       uint64
	StackEquivStateMismatch                      uint64
	StackEquivPayloadMismatch                    uint64
	StackEquivEntryCompares                      uint64
	StackEquivStateMismatchDepthSum              uint64
	StackEquivStateMismatchMaxDepth              uint32
	StackEquivStateMismatchDepthBuckets          [stackEquivMismatchDepthBucketCount]uint64
	StackEquivPayloadMismatchDepthSum            uint64
	StackEquivPayloadMismatchMaxDepth            uint32
	StackEquivPayloadMismatchDepthBuckets        [stackEquivMismatchDepthBucketCount]uint64
	StackEquivPayloadHeaderSigDiff               uint64
	StackEquivPayloadHeaderSigSame               uint64
	StackEquivPayloadShallowSigDiff              uint64
	StackEquivPayloadShallowSigSame              uint64
	StackEquivPairKeyed                          uint64
	StackEquivPairUnkeyed                        uint64
	StackEquivPairRepeats                        uint64
	StackEquivPairRepeatTrue                     uint64
	StackEquivPairRepeatFalse                    uint64
	StackEquivPairRepeatMismatch                 uint64
	StackEquivPairStores                         uint64
	MergeHeaderEqTotal                           uint64
	MergeDeepTrue                                uint64
	MergeDeepFalse                               uint64
	MergeHeaderDeepDivergent                     uint64
	EquivCacheLookups                            uint64
	EquivCacheHits                               uint64
	EquivCacheStores                             uint64
	EquivCacheMisses                             uint64
	EquivCacheTrueHits                           uint64
	EquivCacheFalseHits                          uint64
	EquivCacheEpochMisses                        uint64
	EquivCacheKeyMisses                          uint64
	EquivCacheVersionMisses                      uint64
	EquivSkipError                               uint64
	EquivSkipLeaf                                uint64
	EquivSkipFieldMismatch                       uint64
	EquivExactCalls                              uint64
	EquivExactTrue                               uint64
	EquivExactPointerTrue                        uint64
	EquivExactNilMismatch                        uint64
	EquivExactHeaderMismatch                     uint64
	EquivExactChildMismatch                      uint64
	EquivExactTerminalCalls                      uint64
	EquivExactTerminalTrue                       uint64
	EquivExactTerminalFalse                      uint64
	EquivFrontierCalls                           uint64
	EquivFrontierTrue                            uint64
	EquivExactChildCompares                      uint64
	EquivFrontierChildScans                      uint64
	EquivFrontierCandidateCompares               uint64
	EquivStateStats                              []ParseEquivStateRuntime
	ParseWallNanos                               int64
	ParserLoopNanos                              int64
	TokenNextNanos                               int64
	ActionDispatchNanos                          int64
	ActionLookupNanos                            int64
	GLRMergeNanos                                int64
	GLRCullNanos                                 int64
	ReduceTiming                                 *ParseReduceTiming
	ActionTiming                                 *ParseActionTiming

	ExternalScannerCheckpointRecords                 uint64
	ExternalScannerCheckpointSlotsAllocated          uint64
	ExternalScannerCheckpointBytesAllocated          int64
	ExternalScannerSnapshotBytesAllocated            uint64
	ExternalScannerCheckpointLeafNodes               uint64
	CompactFullLeafCreated                           uint64
	CompactFullLeafMaterialized                      uint64
	CompactFullLeafMaterializedForParentReduce       uint64
	CompactFullLeafMaterializedForParentReject       PendingParentRejectStats
	CompactFullLeafMaterializedForFinalTree          uint64
	CompactFullLeafMaterializedForNormalization      uint64
	CompactFullLeafMaterializedForRecovery           uint64
	CompactFullLeafMaterializedForQuery              uint64
	CompactFullLeafMaterializedForCursor             uint64
	CompactFullLeafMaterializedForParentAPI          uint64
	CompactFullLeafMaterializedForEdit               uint64
	CompactFullLeafMaterializedForCheckpointRebuild  uint64
	CompactFullLeafDropped                           uint64
	CompactFullLeafMaterializedForFieldRejectPayload PendingParentFieldRejectPayloadStats
	PendingParentCreated                             uint64
	PendingParentMaterialized                        uint64
	PendingParentMaterializedForParentReduce         uint64
	PendingParentMaterializedForParentReject         PendingParentRejectStats
	PendingParentMaterializedForFieldReject          PendingParentFieldRejectStats
	PendingParentMaterializedForFieldRejectPayload   PendingParentFieldRejectPayloadStats
	PendingParentMaterializedForFinalTree            uint64
	PendingParentMaterializedForNormalization        uint64
	PendingParentMaterializedForRecovery             uint64
	PendingParentMaterializedForQuery                uint64
	PendingParentMaterializedForCursor               uint64
	PendingParentMaterializedForParentAPI            uint64
	PendingParentMaterializedForEdit                 uint64
	PendingParentMaterializedForCheckpointRebuild    uint64
	PendingParentDropped                             uint64
	PendingParentsFlattened                          uint64
	PendingChildRefsFlattened                        uint64
	PendingChildEntriesAllocated                     uint64
	PendingChildEntryCapacity                        uint64
	PendingChildEntryWaste                           uint64
	PendingParentCandidates                          uint64
	PendingParentRejectedEmpty                       uint64
	PendingParentRejectedChildLimit                  uint64
	PendingParentRejectedAlias                       uint64
	PendingParentRejectedRawSpan                     uint64
	PendingParentRejectedFields                      uint64
	PendingParentRejectedFieldsParentHidden          uint64
	PendingParentRejectedFieldsNoIDs                 uint64
	PendingParentRejectedFieldsInherited             uint64
	PendingParentRejectedFieldsHiddenChild           uint64
	PendingParentRejectedFieldsHiddenChildPlain      uint64
	PendingParentRejectedFieldsHiddenChildPlainEmpty uint64
	PendingParentRejectedFieldsHiddenChildPlainOne   uint64
	PendingParentRejectedFieldsHiddenChildPlainMany  uint64
	PendingParentRejectedFieldsHiddenChildWithFields uint64
	PendingParentRejectedFieldsChild                 uint64
	PendingParentRejectedFieldsAllVisibleDirect      uint64
	PendingParentRejectedChild                       uint64
	PendingParentRejectedSpan                        uint64
	PendingParentRejectedFill                        uint64
	PreMaterializationFieldRejectCandidates          uint64
	PreMaterializationFieldRejectSameKeyCandidates   uint64
	PreMaterializationFieldRejectOverflowCandidates  uint64

	CheckpointLeafFullNodesAvoided      uint64
	LeafNodesConstructed                uint64
	ParentNodesConstructed              uint64
	NoTreeReduceNodesConstructed        uint64
	NoTreeLeafNodesConstructed          uint64
	ResultSelectionNanos                int64
	TransientParentMaterializationNanos int64
	ResultTreeBuildNanos                int64
	TransientChildMaterializationNanos  int64
	ResultPythonKeywordRepairNanos      int64
	ResultPythonRootRepairNanos         int64
	ResultFinalizeRootNanos             int64
	ResultExtendTrailingNanos           int64
	ResultNormalizeRootStartNanos       int64
	ResultCompatibilityNanos            int64
	ResultParentLinkNanos               int64
	NormalizationPassesChecked          uint64
	NormalizationPassesRun              uint64
	NormalizationNodesVisited           uint64
	NormalizationNodesRewritten         uint64
	NormalizationNanos                  int64
	NormalizationPasses                 *[]NormalizationPassRuntime
}

ParseRuntime captures parser-loop diagnostics for a completed tree.

func (ParseRuntime) Summary added in v0.6.0

func (rt ParseRuntime) Summary() string

Summary returns a stable one-line diagnostic string for parse-runtime stats.

type ParseStopReason added in v0.6.0

type ParseStopReason string

ParseStopReason reports why parseInternal terminated.

const (
	ParseStopNone            ParseStopReason = "none"
	ParseStopAccepted        ParseStopReason = "accepted"
	ParseStopNoStacksAlive   ParseStopReason = "no_stacks_alive"
	ParseStopTokenSourceEOF  ParseStopReason = "token_source_eof"
	ParseStopTimeout         ParseStopReason = "timeout"
	ParseStopCancelled       ParseStopReason = "cancelled"
	ParseStopIterationLimit  ParseStopReason = "iteration_limit"
	ParseStopStackDepthLimit ParseStopReason = "stack_depth_limit"
	ParseStopNodeLimit       ParseStopReason = "node_limit"
	ParseStopMemoryBudget    ParseStopReason = "memory_budget"
)

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser reads parse tables from a Language and produces a syntax tree. It supports GLR parsing: when a (state, symbol) pair maps to multiple actions, the parser forks the stack and explores all alternatives in parallel while preserving distinct parse paths. Duplicate stack versions are collapsed and ambiguities are resolved at selection time.

Parser is not safe for concurrent use. Use one parser per goroutine, a ParserPool, or guard shared parser instances with external synchronization.

func NewParser

func NewParser(lang *Language) *Parser

NewParser creates a new Parser for the given language.

func (*Parser) CancellationFlag added in v0.7.0

func (p *Parser) CancellationFlag() *uint32

CancellationFlag returns the parser's current cancellation flag pointer.

func (*Parser) IncludedRanges added in v0.6.0

func (p *Parser) IncludedRanges() []Range

IncludedRanges returns a copy of the configured include ranges.

func (*Parser) InferredRootSymbol added in v0.9.0

func (p *Parser) InferredRootSymbol() (Symbol, bool)

InferredRootSymbol returns the root symbol inferred during parser construction, and whether inference succeeded.

func (*Parser) Language added in v0.7.0

func (p *Parser) Language() *Language

Language returns the parser's configured language.

func (*Parser) Logger added in v0.7.0

func (p *Parser) Logger() ParserLogger

Logger returns the currently configured parser debug logger.

func (*Parser) Parse

func (p *Parser) Parse(source []byte) (*Tree, error)

Parse tokenizes and parses source using the built-in DFA lexer, returning a syntax tree. This works for hand-built grammars that provide LexStates. For real grammars that need a custom lexer, use ParseWithTokenSource. If the input is empty, it returns a tree with a nil root and no error.

func (*Parser) ParseForestExperimental added in v0.20.0

func (p *Parser) ParseForestExperimental(source []byte) (*Tree, bool)

ParseForestExperimental parses source with the experimental GSS-forest GLR path and returns a releasable tree (or nil,false if the parse dies — the forest path has no error recovery yet). Exported so out-of-tree benchmarks and validation in packages that attach external scanners (e.g. grammars) can drive it; not part of the stable API.

func (*Parser) ParseIncremental

func (p *Parser) ParseIncremental(source []byte, oldTree *Tree) (*Tree, error)

ParseIncremental re-parses source after edits were applied to oldTree. It reuses unchanged subtrees from the old tree for better performance. Call oldTree.Edit() for each edit before calling this method.

func (*Parser) ParseIncrementalProfiled added in v0.6.0

func (p *Parser) ParseIncrementalProfiled(source []byte, oldTree *Tree) (*Tree, IncrementalParseProfile, error)

ParseIncrementalProfiled is like ParseIncremental and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseIncrementalUTF16 added in v0.16.0

func (p *Parser) ParseIncrementalUTF16(source []uint16, oldTree *Tree) (*Tree, error)

ParseIncrementalUTF16 re-parses UTF-16 source after edits were applied to oldTree. oldTree should have been produced by ParseUTF16, and UTF-16 edits can be recorded with Tree.EditUTF16.

func (*Parser) ParseIncrementalUTF16Bytes added in v0.16.0

func (p *Parser) ParseIncrementalUTF16Bytes(source []byte, oldTree *Tree, order UTF16ByteOrder) (*Tree, error)

ParseIncrementalUTF16Bytes re-parses UTF-16 bytes after edits were applied to oldTree.

func (*Parser) ParseIncrementalUTF16BytesWithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseIncrementalUTF16BytesWithTokenSourceFactory(source []byte, oldTree *Tree, order UTF16ByteOrder, factory TokenSourceFactory) (*Tree, error)

ParseIncrementalUTF16BytesWithTokenSourceFactory re-parses UTF-16 bytes using a token source built from the parser's canonical UTF-8 source view.

func (*Parser) ParseIncrementalUTF16WithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseIncrementalUTF16WithTokenSourceFactory(source []uint16, oldTree *Tree, factory TokenSourceFactory) (*Tree, error)

ParseIncrementalUTF16WithTokenSourceFactory re-parses UTF-16 source using a token source built from the parser's canonical UTF-8 source view.

func (*Parser) ParseIncrementalWithTokenSource

func (p *Parser) ParseIncrementalWithTokenSource(source []byte, oldTree *Tree, ts TokenSource) (*Tree, error)

ParseIncrementalWithTokenSource is like ParseIncremental but uses a custom token source.

func (*Parser) ParseIncrementalWithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseIncrementalWithTokenSourceFactory(source []byte, oldTree *Tree, factory TokenSourceFactory) (*Tree, error)

ParseIncrementalWithTokenSourceFactory is like ParseWithTokenSourceFactory for an edited old tree.

func (*Parser) ParseIncrementalWithTokenSourceProfiled added in v0.6.0

func (p *Parser) ParseIncrementalWithTokenSourceProfiled(source []byte, oldTree *Tree, ts TokenSource) (*Tree, IncrementalParseProfile, error)

ParseIncrementalWithTokenSourceProfiled is like ParseIncrementalWithTokenSource and also returns runtime attribution for incremental reuse work vs parse/rebuild work.

func (*Parser) ParseNoResultCompatibilityBenchmarkOnly added in v0.18.0

func (p *Parser) ParseNoResultCompatibilityBenchmarkOnly(source []byte) (*Tree, error)

ParseNoResultCompatibilityBenchmarkOnly parses source while suppressing language-specific result compatibility rewrites. It is intended only for performance attribution; the returned tree is not API-compatible.

func (*Parser) ParseNoTreeBenchmarkOnly added in v0.17.0

func (p *Parser) ParseNoTreeBenchmarkOnly(source []byte) (*Tree, error)

ParseNoTreeBenchmarkOnly parses source while suppressing parent/child tree materialization in reduce actions. It is intended only for parser-loop performance experiments; the returned tree is not API-compatible.

func (*Parser) ParseNoTreeWithExternalCheckpointsBenchmarkOnly added in v0.18.0

func (p *Parser) ParseNoTreeWithExternalCheckpointsBenchmarkOnly(source []byte) (*Tree, error)

ParseNoTreeWithExternalCheckpointsBenchmarkOnly parses source while suppressing parent/child tree materialization in reduce actions but keeping external-scanner checkpoint capture enabled. It is intended only for parser performance attribution; the returned tree is not API-compatible.

func (*Parser) ParseUTF16 added in v0.16.0

func (p *Parser) ParseUTF16(source []uint16) (*Tree, error)

ParseUTF16 parses UTF-16 source represented as Go UTF-16 code units.

The parser core uses a canonical UTF-8 view internally so existing byte-based APIs remain unchanged. The returned tree retains the original UTF-16 source and can convert node ranges back to UTF-16 code-unit coordinates.

func (*Parser) ParseUTF16Bytes added in v0.16.0

func (p *Parser) ParseUTF16Bytes(source []byte, order UTF16ByteOrder) (*Tree, error)

ParseUTF16Bytes parses UTF-16 source encoded as bytes with an explicit byte order.

func (*Parser) ParseUTF16BytesWithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseUTF16BytesWithTokenSourceFactory(source []byte, order UTF16ByteOrder, factory TokenSourceFactory) (*Tree, error)

ParseUTF16BytesWithTokenSourceFactory parses UTF-16 bytes using a token source built from the parser's canonical UTF-8 source view.

func (*Parser) ParseUTF16WithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseUTF16WithTokenSourceFactory(source []uint16, factory TokenSourceFactory) (*Tree, error)

ParseUTF16WithTokenSourceFactory parses UTF-16 source using a token source built from the parser's canonical UTF-8 source view.

func (*Parser) ParseWith added in v0.6.0

func (p *Parser) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith parses source using option-based configuration.

func (*Parser) ParseWithTokenSource

func (p *Parser) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource parses source using a custom token source. This is used for real grammars where the lexer DFA isn't available as data tables (e.g., Go grammar using go/scanner as a bridge).

func (*Parser) ParseWithTokenSourceFactory added in v0.16.0

func (p *Parser) ParseWithTokenSourceFactory(source []byte, factory TokenSourceFactory) (*Tree, error)

ParseWithTokenSourceFactory parses source using a freshly built custom token source. The factory is also retained for recovery reparses.

func (*Parser) SetAmbiguityProfile added in v0.17.0

func (p *Parser) SetAmbiguityProfile(profile *AmbiguityProfile)

SetAmbiguityProfile installs an optional diagnostic ambiguity profile. The profile receives parser state/lookahead/action counters for GLR-heavy benchmark runs. Pass nil to disable profiling.

func (*Parser) SetCancellationFlag added in v0.7.0

func (p *Parser) SetCancellationFlag(flag *uint32)

SetCancellationFlag configures a caller-owned cancellation flag. Parsing stops when the pointed value becomes non-zero.

func (*Parser) SetGLRTrace added in v0.7.0

func (p *Parser) SetGLRTrace(enabled bool)

SetGLRTrace enables verbose GLR stack tracing to stdout (debug only).

func (*Parser) SetIncludedRanges added in v0.6.0

func (p *Parser) SetIncludedRanges(ranges []Range)

SetIncludedRanges configures parser include ranges. Tokens outside these ranges are skipped.

func (*Parser) SetIncludedUTF16ByteRanges added in v0.16.0

func (p *Parser) SetIncludedUTF16ByteRanges(source []byte, order UTF16ByteOrder, ranges []UTF16Range) error

SetIncludedUTF16ByteRanges configures parser include ranges from endian-specific UTF-16 bytes.

func (*Parser) SetIncludedUTF16Ranges added in v0.16.0

func (p *Parser) SetIncludedUTF16Ranges(source []uint16, ranges []UTF16Range) bool

SetIncludedUTF16Ranges configures parser include ranges from UTF-16 code-unit ranges. Internal parser points are derived from source as UTF-8 columns.

func (*Parser) SetLogger added in v0.7.0

func (p *Parser) SetLogger(logger ParserLogger)

SetLogger installs a parser debug logger. Pass nil to disable logging.

func (*Parser) SetTimeoutMicros added in v0.7.0

func (p *Parser) SetTimeoutMicros(timeoutMicros uint64)

SetTimeoutMicros configures a per-parse timeout in microseconds. A value of 0 disables timeout checks.

func (*Parser) TimeoutMicros added in v0.7.0

func (p *Parser) TimeoutMicros() uint64

TimeoutMicros returns the parser timeout in microseconds.

type ParserLogType added in v0.7.0

type ParserLogType uint8

ParserLogType categorizes parser log messages.

const (
	// ParserLogParse emits parser-loop lifecycle and control-flow logs.
	ParserLogParse ParserLogType = iota
	// ParserLogLex emits token-source and token-consumption logs.
	ParserLogLex
)

type ParserLogger added in v0.7.0

type ParserLogger func(kind ParserLogType, message string)

ParserLogger receives parser debug logs when configured via SetLogger.

type ParserPool added in v0.7.0

type ParserPool struct {
	// contains filtered or unexported fields
}

ParserPool provides concurrency-safe parsing by reusing Parser instances.

ParserPool is safe for concurrent use. Each call checks out one parser from an internal sync.Pool, applies configured defaults, runs the parse, and returns the parser to the pool.

Mutable parser state (logger, timeout, cancellation flag, included ranges, GLR trace) is reset on checkout so request-local state cannot bleed across callers.

func NewParserPool added in v0.7.0

func NewParserPool(lang *Language, opts ...ParserPoolOption) *ParserPool

NewParserPool creates a concurrency-safe parser pool for lang.

func (*ParserPool) Language added in v0.7.0

func (pp *ParserPool) Language() *Language

Language returns the pool's configured language.

func (*ParserPool) Parse added in v0.7.0

func (pp *ParserPool) Parse(source []byte) (*Tree, error)

Parse delegates to a pooled Parser.Parse call.

func (*ParserPool) ParseIncrementalUTF16 added in v0.16.0

func (pp *ParserPool) ParseIncrementalUTF16(source []uint16, oldTree *Tree) (*Tree, error)

ParseIncrementalUTF16 delegates to a pooled Parser.ParseIncrementalUTF16 call.

func (*ParserPool) ParseIncrementalUTF16Bytes added in v0.16.0

func (pp *ParserPool) ParseIncrementalUTF16Bytes(source []byte, oldTree *Tree, order UTF16ByteOrder) (*Tree, error)

ParseIncrementalUTF16Bytes delegates to a pooled Parser.ParseIncrementalUTF16Bytes call.

func (*ParserPool) ParseIncrementalUTF16BytesWithTokenSourceFactory added in v0.16.0

func (pp *ParserPool) ParseIncrementalUTF16BytesWithTokenSourceFactory(source []byte, oldTree *Tree, order UTF16ByteOrder, factory TokenSourceFactory) (*Tree, error)

ParseIncrementalUTF16BytesWithTokenSourceFactory delegates to a pooled Parser.ParseIncrementalUTF16BytesWithTokenSourceFactory call.

func (*ParserPool) ParseIncrementalUTF16WithTokenSourceFactory added in v0.16.0

func (pp *ParserPool) ParseIncrementalUTF16WithTokenSourceFactory(source []uint16, oldTree *Tree, factory TokenSourceFactory) (*Tree, error)

ParseIncrementalUTF16WithTokenSourceFactory delegates to a pooled Parser.ParseIncrementalUTF16WithTokenSourceFactory call.

func (*ParserPool) ParseNoResultCompatibilityBenchmarkOnly added in v0.18.0

func (pp *ParserPool) ParseNoResultCompatibilityBenchmarkOnly(source []byte) (*Tree, error)

ParseNoResultCompatibilityBenchmarkOnly delegates to Parser.ParseNoResultCompatibilityBenchmarkOnly. It is intended only for performance attribution of parser/tree construction versus compatibility rewrites; the returned tree is not API-compatible.

func (*ParserPool) ParseNoTreeBenchmarkOnly added in v0.17.0

func (pp *ParserPool) ParseNoTreeBenchmarkOnly(source []byte) (*Tree, error)

ParseNoTreeBenchmarkOnly delegates to Parser.ParseNoTreeBenchmarkOnly. It is intended only for parser-loop performance experiments; the returned tree is not API-compatible.

func (*ParserPool) ParseNoTreeWithExternalCheckpointsBenchmarkOnly added in v0.18.0

func (pp *ParserPool) ParseNoTreeWithExternalCheckpointsBenchmarkOnly(source []byte) (*Tree, error)

ParseNoTreeWithExternalCheckpointsBenchmarkOnly delegates to Parser.ParseNoTreeWithExternalCheckpointsBenchmarkOnly. It is intended only for parser performance attribution; the returned tree is not API-compatible.

func (*ParserPool) ParseUTF16 added in v0.16.0

func (pp *ParserPool) ParseUTF16(source []uint16) (*Tree, error)

ParseUTF16 delegates to a pooled Parser.ParseUTF16 call.

func (*ParserPool) ParseUTF16Bytes added in v0.16.0

func (pp *ParserPool) ParseUTF16Bytes(source []byte, order UTF16ByteOrder) (*Tree, error)

ParseUTF16Bytes delegates to a pooled Parser.ParseUTF16Bytes call.

func (*ParserPool) ParseUTF16BytesWithTokenSourceFactory added in v0.16.0

func (pp *ParserPool) ParseUTF16BytesWithTokenSourceFactory(source []byte, order UTF16ByteOrder, factory TokenSourceFactory) (*Tree, error)

ParseUTF16BytesWithTokenSourceFactory delegates to a pooled Parser.ParseUTF16BytesWithTokenSourceFactory call.

func (*ParserPool) ParseUTF16WithTokenSourceFactory added in v0.16.0

func (pp *ParserPool) ParseUTF16WithTokenSourceFactory(source []uint16, factory TokenSourceFactory) (*Tree, error)

ParseUTF16WithTokenSourceFactory delegates to a pooled Parser.ParseUTF16WithTokenSourceFactory call.

func (*ParserPool) ParseWith added in v0.7.0

func (pp *ParserPool) ParseWith(source []byte, opts ...ParseOption) (ParseResult, error)

ParseWith delegates to a pooled Parser.ParseWith call.

func (*ParserPool) ParseWithTokenSource added in v0.7.0

func (pp *ParserPool) ParseWithTokenSource(source []byte, ts TokenSource) (*Tree, error)

ParseWithTokenSource delegates to a pooled Parser.ParseWithTokenSource call.

func (*ParserPool) ParseWithTokenSourceFactory added in v0.16.0

func (pp *ParserPool) ParseWithTokenSourceFactory(source []byte, factory TokenSourceFactory) (*Tree, error)

ParseWithTokenSourceFactory delegates to a pooled Parser.ParseWithTokenSourceFactory call.

type ParserPoolOption added in v0.7.0

type ParserPoolOption func(*parserPoolConfig)

ParserPoolOption configures a ParserPool.

func WithParserPoolAmbiguityProfile added in v0.17.0

func WithParserPoolAmbiguityProfile(profile *AmbiguityProfile) ParserPoolOption

WithParserPoolAmbiguityProfile installs an optional diagnostic ambiguity profile on checked-out parsers.

func WithParserPoolGLRTrace added in v0.7.0

func WithParserPoolGLRTrace(enabled bool) ParserPoolOption

WithParserPoolGLRTrace toggles GLR trace logs on pooled parser instances.

func WithParserPoolIncludedRanges added in v0.7.0

func WithParserPoolIncludedRanges(ranges []Range) ParserPoolOption

WithParserPoolIncludedRanges sets default include ranges for pooled parsers.

func WithParserPoolLogger added in v0.7.0

func WithParserPoolLogger(logger ParserLogger) ParserPoolOption

WithParserPoolLogger sets the logger applied to pooled parser instances.

func WithParserPoolTimeoutMicros added in v0.7.0

func WithParserPoolTimeoutMicros(timeoutMicros uint64) ParserPoolOption

WithParserPoolTimeoutMicros sets the parse timeout for pooled parsers.

type Pattern

type Pattern struct {
	// contains filtered or unexported fields
}

Pattern is a single top-level S-expression pattern in a query.

type PendingParentFieldRejectPayloadStats added in v0.19.0

type PendingParentFieldRejectPayloadStats struct {
	Unknown              uint64
	Visible              uint64
	VisibleFinalLike     uint64
	VisibleNestedPayload uint64
	VisibleCompactLeaf   uint64
	VisibleFieldedDesc   uint64
	HiddenEmpty          uint64
	HiddenOne            uint64
	HiddenMany           uint64
	HiddenWithFields     uint64
}

type PendingParentFieldRejectStats added in v0.19.0

type PendingParentFieldRejectStats struct {
	Unknown               uint64
	ParentHidden          uint64
	NoIDs                 uint64
	Inherited             uint64
	HiddenChild           uint64
	HiddenChildPlain      uint64
	HiddenChildPlainEmpty uint64
	HiddenChildPlainOne   uint64
	HiddenChildPlainMany  uint64
	HiddenChildWithFields uint64
	Child                 uint64
	AllVisibleDirect      uint64
}

type PendingParentRejectStats added in v0.19.0

type PendingParentRejectStats struct {
	Unknown    uint64
	Empty      uint64
	ChildLimit uint64
	Alias      uint64
	RawSpan    uint64
	Fields     uint64
	Child      uint64
	Span       uint64
	Fill       uint64
}

type PerfCounters added in v0.6.0

type PerfCounters struct {
	MergeCalls                      uint64
	MergeDeadPruned                 uint64
	MergePerKeyOverflow             uint64
	MergeReplacements               uint64
	StackEquivalentCalls            uint64
	StackEquivalentTrue             uint64
	StackEqHashMissSkips            uint64
	StackCompareCalls               uint64
	ConflictRR                      uint64
	ConflictRS                      uint64
	ConflictOther                   uint64
	ForkCount                       uint64
	FirstConflictToken              uint64
	MaxConcurrentStacks             uint64
	LexBytes                        uint64
	LexTokens                       uint64
	ReuseNodesVisited               uint64
	ReuseNodesPushed                uint64
	ReuseNodesPopped                uint64
	ReuseCandidatesChecked          uint64
	ReuseSuccesses                  uint64
	ReuseLeafSuccesses              uint64
	ReuseNonLeafChecks              uint64
	ReuseNonLeafSuccesses           uint64
	ReuseNonLeafBytes               uint64
	ReuseNonLeafNoGoto              uint64
	ReuseNonLeafNoGotoTerm          uint64
	ReuseNonLeafNoGotoNt            uint64
	ReuseNonLeafStateMiss           uint64
	ReuseNonLeafStateZero           uint64
	MergeHashZero                   uint64
	GlobalCapCulls                  uint64
	GlobalCapCullDropped            uint64
	ReduceChainSteps                uint64
	ReduceChainMaxLen               uint64
	ReduceChainBreakMulti           uint64
	ReduceChainBreakShift           uint64
	ReduceChainBreakAccept          uint64
	ReduceChainHintCandidates       uint64
	ReduceChainHintTaken            uint64
	ReduceChainHintSteps            uint64
	ReduceChainHintTerminalOK       uint64
	ReduceChainHintTerminalMismatch uint64
	ReduceChainHintLimit            uint64
	ReduceChainHintDead             uint64
	ReduceChainHintUnexpected       uint64
	ParentChildPointers             uint64
	ReduceChildrenFastGSS           uint64
	ReduceChildrenAllVis            uint64
	ReduceChildrenNoAlias           uint64
	ReduceChildrenScratch           uint64
	ReduceScratchNoAlias            uint64
	ReduceScratchGeneral            uint64
	ForestReduceCalls               uint64
	ForestReduceZero                uint64
	ForestReduceLinearNoExtras      uint64
	ForestReduceDFS                 uint64
	ForestReduceDFSLinks            uint64
	ForestReduceDFSMultiLinkSteps   uint64
	ForestReduceDFSExtraLinks       uint64
	ForestReduceDFSVisits           uint64
	ForestReduceDFSPathEntries      uint64
	ForestReduceGotoHits            uint64
	ForestReduceGotoMisses          uint64
	ForestReduceMaxPathLen          uint64
	ForestReduceMaxChildCount       uint64
	ForestCoalesceCalls             uint64
	ForestCoalesceNewNodes          uint64
	ForestCoalesceLinkAppends       uint64
	ForestCoalesceDedupHits         uint64
	ForestCoalesceDedupReplacements uint64
	ForestCoalescePreCapDrops       uint64
	ForestCoalesceCapDrops          uint64
	ForestCoalesceCapReplacements   uint64
	ExtraNodes                      uint64
	ErrorNodes                      uint64
	MergeStacksInHist               [maxGLRStacks + 2]uint64
	MergeAliveHist                  [maxGLRStacks + 2]uint64
	MergeOutHist                    [maxGLRStacks + 2]uint64
	ForkActionsHist                 [8]uint64
	CloneTreeCalls                  uint64
	CloneTreePublicNodes            uint64
	CloneTreeFinalRefs              uint64
	CloneTreeCompactCopies          uint64
	CloneTreeChildRefs              uint64
	CloneOffsetCalls                uint64
	CloneOffsetPublicNodes          uint64
	CloneOffsetCopies               uint64
	CloneOffsetShifted              uint64
	NodeEditCalls                   uint64
	NodeEditNoopCalls               uint64
	NodeEditCompactRefs             uint64
	NodeEditShifted                 uint64
	NodeEditMarked                  uint64
	DenseMutationCalls              uint64
	DenseMutationDrains             uint64
	MutationChildRefCOW             uint64
}

func PerfCountersSnapshot added in v0.6.0

func PerfCountersSnapshot() PerfCounters

type Point

type Point struct {
	Row    uint32
	Column uint32
}

Point is a row/column position in source text.

type PointSkippableTokenSource

type PointSkippableTokenSource interface {
	ByteSkippableTokenSource
	SkipToByteWithPoint(offset uint32, pt Point) Token
}

PointSkippableTokenSource extends ByteSkippableTokenSource with a hint-based skip that avoids recomputing row/column from byte offset. During incremental parsing the reused node already carries its endpoint, so passing it directly eliminates the O(n) offset-to-point scan.

type Query

type Query struct {
	// contains filtered or unexported fields
}

Query holds compiled patterns parsed from a tree-sitter .scm query file. It can be executed against a syntax tree to find matching nodes and return captured names. Query is safe for concurrent use after construction.

func NewQuery

func NewQuery(source string, lang *Language) (*Query, error)

NewQuery compiles query source (tree-sitter .scm format) against a language. It returns an error if the query syntax is invalid or references unknown node types or field names.

func (*Query) CaptureCount added in v0.7.0

func (q *Query) CaptureCount() uint32

CaptureCount returns the number of unique capture names in this query.

func (*Query) CaptureNameForID added in v0.7.0

func (q *Query) CaptureNameForID(id uint32) (string, bool)

CaptureNameForID returns the capture name for the given capture id.

func (*Query) CaptureNames

func (q *Query) CaptureNames() []string

CaptureNames returns the list of unique capture names used in the query.

func (*Query) DisableCapture added in v0.7.0

func (q *Query) DisableCapture(name string)

DisableCapture removes captures with the given name from future query results. Matching behavior is unchanged; only returned captures are filtered.

func (*Query) DisablePattern added in v0.7.0

func (q *Query) DisablePattern(patternIndex uint32)

DisablePattern disables a pattern by index.

func (*Query) EndByteForPattern added in v0.7.0

func (q *Query) EndByteForPattern(patternIndex uint32) (uint32, bool)

EndByteForPattern returns the query-source end byte for patternIndex.

func (*Query) Exec

func (q *Query) Exec(node *Node, lang *Language, source []byte) *QueryCursor

Exec creates a streaming cursor over matches rooted at node.

func (*Query) Execute

func (q *Query) Execute(tree *Tree) []QueryMatch

Execute runs the query against a syntax tree and returns all matches.

func (*Query) ExecuteInto added in v0.10.2

func (q *Query) ExecuteInto(tree *Tree, dst []QueryMatch) []QueryMatch

ExecuteInto runs the query against a syntax tree, appending matches into dst and returning the updated slice. Callers can pre-allocate or reuse dst across calls to eliminate the per-call slice allocation from Execute.

Example:

var buf []QueryMatch
for _, tree := range trees {
    buf = q.ExecuteInto(tree, buf[:0])
    process(buf)
}

func (*Query) ExecuteNode

func (q *Query) ExecuteNode(node *Node, lang *Language, source []byte) []QueryMatch

ExecuteNode runs the query starting from a specific node.

source is required for text predicates (like #eq? / #match?); pass the originating source bytes for correct predicate evaluation.

func (*Query) IsPatternGuaranteedAtStep added in v0.7.0

func (q *Query) IsPatternGuaranteedAtStep(patternIndex uint32, stepIndex uint32) bool

IsPatternGuaranteedAtStep reports whether all steps through stepIndex are definite and non-quantified.

func (*Query) IsPatternNonLocal added in v0.7.0

func (q *Query) IsPatternNonLocal(patternIndex uint32) bool

IsPatternNonLocal reports whether the pattern can begin at multiple roots.

func (*Query) IsPatternRooted added in v0.7.0

func (q *Query) IsPatternRooted(patternIndex uint32) bool

IsPatternRooted reports whether the pattern has exactly one root step at depth 0. Rooted patterns start matching from a single concrete root.

func (*Query) PatternCount

func (q *Query) PatternCount() int

PatternCount returns the number of patterns in the query.

func (*Query) PredicatesForPattern added in v0.7.0

func (q *Query) PredicatesForPattern(patternIndex uint32) ([]QueryPredicate, bool)

PredicatesForPattern returns a copy of predicates attached to patternIndex.

func (*Query) StartByteForPattern added in v0.7.0

func (q *Query) StartByteForPattern(patternIndex uint32) (uint32, bool)

StartByteForPattern returns the query-source start byte for patternIndex.

func (*Query) StepIsDefinite added in v0.7.0

func (q *Query) StepIsDefinite(patternIndex uint32, stepIndex uint32) bool

StepIsDefinite reports whether a pattern step matches a definite symbol (i.e. not wildcard).

func (*Query) StringCount added in v0.7.0

func (q *Query) StringCount() uint32

StringCount returns the number of unique string literals in this query.

func (*Query) StringValueForID added in v0.7.0

func (q *Query) StringValueForID(id uint32) (string, bool)

StringValueForID returns the string literal for the given string id.

type QueryCapture

type QueryCapture struct {
	Name string
	Node *Node
	// TextOverride, when non-empty, replaces the node's source text for
	// downstream consumers. It is set by the #strip! directive.
	TextOverride string
}

QueryCapture is a single captured node within a match.

func (QueryCapture) Text added in v0.6.0

func (c QueryCapture) Text(source []byte) string

Text returns the effective text for this capture. If TextOverride is set (e.g. by the #strip! directive), it is returned. Otherwise the node's source text is returned.

func (QueryCapture) UTF16Range added in v0.16.0

func (c QueryCapture) UTF16Range(tree *Tree) (UTF16Range, bool)

UTF16Range returns this capture's node range in UTF-16 code-unit coordinates for trees produced by UTF-16 parse APIs.

type QueryCursor

type QueryCursor struct {
	// contains filtered or unexported fields
}

QueryCursor incrementally walks a node subtree and yields matches one by one. It is the streaming counterpart to Query.Execute and avoids materializing all matches up front. QueryCursor is not safe for concurrent use.

func (*QueryCursor) DidExceedMatchLimit added in v0.7.0

func (c *QueryCursor) DidExceedMatchLimit() bool

DidExceedMatchLimit reports whether query execution had additional matches beyond the configured match limit.

func (*QueryCursor) NextCapture

func (c *QueryCursor) NextCapture() (QueryCapture, bool)

NextCapture yields captures in match order by draining NextMatch results. This is a practical first-pass ordering: captures are returned in each match's capture order, then by subsequent matches in DFS match order.

func (*QueryCursor) NextMatch

func (c *QueryCursor) NextMatch() (QueryMatch, bool)

NextMatch yields the next query match from the cursor.

func (*QueryCursor) SetByteRange added in v0.6.0

func (c *QueryCursor) SetByteRange(startByte, endByte uint32)

SetByteRange restricts matches to nodes that intersect [startByte, endByte).

func (*QueryCursor) SetMatchLimit added in v0.7.0

func (c *QueryCursor) SetMatchLimit(limit uint32)

SetMatchLimit sets the maximum number of matches this cursor can return. A limit of 0 means unlimited.

func (*QueryCursor) SetMaxStartDepth added in v0.7.0

func (c *QueryCursor) SetMaxStartDepth(depth uint32)

SetMaxStartDepth limits the depth at which new matches can begin. Depth 0 means only the starting node passed to Exec.

func (*QueryCursor) SetPointRange added in v0.6.0

func (c *QueryCursor) SetPointRange(startPoint, endPoint Point)

SetPointRange restricts matches to nodes that intersect [startPoint, endPoint).

func (*QueryCursor) SetUTF16Range added in v0.16.0

func (c *QueryCursor) SetUTF16Range(tree *Tree, startCodeUnit, endCodeUnit uint32) bool

SetUTF16Range restricts matches to nodes that intersect the given UTF-16 code-unit range. tree must have been produced by a UTF-16 parse API.

type QueryMatch

type QueryMatch struct {
	PatternIndex int
	Captures     []QueryCapture
}

QueryMatch represents a successful pattern match with its captures.

func (QueryMatch) SetValues added in v0.6.0

func (m QueryMatch) SetValues(q *Query, key string) []string

SetValues returns the values of a #set! directive with the given key for a match's pattern, or nil if not present. This is used by InjectionParser to read injection.language metadata.

type QueryPredicate

type QueryPredicate struct {
	// contains filtered or unexported fields
}

QueryPredicate is a post-match constraint attached to a pattern. Supported forms:

  • (#eq? @a @b)
  • (#eq? @a "literal")
  • (#not-eq? @a @b)
  • (#not-eq? @a "literal")
  • (#match? @a "regex")
  • (#not-match? @a "regex")
  • (#lua-match? @a "lua-pattern")
  • (#any-of? @a "v1" "v2" ...)
  • (#not-any-of? @a "v1" "v2" ...)
  • (#any-eq? @a "literal"), (#any-eq? @a @b)
  • (#any-not-eq? @a "literal"), (#any-not-eq? @a @b)
  • (#any-match? @a "regex")
  • (#any-not-match? @a "regex")
  • (#has-ancestor? @a type ...)
  • (#not-has-ancestor? @a type ...)
  • (#has-parent? @a type ...)
  • (#not-has-parent? @a type ...)
  • (#is? ...), (#is-not? ...)
  • (#set! key value), (#offset! @cap ...)
  • (#count? @a op value) -- op: >, <, >=, <=, ==, !=
  • (#is-exported? @a)

type QueryStep

type QueryStep struct {
	// contains filtered or unexported fields
}

QueryStep is one matching instruction within a pattern.

type Range

type Range struct {
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
}

Range is a span of source text.

func DiffChangedRanges added in v0.6.0

func DiffChangedRanges(oldTree, newTree *Tree) []Range

DiffChangedRanges compares two syntax trees and returns the minimal ranges where syntactic structure differs. The old tree should have been edited (via Tree.Edit) to match the new tree's source positions before reparsing.

This is equivalent to C tree-sitter's ts_tree_get_changed_ranges().

func IncludedRangesForUTF16 added in v0.16.0

func IncludedRangesForUTF16(source []uint16, ranges []UTF16Range) ([]Range, bool)

IncludedRangesForUTF16 converts UTF-16 included ranges into the parser's internal UTF-8 byte ranges. The returned Range points use UTF-8 columns.

func IncludedRangesForUTF16Bytes added in v0.16.0

func IncludedRangesForUTF16Bytes(source []byte, order UTF16ByteOrder, ranges []UTF16Range) ([]Range, error)

IncludedRangesForUTF16Bytes converts endian-specific UTF-16 byte ranges into the parser's internal UTF-8 byte ranges. The returned Range points use UTF-8 columns.

type ReduceChainHint added in v0.19.0

type ReduceChainHint struct {
	StartState     StateID
	Lookahead      Symbol
	TerminalStates []StateID
	TerminalAction ReduceChainTerminalAction
	MaxSteps       uint16
}

ReduceChainHint describes a terminal-verified parser hot path for a deterministic reduce chain. The runtime still applies normal reduce semantics and stops before the terminal action; this metadata only lets it avoid repeated generic action dispatch for approved state/lookahead pairs.

type ReduceChainTerminalAction added in v0.19.0

type ReduceChainTerminalAction uint8

ReduceChainTerminalAction describes the action class expected after a generated reduce-chain hint finishes applying deterministic reductions.

const (
	ReduceChainTerminalNoAction ReduceChainTerminalAction = iota
	ReduceChainTerminalSingleReduce
	ReduceChainTerminalSingleShift
	ReduceChainTerminalSingleAccept
	ReduceChainTerminalSingleOther
	ReduceChainTerminalMulti
)

type ReduceChildPathRuntime added in v0.18.0

type ReduceChildPathRuntime struct {
	SlicesAllocated   uint64
	SlicesRetained    uint64
	SlicesDropped     uint64
	PointersAllocated uint64
	PointersRetained  uint64
	PointersDropped   uint64
}

type Rewriter added in v0.6.0

type Rewriter struct {
	// contains filtered or unexported fields
}

Rewriter collects source-text edits and applies them atomically. Edits target byte ranges (usually from Node.StartByte/EndByte). Apply returns new source bytes and InputEdit records for incremental reparsing. Rewriter is not safe for concurrent use.

func NewRewriter added in v0.6.0

func NewRewriter(source []byte) *Rewriter

NewRewriter creates a Rewriter for the given source text.

func (*Rewriter) Apply added in v0.6.0

func (r *Rewriter) Apply() (newSource []byte, edits []InputEdit, err error)

Apply sorts edits, validates no overlaps, applies them, and returns the new source bytes plus InputEdit records for incremental reparsing. Returns error if edits overlap.

func (*Rewriter) ApplyToTree added in v0.6.0

func (r *Rewriter) ApplyToTree(tree *Tree) ([]byte, error)

ApplyToTree is a convenience that calls Apply(), then tree.Edit() for each edit, returning the new source ready for ParseIncremental.

func (*Rewriter) Delete added in v0.6.0

func (r *Rewriter) Delete(node *Node)

Delete removes the source text covered by node.

func (*Rewriter) InsertAfter added in v0.6.0

func (r *Rewriter) InsertAfter(node *Node, text []byte)

InsertAfter inserts text immediately after node.

func (*Rewriter) InsertBefore added in v0.6.0

func (r *Rewriter) InsertBefore(node *Node, text []byte)

InsertBefore inserts text immediately before node.

func (*Rewriter) Replace added in v0.6.0

func (r *Rewriter) Replace(node *Node, newText []byte)

Replace replaces the source text covered by node with newText.

func (*Rewriter) ReplaceRange added in v0.6.0

func (r *Rewriter) ReplaceRange(startByte, endByte uint32, newText []byte)

ReplaceRange replaces bytes in [startByte, endByte) with newText.

type StateID

type StateID uint32

StateID is a parser state index. uint32 supports grammars with >65K states (e.g. COBOL with 67K states from 1071 rules).

type Symbol

type Symbol uint16

Symbol is a grammar symbol ID (terminal or nonterminal).

type SymbolMetadata

type SymbolMetadata struct {
	Name      string
	Visible   bool
	Named     bool
	Supertype bool
}

SymbolMetadata holds display information about a symbol.

type Tag

type Tag struct {
	Kind      string // e.g. "definition.function", "reference.call"
	Name      string // the captured symbol text
	Range     Range  // full span of the tagged node
	NameRange Range  // span of the @name capture
}

Tag represents a tagged symbol in source code, extracted by a Tagger. Kind follows tree-sitter convention: "definition.function", "reference.call", etc. Name is the captured symbol text (e.g., the function name).

type Tagger

type Tagger struct {
	// contains filtered or unexported fields
}

Tagger extracts symbol definitions and references from source code using tree-sitter tags queries. It is the tagging counterpart to Highlighter.

Tags queries use a convention where captures follow the pattern:

  • @name captures the symbol name (e.g., function identifier)
  • @definition.X or @reference.X captures the kind

Example query:

(function_declaration name: (identifier) @name) @definition.function
(call_expression function: (identifier) @name) @reference.call

func NewTagger

func NewTagger(lang *Language, tagsQuery string, opts ...TaggerOption) (*Tagger, error)

NewTagger creates a Tagger for the given language and tags query.

func (*Tagger) Tag

func (tg *Tagger) Tag(source []byte) []Tag

Tag parses source and returns all tags.

func (*Tagger) TagIncremental

func (tg *Tagger) TagIncremental(source []byte, oldTree *Tree) ([]Tag, *Tree)

TagIncremental re-tags source after edits to oldTree. Returns the tags and the new tree for subsequent incremental calls.

func (*Tagger) TagIncrementalUTF16 added in v0.16.0

func (tg *Tagger) TagIncrementalUTF16(source []uint16, oldTree *Tree) ([]UTF16Tag, *Tree)

TagIncrementalUTF16 re-tags UTF-16 source after edits to oldTree. Call oldTree.EditUTF16 before calling this.

func (*Tagger) TagIncrementalUTF16Bytes added in v0.16.0

func (tg *Tagger) TagIncrementalUTF16Bytes(source []byte, oldTree *Tree, order UTF16ByteOrder) ([]UTF16Tag, *Tree, error)

TagIncrementalUTF16Bytes is like TagIncrementalUTF16 for endian-specific UTF-16 bytes.

func (*Tagger) TagTree

func (tg *Tagger) TagTree(tree *Tree) []Tag

TagTree extracts tags from an already-parsed tree.

func (*Tagger) TagTreeUTF16 added in v0.16.0

func (tg *Tagger) TagTreeUTF16(tree *Tree) []UTF16Tag

TagTreeUTF16 extracts tags from an already-parsed UTF-16 tree.

func (*Tagger) TagUTF16 added in v0.16.0

func (tg *Tagger) TagUTF16(source []uint16) []UTF16Tag

TagUTF16 parses UTF-16 source and returns all tags with UTF-16 ranges.

func (*Tagger) TagUTF16Bytes added in v0.16.0

func (tg *Tagger) TagUTF16Bytes(source []byte, order UTF16ByteOrder) ([]UTF16Tag, error)

TagUTF16Bytes is like TagUTF16 for endian-specific UTF-16 bytes.

type TaggerOption

type TaggerOption func(*Tagger)

TaggerOption configures a Tagger.

func WithTaggerTokenSourceFactory

func WithTaggerTokenSourceFactory(factory func(source []byte) TokenSource) TaggerOption

WithTaggerTokenSourceFactory sets a factory function that creates a TokenSource for each Tag call.

type Token

type Token struct {
	Symbol     Symbol
	Text       string
	StartByte  uint32
	EndByte    uint32
	StartPoint Point
	EndPoint   Point
	Missing    bool
	// NoLookahead marks a synthetic EOF used to force EOF-table reductions
	// without consuming input, matching tree-sitter's lex_state = -1.
	NoLookahead bool
}

Token is a lexed token with position info.

type TokenSource

type TokenSource interface {
	// Next returns the next token. It should skip whitespace and comments
	// as appropriate for the language. Returns a zero-Symbol token at EOF.
	Next() Token
}

TokenSource provides tokens to the parser. This interface abstracts over different lexer implementations: the built-in DFA lexer (for hand-built grammars) or custom bridges like GoTokenSource (for real grammars where we can't extract the C lexer DFA).

type TokenSourceFactory added in v0.16.0

type TokenSourceFactory func(source []byte) (TokenSource, error)

TokenSourceFactory builds a token source for parser source bytes.

type TokenSourceRebuilder added in v0.7.0

type TokenSourceRebuilder interface {
	RebuildTokenSource(source []byte, lang *Language) (TokenSource, error)
}

TokenSourceRebuilder is an optional extension for token sources that can build a fresh equivalent token source for another source buffer. Result normalization uses this to reparse isolated fragments with the same lexer backend as the original parse.

type Tree

type Tree struct {
	// contains filtered or unexported fields
}

Tree holds a complete syntax tree along with its source text and language. Tree is safe for concurrent reads after construction. Edit and Release are not safe for concurrent use.

func NewTree

func NewTree(root *Node, source []byte, lang *Language) *Tree

NewTree creates a new Tree.

func (*Tree) ArenaBreakdown added in v0.18.0

func (t *Tree) ArenaBreakdown() (ArenaBreakdown, bool)

ArenaBreakdown returns optional arena/materialization attribution captured when EnableArenaBreakdown(true) was set before parsing.

func (*Tree) ChangedRanges added in v0.6.0

func (t *Tree) ChangedRanges() []Range

ChangedRanges converts this tree's recorded edits into changed source ranges. Overlapping ranges are coalesced.

func (*Tree) Copy added in v0.7.0

func (t *Tree) Copy() *Tree

Copy returns an independent copy of this tree.

The copied tree has distinct node objects, so subsequent Tree.Edit calls on either tree do not mutate the other's spans/dirty bits. Source bytes and language pointer are shared (read-only).

func (*Tree) DOT added in v0.7.0

func (t *Tree) DOT(lang *Language) string

DOT returns a DOT graph representation of this tree.

func (*Tree) DescendantForUTF16Range added in v0.16.0

func (t *Tree) DescendantForUTF16Range(startCodeUnit, endCodeUnit uint32) *Node

DescendantForUTF16Range returns the smallest descendant that fully contains the given UTF-16 code-unit range, or nil when no such descendant exists.

func (*Tree) Edit

func (t *Tree) Edit(edit InputEdit)

Edit records an edit on this tree. Call this before ParseIncremental to inform the parser which regions changed. The edit adjusts byte offsets and marks overlapping nodes as dirty so the incremental parser knows what to re-parse.

func (*Tree) EditUTF16 added in v0.16.0

func (t *Tree) EditUTF16(edit UTF16Edit, newSource []uint16) bool

EditUTF16 records a UTF-16 code-unit edit on a UTF-16 tree.

newSource is the full source after the edit; it is used to derive the internal UTF-8 endpoint for NewEndCodeUnit.

func (*Tree) Edits

func (t *Tree) Edits() []InputEdit

Edits returns the pending edits recorded on this tree.

func (*Tree) InputEditForUTF16 added in v0.16.0

func (t *Tree) InputEditForUTF16(edit UTF16Edit, newSource []uint16) (InputEdit, bool)

InputEditForUTF16 converts a UTF-16 code-unit edit into the parser's internal UTF-8 byte-coordinate edit. The tree must have been produced by ParseUTF16.

func (*Tree) Language

func (t *Tree) Language() *Language

Language returns the language used to parse this tree.

func (*Tree) NamedDescendantForUTF16Range added in v0.16.0

func (t *Tree) NamedDescendantForUTF16Range(startCodeUnit, endCodeUnit uint32) *Node

NamedDescendantForUTF16Range returns the smallest named descendant that fully contains the given UTF-16 code-unit range, or nil when no such descendant exists.

func (*Tree) ParseRuntime added in v0.6.0

func (t *Tree) ParseRuntime() ParseRuntime

ParseRuntime returns parser-loop diagnostics captured when this tree was built.

func (*Tree) ParseStopReason added in v0.6.0

func (t *Tree) ParseStopReason() ParseStopReason

ParseStopReason reports why parsing terminated.

func (*Tree) ParseStoppedEarly added in v0.6.0

func (t *Tree) ParseStoppedEarly() bool

ParseStoppedEarly reports whether parsing hit an early-stop condition.

func (*Tree) Release

func (t *Tree) Release()

Release decrements arena references held by this tree. After Release, the tree should be treated as invalid and not reused.

func (*Tree) RootNode

func (t *Tree) RootNode() *Node

RootNode returns the tree's root node.

func (*Tree) RootNodeWithOffset added in v0.7.0

func (t *Tree) RootNodeWithOffset(offsetBytes uint32, offsetExtent Point) *Node

RootNodeWithOffset returns a copy of the root node with all spans shifted by the provided byte and point offsets.

This mirrors tree-sitter C's root-node-with-offset behavior for callers that need to embed a parsed tree at a larger document offset.

func (*Tree) Source

func (t *Tree) Source() []byte

Source returns the original source text.

func (*Tree) SourceEncoding added in v0.16.0

func (t *Tree) SourceEncoding() InputEncoding

SourceEncoding returns the encoding used by the caller that produced this tree.

For UTF-16 parses, Source still returns the parser's canonical UTF-8 copy. Use SourceUTF16 and UTF16RangeForNode when caller-facing UTF-16 coordinates are needed.

func (*Tree) SourceUTF16 added in v0.16.0

func (t *Tree) SourceUTF16() []uint16

SourceUTF16 returns the original UTF-16 source for trees produced by ParseUTF16. It returns nil for ordinary UTF-8 parses.

func (*Tree) UTF8ByteForUTF16Offset added in v0.16.0

func (t *Tree) UTF8ByteForUTF16Offset(offset uint32) (uint32, bool)

UTF8ByteForUTF16Offset converts a UTF-16 code-unit offset to the parser's canonical UTF-8 byte offset for trees produced by ParseUTF16.

func (*Tree) UTF16OffsetForByte added in v0.16.0

func (t *Tree) UTF16OffsetForByte(offset uint32) (uint32, bool)

UTF16OffsetForByte converts a parser UTF-8 byte offset to a UTF-16 code-unit offset for trees produced by ParseUTF16.

func (*Tree) UTF16PointForByte added in v0.16.0

func (t *Tree) UTF16PointForByte(offset uint32) (Point, bool)

UTF16PointForByte converts a parser UTF-8 byte offset to a UTF-16 point.

func (*Tree) UTF16RangeForByteRange added in v0.16.0

func (t *Tree) UTF16RangeForByteRange(startByte, endByte uint32) (UTF16Range, bool)

UTF16RangeForByteRange converts a canonical UTF-8 byte range into UTF-16 code-unit coordinates.

func (*Tree) UTF16RangeForNode added in v0.16.0

func (t *Tree) UTF16RangeForNode(n *Node) (UTF16Range, bool)

UTF16RangeForNode returns a node range in UTF-16 code-unit coordinates.

func (*Tree) UTF16RangeForRange added in v0.16.0

func (t *Tree) UTF16RangeForRange(r Range) (UTF16Range, bool)

UTF16RangeForRange converts a canonical UTF-8 Range into UTF-16 code-unit coordinates.

func (*Tree) UTF16SourceForNode added in v0.16.0

func (t *Tree) UTF16SourceForNode(n *Node) ([]uint16, bool)

UTF16SourceForNode returns the original UTF-16 code units covered by n.

func (*Tree) WriteDOT added in v0.7.0

func (t *Tree) WriteDOT(w io.Writer, lang *Language) error

WriteDOT writes a DOT graph representation of this tree to w.

type TreeCursor added in v0.6.0

type TreeCursor struct {
	// contains filtered or unexported fields
}

TreeCursor provides stateful, O(1) tree navigation. It maintains a stack of (node, childIndex) frames enabling efficient parent, child, and sibling movement without scanning.

The cursor holds pointers to Nodes. If the underlying Tree is released, edited, or replaced via incremental reparse, the cursor should be recreated.

func NewTreeCursor added in v0.6.0

func NewTreeCursor(node *Node, tree *Tree) *TreeCursor

NewTreeCursor creates a cursor starting at the given node. The optional tree reference enables field name resolution and text extraction.

func NewTreeCursorFromTree added in v0.6.0

func NewTreeCursorFromTree(tree *Tree) *TreeCursor

NewTreeCursorFromTree creates a cursor starting at the tree's root node.

func (*TreeCursor) Copy added in v0.6.0

func (c *TreeCursor) Copy() *TreeCursor

Copy returns an independent copy of the cursor. The copy shares the same tree reference but has its own navigation stack.

func (*TreeCursor) CurrentFieldID added in v0.6.0

func (c *TreeCursor) CurrentFieldID() FieldID

CurrentFieldID returns the field ID of the current node within its parent. Returns 0 if the cursor is at the root or the node has no field assignment.

func (*TreeCursor) CurrentFieldName added in v0.6.0

func (c *TreeCursor) CurrentFieldName() string

CurrentFieldName returns the field name of the current node within its parent. Returns "" if no tree is associated, the cursor is at the root, or the node has no field assignment.

func (*TreeCursor) CurrentNode added in v0.6.0

func (c *TreeCursor) CurrentNode() *Node

CurrentNode returns the node the cursor is currently pointing to.

func (*TreeCursor) CurrentNodeIsNamed added in v0.6.0

func (c *TreeCursor) CurrentNodeIsNamed() bool

CurrentNodeIsNamed returns whether the current node is a named node.

func (*TreeCursor) CurrentNodeText added in v0.6.0

func (c *TreeCursor) CurrentNodeText() string

CurrentNodeText returns the source text of the current node. Requires a tree with source to be associated.

func (*TreeCursor) CurrentNodeType added in v0.6.0

func (c *TreeCursor) CurrentNodeType() string

CurrentNodeType returns the type name of the current node. Requires a tree with a language to be associated.

func (*TreeCursor) Depth added in v0.6.0

func (c *TreeCursor) Depth() int

Depth returns the cursor's current depth (0 at the root).

func (*TreeCursor) GotoChildByFieldID added in v0.6.0

func (c *TreeCursor) GotoChildByFieldID(fid FieldID) bool

GotoChildByFieldID moves the cursor to the first child with the given field ID. Returns false if no child has that field.

func (*TreeCursor) GotoChildByFieldName added in v0.6.0

func (c *TreeCursor) GotoChildByFieldName(name string) bool

GotoChildByFieldName moves the cursor to the first child with the given field name. Returns false if the tree has no language, the field name is unknown, or no child has that field.

func (*TreeCursor) GotoFirstChild added in v0.6.0

func (c *TreeCursor) GotoFirstChild() bool

GotoFirstChild moves the cursor to the first child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoFirstChildForByte added in v0.6.0

func (c *TreeCursor) GotoFirstChildForByte(targetByte uint32) int64

GotoFirstChildForByte moves the cursor to the first child whose byte range contains targetByte (i.e., first child where endByte > targetByte). Returns the child index, or -1 when no child contains the byte.

func (*TreeCursor) GotoFirstChildForPoint added in v0.6.0

func (c *TreeCursor) GotoFirstChildForPoint(targetPoint Point) int64

GotoFirstChildForPoint moves the cursor to the first child whose point range contains targetPoint (i.e., first child where endPoint > targetPoint). Returns the child index, or -1 when no child contains the point.

func (*TreeCursor) GotoFirstNamedChild added in v0.6.0

func (c *TreeCursor) GotoFirstNamedChild() bool

GotoFirstNamedChild moves the cursor to the first named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoLastChild added in v0.6.0

func (c *TreeCursor) GotoLastChild() bool

GotoLastChild moves the cursor to the last child of the current node. Returns false if the current node has no children.

func (*TreeCursor) GotoLastNamedChild added in v0.6.0

func (c *TreeCursor) GotoLastNamedChild() bool

GotoLastNamedChild moves the cursor to the last named child of the current node, skipping anonymous nodes. Returns false if no named child exists.

func (*TreeCursor) GotoNextNamedSibling added in v0.6.0

func (c *TreeCursor) GotoNextNamedSibling() bool

GotoNextNamedSibling moves the cursor to the next named sibling, skipping anonymous nodes. Returns false if no named sibling follows.

func (*TreeCursor) GotoNextSibling added in v0.6.0

func (c *TreeCursor) GotoNextSibling() bool

GotoNextSibling moves the cursor to the next sibling. Returns false if the cursor is at the root or the last sibling.

func (*TreeCursor) GotoParent added in v0.6.0

func (c *TreeCursor) GotoParent() bool

GotoParent moves the cursor to the parent of the current node. Returns false if the cursor is at the root.

func (*TreeCursor) GotoPrevNamedSibling added in v0.6.0

func (c *TreeCursor) GotoPrevNamedSibling() bool

GotoPrevNamedSibling moves the cursor to the previous named sibling, skipping anonymous nodes. Returns false if no named sibling precedes.

func (*TreeCursor) GotoPrevSibling added in v0.6.0

func (c *TreeCursor) GotoPrevSibling() bool

GotoPrevSibling moves the cursor to the previous sibling. Returns false if the cursor is at the root or the first sibling.

func (*TreeCursor) Reset added in v0.6.0

func (c *TreeCursor) Reset(node *Node)

Reset resets the cursor to a new root node, clearing the navigation stack.

func (*TreeCursor) ResetTree added in v0.6.0

func (c *TreeCursor) ResetTree(tree *Tree)

ResetTree resets the cursor to the root of a new tree.

type UTF16ByteOrder added in v0.16.0

type UTF16ByteOrder uint8

UTF16ByteOrder identifies the byte order used by a UTF-16 byte source.

const (
	UTF16LittleEndian UTF16ByteOrder = iota
	UTF16BigEndian
)

func (UTF16ByteOrder) String added in v0.16.0

func (o UTF16ByteOrder) String() string

type UTF16Edit added in v0.16.0

type UTF16Edit struct {
	StartCodeUnit  uint32
	OldEndCodeUnit uint32
	NewEndCodeUnit uint32
}

UTF16Edit describes a source edit in UTF-16 code-unit offsets.

type UTF16HighlightRange added in v0.16.0

type UTF16HighlightRange struct {
	StartCodeUnit uint32
	EndCodeUnit   uint32
	StartPoint    Point
	EndPoint      Point
	Capture       string
	PatternIndex  int
}

UTF16HighlightRange is a styled source range in UTF-16 code-unit coordinates.

type UTF16Injection added in v0.16.0

type UTF16Injection struct {
	// Language is the detected language name (e.g., "javascript").
	Language string
	// Tree is the parse tree for this region, or nil if the language
	// was not registered.
	Tree *Tree
	// Ranges are the source ranges this tree covers in UTF-16 code units.
	Ranges []UTF16Range
	// Node is the parent tree node that triggered the injection.
	Node *Node
}

UTF16Injection is a single embedded language region with ranges in UTF-16 code-unit coordinates.

type UTF16InjectionResult added in v0.16.0

type UTF16InjectionResult struct {
	// Tree is the parent language's parse tree.
	Tree *Tree
	// Injections contains child language parse results, ordered by position.
	Injections []UTF16Injection
	// contains filtered or unexported fields
}

UTF16InjectionResult holds parse results for a UTF-16 multi-language document. Injection ranges are expressed in UTF-16 code units.

type UTF16Range added in v0.16.0

type UTF16Range struct {
	StartCodeUnit uint32
	EndCodeUnit   uint32
	StartPoint    Point
	EndPoint      Point
}

UTF16Range is a source range in UTF-16 code units.

StartPoint and EndPoint use UTF-16 code-unit columns, matching the coordinate system used by many editors and LSP clients.

type UTF16Tag added in v0.16.0

type UTF16Tag struct {
	Kind      string
	Name      string
	Range     UTF16Range
	NameRange UTF16Range
}

UTF16Tag represents a tagged symbol with ranges in UTF-16 code-unit coordinates.

type WalkAction

type WalkAction int

WalkAction controls the tree walk behavior.

const (
	// WalkContinue continues the walk to children and siblings.
	WalkContinue WalkAction = iota
	// WalkSkipChildren skips the current node's children but continues to siblings.
	WalkSkipChildren
	// WalkStop terminates the walk entirely.
	WalkStop
)

Source Files

Directories

Path Synopsis
cgo_harness module
cmd
benchgate command
benchmatrix command
gen_linguist command
Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.
Command gen_linguist generates grammars/linguist_gen.go by matching gotreesitter grammar names to GitHub Linguist's languages.yml.
gen_subset_blob_embeds command
Command gen_subset_blob_embeds generates the per-language z_subset_blob_embed_<lang>.go files that power embedded grammar_subset builds (issue #88: per-language compile-time grammar selection).
Command gen_subset_blob_embeds generates the per-language z_subset_blob_embed_<lang>.go files that power embedded grammar_subset builds (issue #88: per-language compile-time grammar selection).
grammar_update_guard command
Command grammar_update_guard checks lock-update reports for scanner-facing changes that require hand-written scanner review before grammar blobs move.
Command grammar_update_guard checks lock-update reports for scanner-facing changes that require hand-written scanner review before grammar blobs move.
grammar_updater command
Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.
Command grammar_updater refreshes pinned grammar commits in grammars/languages.lock and emits a machine-readable update report.
grammarblobprobe command
Command grammarblobprobe is a minimal binary that blank-imports the grammars package so that whatever grammar blobs are embedded by the active build tags are linked into the binary.
Command grammarblobprobe is a minimal binary that blank-imports the grammars package so that whatever grammar blobs are embedded by the active build tags are linked into the binary.
grammargen command
Command grammargen generates tree-sitter parser artifacts from grammar definitions.
Command grammargen generates tree-sitter parser artifacts from grammar definitions.
harnessgate command
parity_report command
ts2go command
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
Command ts2go reads a tree-sitter generated parser.c file and outputs a Go source file containing a function that returns a populated *gotreesitter.Language with all extracted parse tables.
tsquery command
Command tsquery generates type-safe Go code from tree-sitter .scm query files.
Command tsquery generates type-safe Go code from tree-sitter .scm query files.
Package grammargen implements a pure-Go grammar generator for gotreesitter.
Package grammargen implements a pure-Go grammar generator for gotreesitter.
Package grammars provides built-in and extension tree-sitter grammars with lazy loading.
Package grammars provides built-in and extension tree-sitter grammars with lazy loading.
Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.
Package grep provides structural code search, match, and rewrite using tree-sitter parse trees.
Package taproot is the common front-end harness shared by M31 DSLs that use the gotreesitter runtime.
Package taproot is the common front-end harness shared by M31 DSLs that use the gotreesitter runtime.
diag
Package diag provides a generic structured diagnostic type and a source-quoting renderer.
Package diag provides a generic structured diagnostic type and a source-quoting renderer.
wasm
grammargen command
runtime command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL