parser

package
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2026 License: MIT Imports: 2 Imported by: 0

Documentation

Overview

Package parser defines the engine-agnostic parsing port and the concrete syntax tree (CST) data-transfer objects Båge uses to locate byte ranges.

This drop is interface + DTOs only. The official CGO go-tree-sitter adapter that implements ParserPort lands in a later, dependency-gated drop per docs/adr/0002. Nothing here depends on cgo or any third-party package.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type ByteRange

type ByteRange struct {
	// Start is the inclusive starting byte offset.
	Start int
	// End is the exclusive ending byte offset.
	End int
}

ByteRange is a half-open [Start, End) span of byte offsets within a source file.

type InputEdit

type InputEdit struct {
	// StartByte is the byte offset where the edit begins.
	StartByte int
	// OldEndByte is the byte offset where the replaced region ended.
	OldEndByte int
	// NewEndByte is the byte offset where the replacement region ends.
	NewEndByte int
	// StartPoint is the point for StartByte.
	StartPoint Point
	// OldEndPoint is the point for OldEndByte.
	OldEndPoint Point
	// NewEndPoint is the point for NewEndByte.
	NewEndPoint Point
}

InputEdit describes a single text edit for incremental reparsing, in the shape tree-sitter expects: byte offsets plus the corresponding points.

type Lang

type Lang int

Lang enumerates the source languages a ParserPort adapter may parse.

The zero value is LangUnknown so an unset Lang is explicitly invalid rather than silently selecting a grammar.

const (
	// LangUnknown is the zero value and selects no grammar.
	LangUnknown Lang = iota
	// LangGo selects the Go grammar.
	LangGo
	// LangTypeScript selects the TypeScript grammar.
	LangTypeScript
	// LangTSX selects the TSX (TypeScript + JSX) grammar.
	LangTSX
	// LangJavaScript selects the JavaScript grammar.
	LangJavaScript
	// LangPython selects the Python grammar.
	LangPython
	// LangRust selects the Rust grammar.
	LangRust
	// LangJava selects the Java grammar.
	LangJava
	// LangC selects the C grammar.
	LangC
	// LangCPP selects the C++ grammar.
	LangCPP
	// LangCSharp selects the C# grammar.
	LangCSharp
	// LangRuby selects the Ruby grammar.
	LangRuby
	// LangJSON selects the JSON grammar.
	LangJSON
	// LangHTML selects the HTML grammar.
	LangHTML
	// LangCSS selects the CSS grammar.
	LangCSS
	// LangYAML selects the YAML grammar.
	LangYAML
	// LangTOML selects the TOML grammar.
	LangTOML
	// LangXML selects the XML grammar.
	LangXML
	// LangMakefile selects the Make grammar.
	LangMakefile
	// LangBash selects the Bash/shell grammar.
	LangBash
	// LangMarkdown selects the Markdown (block) grammar.
	LangMarkdown
	// LangText selects the grammar-free text fallback: the whole file is one
	// node (with line children). It guarantees any file type (MDX, Dockerfile,
	// SCSS, .txt, …) round-trips losslessly and is byte-anchorable even with no
	// registered grammar.
	LangText
)

func LangForPath

func LangForPath(path string) Lang

LangForPath selects a Lang from a file path by extension, with a few extensionless build files keyed by basename (Makefile, Dockerfile, …). Unknown or grammar-less types resolve to LangText, the grammar-free fallback, so an agent IDE can always open and losslessly round-trip ANY file. It never returns LangUnknown. Matching is case-insensitive on the extension.

func (Lang) String

func (l Lang) String() string

String returns the lowercase canonical name of the language, or "unknown" for unrecognized values (including the zero value).

type Node

type Node struct {
	// Kind is the grammar node type (e.g. "function_declaration").
	Kind string
	// StartByte is the inclusive starting byte offset of the node.
	StartByte int
	// EndByte is the exclusive ending byte offset of the node.
	EndByte int
	// StartPoint is the row/column position of StartByte.
	StartPoint Point
	// EndPoint is the row/column position of EndByte.
	EndPoint Point
	// Named reports whether the node is a named node (vs. an anonymous token).
	Named bool
	// Missing reports whether the node is a MISSING node — a zero-width node the
	// parser inserts to recover from a syntax error (e.g. an absent closing
	// brace). A MISSING node has a normal Kind, so this flag is the only way to
	// distinguish it from a genuine token; parse-health diagnostics surface it
	// alongside ERROR-kind nodes (SPEC §10.5).
	Missing bool
	// Children are the node's direct child nodes in source order.
	Children []*Node
}

Node is a single concrete-syntax-tree node addressed by byte range and point.

type ParserPort

type ParserPort interface {
	// Parse parses src under lang into a fresh Tree.
	Parse(ctx context.Context, lang Lang, src []byte) (*Tree, error)
	// ParseIncremental reparses src under lang, reusing old after applying edit.
	ParseIncremental(ctx context.Context, lang Lang, src []byte, old *Tree, edit InputEdit) (*Tree, error)
	// ChangedRanges reports the byte ranges that differ between old and new.
	ChangedRanges(old, new *Tree) []ByteRange
}

ParserPort is the engine-agnostic contract for parsing source into a Tree and reparsing incrementally. Adapters (e.g. the CGO go-tree-sitter binding) must implement it; the rest of Båge depends only on this interface.

type Point

type Point struct {
	// Row is the zero-based line number.
	Row int
	// Col is the zero-based byte offset within the row.
	Col int
}

Point is a zero-based row/column position within a source file. Col is a byte offset within the row, consistent with tree-sitter's point semantics.

type Tree

type Tree struct {
	// Root is the root node of the tree.
	Root *Node
	// Source is the byte slice the tree was parsed from.
	Source []byte
	// Native is an opaque, adapter-owned engine handle; nil for engine-free
	// trees (e.g. test fakes). Consumers never inspect it.
	Native any
}

Tree is a parsed concrete syntax tree together with the source bytes it was parsed from.

Root and Source are fully materialized and independent of any engine: a consumer may use them after the underlying engine tree is freed. Native is an opaque, adapter-owned handle to the engine's native tree (e.g. a *tree_sitter.Tree), retained only so an adapter can reuse it for incremental reparsing and changed-range queries. Consumers MUST treat Native as opaque and MUST call Close when done so the adapter can free any native (C) resources.

func (*Tree) Close

func (t *Tree) Close()

Close releases any native resources held by the tree's adapter handle. It is idempotent: after the first call the native handle is released and cleared, so subsequent calls (and calls on a nil tree or a tree with no native handle) are no-ops. This matters because some engine handles (e.g. the CGO tree-sitter tree) double-free if their own Close is called twice. Close is not safe for concurrent use; callers serialize per tree (SPEC §7, one writer per file).

Directories

Path Synopsis
Package treesitter implements parser.ParserPort with the official CGO go-tree-sitter bindings (docs/adr/0002).
Package treesitter implements parser.ParserPort with the official CGO go-tree-sitter bindings (docs/adr/0002).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL