goparser

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2026 License: BSD-3-Clause Imports: 19 Imported by: 0

README

Parser

See if we can build an almost single pass compiler for Go doing syntax directed translation, without any complex data structure (no syntax tree), only lists of tokens.

The goal is to have the shortest and simplest path from source to bytecode.

Design

The input of parser is a list of tokens produced by the scanner. Multiple tokens are processed at once. The minimal set to get meaningful results (not an error or nil) is a complete statement.

The output of parser is also a list of tokens, to be consumed by the compiler to produce bytecode. The output tokens set is identical to the bytecode instructions set except that:

  • code locations may be provided as labels instead of numerical values,
  • memory locations for constants and variables may be provided as symbol names instead of numerical values.

Status

Go language support:

  • named functions
  • anonymous functions (closures)
  • methods
  • internal function calls
  • external function calls (calling runtime symbols in interpreter)
  • export to runtime
  • builtin calls (new, make, copy, delete, len, cap, ...)
  • out of order declarations
  • arbirtrary precision constants
  • basic types
  • complete numeric types
  • complex numbers
  • function types
  • variadic functions
  • pointers
  • structures
  • embedded structures
  • recursive structures
  • literal composite objects
  • interfaces
  • arrays, slices
  • maps
  • deterministic maps
  • channel types
  • channel operations
  • [x]� multi-assign expressions
  • var defined by assign :=
  • var assign =
  • var declaration
  • type declaration
  • func declaration
  • const declaration
  • iota expression
  • panic statement
  • defer statement
  • recover statement
  • range clause
  • go statement
  • if statement (including else and else if)
  • for statement
  • switch statement
  • type switch statement
  • break statement
  • continue statement
  • fallthrough statement
  • goto statement
  • label statement
  • select statement
  • binary operators
  • unary operators
  • logical operators && and ||
  • assign operators
  • operator precedence rules
  • parenthesis expressions
  • call expressions
  • index expressions
  • selector expressions
  • slice expressions
  • type convertions
  • type assertions
  • parametric types (generic)
  • type parametric functions (generic)
  • type constraints (generic)
  • type checking
  • comment pragmas
  • package import
  • init functions
  • modules

Other items:

  • REPL
  • multiline statements in REPL
  • completion, history in REPL
  • eval strings
  • eval files (including stdin, ...)
  • debug traces for scanner, parser, compiler, bytecode vm
  • simple interpreter tests to exercise from source to execution
  • compile time error detection and diagnosis
  • stack dump
  • symbol tables, data tables, code binded to source lines
  • interactive debugger: breaks, continue, instrospection, ...
  • machine level debugger
  • source level debugger
  • replay debugger, backward instruction execution
  • vm monitor: live memory / code display during run
  • stdlib wrappers a la yaegi
  • system and environment sandboxing
  • build constraints (arch, sys, etc)
  • test command (running go test / benchmark / example files)
  • skipping / including test files
  • test coverage
  • fuzzy tests for scanner, vm, ...

Documentation

Overview

Package goparser implements a structured parser for Go.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrEllipsisArray  = errors.New("[...] array")
	ErrFuncType       = errors.New("invalid function type")
	ErrInvalidType    = errors.New("invalid type")
	ErrMissingType    = errors.New("missing type")
	ErrSize           = errors.New("invalid size")
	ErrSyntax         = errors.New("syntax error")
	ErrNotImplemented = errors.New("not implemented")
)

Type parsing error definitions.

Functions

func ConstConvert added in v0.4.0

func ConstConvert(cv constant.Value, typ *vm.Type) constant.Value

ConstConvert converts a constant to the representation of typ (Go's typed constant conversion rules). Exported for the compiler's expression folder.

func ConstFromExact added in v0.4.0

func ConstFromExact(s string) constant.Value

ConstFromExact reconstructs a constant from the exact textual form produced by go/constant.Value.ExactString(): a decimal integer, a float literal, or a "num/den" rational. It is the inverse used to recover high-precision bridged constants (see stdlib.ConstValues). Returns nil if s is not parseable.

func DefaultConstType added in v0.4.0

func DefaultConstType(c constant.Value, syms symbol.SymMap) *vm.Type

DefaultConstType returns the default type of an untyped constant (Go spec "Constants"): int, float64, string or bool, resolved through the given symbol table. Shared by the const-decl evaluator and the compiler's expression folder.

func FoldBinary added in v0.4.0

func FoldBinary(op lang.Token, x constant.Value, xtyp *vm.Type, y constant.Value, ytyp *vm.Type) (constant.Value, *vm.Type, bool)

FoldBinary folds a binary constant operation following Go's untyped-constant rules, delegating to go/constant. xtyp/ytyp are the operand types (nil for untyped) and drive the unsigned right-shift reinterpretation and the result type. ok is false when the operation is not a valid constant operation (invalid shift count, or integer/float division or remainder by zero); the caller then falls back to a runtime op (compiler) or reports an error (const-decl evaluator). Shared so both paths fold identically.

func FoldUnary added in v0.4.0

func FoldUnary(op lang.Token, x constant.Value, xtyp *vm.Type) (constant.Value, *vm.Type, bool)

FoldUnary folds a unary constant operation (+, -, !, ^) via go/constant. xtyp drives the width-limited complement for typed unsigned constants. ok is currently always true; it is returned for symmetry with FoldBinary.

func IsExported

func IsExported(name string) bool

IsExported reports whether the given name starts with an upper-case letter.

func MatchFileName

func MatchFileName(name string, ctx *buildContext) bool

MatchFileName reports whether name (a .go file basename) matches the given build context's GOOS/GOARCH constraints encoded in the file name.

func MatchFileNameFor

func MatchFileNameFor(name, goos, goarch string) bool

MatchFileNameFor reports whether name matches the given GOOS/GOARCH constraints encoded in the file name. It is like MatchFileName but for an explicit platform.

func OverflowsType added in v0.4.0

func OverflowsType(cv constant.Value, typ *vm.Type) bool

OverflowsType reports whether the integer constant cv cannot be represented in the integer type typ -- the Go representability rule the gc compiler enforces as a compile error (e.g. `int8(200)` or `const x uint8 = 256`). It is meant to be called only at explicit-type points (conversions, typed const declarations); untyped constant arithmetic must NOT use it (1<<63 etc. are valid untyped). Non-integer constants and non-integer types return false (truncation and float range are handled elsewhere / left unconstrained).

func PackageName

func PackageName(importPath string) string

PackageName returns the identifier used to reference the package given its import path: the last segment, or the second-to-last when the last matches "v[0-9]*" (module versioning suffix).

func QualifyName added in v0.2.0

func QualifyName(pkg, name string) string

QualifyName composes the canonical pkg-qualified symbol-table key for a top-level name. For pointer-receiver method names ("*Tag.M"), the '*' moves to the very front of the key ("*<pkg>.Tag.M") so the standard pointer- method composition `"*"+typeKey+"."+method` still produces the same key. Exported so the comp package (qualifyLabel) shares the exact composition.

func RegisterGenericShim added in v0.2.0

func RegisterGenericShim(pkg, source string, nativeRefs []string)

RegisterGenericShim queues an interpreted-source shim to install into pkg's symbol table the next time a Parser populates that package (typically from ImportPackageValues). source must start with `package <pkg>`. nativeRefs lists names referenced bare by source from pkg's exports: those are pre-seeded into p.Symbols at qualified keys so symGet's importingPkg / CompilingPkg fallback resolves them at signature parse and body instantiation time.

Safe to call from package init; the registry persists for the process lifetime and is consulted by every parser created thereafter.

func TypedConstValue added in v0.4.0

func TypedConstValue(cv constant.Value, typ *vm.Type) any

TypedConstValue materializes a constant into a Go value of typ (or its default kind when typ is nil). Exported for the compiler's expression folder.

Types

type DeferredDecl added in v0.2.0

type DeferredDecl struct {
	PkgPath string
	Toks    Tokens
}

DeferredDecl is a top-level declaration (func body, var initializer) whose code generation is deferred to Phase 2, tagged with the import path of the package it came from ("" for the main package / REPL). The tag lets Phase 2 resolve unqualified identifiers against the originating package's symbols, which matters when a sibling import shadowed a bare name in the symbol table.

type ErrConstOverflow added in v0.4.0

type ErrConstOverflow struct {
	Value string
	Type  string
	Loc   string
	Pos   int
}

ErrConstOverflow reports a constant that cannot be represented in its type -- the gc "constant X overflows T" compile error. It is a hard parse error so ParseAll does not skip past it (which would otherwise mask it as a later "undefined" error). ErrPos lets the diagnostic chokepoint render a snippet.

func (ErrConstOverflow) ErrPos added in v0.4.0

func (e ErrConstOverflow) ErrPos() int

ErrPos exposes the source offset so the diagnostic chokepoint can render a snippet.

func (ErrConstOverflow) Error added in v0.4.0

func (e ErrConstOverflow) Error() string

type ErrRedeclared added in v0.4.0

type ErrRedeclared struct {
	Name string
	Loc  string
	Pos  int
}

ErrRedeclared reports a second top-level declaration of a name within one compilation unit (the gc "X redeclared in this block" error). It is a hard error, not a parser limitation, so resolveDecls propagates it rather than skipping the decl (which would let Phase 2 emit a duplicate function label and hang the VM). ErrPos lets the diagnostic chokepoint render a snippet.

func (ErrRedeclared) ErrPos added in v0.4.0

func (e ErrRedeclared) ErrPos() int

ErrPos exposes the source offset so the diagnostic chokepoint can render a snippet.

func (ErrRedeclared) Error added in v0.4.0

func (e ErrRedeclared) Error() string

type ErrUndefined

type ErrUndefined struct {
	Name string
	Loc  string // optional "file:line:col" source position
	Pos  int    // optional global source offset, for snippet rendering
}

ErrUndefined is returned during parsing when a referenced symbol is not yet defined. It is retryable: the lazy fixpoint loop in interp.Eval defers the declaration and retries after other declarations have been processed.

func (ErrUndefined) ErrPos added in v0.3.0

func (e ErrUndefined) ErrPos() int

ErrPos exposes the source offset so a diagnostic chokepoint (interp.Eval) can render a source snippet. Returns 0 when no position was attached.

func (ErrUndefined) Error

func (e ErrUndefined) Error() string

type PackageSource added in v0.2.0

type PackageSource struct {
	Name    string // basename (e.g. "uuid.go")
	Content string
}

PackageSource is a single .go file's basename and content as loaded by LoadPackageSources.

type Parser

type Parser struct {
	*scan.Scanner

	Symbols  symbol.SymMap
	Packages map[string]*symbol.Package

	CompilingPkg string // while a deferred decl is being parsed/compiled in Phase 2: its origin package's import path ("" = main/REPL); makes unqualified type/name lookups prefer that package's symbols (see symGet, comp.Compiler.symAt)

	InitFuncs []string // ordered list of init function internal names
	// contains filtered or unexported fields
}

Parser represents the state of a parser.

func NewParser

func NewParser(spec *lang.Spec, noPkg bool) *Parser

NewParser returns a new parser.

func (*Parser) ImportPackageConsts added in v0.4.0

func (p *Parser) ImportPackageConsts(m map[string]map[string]string)

ImportPackageConsts attaches high-precision constant values to already-imported packages, so the compiler can fold constant expressions involving bridged floats at full precision. Call it after ImportPackageValues.

func (*Parser) ImportPackageValues

func (p *Parser) ImportPackageValues(m map[string]map[string]reflect.Value)

ImportPackageValues populates packages with values.

func (*Parser) LoadPackageSources added in v0.2.0

func (p *Parser) LoadPackageSources(importPath string, includeTests bool) ([]PackageSource, error)

LoadPackageSources returns the .go files of the given package import path (a directory in the FS chain pkgfs -> stdlibfs -> remotefs), filtered by build tags (file-name and //go:build directives). When includeTests is false, _test.go files are skipped (matching `import "X"` resolution); pass true to include them (used by `mvm test <importpath>`).

Result order matches fs.ReadDir, which is sorted by filename.

func (*Parser) ParseAll

func (p *Parser) ParseAll(name, src string) (out []DeferredDecl, err error)

ParseAll parses code and its dependencies, and returns the still-to-be- code-generated declarations (func bodies, var initializers), each tagged with its originating package path, or an error. When src == "" the source is loaded from the package directory `name`, so its decls are tagged with `name`; otherwise (main package / REPL) they are tagged with "".

func (*Parser) ParseAllFiles added in v0.4.0

func (p *Parser) ParseAllFiles(sources []PackageSource) (out []DeferredDecl, err error)

ParseAllFiles parses a set of in-memory source files as a SINGLE compile unit (one package) and returns the still-to-be-code-generated declarations. Used by `mvm run f1.go f2.go ...`, where several local files form the main package and must see each other's top-level symbols regardless of file or declaration order. Each source's Name labels its origin for diagnostics. Decls are tagged for the main package (bare keys), matching a single-file main Eval.

func (*Parser) ParseDecl

func (p *Parser) ParseDecl(toks Tokens) (handled bool, err error)

ParseDecl resolves a declaration's symbols (Phase 1) without emitting code. Returns handled=true if fully resolved, false if code generation is needed.

func (*Parser) ParseOneStmt

func (p *Parser) ParseOneStmt(toks Tokens) (Tokens, error)

ParseOneStmt parses a single pre-scanned statement token slice.

func (*Parser) RestoreUnit added in v0.4.0

func (p *Parser) RestoreUnit(s UnitState)

RestoreUnit reverts a failed compile to s: deletes added symbol keys/packages, restores replaced ones, truncates template instances/InitFuncs, and clears the instance/import queues. In-place mutation of pre-existing symbols is not undone (shared pointers), but a failed unit's own new declarations are.

func (*Parser) SetBuildContext

func (p *Parser) SetBuildContext(goos, goarch string)

SetBuildContext overrides the parser's target GOOS/GOARCH for build constraint filtering.

func (*Parser) SetIncludeTests added in v0.2.0

func (p *Parser) SetIncludeTests(b bool)

SetIncludeTests toggles whether ParseAll's directory-mode load includes _test.go files. Off by default (matching `import "X"` resolution); turn on for `mvm test <importpath>` so test functions become callable.

func (*Parser) SetPkgfs

func (p *Parser) SetPkgfs(pkgPath string)

SetPkgfs sets the parser virtual filesystem for reading sources.

func (*Parser) SetRemoteFS added in v0.2.0

func (p *Parser) SetRemoteFS(fsys fs.FS)

SetRemoteFS installs a last-resort filesystem consulted when neither pkgfs nor stdlibfs contain the requested import path. Typical use is a modfs.FS that fetches modules from a proxy on demand.

func (*Parser) SetStdlibFS

func (p *Parser) SetStdlibFS(fsys fs.FS)

SetStdlibFS installs a fallback filesystem for resolving imported source packages that are not present in the primary pkgfs. This is used to resolve generics-first stdlib packages (cmp, slices, maps, ...) whose sources are embedded in the interpreter binary.

func (*Parser) SetTestSkipFiles added in v0.3.0

func (p *Parser) SetTestSkipFiles(names map[string]bool)

SetTestSkipFiles records basenames of bridged-stdlib test files that loadBridgedTestSources must skip. Used by `mvm test`'s drop-on-compile- error retry: a stdlib external test file that references export_test.go- only symbols (e.g. a method the real native type lacks) can't compile against the bridge, so the driver drops it and reloads the rest. nil or empty means skip nothing.

func (*Parser) SetTestSourceFS added in v0.3.0

func (p *Parser) SetTestSourceFS(fsys fs.FS)

SetTestSourceFS installs the test-source filesystem consulted by LoadPackageSources only when (a) includeTests is on and (b) the target import path is a bridge-only stdlib package (i.e. has a Bin entry in p.Packages but no source in pkgfs/stdlibfs/remotefs). The intended supplier is stdlib.GorootTestFS(), which serves $GOROOT/src so external `package X_test` files run against the existing reflect bindings.

This FS is deliberately separate from the pkgfs -> stdlibfs -> remotefs chain: feeding $GOROOT/src into that chain would make ordinary `import "strings"` start loading interpreted source alongside the reflect bridge, double-defining every exported symbol.

func (*Parser) SnapshotUnit added in v0.4.0

func (p *Parser) SnapshotUnit() UnitState

SnapshotUnit captures the symbol-table state before a top-level compile.

func (*Parser) SymAdd

func (p *Parser) SymAdd(i int, name string, v vm.Value, k symbol.Kind, t *vm.Type)

SymAdd adds a new named symbol, recording the key for potential rollback.

func (*Parser) SymSet

func (p *Parser) SymSet(key string, sym *symbol.Symbol)

SymSet inserts sym at key in the symbol table, recording the key for potential rollback.

func (*Parser) TakeInstanceDecls added in v0.4.0

func (p *Parser) TakeInstanceDecls() []DeferredDecl

TakeInstanceDecls returns and clears the queued generic-instance bodies.

func (*Parser) WithImportingPkg added in v0.2.0

func (p *Parser) WithImportingPkg(pkg string) func()

WithImportingPkg sets p.importingPkg to pkg and returns a function that restores the previous value. Callers loading a package's source directly (e.g. `mvm test <importpath>`) use this to mirror the canonical-key setup that importSrc performs for transitive imports, so the target's top-level Type/Func/Method/Var/Const symbols land at `<pkg>.<name>` keys rather than bare keys (which would mismatch every subsequent qualified lookup in the target's own deferred bodies). See pkgKey, symGet, and the Phase 2 Path B memory notes.

type SelectCaseDesc

type SelectCaseDesc struct {
	Dir     reflect.SelectDir
	ValName string // scoped name of recv value var ("" if none)
	OkName  string // scoped name of recv ok var ("" if none)
}

SelectCaseDesc describes one case of a select statement for the compiler.

type Token

type Token struct {
	scan.Token
	Arg []any
}

Token represents a parser token.

func (Token) FieldKeyName added in v0.2.0

func (t Token) FieldKeyName() (string, bool)

FieldKeyName returns the field name of a struct-composite-literal key Colon token (emitted by newFieldColon), and false otherwise.

func (*Token) MarkNoFnew added in v0.2.0

func (t *Token) MarkNoFnew()

MarkNoFnew tags this Token (intended for Type Idents) so the compiler will not emit a speculative Fnew for it.

func (Token) NoFnew added in v0.2.0

func (t Token) NoFnew() bool

NoFnew reports whether this Token was tagged via MarkNoFnew.

func (Token) ResolvedType added in v0.4.0

func (t Token) ResolvedType() *vm.Type

ResolvedType returns the resolved *vm.Type that a Type-kind Ident carries, or nil. The parser attaches it so the compiler resolves the type by its global slot (typeSym/typeIndex) instead of re-looking up the name in the mutable, shared symbol table -- the type's identity travels in the IR, not its name.

type Tokens

type Tokens []Token

Tokens represents slice of tokens.

func (Tokens) Index

func (toks Tokens) Index(tok lang.Token) int

Index returns the index in toks of the first matching tok, or -1.

func (Tokens) LastIndex

func (toks Tokens) LastIndex(tok lang.Token) int

LastIndex returns the index in toks of the last matching tok, or -1.

func (Tokens) Split

func (toks Tokens) Split(tok lang.Token) (result []Tokens)

Split returns a slice of token arrays, separated by tok.

func (Tokens) SplitStart

func (toks Tokens) SplitStart(tok lang.Token) (result []Tokens)

SplitStart is similar to Split, except the first token in toks is skipped.

func (Tokens) String

func (toks Tokens) String() (s string)

type UnitState added in v0.4.0

type UnitState struct {
	// contains filtered or unexported fields
}

UnitState is an opaque pre-compile snapshot for SnapshotUnit/RestoreUnit.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL