lexer

package
v1.3.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 5, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package lexer tokenizes craftgo DSL source files.

The lexer performs a single linear pass over UTF-8 input and produces a flat stream of Token values, each tagged with a Position. Errors are recorded as Diagnostic entries (retrievable via Lexer.Diagnostics) and surfaced as tokens of Kind Error in the stream - the lexer never panics or aborts on malformed input. Downstream phases (parser, semantic, LSP) all consume this same token stream so diagnostics are consistent across CLI and IDE.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Comment

type Comment struct {
	Pos  Position    // position of the leading `/` on the comment line
	Text string      // comment body with leading `// ` (and one optional space) stripped
	Kind CommentKind // leading vs trailing
}

Comment is one source-level `//` line, captured by the lexer. The formatter / future linters consume the full slice via Lexer.Comments; the parser snapshots it onto `*ast.File.Comments` so downstream tools see one canonical view of every comment in the file regardless of whether it ended up attached to an AST node.

type CommentKind

type CommentKind uint8

CommentKind classifies a `//` comment by its source position relative to surrounding tokens. Used by the formatter to render leading vs trailing comments at the correct site after parser/AST has lost the raw column information.

const (
	// CommentLeading is the default - the comment was preceded only by
	// whitespace on its source line.
	CommentLeading CommentKind = iota
	// CommentTrailing is a comment that follows non-whitespace code on
	// the same line as the previously emitted token, e.g. the
	// `// 5 MiB` in `@maxBodySize(5242880) // 5 MiB`. The lexer detects
	// this via [Lexer.sawNewlineSinceLastToken].
	CommentTrailing
)

func (CommentKind) String

func (k CommentKind) String() string

String returns "leading" or "trailing" for diagnostic messages.

type Diagnostic

type Diagnostic struct {
	Pos      Position
	End      Position
	Severity Severity
	Code     string
	Msg      string
	Related  []Related
}

Diagnostic is a single error/warning tied to a source range. The lexer, parser, and semantic analyser all accumulate Diagnostics so the parser, formatter, and LSP server can present them at once.

Pos is the start of the offending token / construct; End is the exclusive end of the same range and is used by the LSP layer to draw the red squiggle. End may equal Pos when only a point location is known (e.g. lexer point errors); callers should treat (Pos == End) as "underline a single column".

Code is a stable machine-readable identifier (e.g. `decorator/placement`) that the IDE uses for filtering, "disable next line", and documentation links. It must NOT include the message - keep human text in Msg.

Related carries secondary positions referenced by Msg. The IDE shows them as clickable cross-links rather than appending another sentence to Msg.

func (Diagnostic) Error

func (d Diagnostic) Error() string

Error implements the error interface, formatted as `pos: msg`. Severity and code are omitted from the default rendering; the LSP layer reads the structured fields directly.

type Kind

type Kind int

Kind enumerates every token category emitted by the lexer.

Kind values are stable and ordered: the keyword block (KwPackage..KwNull) and the HTTP-verb block (VerbGet..VerbOptions) are contiguous so that callers can detect "any keyword" via simple range checks. New kinds must be appended; reordering breaks parser code that relies on the keyword range.

const (
	// EOF is emitted exactly once at the end of input.
	EOF Kind = iota
	// Error wraps a malformed token; the offending source slice is in Text and
	// a Diagnostic is recorded on the [Lexer]. Parsing should treat this as
	// "skip and continue" - the diagnostic carries the message for users.
	Error

	// Ident is any identifier that is not a reserved keyword.
	Ident
	// Int holds a plain decimal integer literal (no sign, no suffix).
	Int
	// Float holds a decimal float literal of the form `digits.digits`.
	Float
	// String holds a double-quoted string with escape sequences preserved
	// verbatim (parser does the unescape).
	String
	// RawString holds a backtick-quoted string. Backticks are kept in Text;
	// no escape processing is performed.
	RawString
	// Duration is a numeric literal followed immediately by a duration suffix
	// (`ns`, `us`, `µs`, `ms`, `s`, `m`, `h`).
	Duration
	// Size is a numeric literal followed immediately by a size suffix
	// (`B`, `KB`, `MB`, `GB`).
	Size

	KwPackage
	KwImport
	KwType
	KwEnum
	KwError
	KwScalar
	KwService
	KwExtend
	KwMiddleware
	KwRequest
	KwResponse
	KwMap
	KwTrue
	KwFalse
	KwNull

	VerbGet
	VerbPost
	VerbPut
	VerbPatch
	VerbDelete
	VerbHead
	VerbOptions

	LBrace   // {
	RBrace   // }
	LParen   // (
	RParen   // )
	LBracket // [
	RBracket // ]
	LAngle   // <
	RAngle   // >
	Comma    // ,
	Colon    // :
	Equal    // =
	Question // ?
	Dot      // .
	Slash    // /
	At       // @
	Dash     // -
)

func (Kind) String

func (k Kind) String() string

String returns a human-readable name for the kind, e.g. `EOF`, `Ident`, or the literal punctuation character. Unknown kinds (added without updating [kindNames]) render as `Kind(N)` so they remain visible in diagnostics.

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes a single craftgo source buffer.

A Lexer holds its position, the original source, accumulated diagnostics, and is consumed via Lexer.Next (one token at a time) or Lexer.Tokenize (slurp the whole stream). Lexers are not safe for concurrent use; create one per file.

func New

func New(filename, src string) *Lexer

New constructs a Lexer ready to tokenize src. filename is informational - it appears in [Position.Filename] on every emitted token and in diagnostics. Pass an empty string when there is no associated file.

func (*Lexer) Comments

func (l *Lexer) Comments() []*Comment

Comments returns every `//` comment encountered in the source so far, in source order, with their position and leading/trailing kind. Callers (parser snapshot, format printer, lint tools) consume this slice instead of re-scanning the source; it is the single source of truth for "every comment in the file".

func (*Lexer) Diagnostics

func (l *Lexer) Diagnostics() []Diagnostic

Diagnostics returns every error encountered so far. Calling it does not reset internal state, so additional errors from later tokens append to the same slice.

func (*Lexer) Next

func (l *Lexer) Next() Token

Next returns the next token in the stream. It skips whitespace and `//` line comments, then dispatches to a specialised lexer based on the leading rune. Any malformed input produces a token of kind Error (with the message in Text) and adds a corresponding Diagnostic; the lexer continues from the next available position.

func (*Lexer) Tokenize

func (l *Lexer) Tokenize() []Token

Tokenize consumes the entire source and returns every token, terminated by exactly one EOF token. Convenience wrapper for callers that want random access (parser does this; LSP keeps a Lexer around for incremental work).

type Position

type Position struct {
	// Filename is the path or label of the source file (may be empty).
	Filename string
	// Offset is the 0-indexed byte offset into the source.
	Offset int
	// Line is the 1-indexed line number.
	Line int
	// Column is the 1-indexed rune column on the current line.
	Column int
}

Position identifies a single byte location in a source file.

The zero value is invalid (Line == 0). All non-zero positions use 1-indexed Line/Column counts; Offset is a 0-indexed byte offset suitable for slicing into the original source. Filename is optional - when empty, Position.String omits it.

func (Position) IsValid

func (p Position) IsValid() bool

IsValid reports whether the position has been initialised. A position is considered valid once Line > 0; the zero-value Position is invalid.

func (Position) String

func (p Position) String() string

String renders the position as `file:line:col` (or `line:col` when Filename is empty). Output matches the convention used by `go vet`, `gopls`, and most editors so that error messages are clickable.

type Related struct {
	Pos Position
	Msg string
}

Related links a Diagnostic to a secondary location - typically the "previously declared at" site for a duplicate, or the conflicting decorator for a combination-rule violation. The IDE renders these as clickable secondary markers next to the primary diagnostic.

type Severity

type Severity uint8

Severity classifies a Diagnostic for IDE rendering. The values mirror the LSP DiagnosticSeverity enum so the LSP server can pass them through without translation. Zero value is SeverityError - every diagnostic constructed without an explicit severity is treated as an error.

const (
	// SeverityError is a hard failure: codegen / runtime would be wrong.
	SeverityError Severity = iota
	// SeverityWarning is a soft issue worth surfacing but not blocking.
	SeverityWarning
	// SeverityInfo is informational (style hints, redundant constructs).
	SeverityInfo
	// SeverityHint is a low-priority suggestion, often paired with a fix.
	SeverityHint
)

func (Severity) String

func (s Severity) String() string

String renders the severity as a short label for diagnostic formatting.

type Token

type Token struct {
	Kind Kind
	Text string
	Pos  Position
	// Doc is the contiguous run of `//` line comments immediately
	// preceding this token, with the leading `//` and a single trailing
	// space stripped. A blank line between a comment block and the next
	// token discards the block - only "doc-attached" comments arrive here.
	Doc []string
	// Trailing is a single `// note` comment that follows this token on
	// the same source line, with the leading `// ` stripped. Empty when
	// no trailing comment is present. Captured by [Lexer.Next] right
	// after the token is constructed; the comment text is consumed from
	// the source stream so [skipWhitespaceAndComments] on the next
	// [Lexer.Next] call does not see it again as a leading comment.
	Trailing string
}

Token is a single lexed unit of the source.

Text holds the literal source slice that produced this token (including surrounding quotes for String / RawString, suffix for Duration / Size). For keyword tokens, Text is the keyword spelling - useful when echoing source without consulting [kindNames].

func (Token) String

func (t Token) String() string

String formats the token for debug and test output as `Kind "text" at pos`.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL