Documentation
¶
Overview ¶
Package lexer tokenizes craftgo DSL source files.
The lexer performs a single linear pass over UTF-8 input and produces a flat stream of Token values, each tagged with a Position. Errors are recorded as Diagnostic entries (retrievable via Lexer.Diagnostics) and surfaced as tokens of Kind Error in the stream - the lexer never panics or aborts on malformed input. Downstream phases (parser, semantic, LSP) all consume this same token stream so diagnostics are consistent across CLI and IDE.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Comment ¶
type Comment struct {
Pos Position // position of the leading `/` on the comment line
Text string // comment body with leading `// ` (and one optional space) stripped
Kind CommentKind // leading vs trailing
}
Comment is one source-level `//` line, captured by the lexer. The formatter / future linters consume the full slice via Lexer.Comments; the parser snapshots it onto `*ast.File.Comments` so downstream tools see one canonical view of every comment in the file regardless of whether it ended up attached to an AST node.
type CommentKind ¶
type CommentKind uint8
CommentKind classifies a `//` comment by its source position relative to surrounding tokens. Used by the formatter to render leading vs trailing comments at the correct site after parser/AST has lost the raw column information.
const ( // CommentLeading is the default - the comment was preceded only by // whitespace on its source line. CommentLeading CommentKind = iota // CommentTrailing is a comment that follows non-whitespace code on // the same line as the previously emitted token, e.g. the // `// 5 MiB` in `@maxBodySize(5242880) // 5 MiB`. The lexer detects // this via [Lexer.sawNewlineSinceLastToken]. CommentTrailing )
func (CommentKind) String ¶
func (k CommentKind) String() string
String returns "leading" or "trailing" for diagnostic messages.
type Diagnostic ¶
type Diagnostic struct {
Pos Position
End Position
Severity Severity
Code string
Msg string
Related []Related
}
Diagnostic is a single error/warning tied to a source range. The lexer, parser, and semantic analyser all accumulate Diagnostics so the parser, formatter, and LSP server can present them at once.
Pos is the start of the offending token / construct; End is the exclusive end of the same range and is used by the LSP layer to draw the red squiggle. End may equal Pos when only a point location is known (e.g. lexer point errors); callers should treat (Pos == End) as "underline a single column".
Code is a stable machine-readable identifier (e.g. `decorator/placement`) that the IDE uses for filtering, "disable next line", and documentation links. It must NOT include the message - keep human text in Msg.
Related carries secondary positions referenced by Msg. The IDE shows them as clickable cross-links rather than appending another sentence to Msg.
func (Diagnostic) Error ¶
func (d Diagnostic) Error() string
Error implements the error interface, formatted as `pos: msg`. Severity and code are omitted from the default rendering; the LSP layer reads the structured fields directly.
type Kind ¶
type Kind int
Kind enumerates every token category emitted by the lexer.
Kind values are stable and ordered: the keyword block (KwPackage..KwNull) and the HTTP-verb block (VerbGet..VerbOptions) are contiguous so that callers can detect "any keyword" via simple range checks. New kinds must be appended; reordering breaks parser code that relies on the keyword range.
const ( // EOF is emitted exactly once at the end of input. EOF Kind = iota // Error wraps a malformed token; the offending source slice is in Text and // a Diagnostic is recorded on the [Lexer]. Parsing should treat this as // "skip and continue" - the diagnostic carries the message for users. Error // Ident is any identifier that is not a reserved keyword. Ident // Int holds a plain decimal integer literal (no sign, no suffix). Int // Float holds a decimal float literal of the form `digits.digits`. Float // String holds a double-quoted string with escape sequences preserved // verbatim (parser does the unescape). String // RawString holds a backtick-quoted string. Backticks are kept in Text; // no escape processing is performed. RawString // Duration is a numeric literal followed immediately by a duration suffix // (`ns`, `us`, `µs`, `ms`, `s`, `m`, `h`). Duration // Size is a numeric literal followed immediately by a size suffix // (`B`, `KB`, `MB`, `GB`). Size KwPackage KwImport KwType KwEnum KwError KwScalar KwService KwExtend KwMiddleware KwRequest KwResponse KwMap KwTrue KwFalse KwNull VerbGet VerbPost VerbPut VerbPatch VerbDelete VerbHead VerbOptions LBrace // { RBrace // } LParen // ( RParen // ) LBracket // [ RBracket // ] LAngle // < RAngle // > Comma // , Colon // : Equal // = Question // ? Dot // . Slash // / At // @ Dash // - )
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer tokenizes a single craftgo source buffer.
A Lexer holds its position, the original source, accumulated diagnostics, and is consumed via Lexer.Next (one token at a time) or Lexer.Tokenize (slurp the whole stream). Lexers are not safe for concurrent use; create one per file.
func New ¶
New constructs a Lexer ready to tokenize src. filename is informational - it appears in [Position.Filename] on every emitted token and in diagnostics. Pass an empty string when there is no associated file.
func (*Lexer) Comments ¶
Comments returns every `//` comment encountered in the source so far, in source order, with their position and leading/trailing kind. Callers (parser snapshot, format printer, lint tools) consume this slice instead of re-scanning the source; it is the single source of truth for "every comment in the file".
func (*Lexer) Diagnostics ¶
func (l *Lexer) Diagnostics() []Diagnostic
Diagnostics returns every error encountered so far. Calling it does not reset internal state, so additional errors from later tokens append to the same slice.
func (*Lexer) Next ¶
Next returns the next token in the stream. It skips whitespace and `//` line comments, then dispatches to a specialised lexer based on the leading rune. Any malformed input produces a token of kind Error (with the message in Text) and adds a corresponding Diagnostic; the lexer continues from the next available position.
type Position ¶
type Position struct {
// Filename is the path or label of the source file (may be empty).
Filename string
// Offset is the 0-indexed byte offset into the source.
Offset int
// Line is the 1-indexed line number.
Line int
// Column is the 1-indexed rune column on the current line.
Column int
}
Position identifies a single byte location in a source file.
The zero value is invalid (Line == 0). All non-zero positions use 1-indexed Line/Column counts; Offset is a 0-indexed byte offset suitable for slicing into the original source. Filename is optional - when empty, Position.String omits it.
type Related ¶
Related links a Diagnostic to a secondary location - typically the "previously declared at" site for a duplicate, or the conflicting decorator for a combination-rule violation. The IDE renders these as clickable secondary markers next to the primary diagnostic.
type Severity ¶
type Severity uint8
Severity classifies a Diagnostic for IDE rendering. The values mirror the LSP DiagnosticSeverity enum so the LSP server can pass them through without translation. Zero value is SeverityError - every diagnostic constructed without an explicit severity is treated as an error.
const ( // SeverityError is a hard failure: codegen / runtime would be wrong. SeverityError Severity = iota // SeverityWarning is a soft issue worth surfacing but not blocking. SeverityWarning // SeverityInfo is informational (style hints, redundant constructs). SeverityInfo // SeverityHint is a low-priority suggestion, often paired with a fix. SeverityHint )
type Token ¶
type Token struct {
Kind Kind
Text string
Pos Position
// Doc is the contiguous run of `//` line comments immediately
// preceding this token, with the leading `//` and a single trailing
// space stripped. A blank line between a comment block and the next
// token discards the block - only "doc-attached" comments arrive here.
Doc []string
// Trailing is a single `// note` comment that follows this token on
// the same source line, with the leading `// ` stripped. Empty when
// no trailing comment is present. Captured by [Lexer.Next] right
// after the token is constructed; the comment text is consumed from
// the source stream so [skipWhitespaceAndComments] on the next
// [Lexer.Next] call does not see it again as a leading comment.
Trailing string
}
Token is a single lexed unit of the source.
Text holds the literal source slice that produced this token (including surrounding quotes for String / RawString, suffix for Duration / Size). For keyword tokens, Text is the keyword spelling - useful when echoing source without consulting [kindNames].