lexer

package

v0.14.3 Latest Latest Go to latest Published: May 2, 2026 License: Apache-2.0 Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/sunholo-data/ailang

Links

Open Source Insights

Documentation ¶

Index ¶

func IsContextualKeyword(t TokenType) bool
func IsReservedKeyword(ident string) bool
func Normalize(src []byte) []byte
type Error
- func (e Error) Error() string
type Lexer
- func New(input string, filename string) *Lexer
- func (l *Lexer) NextToken() Token
type Token
- func NewToken(tokenType TokenType, literal string, line, column int, file string) Token
type TokenType
- func LookupIdent(ident string) TokenType
- func LookupIdentContextual(ident string) TokenType
- func (t TokenType) String() string

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func IsContextualKeyword ¶

func IsContextualKeyword(t TokenType) bool

IsContextualKeyword checks if a token type is only reserved in specific contexts For now, all keywords are strictly reserved, but this allows future flexibility

func IsReservedKeyword ¶

func IsReservedKeyword(ident string) bool

IsReservedKeyword checks if a string is a reserved keyword This is used to prevent keywords from being used as identifiers

func Normalize ¶

func Normalize(src []byte) []byte

Normalize performs input normalization at the lexer boundary: 1. Strips UTF-8 BOM if present 2. Applies Unicode NFC normalization

This ensures that lexically equivalent source code produces identical token streams regardless of encoding variations.

Examples:

"café" in NFC vs NFD → identical tokens
"\uFEFF let x = 5" → "let x = 5" (BOM stripped)

Normalization is performed once at input to avoid repeated processing.

Types ¶

type Error ¶

type Error struct {
	Message string
	Line    int
	Column  int
	File    string
}

Error represents a lexer error

func (Error) Error ¶

func (e Error) Error() string

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes AILANG source code

func New ¶

func New(input string, filename string) *Lexer

New creates a new Lexer with normalized input. Input is normalized at the lexer boundary: - UTF-8 BOM is stripped - Unicode NFC normalization is applied

This ensures lexically equivalent source produces identical token streams.

func (*Lexer) NextToken ¶

func (l *Lexer) NextToken() Token

NextToken returns the next token

type Token ¶

type Token struct {
	Type    TokenType
	Literal string
	Line    int
	Column  int
	File    string
}

Token represents a lexical token

func NewToken ¶

func NewToken(tokenType TokenType, literal string, line, column int, file string) Token

NewToken creates a new token

func (Token) IsKeyword ¶

func (t Token) IsKeyword() bool

IsKeyword checks if a token is a keyword

func (Token) IsOperator ¶

func (t Token) IsOperator() bool

IsOperator checks if a token is an operator

func (Token) Position ¶

func (t Token) Position() string

Position returns the position of the token as a string

func (Token) Precedence ¶

func (t Token) Precedence() int

Precedence returns the precedence of an operator. Follows C-standard dedicated precedence bands (tightest at top):

DOT_ACCESS   .                   (16)
CALL         f(x)                (15)
PREFIX       -, not, ~           (14)
PRODUCT      *, /, %             (13)
SUM          +, -                (12)
APPEND       ++                  (11)
CONS         ::                  (10)
SHIFT        <<, >>              (9)
LESSGREATER  <, >, <=, >=        (8)
EQUALS       ==, !=              (7)
BITWISE_AND  &                   (6)
BITWISE_XOR  ^                   (5)
(BITWISE_OR  |  — reserved, not an operator; use bitwiseOr())
LOGICAL_AND  &&                  (3)
LOGICAL_OR   ||                  (2)
LAMBDA       \                   (1)

func (Token) String ¶

func (t Token) String() string

String returns a string representation of the token

type TokenType ¶

type TokenType int

TokenType represents the type of a token

const (
	// Special tokens
	ILLEGAL TokenType = iota
	EOF
	COMMENT

	// Literals
	IDENT  // identifier
	INT    // 123
	FLOAT  // 123.45
	STRING // "abc"
	CHAR   // 'a'

	// String interpolation (M1_LEXER_INTERP, v0.12.1)
	// "prefix${expr}suffix" tokenizes to:
	//   STRING_PART("prefix"), INTERP_START, <expr tokens>, INTERP_END, STRING_PART("suffix")
	// Plain strings without `${` continue to emit a single STRING token.
	STRING_PART  // partial string literal (before/after/between `${...}`)
	INTERP_START // the `${` marker opening an interpolation
	INTERP_END   // the matching `}` closing an interpolation

	// Keywords
	FUNC
	PURE
	LET
	LETREC
	IN
	IF
	THEN
	ELSE
	MATCH
	WITH
	TYPE
	CLASS
	INSTANCE
	MODULE
	IMPORT
	EXPORT
	EXTERN // extern func declarations for Go interop
	FORALL
	EXISTS
	TEST
	TESTS // tests block
	PROPERTY
	PROPERTIES // properties block
	ASSERT
	SPAWN
	PARALLEL
	SELECT
	CHANNEL
	SEND
	RECV
	TIMEOUT
	AS       // as (import aliasing)
	DERIVING // deriving (type class derivation)

	// Contract keywords (M-VERIFY)
	REQUIRES  // requires
	ENSURES   // ensures
	INVARIANT // invariant

	// Operators
	PLUS      // +
	MINUS     // -
	STAR      // *
	SLASH     // /
	PERCENT   // %
	EQ        // ==
	NEQ       // !=
	LT        // <
	GT        // >
	LTE       // <=
	GTE       // >=
	AND       // &&
	OR        // ||
	NOT       // not
	ARROW     // ->
	FARROW    // =>
	LARROW    // <-
	PIPE      // |
	APPEND    // ++
	CONS      // ::
	COMPOSE   // .
	BANG      // !
	QUESTION  // ?
	AT        // @
	DOLLAR    // $
	HASH      // #
	ASSIGN    // =
	COLON     // :
	DCOLON    // ::
	BACKSLASH // \

	// Bitwise operators
	AMPERSAND // & (bitwise AND)
	CARET     // ^ (bitwise XOR)
	TILDE     // ~ (bitwise NOT)
	SHL       // << (left shift)
	SHR       // >> (right shift)

	// Delimiters
	LPAREN    // (
	RPAREN    // )
	LBRACE    // {
	RBRACE    // }
	LBRACKET  // [
	RBRACKET  // ]
	COMMA     // ,
	DOT       // .
	DOTDOT    // ..
	ELLIPSIS  // ...
	SEMICOLON // ;
	NEWLINE   // \n

	// Quasiquote types
	SQLQuote   // sql"""
	HTMLQuote  // html"""
	JSONQuote  // json{
	RegexQuote // regex/
	URLQuote   // url"
	ShellQuote // shell"""

	// Effect markers
	EffectMarker // ! {effects}

	// Boolean literals
	TRUE
	FALSE

	// Unit type
	UNIT // ()
)

func LookupIdent ¶

func LookupIdent(ident string) TokenType

LookupIdent checks if an identifier is a keyword

func LookupIdentContextual ¶

func LookupIdentContextual(ident string) TokenType

LookupIdentContextual checks if an identifier is a keyword, but treats test/tests/properties as contextual (can be used as identifiers in some contexts)

func (TokenType) String ¶

func (t TokenType) String() string

String returns the string representation of a token type

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL