Documentation
¶
Overview ¶
Package tokeniser provides the shared lexer for the English programming language. Both the compiler pipeline (via the parser package) and the syntax-highlighter (via the highlight package) use this lexer so that keyword recognition, operator phrases, possessive handling, and every other tokenisation rule have a single authoritative implementation.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func TokenizeForHighlight ¶
TokenizeForHighlight tokenizes source and returns a token stream suitable for syntax highlighting. Unlike TokenizeAll it:
- preserves NEWLINE tokens
- inserts WHITESPACE tokens for the horizontal whitespace that the lexer normally discards between semantic tokens
- sets each token's Value to the exact bytes from source so that the original text can be reconstructed verbatim (including original casing, spacing inside multi-word operators, quote characters around strings, and the leading '#' of comments)
The returned slice, when its Value fields are concatenated in order, reproduces source exactly.
Types ¶
type Lexer ¶
type Lexer struct {
// contains filtered or unexported fields
}
Lexer tokenizes English language source code.
func (*Lexer) Offset ¶
Offset returns the current byte position in the input. After a call to NextToken, Offset() returns the position of the first byte that has not yet been consumed — i.e. the exclusive end of the just-returned token in the source string. This is used by TokenizeForHighlight to locate the raw source bytes for each token.
func (*Lexer) TokenizeAll ¶
TokenizeAll returns all tokens from the input, skipping NEWLINE tokens so that the parser receives a flat, newline-free token stream.