tokeniser

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 22, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package tokeniser provides the shared lexer for the English programming language. Both the compiler pipeline (via the parser package) and the syntax-highlighter (via the highlight package) use this lexer so that keyword recognition, operator phrases, possessive handling, and every other tokenisation rule have a single authoritative implementation.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func TokenizeForHighlight

func TokenizeForHighlight(source string) []token.Token

TokenizeForHighlight tokenizes source and returns a token stream suitable for syntax highlighting. Unlike TokenizeAll it:

  • preserves NEWLINE tokens
  • inserts WHITESPACE tokens for the horizontal whitespace that the lexer normally discards between semantic tokens
  • sets each token's Value to the exact bytes from source so that the original text can be reconstructed verbatim (including original casing, spacing inside multi-word operators, quote characters around strings, and the leading '#' of comments)

The returned slice, when its Value fields are concatenated in order, reproduces source exactly.

Types

type Lexer

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes English language source code.

func NewLexer

func NewLexer(input string) *Lexer

NewLexer creates a new Lexer for the given input.

func (*Lexer) NextToken

func (l *Lexer) NextToken() token.Token

NextToken returns the next token from the input.

func (*Lexer) Offset

func (l *Lexer) Offset() int

Offset returns the current byte position in the input. After a call to NextToken, Offset() returns the position of the first byte that has not yet been consumed — i.e. the exclusive end of the just-returned token in the source string. This is used by TokenizeForHighlight to locate the raw source bytes for each token.

func (*Lexer) TokenizeAll

func (l *Lexer) TokenizeAll() []token.Token

TokenizeAll returns all tokens from the input, skipping NEWLINE tokens so that the parser receives a flat, newline-free token stream.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL