tokeniser

package

v1.5.0 Latest Latest Go to latest Published: Mar 22, 2026 License: MIT Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/Advik-B/english

Links

Open Source Insights

Documentation ¶

Overview ¶

Package tokeniser provides the shared lexer for the English programming language. Both the compiler pipeline (via the parser package) and the syntax-highlighter (via the highlight package) use this lexer so that keyword recognition, operator phrases, possessive handling, and every other tokenisation rule have a single authoritative implementation.

Index ¶

func TokenizeForHighlight(source string) []token.Token
type Lexer
- func NewLexer(input string) *Lexer

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func TokenizeForHighlight ¶

func TokenizeForHighlight(source string) []token.Token

TokenizeForHighlight tokenizes source and returns a token stream suitable for syntax highlighting. Unlike TokenizeAll it:

preserves NEWLINE tokens
inserts WHITESPACE tokens for the horizontal whitespace that the lexer normally discards between semantic tokens
sets each token's Value to the exact bytes from source so that the original text can be reconstructed verbatim (including original casing, spacing inside multi-word operators, quote characters around strings, and the leading '#' of comments)

The returned slice, when its Value fields are concatenated in order, reproduces source exactly.

Types ¶

type Lexer ¶

type Lexer struct {
	// contains filtered or unexported fields
}

Lexer tokenizes English language source code.

func NewLexer ¶

func NewLexer(input string) *Lexer

NewLexer creates a new Lexer for the given input.

func (*Lexer) NextToken ¶

func (l *Lexer) NextToken() token.Token

NextToken returns the next token from the input.

func (*Lexer) Offset ¶

func (l *Lexer) Offset() int

Offset returns the current byte position in the input. After a call to NextToken, Offset() returns the position of the first byte that has not yet been consumed — i.e. the exclusive end of the just-returned token in the source string. This is used by TokenizeForHighlight to locate the raw source bytes for each token.

func (*Lexer) TokenizeAll ¶

func (l *Lexer) TokenizeAll() []token.Token

TokenizeAll returns all tokens from the input, skipping NEWLINE tokens so that the parser receives a flat, newline-free token stream.

Source Files ¶

View all Source files

tokeniser.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL