Documentation
¶
Overview ¶
Package tokenizer implements a generic lexical scanner for tokenizing text input.
The tokenizer breaks input text into tokens such as identifiers, numbers, strings, operators, and punctuation. It supports various number formats (integer, float, hex, octal, binary) and can be configured with optional features like comment parsing and newline handling.
Basic Usage ¶
scanner := tokenizer.NewScanner(strings.NewReader("hello world"), tokenizer.Pos{})
for {
tok := scanner.Next()
if tok.Kind == tokenizer.EOF {
break
}
fmt.Println(tok)
}
Features ¶
The scanner supports optional features that can be enabled:
- HashComment: Enable # style single-line comments
- LineComment: Enable // style single-line comments
- BlockComment: Enable block comments
- UnderscoreToken: Emit underscores as separate tokens (for markdown parsing)
- NewlineToken: Emit newlines as separate tokens instead of whitespace
Features are combined using bitwise OR:
scanner := tokenizer.NewScanner(r, pos, tokenizer.HashComment|tokenizer.LineComment)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewParseError ¶
NewParseError creates a parse error for the given token. The error message includes the token's value and position.
func NewPosError ¶
NewPosError creates a new error with positional information. The resulting error implements the error interface and includes file, line, and column information in its Error() output.
Types ¶
type Feature ¶
type Feature uint
Feature represents optional scanner features that can be enabled
const ( // HashComment enables # style comments (# comment until end of line) HashComment Feature = 1 << iota // LineComment enables // style comments (// comment until end of line) LineComment // BlockComment enables /* */ style comments (/* block comment */) BlockComment // UnderscoreToken emits underscores as separate Underscore tokens // When disabled, underscores are part of identifiers (hello_world) UnderscoreToken // NewlineToken emits newlines as separate Newline tokens instead of Space NewlineToken // NumberFloatToken enables parsing of floating point numbers (1.5, 3.14e10) // When disabled, "1.5" is parsed as NumberInteger("1") + Punkt(".") + NumberInteger("5") NumberFloatToken )
type Pos ¶
type Pos struct {
// Path is an optional pointer to the source file path
Path *string
// Line is the zero-indexed line number
Line uint
// Col is the zero-indexed column number
Col uint
// contains filtered or unexported fields
}
Pos represents a position in source text, typically within a file. Line and Col are zero-indexed internally; add 1 when displaying to users. The Path field is optional and can be nil if the source has no associated file.
type PosError ¶
type PosError struct {
// Err is the underlying error
Err error
// Pos indicates where in the source the error occurred
Pos Pos
}
PosError wraps an error with positional information from the source. This allows error messages to include file, line, and column information.
type Scanner ¶
type Scanner struct {
// contains filtered or unexported fields
}
Scanner represents a lexical scanner.
func NewScanner ¶
NewScanner returns a new instance of Scanner with optional features. Features can be combined using bitwise OR: HashComment|LineComment|BlockComment
func (*Scanner) NewError ¶
NewError wraps the given error with the scanner's current position. This is useful for creating error messages that include file, line, and column information indicating where the error occurred.
func (*Scanner) Next ¶
Next returns the next token and advances the scanner position. If the scanner has reached EOF, subsequent calls continue to return EOF. Use Peak() instead if you need to look ahead without consuming the token.
func (*Scanner) Peak ¶
Peak returns the next token without advancing the scanner position. This allows looking ahead at upcoming tokens without consuming them. If the scanner has reached EOF, subsequent calls continue to return EOF. Note: The token is buffered, so multiple Peak() calls return the same token.
type Token ¶
type Token struct {
// Kind identifies the type of token (e.g., Ident, String, NumberInteger)
Kind TokenKind
// Val contains the literal text of the token
Val string
// Pos indicates where in the source the token was found
Pos Pos
}
Token represents a lexical token produced by the scanner. It contains the token's kind, its literal value as a string, and the position in the source where it was found.
type TokenKind ¶
type TokenKind uint
TokenKind classifies the type of a token produced by the scanner. Each token has a kind that identifies what type of lexical element it represents, such as an identifier, number, string, operator, or punctuation.
const ( Any TokenKind = iota String Expr Space Ident NumberInteger NumberFloat NumberOctal NumberHex NumberBinary Punkt Question Colon SemiColon Comma OpenParen CloseParen OpenSquare CloseSquare OpenBrace CloseBrace Ampersand Equal Less Greater Plus Minus Multiply Divide Not Backtick Tilde Pipe Backslash Underscore Hash At Caret Percent Dollar True False Null Comment Newline EOF Lowest = Equal // Lowest precedence )
Directories
¶
| Path | Synopsis |
|---|---|
|
pkg
|
|
|
ast
Package ast defines the abstract syntax tree node types used by parsers.
|
Package ast defines the abstract syntax tree node types used by parsers. |
|
markdown
Package markdown provides a parser for converting Markdown text into an AST.
|
Package markdown provides a parser for converting Markdown text into an AST. |
|
markdown/html
Package html provides an HTML renderer for Markdown AST nodes.
|
Package html provides an HTML renderer for Markdown AST nodes. |