scan

package
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2026 License: BSD-3-Clause Imports: 7 Imported by: 0

README

Scanner

A scanner takes a string in input and returns an array of tokens.

graph LR
s[ ] --> |source| a(scanner)
--> |tokens| b(parser)
--> |AST| c[ ]
subgraph scanner
    a
end
style s height:0px;
style c height:0px;

Tokens can be of the following kinds:

  • identifier
  • number
  • operator
  • separator
  • string
  • block

Resolving nested blocks in the scanner is making the parser simple and generic, without having to resort to parse tables.

The lexical rules are provided by a language specification at language level which includes the following:

  • a set of composable properties (1 per bit, on an integer) for each character in the ASCII range (where all separator, operators and reserved keywords must be defined).
  • for each block or string, the specification of starting and ending delimiter.

Development status

A successful test must be provided to check the status.

  • numbers starting with a digit
  • numbers starting otherwise
  • unescaped strings (including multiline)
  • escaped string (including multiline)
  • separators (in UTF-8 range)
  • single line string (\n not allowed)
  • identifiers (in UTF-8 range)
  • operators, concatenated or not
  • single character block/string delimiters
  • arbitrarly nested blocks and strings
  • multiple characters block/string delimiters
  • semi-colon automatic insertion after newline
  • blocks delimited by operator characters
  • blocks delimited by identifiers
  • blocks with delimiter inclusion/exclusion rules
  • blocks delimited by indentation level (python, yaml, ...)

Documentation

Overview

Package scan provide a language independent scanner.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrBlock   = errors.New("block not terminated")
	ErrIllegal = errors.New("illegal token")
)

Error definitions.

Functions

This section is empty.

Types

type Scanner

type Scanner struct {
	*lang.Spec
	Sources Sources // source position registry (multi-file / REPL)
	PosBase int     // base offset for current source
	// contains filtered or unexported fields
}

Scanner contains the scanner rules for a language.

func NewScanner

func NewScanner(spec *lang.Spec) *Scanner

NewScanner returns a new scanner for a given language specification.

func (*Scanner) Next

func (sc *Scanner) Next(src string) (tok Token, err error)

Next returns the next token in string.

func (*Scanner) Scan

func (sc *Scanner) Scan(src string, semiEOF bool) (tokens []Token, err error)

Scan performs a lexical analysis on src and returns tokens or an error.

type Source

type Source struct {
	Name string
	Base int // base byte offset in the unified position space
	Len  int // length in bytes
	// contains filtered or unexported fields
}

Source describes a source text.

type Sources

type Sources []Source

Sources is an ordered list of Source entries.

func (*Sources) Add

func (ss *Sources) Add(name, src string) int

Add registers a new source and returns its base offset.

func (Sources) FormatPos

func (ss Sources) FormatPos(pos int) string

FormatPos converts a global byte offset to a "[file:]line:col" string.

func (Sources) Resolve

func (ss Sources) Resolve(pos int) (name string, line, col int)

Resolve converts a global byte offset to (source name, line, col). Returns ("", 0, 0) if pos is out of range.

type Token

type Token struct {
	Tok lang.Token // token identificator
	Pos int        // position in source
	Str string     // string in source
	Beg int        // length of begin delimiter (block, string)
	End int        // length of end delimiter (block, string)
}

Token defines a scanner token.

func (*Token) Block

func (t *Token) Block() string

Block return the block content of t.

func (*Token) Name

func (t *Token) Name() string

Name return the name of t (short string for debugging).

func (*Token) Prefix

func (t *Token) Prefix() string

Prefix returns the block starting delimiter of t.

func (*Token) String

func (t *Token) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL