Documentation
¶
Overview ¶
Package token provides tokenization support for Tony and related formats.
Tokenize is a function for tokenizing bytes.
Balance provides tree structure discovery based on indentation and normalizes the token sequence so that it is context free.
Index ¶
- Constants
- Variables
- func ExpectedErr(what string, p *Pos) error
- func KPathQuoteField(v string) bool
- func LeadingZeroErr(pos *Pos) error
- func NeedsQuote(v string) bool
- func PrintTokens(toks []Token, msg string)
- func Quote(v string, autoSingle bool) string
- func QuotedToString(d []byte) string
- func UnexpectedErr(what string, p *Pos) error
- func Unquote(v string) (string, error)
- type ErrImbalancedStructure
- type NodeOffsetCallback
- type Pos
- type PosDoc
- type Token
- type TokenOpt
- type TokenSink
- type TokenSource
- type TokenType
- type TokenizeErr
- type Tokenizer
Constants ¶
const ( MLitChomp = '-' MLitKeep = '+' )
const ( TIndent = iota TInteger TFloat TColon TArrayElt TDocSep TComment TNull TTrue TFalse TTag TString TMString TLiteral TMLit TMergeKey TLCurl TRCurl TLSquare TRSquare TComma )
Variables ¶
var ( ErrBadUTF8 = errors.New("bad utf8") ErrUnterminated = errors.New("unterminated") ErrNumberLeadingZero = errors.New("leading zero") ErrNoIndent = errors.New("indentation needed") ErrDocBalance = errors.New("imbalanced document") ErrLiteral = errors.New("bad literal") ErrBadEscape = errors.New("bad escape") ErrBadUnicode = errors.New("bad unicode") ErrUnicodeControl = errors.New("unicode control") ErrMalformedMLit = errors.New("malformed multiline literal") ErrColonSpace = errors.New("colon should be followed by space") ErrEmptyDoc = errors.New("empty document") ErrMultilineString = errors.New("multiline string") ErrYAMLDoubleQuote = errors.New("yaml double quote") ErrMLitPlacement = errors.New("bad placement of |") ErrYAMLPlain = errors.New("yaml plain string") ErrUnsupported = errors.New("unsupported") ErrNumber = errors.New("number") )
Functions ¶
func ExpectedErr ¶
func KPathQuoteField ¶ added in v0.0.10
KPathQuoteField returns true if a field name needs to be quoted in a kinded path. A field needs quoting if:
- It contains characters that require quoting according to NeedsQuote (spaces, special chars)
- It contains any of the path syntax characters: ".", "[", "{"
func LeadingZeroErr ¶
func NeedsQuote ¶
func PrintTokens ¶
func QuotedToString ¶
func UnexpectedErr ¶
Types ¶
type ErrImbalancedStructure ¶
type ErrImbalancedStructure struct {
Open, Close *Token
}
func (*ErrImbalancedStructure) Error ¶
func (i *ErrImbalancedStructure) Error() string
func (*ErrImbalancedStructure) Unwrap ¶
func (i *ErrImbalancedStructure) Unwrap() error
type NodeOffsetCallback ¶ added in v0.0.10
NodeOffsetCallback is called when a node starts in the output stream. The offset is the absolute byte position where the node begins. The path is the kinded path from document root (e.g., "", "key", "key[0]", "a.b.c", "a{0}"). The token is the token that triggered the node start detection.
type Pos ¶
type PosDoc ¶
type PosDoc struct {
// contains filtered or unexported fields
}
func (*PosDoc) PosWithContext ¶ added in v0.0.10
PosWithContext creates a Pos with embedded context snippet. This allows Pos.String() to work without the full document. Parameters:
- absoluteOffset: absolute byte offset in the stream
- context: buffer slice containing bytes around the position
- bufferStartOffset: absolute offset where the context buffer starts
type TokenSink ¶ added in v0.0.10
type TokenSink struct {
// contains filtered or unexported fields
}
TokenSink provides streaming token encoding to an io.Writer. It tracks absolute byte offsets and calls a callback when nodes start.
func NewTokenSink ¶ added in v0.0.10
func NewTokenSink(w io.Writer, onNodeStart NodeOffsetCallback) *TokenSink
NewTokenSink creates a new TokenSink writing to w. If onNodeStart is provided, it will be called whenever a node starts.
type TokenSource ¶ added in v0.0.10
type TokenSource struct {
// contains filtered or unexported fields
}
TokenSource provides streaming tokenization from an io.Reader. It maintains internal state and buffers data as needed.
func NewTokenSource ¶ added in v0.0.10
func NewTokenSource(r io.Reader, opts ...TokenOpt) *TokenSource
NewTokenSource creates a new TokenSource reading from r.
func (*TokenSource) CurrentPath ¶ added in v0.0.10
func (ts *TokenSource) CurrentPath() string
CurrentPath returns the current kinded path from root (e.g., "", "key", "key[0]", "a.b"). Only tracks bracketed structures (objects with {}/arrays with []). Block-style arrays and objects are not tracked.
func (*TokenSource) Depth ¶ added in v0.0.10
func (ts *TokenSource) Depth() int
Depth returns the current bracket nesting depth.
func (*TokenSource) Read ¶ added in v0.0.10
func (ts *TokenSource) Read() ([]Token, error)
Read reads tokens from the stream. It returns tokens until:
- A complete token (or tokens) is found
- EOF is reached
- An error occurs
When EOF is reached, Read will return any remaining tokens and then return (nil, io.EOF) on subsequent calls.
Some constructs, such as multiline strings, are encoded as sequences of tokens.
type TokenizeErr ¶
func NewTokenizeErr ¶
func NewTokenizeErr(e error, p *Pos) *TokenizeErr
func (*TokenizeErr) Error ¶
func (e *TokenizeErr) Error() string
func (*TokenizeErr) Unwrap ¶
func (t *TokenizeErr) Unwrap() error
type Tokenizer ¶ added in v0.0.12
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer provides stateful tokenization with proper buffer management and trailing whitespace tracking. It supports both streaming (io.Reader) and non-streaming ([]byte) modes.
func NewTokenizer ¶ added in v0.0.12
NewTokenizer creates a new Tokenizer for streaming mode (from io.Reader).
func NewTokenizerFromBytes ¶ added in v0.0.12
NewTokenizerFromBytes creates a new Tokenizer for non-streaming mode (from []byte).
func (*Tokenizer) Read ¶ added in v0.0.12
Read reads the next chunk of data from the source. For streaming mode: reads from io.Reader, accumulates trailing whitespace. For non-streaming mode: returns remaining bytes from doc.
Returns:
- data: bytes read (with trailing whitespace from previous read prepended if any)
- startOffset: absolute offset where this data starts in the stream
- err: io.EOF when no more data, or other error
func (*Tokenizer) TokenizeOne ¶ added in v0.0.12
func (t *Tokenizer) TokenizeOne(data []byte, pos int, bufferStartOffset int64) ([]Token, int, error)
TokenizeOne tokenizes one or more tokens from a buffer slice. This is the core tokenization logic, adapted to use Tokenizer's state and lineStartOffset for comment prefix calculation (no recentBuf/docPrefix fallback).
Parameters:
- data: buffer slice to tokenize from (may be partial document)
- pos: current offset within buffer (relative offset, 0-based)
- bufferStartOffset: absolute offset where buffer starts in stream (for PosDoc and lineStartOffset calculation)
Returns:
- tokens: slice of tokens found (empty slice for whitespace)
- consumed: number of bytes consumed from buffer
- error: any error encountered, or io.EOF if need more buffer