tokenizer

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 15, 2025 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package tokenizer provides a high-performance SQL tokenizer with zero-copy operations

Index

Constants

View Source
const (
	// MaxInputSize is the maximum allowed input size in bytes (10MB)
	// This prevents DoS attacks via extremely large SQL queries
	MaxInputSize = 10 * 1024 * 1024 // 10MB

	// MaxTokens is the maximum number of tokens allowed in a single SQL query
	// This prevents DoS attacks via token explosion
	MaxTokens = 1000000 // 1M tokens
)

Variables

This section is empty.

Functions

func PutTokenizer

func PutTokenizer(t *Tokenizer)

PutTokenizer returns a Tokenizer to the pool

Types

type BufferPool

type BufferPool struct {
	// contains filtered or unexported fields
}

BufferPool manages a pool of reusable byte buffers for token content

func NewBufferPool

func NewBufferPool() *BufferPool

NewBufferPool creates a new buffer pool with optimized initial capacity

func (*BufferPool) Get

func (p *BufferPool) Get() []byte

Get retrieves a buffer from the pool

func (*BufferPool) Grow

func (p *BufferPool) Grow(buf []byte, n int) []byte

Grow ensures the buffer has enough capacity

func (*BufferPool) Put

func (p *BufferPool) Put(buf []byte)

Put returns a buffer to the pool

type DebugLogger

type DebugLogger interface {
	Debug(format string, args ...interface{})
}

DebugLogger is an interface for debug logging

type Error

type Error struct {
	Message  string
	Location models.Location
}

Error represents a tokenization error with location information

func ErrorInvalidIdentifier

func ErrorInvalidIdentifier(value string, location models.Location) *Error

ErrorInvalidIdentifier creates an error for an invalid identifier

func ErrorInvalidNumber

func ErrorInvalidNumber(value string, location models.Location) *Error

ErrorInvalidNumber creates an error for an invalid number format

func ErrorInvalidOperator

func ErrorInvalidOperator(value string, location models.Location) *Error

ErrorInvalidOperator creates an error for an invalid operator

func ErrorUnexpectedChar

func ErrorUnexpectedChar(ch byte, location models.Location) *Error

ErrorUnexpectedChar creates an error for an unexpected character

func ErrorUnterminatedString

func ErrorUnterminatedString(location models.Location) *Error

ErrorUnterminatedString creates an error for an unterminated string

func NewError

func NewError(message string, location models.Location) *Error

NewError creates a new tokenization error

func (*Error) Error

func (e *Error) Error() string

type Position

type Position struct {
	Line   int
	Index  int
	Column int
	LastNL int // byte offset of last newline
}

Position tracks our scanning cursor with optimized tracking - Line is 1-based - Index is 0-based - Column is 1-based - LastNL tracks the last newline for efficient column calculation

func NewPosition

func NewPosition(line, index int) Position

NewPosition builds a Position from raw info

func (*Position) AdvanceN

func (p *Position) AdvanceN(n int, lineStarts []int)

AdvanceN moves forward by n bytes

func (*Position) AdvanceRune

func (p *Position) AdvanceRune(r rune, size int)

Advance moves us forward by the given rune, updating line/col efficiently

func (Position) Clone

func (p Position) Clone() Position

Clone makes a copy of Position

func (Position) Location

func (p Position) Location(t *Tokenizer) models.Location

Location gives the models.Location for this position

type StringLiteralReader

type StringLiteralReader struct {
	// contains filtered or unexported fields
}

StringLiteralReader handles reading of string literals with proper escape sequence handling

func NewStringLiteralReader

func NewStringLiteralReader(input []byte, pos *Position, quote rune) *StringLiteralReader

NewStringLiteralReader creates a new StringLiteralReader

func (*StringLiteralReader) ReadStringLiteral

func (r *StringLiteralReader) ReadStringLiteral() (models.Token, error)

ReadStringLiteral reads a string literal with proper escape sequence handling

type Tokenizer

type Tokenizer struct {
	// contains filtered or unexported fields
}

Tokenizer provides high-performance SQL tokenization with zero-copy operations

func GetTokenizer

func GetTokenizer() *Tokenizer

GetTokenizer gets a Tokenizer from the pool

func New

func New() (*Tokenizer, error)

New creates a new Tokenizer with default configuration

func NewWithKeywords

func NewWithKeywords(kw *keywords.Keywords) (*Tokenizer, error)

NewWithKeywords initializes a Tokenizer with custom keywords

func (*Tokenizer) Reset

func (t *Tokenizer) Reset()

Reset resets a Tokenizer's state for reuse

func (*Tokenizer) SetDebugLogger

func (t *Tokenizer) SetDebugLogger(logger DebugLogger)

SetDebugLogger sets a debug logger for verbose tracing

func (*Tokenizer) Tokenize

func (t *Tokenizer) Tokenize(input []byte) ([]models.TokenWithSpan, error)

Tokenize processes the input and returns tokens

func (*Tokenizer) TokenizeContext added in v1.5.0

func (t *Tokenizer) TokenizeContext(ctx context.Context, input []byte) ([]models.TokenWithSpan, error)

TokenizeContext processes the input and returns tokens with context support for cancellation. It checks the context at regular intervals (every 100 tokens) to enable fast cancellation. Returns context.Canceled or context.DeadlineExceeded when the context is cancelled.

This method is useful for:

  • Long-running tokenization operations that need to be cancellable
  • Implementing timeouts for tokenization
  • Graceful shutdown scenarios

Example:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
tokens, err := tokenizer.TokenizeContext(ctx, []byte(sql))
if err == context.DeadlineExceeded {
    // Handle timeout
}

type TokenizerError

type TokenizerError struct {
	Message  string
	Location models.Location
}

TokenizerError is a simple error wrapper

func (TokenizerError) Error

func (e TokenizerError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL