Documentation
¶
Overview ¶
Package tokenizer provides a high-performance SQL tokenizer with zero-copy operations
Index ¶
- Constants
- func PutTokenizer(t *Tokenizer)
- type BufferPool
- type DebugLogger
- type Error
- func ErrorInvalidIdentifier(value string, location models.Location) *Error
- func ErrorInvalidNumber(value string, location models.Location) *Error
- func ErrorInvalidOperator(value string, location models.Location) *Error
- func ErrorUnexpectedChar(ch byte, location models.Location) *Error
- func ErrorUnterminatedString(location models.Location) *Error
- func NewError(message string, location models.Location) *Error
- type Position
- type StringLiteralReader
- type Tokenizer
- type TokenizerError
Constants ¶
const ( // MaxInputSize is the maximum allowed input size in bytes (10MB) // This prevents DoS attacks via extremely large SQL queries MaxInputSize = 10 * 1024 * 1024 // 10MB // MaxTokens is the maximum number of tokens allowed in a single SQL query // This prevents DoS attacks via token explosion MaxTokens = 1000000 // 1M tokens )
Variables ¶
This section is empty.
Functions ¶
Types ¶
type BufferPool ¶
type BufferPool struct {
// contains filtered or unexported fields
}
BufferPool manages a pool of reusable byte buffers for token content
func NewBufferPool ¶
func NewBufferPool() *BufferPool
NewBufferPool creates a new buffer pool with optimized initial capacity
type DebugLogger ¶
type DebugLogger interface {
Debug(format string, args ...interface{})
}
DebugLogger is an interface for debug logging
type Error ¶
Error represents a tokenization error with location information
func ErrorInvalidIdentifier ¶
ErrorInvalidIdentifier creates an error for an invalid identifier
func ErrorInvalidNumber ¶
ErrorInvalidNumber creates an error for an invalid number format
func ErrorInvalidOperator ¶
ErrorInvalidOperator creates an error for an invalid operator
func ErrorUnexpectedChar ¶
ErrorUnexpectedChar creates an error for an unexpected character
func ErrorUnterminatedString ¶
ErrorUnterminatedString creates an error for an unterminated string
type Position ¶
Position tracks our scanning cursor with optimized tracking - Line is 1-based - Index is 0-based - Column is 1-based - LastNL tracks the last newline for efficient column calculation
func NewPosition ¶
NewPosition builds a Position from raw info
func (*Position) AdvanceRune ¶
Advance moves us forward by the given rune, updating line/col efficiently
type StringLiteralReader ¶
type StringLiteralReader struct {
// contains filtered or unexported fields
}
StringLiteralReader handles reading of string literals with proper escape sequence handling
func NewStringLiteralReader ¶
func NewStringLiteralReader(input []byte, pos *Position, quote rune) *StringLiteralReader
NewStringLiteralReader creates a new StringLiteralReader
func (*StringLiteralReader) ReadStringLiteral ¶
func (r *StringLiteralReader) ReadStringLiteral() (models.Token, error)
ReadStringLiteral reads a string literal with proper escape sequence handling
type Tokenizer ¶
type Tokenizer struct {
// contains filtered or unexported fields
}
Tokenizer provides high-performance SQL tokenization with zero-copy operations
func NewWithKeywords ¶
NewWithKeywords initializes a Tokenizer with custom keywords
func (*Tokenizer) SetDebugLogger ¶
func (t *Tokenizer) SetDebugLogger(logger DebugLogger)
SetDebugLogger sets a debug logger for verbose tracing
func (*Tokenizer) Tokenize ¶
func (t *Tokenizer) Tokenize(input []byte) ([]models.TokenWithSpan, error)
Tokenize processes the input and returns tokens
func (*Tokenizer) TokenizeContext ¶ added in v1.5.0
func (t *Tokenizer) TokenizeContext(ctx context.Context, input []byte) ([]models.TokenWithSpan, error)
TokenizeContext processes the input and returns tokens with context support for cancellation. It checks the context at regular intervals (every 100 tokens) to enable fast cancellation. Returns context.Canceled or context.DeadlineExceeded when the context is cancelled.
This method is useful for:
- Long-running tokenization operations that need to be cancellable
- Implementing timeouts for tokenization
- Graceful shutdown scenarios
Example:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
tokens, err := tokenizer.TokenizeContext(ctx, []byte(sql))
if err == context.DeadlineExceeded {
// Handle timeout
}
type TokenizerError ¶
TokenizerError is a simple error wrapper
func (TokenizerError) Error ¶
func (e TokenizerError) Error() string