parser

package

v1.6.0 Latest Latest Go to latest Published: Dec 11, 2025 License: AGPL-3.0 Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ajitpratap0/GoSQLX

Links

Open Source Insights

README ¶

SQL Parser Package

Overview

The parser package provides a production-ready, recursive descent SQL parser that converts tokenized SQL into an Abstract Syntax Tree (AST). It supports comprehensive SQL features across multiple dialects with ~80-85% SQL-99 compliance.

Key Features

DML Operations: SELECT, INSERT, UPDATE, DELETE with full clause support
DDL Operations: CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX
Advanced SQL: CTEs (WITH), set operations (UNION/EXCEPT/INTERSECT), window functions
JOINs: All types (INNER, LEFT, RIGHT, FULL, CROSS, NATURAL) with proper left-associative parsing
Window Functions: PARTITION BY, ORDER BY, frame clauses (ROWS/RANGE)
SQL-99 F851: NULLS FIRST/LAST support in ORDER BY clauses
Object Pooling: Memory-efficient parser instance reuse
Context Support: Cancellation and timeout handling

Usage

Basic Parsing

package main

import (
    "github.com/ajitpratap0/GoSQLX/pkg/sql/parser"
    "github.com/ajitpratap0/GoSQLX/pkg/sql/token"
)

func main() {
    // Create parser from pool
    p := parser.NewParser()
    defer p.Release()  // ALWAYS release back to pool

    // Parse tokens into AST
    tokens := []token.Token{ /* your tokens */ }
    astNode, err := p.Parse(tokens)
    if err != nil {
        // Handle parsing error
    }

    // Work with AST
    // ...
}

Context-Aware Parsing

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)
if err != nil {
    if ctx.Err() != nil {
        // Handle timeout/cancellation
    }
    // Handle parse error
}

Architecture

Core Components

parser.go (1,628 lines): Main parser with all parsing logic
alter.go (368 lines): DDL ALTER statement parsing
token_converter.go (~200 lines): Token type conversion utilities

Parsing Flow

Tokens → Parse() → parseStatement() → Specific statement parser → AST Node

Recursion Protection

Maximum recursion depth: 100 levels

Protects against:

Deeply nested CTEs
Excessive subquery nesting
Stack overflow attacks

Supported SQL Features

Phase 1 (v1.0.0) - Core DML

SELECT with FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET
All JOIN types with proper precedence
INSERT (single/multi-row)
UPDATE with SET and WHERE
DELETE with WHERE

Phase 2 (v1.2.0) - Advanced Features

Common Table Expressions (WITH clause)
Recursive CTEs with depth protection
Set operations: UNION [ALL], EXCEPT, INTERSECT
CTE column specifications

Phase 2.5 (v1.3.0) - Window Functions

Ranking: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
Analytic: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
PARTITION BY and ORDER BY
Frame clauses: ROWS/RANGE with bounds

Phase 2.6 (v1.5.0) - NULL Ordering

NULLS FIRST/LAST in ORDER BY
NULLS FIRST/LAST in window ORDER BY
Database portability for NULL ordering

Performance Characteristics

Throughput: 1.5M operations/second (peak), 1.38M sustained
Memory: Object pooling provides 60-80% reduction vs. new instances
Latency: <1μs for complex queries with window functions
Thread Safety: All pool operations are race-free

Error Handling

astNode, err := p.Parse(tokens)
if err != nil {
    if parseErr, ok := err.(*parser.ParseError); ok {
        fmt.Printf("Parse error at token '%s': %s\n",
            parseErr.Token.Literal, parseErr.Message)
    }
}

Testing

Run parser tests:

# All tests
go test -v ./pkg/sql/parser/

# With race detection
go test -race ./pkg/sql/parser/

# Specific features
go test -v -run TestParser_.*Window ./pkg/sql/parser/
go test -v -run TestParser_.*CTE ./pkg/sql/parser/
go test -v -run TestParser_.*Join ./pkg/sql/parser/

# Performance benchmarks
go test -bench=BenchmarkParser -benchmem ./pkg/sql/parser/

Best Practices

1. Always Use Defer

p := parser.NewParser()
defer p.Release()  // Ensures cleanup even on panic

2. Don't Store Pooled Instances

// BAD: Storing pooled object
type MyStruct struct {
    parser *Parser  // DON'T DO THIS
}

// GOOD: Get from pool when needed
func ParseSQL(tokens []token.Token) (*ast.AST, error) {
    p := parser.NewParser()
    defer p.Release()
    return p.Parse(tokens)
}

3. Use Context for Long Operations

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

p := parser.NewParser()
defer p.Release()

astNode, err := p.ParseContext(ctx, tokens)

Common Pitfalls

❌ Forgetting to Release

// BAD: Memory leak
p := parser.NewParser()
astNode, _ := p.Parse(tokens)
// p never returned to pool

✅ Correct Pattern

// GOOD: Automatic cleanup
p := parser.NewParser()
defer p.Release()
astNode, err := p.Parse(tokens)

tokenizer: Converts SQL text to tokens (input to parser)
ast: AST node definitions (output from parser)
token: Token type definitions
keywords: SQL keyword classification

Documentation

Version History

v1.5.0: NULLS FIRST/LAST support (SQL-99 F851)
v1.4.0: Production validation complete
v1.3.0: Window functions (Phase 2.5)
v1.2.0: CTEs and set operations (Phase 2)
v1.0.0: Core DML and JOINs (Phase 1)

Documentation ¶

Overview ¶

Package parser provides a recursive descent SQL parser that converts tokens into an Abstract Syntax Tree (AST). It supports comprehensive SQL features including SELECT, INSERT, UPDATE, DELETE, DDL operations, Common Table Expressions (CTEs), set operations (UNION, EXCEPT, INTERSECT), and window functions.

Phase 2 Features (v1.2.0+):

Common Table Expressions (WITH clause) with recursive support
Set operations: UNION, UNION ALL, EXCEPT, INTERSECT
Multiple CTE definitions in single query
CTE column specifications
Left-associative set operation parsing
Integration of CTEs with set operations

Phase 2.5 Features (v1.3.0+):

Window functions with OVER clause support
PARTITION BY and ORDER BY in window specifications
Window frame clauses (ROWS/RANGE with bounds)
Ranking functions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
Analytic functions: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
Function call parsing with parentheses and arguments
Integration with existing SELECT statement parsing

Index ¶

Constants
func ConvertTokensForParser(tokens []models.TokenWithSpan) ([]token.Token, error)
func PutParser(p *Parser)
type ConversionResult
- func ConvertTokensWithPositions(tokens []models.TokenWithSpan) (*ConversionResult, error)
type Parser
- func GetParser() *Parser
- func NewParser() *Parser
type TokenConverter
- func NewTokenConverter() *TokenConverter
- func (tc *TokenConverter) Convert(tokens []models.TokenWithSpan) (*ConversionResult, error)
type TokenPosition

Constants ¶

View Source

const MaxRecursionDepth = 100

MaxRecursionDepth defines the maximum allowed recursion depth for parsing operations. This prevents stack overflow from deeply nested expressions, CTEs, or other recursive structures.

Variables ¶

This section is empty.

Functions ¶

func ConvertTokensForParser ¶ added in v1.4.0

func ConvertTokensForParser(tokens []models.TokenWithSpan) ([]token.Token, error)

ConvertTokensForParser is a convenient function that creates a converter and converts tokens This maintains backward compatibility with existing CLI code

func PutParser ¶ added in v1.6.0

func PutParser(p *Parser)

PutParser returns a Parser instance to the pool after resetting it. This should be called after parsing is complete to enable reuse.

Types ¶

type ConversionResult ¶ added in v1.4.0

type ConversionResult struct {
	Tokens          []token.Token
	PositionMapping []TokenPosition // Maps parser token index to original position
}

ConversionResult contains the converted tokens and any position mappings

func ConvertTokensWithPositions ¶ added in v1.4.0

func ConvertTokensWithPositions(tokens []models.TokenWithSpan) (*ConversionResult, error)

ConvertTokensWithPositions provides both tokens and position mapping for enhanced error reporting

type Parser ¶

type Parser struct {
	// contains filtered or unexported fields
}

Parser represents a SQL parser

func GetParser ¶ added in v1.6.0

func GetParser() *Parser

GetParser returns a Parser instance from the pool. The caller must call PutParser when done to return it to the pool.

func NewParser ¶

func NewParser() *Parser

NewParser creates a new parser

func (*Parser) Parse ¶

func (p *Parser) Parse(tokens []token.Token) (*ast.AST, error)

Parse parses the tokens into an AST Uses fast ModelType (int) comparisons for hot path optimization

func (*Parser) ParseContext ¶ added in v1.5.0

func (p *Parser) ParseContext(ctx context.Context, tokens []token.Token) (*ast.AST, error)

ParseContext parses the tokens into an AST with context support for cancellation. It checks the context at strategic points (every statement and expression) to enable fast cancellation. Returns context.Canceled or context.DeadlineExceeded when the context is cancelled.

This method is useful for:

Long-running parsing operations that need to be cancellable
Implementing timeouts for parsing
Graceful shutdown scenarios

Example:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
astNode, err := parser.ParseContext(ctx, tokens)
if err == context.DeadlineExceeded {
    // Handle timeout
}

func (*Parser) ParseWithPositions ¶ added in v1.6.0

func (p *Parser) ParseWithPositions(result *ConversionResult) (*ast.AST, error)

ParseWithPositions parses tokens with position tracking for enhanced error reporting. This method accepts a ConversionResult from the token converter, which includes both the converted tokens and their original source positions. Errors generated during parsing will include accurate line/column information.

func (*Parser) Release ¶

func (p *Parser) Release()

Release releases any resources held by the parser

func (*Parser) Reset ¶ added in v1.6.0

func (p *Parser) Reset()

Reset clears the parser state for reuse from the pool.

type TokenConverter ¶ added in v1.4.0

type TokenConverter struct {
	// contains filtered or unexported fields
}

TokenConverter provides centralized, optimized token conversion from tokenizer output (models.TokenWithSpan) to parser input (token.Token)

func NewTokenConverter ¶ added in v1.4.0

func NewTokenConverter() *TokenConverter

NewTokenConverter creates an optimized token converter

func (*TokenConverter) Convert ¶ added in v1.4.0

func (tc *TokenConverter) Convert(tokens []models.TokenWithSpan) (*ConversionResult, error)

Convert converts tokenizer tokens to parser tokens with position tracking

type TokenPosition ¶ added in v1.4.0

type TokenPosition struct {
	OriginalIndex int                   // Index in original token slice
	Start         models.Location       // Original start position
	End           models.Location       // Original end position
	SourceToken   *models.TokenWithSpan // Reference to original token for error reporting
}

TokenPosition maps a parser token back to its original source position

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL