sqlflowparser

package
v0.0.0-...-abe9ea0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 8, 2025 License: MIT Imports: 12 Imported by: 0

README

SQL Flow Parser Package

A Go package for parsing individual MySQL SQL statements into Abstract Syntax Trees (AST), with focus on stored procedures, functions, and control flow statements. This parser provides structured representation of MySQL syntax for analysis and instrumentation.

Important: This parser handles one statement at a time. For parsing entire SQL files with multiple statements, use @pkg/sqlsplitter first to split the file into individual statements.

Features

  • Stored Procedures: Parses CREATE PROCEDURE statements with parameters and body
  • Functions: Parses CREATE FUNCTION statements with parameters, return types, and body
  • Control Flow: Supports IF, WHILE, LOOP, REPEAT, CASE statements
  • Variables: Handles DECLARE and SET statements
  • Flow Control: Parses LEAVE, ITERATE, RETURN, and SIGNAL statements
  • Generic Expressions: Simplified expression parsing for conditions and values
  • Position Tracking: Every AST node includes line number, column, and byte offset information
  • Error Handling: Provides detailed error messages with line/column information

Installation

# For single statement parsing
go get github.com/eiiches/mysql-coverage/pkg/sqlflowparser

# For multi-statement files (recommended)
go get github.com/eiiches/mysql-coverage/pkg/sqlflowparser
go get github.com/eiiches/mysql-coverage/pkg/sqlsplitter

Usage

Single Statement Parsing
package main

import (
    "fmt"
    "log"
    
    "github.com/eiiches/mysql-coverage/pkg/sqlflowparser"
)

func main() {
    // Parse a single CREATE PROCEDURE statement
    input := `CREATE PROCEDURE test_proc(IN id INT, OUT result VARCHAR(100))
BEGIN
    DECLARE temp VARCHAR(50);
    SET temp = 'Hello';
    
    IF id > 0 THEN
        SET result = CONCAT(temp, ' World');
    ELSE
        SET result = temp;
    END IF;
END`

    ast, err := sqlflowparser.Parse("", []byte(input))
    if err != nil {
        log.Fatal(err)
    }

    if proc, ok := ast.(*sqlflowparser.CreateProcedureStmt); ok {
        fmt.Printf("Procedure: %s\n", proc.Name)
        fmt.Printf("Parameters: %d\n", len(proc.Parameters))
        fmt.Printf("Body statements: %d\n", len(proc.Body))
    }
}
Multi-Statement Files (with sqlsplitter)
package main

import (
    "fmt"
    "log"
    "os"
    
    "github.com/eiiches/mysql-coverage/pkg/sqlsplitter"
    "github.com/eiiches/mysql-coverage/pkg/sqlflowparser"
)

func main() {
    // Read SQL file with multiple statements
    content, err := os.ReadFile("procedures.sql")
    if err != nil {
        log.Fatal(err)
    }

    // First, split into individual statements
    parser := sqlsplitter.NewParser(content)
    statements, err := parser.Parse()
    if err != nil {
        log.Fatal(err)
    }

    // Then parse each SQL statement into AST
    for i, stmt := range statements {
        if stmt.Type == "SQL" {
            ast, err := sqlflowparser.Parse("", []byte(stmt.Text))
            if err != nil {
                fmt.Printf("Failed to parse statement %d: %v\n", i+1, err)
                continue
            }
            
            switch s := ast.(type) {
            case *sqlflowparser.CreateProcedureStmt:
                fmt.Printf("Found procedure: %s\n", s.Name)
            case *sqlflowparser.CreateFunctionStmt:
                fmt.Printf("Found function: %s\n", s.Name)
            }
        }
    }
}

API

Functions
Parse
func Parse(filename string, b []byte, opts ...Option) (any, error)

Parses MySQL SQL input and returns an AST node. The filename parameter is used for error reporting.

Types
AST Interface
type AST interface {
    astNode()
    GetPosition() Position
}

Base interface for all AST nodes. Every AST node provides position information.

StatementAST Interface
type StatementAST interface {
    AST
    statementNode()
}

Interface for all statement nodes.

ExpressionAST Interface
type ExpressionAST interface {
    AST
    expressionNode()
}

Interface for all expression nodes.

Statement Types
CreateProcedureStmt
type CreateProcedureStmt struct {
    BaseStatement
    Name       string
    Parameters []Parameter
    Body       []StatementAST
}
CreateFunctionStmt
type CreateFunctionStmt struct {
    BaseStatement
    Name       string
    Parameters []Parameter
    ReturnType string
    Body       []StatementAST
}
IfStmt
type IfStmt struct {
    BaseStatement
    Condition string
    Then      []StatementAST
    ElseIfs   []ElseIfClause
    Else      []StatementAST
}
WhileStmt
type WhileStmt struct {
    BaseStatement
    Condition string
    Body      []StatementAST
}
Parameter
type Parameter struct {
    Name string
    Type string
    Mode string // IN, OUT, INOUT
}

Output

The parser returns different AST node types based on the input:

  • CreateProcedureStmt: For CREATE PROCEDURE statements
  • CreateFunctionStmt: For CREATE FUNCTION statements
  • IfStmt: For IF statements
  • WhileStmt: For WHILE statements
  • LoopStmt: For LOOP statements
  • RepeatStmt: For REPEAT statements
  • CaseStmt: For CASE statements
  • BeginStmt: For BEGIN...END blocks
  • DeclareStmt: For DECLARE statements
  • LeaveStmt: For LEAVE statements
  • IterateStmt: For ITERATE statements
  • ReturnStmt: For RETURN statements
  • GenericStmt: For other SQL statements (including SIGNAL, SET, SELECT, CALL, etc.)
Expression Handling

Expressions (conditions, values) are represented as strings with the original text preserved, allowing for simplified parsing while maintaining the ability to access the original expression content.

Examples

CREATE PROCEDURE
CREATE PROCEDURE test_proc(IN id INT, OUT name VARCHAR(50))
BEGIN
    DECLARE temp VARCHAR(100);
    SET temp = 'Hello';
    SELECT temp;
END

Output:

CreateProcedureStmt{
    Name: "test_proc",
    Parameters: []Parameter{
        {Name: "id", Type: "INT", Mode: "IN"},
        {Name: "name", Type: "VARCHAR(50)", Mode: "OUT"}
    },
    Body: []StatementAST{
        DeclareStmt{...},
        GenericStmt{Text: "SET temp = 'Hello'"},
        GenericStmt{Text: "SELECT temp"}
    }
}
CREATE FUNCTION
CREATE FUNCTION calc_tax(amount DECIMAL(10,2))
RETURNS DECIMAL(10,2)
BEGIN
    RETURN amount * 0.1;
END

Output:

CreateFunctionStmt{
    Name: "calc_tax",
    Parameters: []Parameter{
        {Name: "amount", Type: "DECIMAL(10,2)", Mode: ""}
    },
    ReturnType: "DECIMAL(10,2)",
    Body: []StatementAST{
        ReturnStmt{...}
    }
}
IF Statement
IF x > 0 THEN
    SET result = 'positive';
ELSEIF x < 0 THEN
    SET result = 'negative';
ELSE
    SET result = 'zero';
END IF

Output:

IfStmt{
    Condition: "x > 0",
    Then: []StatementAST{GenericStmt{Text: "SET result = 'positive'"}},
    ElseIfs: []ElseIfClause{
        {Condition: "x < 0", Then: []StatementAST{GenericStmt{Text: "SET result = 'negative'"}}}
    },
    Else: []StatementAST{GenericStmt{Text: "SET result = 'zero'"}}
}
WHILE Loop
WHILE counter < 10 DO
    SET counter = counter + 1;
    SELECT counter;
END WHILE

Output:

WhileStmt{
    Condition: "counter < 10",
    Body: []StatementAST{
        GenericStmt{Text: "SET counter = counter + 1"},
        GenericStmt{Text: "SELECT counter"}
    }
}

Testing

The package includes comprehensive unit tests using Gomega:

go test ./pkg/sqlflowparser -v

Design

This parser is built using PEG (Parsing Expression Grammar) via the pigeon parser generator. It focuses on:

  • Single statement parsing: Designed to parse one MySQL statement at a time
  • Structural parsing: Extracts procedure/function definitions and control flow
  • Statement boundaries: Identifies nested statement blocks within procedures/functions
  • Parameter extraction: Parses function/procedure signatures
  • Control flow analysis: Supports all MySQL control flow constructs

It does not provide:

  • Multi-statement parsing: Cannot handle entire SQL files with multiple statements
  • Delimiter handling: No support for DELIMITER statements or custom delimiters
  • Detailed expression parsing: Expressions are kept as text for simplicity
  • Full SQL validation: Focuses on structure rather than semantic correctness
  • Complex type analysis: Types are preserved as strings
Architecture

The typical workflow combines two packages:

  1. @pkg/sqlsplitter: First pass - splits SQL files into individual statements
  2. @pkg/sqlflowparser: Second pass - parses each statement into structured AST

This separation of concerns allows:

  • sqlsplitter to handle file-level concerns (delimiters, comments, string literals)
  • sqlflowparser to focus on statement-level syntax analysis

This makes it ideal for tools that need to analyze stored procedure structure, instrument code, or perform static analysis on MySQL stored procedures and functions.

Grammar

The parser is generated from a PEG grammar file (mysql_ast.peg) that defines the MySQL syntax patterns. The grammar focuses on the subset of MySQL relevant for stored procedures and functions.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Parse

func Parse(filename string, b []byte, opts ...Option) (any, error)

Parse parses the data from b using filename as information in the error messages.

func ParseFile

func ParseFile(filename string, opts ...Option) (i any, err error)

ParseFile parses the file identified by filename.

func ParseReader

func ParseReader(filename string, r io.Reader, opts ...Option) (any, error)

ParseReader parses the data from r using filename as information in the error messages.

Types

type AST

type AST interface {
	GetPosition() Position
	// contains filtered or unexported methods
}

AST represents the Abstract Syntax Tree for MySQL statements

type BaseStatement

type BaseStatement struct {
	Pos   Position
	Label string
	Text  string // Optional text content for statements that need it
}

BaseStatement contains common fields for all statements

func (*BaseStatement) GetLabel

func (s *BaseStatement) GetLabel() string

func (BaseStatement) GetPosition

func (s BaseStatement) GetPosition() Position

Implement GetPosition method for all AST nodes

func (*BaseStatement) SetLabel

func (s *BaseStatement) SetLabel(label string)

type BeginStmt

type BeginStmt struct {
	BaseStatement
	Body []StatementAST
}

BeginStmt represents BEGIN...END block

type CaseStmt

type CaseStmt struct {
	BaseStatement
	Expression  string
	WhenClauses []WhenClause
	Else        []StatementAST
}

CaseStmt represents CASE statement

type Cloner

type Cloner interface {
	Clone() any
}

Cloner is implemented by any value that has a Clone method, which returns a copy of the value. This is mainly used for types which are not passed by value (e.g map, slice, chan) or structs that contain such types.

This is used in conjunction with the global state feature to create proper copies of the state to allow the parser to properly restore the state in the case of backtracking.

type CreateFunctionStmt

type CreateFunctionStmt struct {
	BaseStatement
	Name       string
	Parameters []Parameter
	ReturnType string
	Body       []StatementAST
}

CreateFunctionStmt represents CREATE FUNCTION statement

type CreateProcedureStmt

type CreateProcedureStmt struct {
	BaseStatement
	Name       string
	Parameters []Parameter
	Body       []StatementAST
}

CreateProcedureStmt represents CREATE PROCEDURE statement

type DeclareStmt

type DeclareStmt struct {
	BaseStatement
}

DeclareStmt represents DECLARE statement

type ElseIfClause

type ElseIfClause struct {
	BaseStatement
	Condition string
	Then      []StatementAST
}

ElseIfClause represents ELSEIF clause

type GenericStmt

type GenericStmt struct {
	BaseStatement
}

GenericStmt represents any other SQL statement (SELECT, INSERT, etc.)

type IfStmt

type IfStmt struct {
	BaseStatement
	Condition string
	Then      []StatementAST
	ElseIfs   []ElseIfClause
	Else      []StatementAST
}

IfStmt represents IF statement

type IterateStmt

type IterateStmt struct {
	BaseStatement
}

IterateStmt represents ITERATE statement

type LeaveStmt

type LeaveStmt struct {
	BaseStatement
}

LeaveStmt represents LEAVE statement

type LoopStmt

type LoopStmt struct {
	BaseStatement
	Body []StatementAST
}

LoopStmt represents LOOP statement

type Option

type Option func(*parser) Option

Option is a function that can set an option on the parser. It returns the previous setting as an Option.

func AllowInvalidUTF8

func AllowInvalidUTF8(b bool) Option

AllowInvalidUTF8 creates an Option to allow invalid UTF-8 bytes. Every invalid UTF-8 byte is treated as a utf8.RuneError (U+FFFD) by character class matchers and is matched by the any matcher. The returned matched value, c.text and c.offset are NOT affected.

The default is false.

func Debug

func Debug(b bool) Option

Debug creates an Option to set the debug flag to b. When set to true, debugging information is printed to stdout while parsing.

The default is false.

func Entrypoint

func Entrypoint(ruleName string) Option

Entrypoint creates an Option to set the rule name to use as entrypoint. The rule name must have been specified in the -alternate-entrypoints if generating the parser with the -optimize-grammar flag, otherwise it may have been optimized out. Passing an empty string sets the entrypoint to the first rule in the grammar.

The default is to start parsing at the first rule in the grammar.

func GlobalStore

func GlobalStore(key string, value any) Option

GlobalStore creates an Option to set a key to a certain value in the globalStore.

func InitState

func InitState(key string, value any) Option

InitState creates an Option to set a key to a certain value in the global "state" store.

func MaxExpressions

func MaxExpressions(maxExprCnt uint64) Option

MaxExpressions creates an Option to stop parsing after the provided number of expressions have been parsed, if the value is 0 then the parser will parse for as many steps as needed (possibly an infinite number).

The default for maxExprCnt is 0.

func Memoize

func Memoize(b bool) Option

Memoize creates an Option to set the memoize flag to b. When set to true, the parser will cache all results so each expression is evaluated only once. This guarantees linear parsing time even for pathological cases, at the expense of more memory and slower times for typical cases.

The default is false.

func Recover

func Recover(b bool) Option

Recover creates an Option to set the recover flag to b. When set to true, this causes the parser to recover from panics and convert it to an error. Setting it to false can be useful while debugging to access the full stack trace.

The default is true.

func Statistics

func Statistics(stats *Stats, choiceNoMatch string) Option

Statistics adds a user provided Stats struct to the parser to allow the user to process the results after the parsing has finished. Also the key for the "no match" counter is set.

Example usage:

input := "input"
stats := Stats{}
_, err := Parse("input-file", []byte(input), Statistics(&stats, "no match"))
if err != nil {
    log.Panicln(err)
}
b, err := json.MarshalIndent(stats.ChoiceAltCnt, "", "  ")
if err != nil {
    log.Panicln(err)
}
fmt.Println(string(b))

type Parameter

type Parameter struct {
	Name string
	Type string
	Mode string // IN, OUT, INOUT
}

Parameter represents procedure/function parameter

type Position

type Position struct {
	Line   int // Line number (1-based)
	Column int // Column number (1-based)
	Offset int // Byte offset in the source
}

Position represents the position of a statement in the source code

type RepeatStmt

type RepeatStmt struct {
	BaseStatement
	Body      []StatementAST
	Condition string
}

RepeatStmt represents REPEAT statement

type ReturnStmt

type ReturnStmt struct {
	BaseStatement
}

ReturnStmt represents RETURN statement

type StatementAST

type StatementAST interface {
	AST

	// SetLabel sets the label for the statement
	SetLabel(label string)
	// GetLabel gets the label for the statement
	GetLabel() string
	// contains filtered or unexported methods
}

StatementAST represents any SQL statement

type Stats

type Stats struct {
	// ExprCnt counts the number of expressions processed during parsing
	// This value is compared to the maximum number of expressions allowed
	// (set by the MaxExpressions option).
	ExprCnt uint64

	// ChoiceAltCnt is used to count for each ordered choice expression,
	// which alternative is used how may times.
	// These numbers allow to optimize the order of the ordered choice expression
	// to increase the performance of the parser
	//
	// The outer key of ChoiceAltCnt is composed of the name of the rule as well
	// as the line and the column of the ordered choice.
	// The inner key of ChoiceAltCnt is the number (one-based) of the matching alternative.
	// For each alternative the number of matches are counted. If an ordered choice does not
	// match, a special counter is incremented. The name of this counter is set with
	// the parser option Statistics.
	// For an alternative to be included in ChoiceAltCnt, it has to match at least once.
	ChoiceAltCnt map[string]map[string]int
}

Stats stores some statistics, gathered during parsing

type WhenClause

type WhenClause struct {
	BaseStatement
	Condition string
	Then      []StatementAST
}

WhenClause represents WHEN clause in CASE statement

type WhileStmt

type WhileStmt struct {
	BaseStatement
	Condition string
	Body      []StatementAST
}

WhileStmt represents WHILE statement

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL