sqlflowparser

package

v0.0.0-...-abe9ea0 Latest Latest Go to latest Published: Aug 8, 2025 License: MIT Imports: 12 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/eiiches/mysql-protobuf-functions

Links

Open Source Insights

README ¶

SQL Flow Parser Package

A Go package for parsing individual MySQL SQL statements into Abstract Syntax Trees (AST), with focus on stored procedures, functions, and control flow statements. This parser provides structured representation of MySQL syntax for analysis and instrumentation.

Important: This parser handles one statement at a time. For parsing entire SQL files with multiple statements, use @pkg/sqlsplitter first to split the file into individual statements.

Features

Stored Procedures: Parses CREATE PROCEDURE statements with parameters and body
Functions: Parses CREATE FUNCTION statements with parameters, return types, and body
Control Flow: Supports IF, WHILE, LOOP, REPEAT, CASE statements
Variables: Handles DECLARE and SET statements
Flow Control: Parses LEAVE, ITERATE, RETURN, and SIGNAL statements
Generic Expressions: Simplified expression parsing for conditions and values
Position Tracking: Every AST node includes line number, column, and byte offset information
Error Handling: Provides detailed error messages with line/column information

Installation

# For single statement parsing
go get github.com/eiiches/mysql-coverage/pkg/sqlflowparser

# For multi-statement files (recommended)
go get github.com/eiiches/mysql-coverage/pkg/sqlflowparser
go get github.com/eiiches/mysql-coverage/pkg/sqlsplitter

Usage

Single Statement Parsing

package main

import (
    "fmt"
    "log"
    
    "github.com/eiiches/mysql-coverage/pkg/sqlflowparser"
)

func main() {
    // Parse a single CREATE PROCEDURE statement
    input := `CREATE PROCEDURE test_proc(IN id INT, OUT result VARCHAR(100))
BEGIN
    DECLARE temp VARCHAR(50);
    SET temp = 'Hello';
    
    IF id > 0 THEN
        SET result = CONCAT(temp, ' World');
    ELSE
        SET result = temp;
    END IF;
END`

    ast, err := sqlflowparser.Parse("", []byte(input))
    if err != nil {
        log.Fatal(err)
    }

    if proc, ok := ast.(*sqlflowparser.CreateProcedureStmt); ok {
        fmt.Printf("Procedure: %s\n", proc.Name)
        fmt.Printf("Parameters: %d\n", len(proc.Parameters))
        fmt.Printf("Body statements: %d\n", len(proc.Body))
    }
}

Multi-Statement Files (with sqlsplitter)

package main

import (
    "fmt"
    "log"
    "os"
    
    "github.com/eiiches/mysql-coverage/pkg/sqlsplitter"
    "github.com/eiiches/mysql-coverage/pkg/sqlflowparser"
)

func main() {
    // Read SQL file with multiple statements
    content, err := os.ReadFile("procedures.sql")
    if err != nil {
        log.Fatal(err)
    }

    // First, split into individual statements
    parser := sqlsplitter.NewParser(content)
    statements, err := parser.Parse()
    if err != nil {
        log.Fatal(err)
    }

    // Then parse each SQL statement into AST
    for i, stmt := range statements {
        if stmt.Type == "SQL" {
            ast, err := sqlflowparser.Parse("", []byte(stmt.Text))
            if err != nil {
                fmt.Printf("Failed to parse statement %d: %v\n", i+1, err)
                continue
            }
            
            switch s := ast.(type) {
            case *sqlflowparser.CreateProcedureStmt:
                fmt.Printf("Found procedure: %s\n", s.Name)
            case *sqlflowparser.CreateFunctionStmt:
                fmt.Printf("Found function: %s\n", s.Name)
            }
        }
    }
}

API

Functions

Parse

func Parse(filename string, b []byte, opts ...Option) (any, error)

Parses MySQL SQL input and returns an AST node. The filename parameter is used for error reporting.

Types

AST Interface

type AST interface {
    astNode()
    GetPosition() Position
}

Base interface for all AST nodes. Every AST node provides position information.

StatementAST Interface

type StatementAST interface {
    AST
    statementNode()
}

Interface for all statement nodes.

ExpressionAST Interface

type ExpressionAST interface {
    AST
    expressionNode()
}

Interface for all expression nodes.

Statement Types

CreateProcedureStmt

type CreateProcedureStmt struct {
    BaseStatement
    Name       string
    Parameters []Parameter
    Body       []StatementAST
}

CreateFunctionStmt

type CreateFunctionStmt struct {
    BaseStatement
    Name       string
    Parameters []Parameter
    ReturnType string
    Body       []StatementAST
}

IfStmt

type IfStmt struct {
    BaseStatement
    Condition string
    Then      []StatementAST
    ElseIfs   []ElseIfClause
    Else      []StatementAST
}

WhileStmt

type WhileStmt struct {
    BaseStatement
    Condition string
    Body      []StatementAST
}

Parameter

type Parameter struct {
    Name string
    Type string
    Mode string // IN, OUT, INOUT
}

Output

The parser returns different AST node types based on the input:

CreateProcedureStmt: For CREATE PROCEDURE statements
CreateFunctionStmt: For CREATE FUNCTION statements
IfStmt: For IF statements
WhileStmt: For WHILE statements
LoopStmt: For LOOP statements
RepeatStmt: For REPEAT statements
CaseStmt: For CASE statements
BeginStmt: For BEGIN...END blocks
DeclareStmt: For DECLARE statements
LeaveStmt: For LEAVE statements
IterateStmt: For ITERATE statements
ReturnStmt: For RETURN statements
GenericStmt: For other SQL statements (including SIGNAL, SET, SELECT, CALL, etc.)

Expression Handling

Expressions (conditions, values) are represented as strings with the original text preserved, allowing for simplified parsing while maintaining the ability to access the original expression content.

Examples

CREATE PROCEDURE

CREATE PROCEDURE test_proc(IN id INT, OUT name VARCHAR(50))
BEGIN
    DECLARE temp VARCHAR(100);
    SET temp = 'Hello';
    SELECT temp;
END

Output:

CreateProcedureStmt{
    Name: "test_proc",
    Parameters: []Parameter{
        {Name: "id", Type: "INT", Mode: "IN"},
        {Name: "name", Type: "VARCHAR(50)", Mode: "OUT"}
    },
    Body: []StatementAST{
        DeclareStmt{...},
        GenericStmt{Text: "SET temp = 'Hello'"},
        GenericStmt{Text: "SELECT temp"}
    }
}

CREATE FUNCTION

CREATE FUNCTION calc_tax(amount DECIMAL(10,2))
RETURNS DECIMAL(10,2)
BEGIN
    RETURN amount * 0.1;
END

Output:

CreateFunctionStmt{
    Name: "calc_tax",
    Parameters: []Parameter{
        {Name: "amount", Type: "DECIMAL(10,2)", Mode: ""}
    },
    ReturnType: "DECIMAL(10,2)",
    Body: []StatementAST{
        ReturnStmt{...}
    }
}

IF Statement

IF x > 0 THEN
    SET result = 'positive';
ELSEIF x < 0 THEN
    SET result = 'negative';
ELSE
    SET result = 'zero';
END IF

Output:

IfStmt{
    Condition: "x > 0",
    Then: []StatementAST{GenericStmt{Text: "SET result = 'positive'"}},
    ElseIfs: []ElseIfClause{
        {Condition: "x < 0", Then: []StatementAST{GenericStmt{Text: "SET result = 'negative'"}}}
    },
    Else: []StatementAST{GenericStmt{Text: "SET result = 'zero'"}}
}

WHILE Loop

WHILE counter < 10 DO
    SET counter = counter + 1;
    SELECT counter;
END WHILE

Output:

WhileStmt{
    Condition: "counter < 10",
    Body: []StatementAST{
        GenericStmt{Text: "SET counter = counter + 1"},
        GenericStmt{Text: "SELECT counter"}
    }
}

Testing

The package includes comprehensive unit tests using Gomega:

go test ./pkg/sqlflowparser -v

Design

This parser is built using PEG (Parsing Expression Grammar) via the pigeon parser generator. It focuses on:

Single statement parsing: Designed to parse one MySQL statement at a time
Structural parsing: Extracts procedure/function definitions and control flow
Statement boundaries: Identifies nested statement blocks within procedures/functions
Parameter extraction: Parses function/procedure signatures
Control flow analysis: Supports all MySQL control flow constructs

It does not provide:

Multi-statement parsing: Cannot handle entire SQL files with multiple statements
Delimiter handling: No support for DELIMITER statements or custom delimiters
Detailed expression parsing: Expressions are kept as text for simplicity
Full SQL validation: Focuses on structure rather than semantic correctness
Complex type analysis: Types are preserved as strings

Architecture

The typical workflow combines two packages:

@pkg/sqlsplitter: First pass - splits SQL files into individual statements
@pkg/sqlflowparser: Second pass - parses each statement into structured AST

This separation of concerns allows:

sqlsplitter to handle file-level concerns (delimiters, comments, string literals)
sqlflowparser to focus on statement-level syntax analysis

This makes it ideal for tools that need to analyze stored procedure structure, instrument code, or perform static analysis on MySQL stored procedures and functions.

Grammar

The parser is generated from a PEG grammar file (mysql_ast.peg) that defines the MySQL syntax patterns. The grammar focuses on the subset of MySQL relevant for stored procedures and functions.

Documentation ¶

Index ¶

func Parse(filename string, b []byte, opts ...Option) (any, error)
func ParseFile(filename string, opts ...Option) (i any, err error)
func ParseReader(filename string, r io.Reader, opts ...Option) (any, error)
type AST
type BaseStatement
- func (s *BaseStatement) GetLabel() string
- func (s BaseStatement) GetPosition() Position
- func (s *BaseStatement) SetLabel(label string)
type BeginStmt
type CaseStmt
type Cloner
type CreateFunctionStmt
type CreateProcedureStmt
type DeclareStmt
type ElseIfClause
type GenericStmt
type IfStmt
type IterateStmt
type LeaveStmt
type LoopStmt
type Option
- func AllowInvalidUTF8(b bool) Option
- func Debug(b bool) Option
- func Entrypoint(ruleName string) Option
- func GlobalStore(key string, value any) Option
- func InitState(key string, value any) Option
- func MaxExpressions(maxExprCnt uint64) Option
- func Memoize(b bool) Option
- func Recover(b bool) Option
- func Statistics(stats *Stats, choiceNoMatch string) Option
type Parameter
type Position
type RepeatStmt
type ReturnStmt
type StatementAST
type Stats
type WhenClause
type WhileStmt

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Parse ¶

func Parse(filename string, b []byte, opts ...Option) (any, error)

Parse parses the data from b using filename as information in the error messages.

func ParseFile ¶

func ParseFile(filename string, opts ...Option) (i any, err error)

ParseFile parses the file identified by filename.

func ParseReader ¶

func ParseReader(filename string, r io.Reader, opts ...Option) (any, error)

ParseReader parses the data from r using filename as information in the error messages.

Types ¶

type AST ¶

type AST interface {
	GetPosition() Position
	// contains filtered or unexported methods
}

AST represents the Abstract Syntax Tree for MySQL statements

type BaseStatement ¶

type BaseStatement struct {
	Pos   Position
	Label string
	Text  string // Optional text content for statements that need it
}

BaseStatement contains common fields for all statements

func (*BaseStatement) GetLabel ¶

func (s *BaseStatement) GetLabel() string

func (BaseStatement) GetPosition ¶

func (s BaseStatement) GetPosition() Position

Implement GetPosition method for all AST nodes

func (*BaseStatement) SetLabel ¶

func (s *BaseStatement) SetLabel(label string)

type BeginStmt ¶

type BeginStmt struct {
	BaseStatement
	Body []StatementAST
}

BeginStmt represents BEGIN...END block

type CaseStmt ¶

type CaseStmt struct {
	BaseStatement
	Expression  string
	WhenClauses []WhenClause
	Else        []StatementAST
}

CaseStmt represents CASE statement

type Cloner ¶

type Cloner interface {
	Clone() any
}

Cloner is implemented by any value that has a Clone method, which returns a copy of the value. This is mainly used for types which are not passed by value (e.g map, slice, chan) or structs that contain such types.

This is used in conjunction with the global state feature to create proper copies of the state to allow the parser to properly restore the state in the case of backtracking.

type CreateFunctionStmt ¶

type CreateFunctionStmt struct {
	BaseStatement
	Name       string
	Parameters []Parameter
	ReturnType string
	Body       []StatementAST
}

CreateFunctionStmt represents CREATE FUNCTION statement

type CreateProcedureStmt ¶

type CreateProcedureStmt struct {
	BaseStatement
	Name       string
	Parameters []Parameter
	Body       []StatementAST
}

CreateProcedureStmt represents CREATE PROCEDURE statement

type DeclareStmt ¶

type DeclareStmt struct {
	BaseStatement
}

DeclareStmt represents DECLARE statement

type ElseIfClause ¶

type ElseIfClause struct {
	BaseStatement
	Condition string
	Then      []StatementAST
}

ElseIfClause represents ELSEIF clause

type GenericStmt ¶

type GenericStmt struct {
	BaseStatement
}

GenericStmt represents any other SQL statement (SELECT, INSERT, etc.)

type IfStmt ¶

type IfStmt struct {
	BaseStatement
	Condition string
	Then      []StatementAST
	ElseIfs   []ElseIfClause
	Else      []StatementAST
}

IfStmt represents IF statement

type IterateStmt ¶

type IterateStmt struct {
	BaseStatement
}

IterateStmt represents ITERATE statement

type LeaveStmt ¶

type LeaveStmt struct {
	BaseStatement
}

LeaveStmt represents LEAVE statement

type LoopStmt ¶

type LoopStmt struct {
	BaseStatement
	Body []StatementAST
}

LoopStmt represents LOOP statement

type Option ¶

type Option func(*parser) Option

Option is a function that can set an option on the parser. It returns the previous setting as an Option.

func AllowInvalidUTF8 ¶

func AllowInvalidUTF8(b bool) Option

AllowInvalidUTF8 creates an Option to allow invalid UTF-8 bytes. Every invalid UTF-8 byte is treated as a utf8.RuneError (U+FFFD) by character class matchers and is matched by the any matcher. The returned matched value, c.text and c.offset are NOT affected.

The default is false.

func Debug ¶

func Debug(b bool) Option

Debug creates an Option to set the debug flag to b. When set to true, debugging information is printed to stdout while parsing.

The default is false.

func Entrypoint ¶

func Entrypoint(ruleName string) Option

Entrypoint creates an Option to set the rule name to use as entrypoint. The rule name must have been specified in the -alternate-entrypoints if generating the parser with the -optimize-grammar flag, otherwise it may have been optimized out. Passing an empty string sets the entrypoint to the first rule in the grammar.

The default is to start parsing at the first rule in the grammar.

func GlobalStore ¶

func GlobalStore(key string, value any) Option

GlobalStore creates an Option to set a key to a certain value in the globalStore.

func InitState ¶

func InitState(key string, value any) Option

InitState creates an Option to set a key to a certain value in the global "state" store.

func MaxExpressions ¶

func MaxExpressions(maxExprCnt uint64) Option

MaxExpressions creates an Option to stop parsing after the provided number of expressions have been parsed, if the value is 0 then the parser will parse for as many steps as needed (possibly an infinite number).

The default for maxExprCnt is 0.

func Memoize ¶

func Memoize(b bool) Option

Memoize creates an Option to set the memoize flag to b. When set to true, the parser will cache all results so each expression is evaluated only once. This guarantees linear parsing time even for pathological cases, at the expense of more memory and slower times for typical cases.

The default is false.

func Recover ¶

func Recover(b bool) Option

Recover creates an Option to set the recover flag to b. When set to true, this causes the parser to recover from panics and convert it to an error. Setting it to false can be useful while debugging to access the full stack trace.

The default is true.

func Statistics ¶

func Statistics(stats *Stats, choiceNoMatch string) Option

Statistics adds a user provided Stats struct to the parser to allow the user to process the results after the parsing has finished. Also the key for the "no match" counter is set.

Example usage:

input := "input"
stats := Stats{}
_, err := Parse("input-file", []byte(input), Statistics(&stats, "no match"))
if err != nil {
    log.Panicln(err)
}
b, err := json.MarshalIndent(stats.ChoiceAltCnt, "", "  ")
if err != nil {
    log.Panicln(err)
}
fmt.Println(string(b))

type Parameter ¶

type Parameter struct {
	Name string
	Type string
	Mode string // IN, OUT, INOUT
}

Parameter represents procedure/function parameter

type Position ¶

type Position struct {
	Line   int // Line number (1-based)
	Column int // Column number (1-based)
	Offset int // Byte offset in the source
}

Position represents the position of a statement in the source code

type RepeatStmt ¶

type RepeatStmt struct {
	BaseStatement
	Body      []StatementAST
	Condition string
}

RepeatStmt represents REPEAT statement

type ReturnStmt ¶

type ReturnStmt struct {
	BaseStatement
}

ReturnStmt represents RETURN statement

type StatementAST ¶

type StatementAST interface {
	AST

	// SetLabel sets the label for the statement
	SetLabel(label string)
	// GetLabel gets the label for the statement
	GetLabel() string
	// contains filtered or unexported methods
}

StatementAST represents any SQL statement

type Stats ¶

type Stats struct {
	// ExprCnt counts the number of expressions processed during parsing
	// This value is compared to the maximum number of expressions allowed
	// (set by the MaxExpressions option).
	ExprCnt uint64

	// ChoiceAltCnt is used to count for each ordered choice expression,
	// which alternative is used how may times.
	// These numbers allow to optimize the order of the ordered choice expression
	// to increase the performance of the parser
	//
	// The outer key of ChoiceAltCnt is composed of the name of the rule as well
	// as the line and the column of the ordered choice.
	// The inner key of ChoiceAltCnt is the number (one-based) of the matching alternative.
	// For each alternative the number of matches are counted. If an ordered choice does not
	// match, a special counter is incremented. The name of this counter is set with
	// the parser option Statistics.
	// For an alternative to be included in ChoiceAltCnt, it has to match at least once.
	ChoiceAltCnt map[string]map[string]int
}

Stats stores some statistics, gathered during parsing

type WhenClause ¶

type WhenClause struct {
	BaseStatement
	Condition string
	Then      []StatementAST
}

WhenClause represents WHEN clause in CASE statement

type WhileStmt ¶

type WhileStmt struct {
	BaseStatement
	Condition string
	Body      []StatementAST
}

WhileStmt represents WHILE statement

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL