analysis

package
v0.0.0-...-6f40dae Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 22, 2026 License: MIT Imports: 3 Imported by: 0

Documentation

Overview

Package analyze is a small, read-only analysis framework over the sqlkit SQL AST. It is a toolbox of focused functions that turn a parsed sql.Query into individual structural facts — the statement kind (KindOf), the base tables it touches (Tables), whether the top-level node has a WHERE (HasTopLevelWhere) or projects "*" (SelectsStar), the columns its WHERE filters on (FilterColumns) and the largest IN list (MaxInList) — and normalizes a statement into a value-independent fingerprint (Fingerprint), all without recompiling or mutating the query.

The package is dialect-agnostic and depends only on the sql package. It does not bundle the facts into an aggregate of its own: a consumer composes the functions it needs into its own type. querytrace, for instance, uses them to fill its QueryInfo, and they are equally the foundation other AST passes (column resolution, masking) build on.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BareColumn

func BareColumn(name string) string

BareColumn returns the unqualified, lower-cased column name from a possibly qualified reference ("public.users.tenant_id" -> "tenant_id"). Matching is case-insensitive because an unquoted SQL identifier folds case.

func EnrichSources

func EnrichSources(sources []SourceRef, ctes []sql.CTE)

EnrichSources annotates each base-table source in sources that names one of the CTEs with that CTE's body query, so resolution can trace columns through it. SourcesOf already applies a query's own WITH; this lets a caller also apply the CTEs visible from an enclosing WITH (ambient CTEs) to a nested query's sources, the piece a query node cannot recover from itself.

func FilterColumns

func FilterColumns(q sql.Query) []string

FilterColumns returns the bare, lower-cased column names referenced in the statement's top-level WHERE clause, deduplicated. It is best-effort: it walks the WHERE expression with sql.WalkExpr (which does not descend into subqueries), so a predicate node it does not model may under-report. nil when there is no WHERE.

func Fingerprint

func Fingerprint(q sql.Query, dialect sql.Dialect) (string, bool)

Fingerprint returns the normalized form of a parsed query used to group repeats. It recompiles the AST through a dialect that emits a constant "?" for every bind placeholder (the parser has already collapsed literals to placeholders). Text passes then canonicalize the values that do not flow through Placeholder: a $N/?/:name placeholder that arrived verbatim as a raw fragment (sql.Raw, as the postgres frontend produces) is collapsed to the canonical "?", LIMIT/OFFSET counts (compiled as plain integers) are collapsed, and variable-length value lists are made count-insensitive — an "IN (?, …)" of placeholders and a "VALUES (…), (…), …" run of placeholder rows each collapse to a single element. The result is therefore numbering-, value-, and arity-independent. ok is false when q is nil or fails to compile, in which case callers fall back to FingerprintRaw on the SQL text.

func FingerprintRaw

func FingerprintRaw(sqlText string) string

FingerprintRaw normalizes SQL text without a parser: string and numeric literals and bind placeholders all collapse to ?, whitespace is squeezed, and IN/VALUES lists are made count-insensitive. It is the fallback for Fingerprint when no parsed AST is available.

func HasTopLevelJoin

func HasTopLevelJoin(q sql.Query) bool

HasTopLevelJoin reports whether a statement's top-level node carries a JOIN clause — a SELECT, UPDATE, or DELETE. A JOIN inside a subquery, a CTE body, or one arm of a UNION belongs to that nested query and is not counted.

func HasTopLevelWhere

func HasTopLevelWhere(q sql.Query) bool

HasTopLevelWhere reports whether a statement's top-level node carries a WHERE clause — a SELECT, UPDATE, or DELETE. It is false for a set operation, INSERT, VALUES, or DDL, which have no top-level WHERE. A WHERE inside a subquery, a CTE body, or one arm of a UNION is that nested query's property, not the outer statement's, and the name says so: it asks only about the top-level node. It backs the update_without_where / delete_without_where checks.

func MaxInList

func MaxInList(q sql.Query) int

MaxInList returns the length of the longest IN (...) value list in the statement's top-level WHERE, or 0 when there is none. A subquery IN ("col IN (SELECT ...)") carries no value list and does not count.

func NullEqualityPredicate

func NullEqualityPredicate(q sql.Query) bool

NullEqualityPredicate reports whether the statement's top-level WHERE compares a value against a NULL literal with = or <> (x = NULL / x <> NULL). SQL evaluates such a comparison to unknown rather than matching NULL, so the predicate silently drops every row; IS NULL / IS NOT NULL is meant. It walks the top-level WHERE only, like the other predicate facts, with sql.WalkExpr (which does not descend into subqueries), and is best-effort: a NULL written as a non-literal expression is not detected. It backs the eq_null check.

func SelectsStar

func SelectsStar(q sql.Query) bool

SelectsStar reports whether a top-level SELECT projects every column with a "*" (qualified or not) anywhere in its select list. It is false for a non-SELECT and for a set operation, whose arms own their own select lists. It backs the select_star check.

func Tables

func Tables(q sql.Query) []string

Tables returns the base tables a parsed query references anywhere in its tree, in source order with duplicates removed; nil when q is nil, unsourced, or of an unrecognized type. It reads the query's structure only, through the sql package's read-only introspection seam (the statement fields and sql.WalkSubqueries), never compiling or mutating q.

The walk follows FROM/JOIN subqueries, set-operation arms, CTE bodies, and the subqueries embedded in WHERE/SELECT/HAVING expressions (correlated ones included). A reference to a CTE is not a base table, so CTE names are excluded.

func TopLevelWhere

func TopLevelWhere(q sql.Query) sql.Expression

TopLevelWhere returns the WHERE expression of a statement's top-level node, or nil when the statement has none (a set operation, INSERT, VALUES, or DDL). It reads the query's own clauses; it never descends into subqueries, because a predicate that scopes the outer statement lives in that statement's own WHERE.

func WalkColumns

func WalkColumns(q sql.Query, visit func(ColumnUse))

WalkColumns calls visit for every column use in q's own clauses, in occurrence order and without deduplication — each tagged with the clause it appears in and whether it is read or written. It is the streaming form of Columns, for callers that want each occurrence (a rewrite pass), not the distinct set. Like Columns it does not descend into subqueries.

func WherePredicates

func WherePredicates(q sql.Query) (filterColumns []string, maxInList int)

WherePredicates returns the top-level WHERE facts FilterColumns and MaxInList report, gathered in a single walk of the clause. It is the form a caller that wants both (querytrace, deriving its per-statement warnings) uses, to avoid traversing the same WHERE expression twice. The results are identical to calling FilterColumns and MaxInList separately; both are zero when there is no WHERE.

Types

type Access

type Access int

Access is whether a column reference reads or writes the column.

const (
	// Read is a column whose value is read: a predicate, the SELECT list, a
	// GROUP BY/ORDER BY term, the right-hand side of an assignment, an inserted
	// value, a RETURNING item.
	Read Access = iota
	// Write is a column whose value is set: the target of an UPDATE/MERGE SET
	// assignment, or a column named in an INSERT/MERGE column list.
	Write
)

func (Access) String

func (a Access) String() string

String renders the access for logs.

type Analysis

type Analysis struct {
	// Kind is the statement kind, from the AST (KindOf) or a keyword guess (DetectKind).
	Kind StmtKind
	// Tables are the base tables the statement references (Tables); nil unless Parsed.
	Tables []string
	// HasTopLevelWhere, HasTopLevelJoin, and SelectStar are the top-level clause
	// facts (HasTopLevelWhere, HasTopLevelJoin, SelectsStar); all false unless Parsed.
	HasTopLevelWhere bool
	HasTopLevelJoin  bool
	SelectStar       bool
	// FilterColumns are the bare column names in the top-level WHERE (FilterColumns),
	// and MaxInList the largest IN (...) value list there (MaxInList); both zero
	// unless Parsed. They back the tenant/soft-delete and large-IN checks.
	FilterColumns []string
	MaxInList     int
	// NullEquality reports whether the top-level WHERE compares against a NULL
	// literal with = or <> (NullEqualityPredicate); false unless Parsed. It backs
	// the eq_null check.
	NullEquality bool
	// LockingRead reports whether a SELECT carries a row-locking clause
	// (FOR UPDATE / FOR SHARE / ...); false unless Parsed. Such a statement is a
	// read by Kind but write-like in effect — it takes locks and a read-only
	// transaction or replica rejects it — so IsWrite folds it in.
	LockingRead bool
	// Fingerprint is the normalized SQL used to group repeats; empty unless the
	// caller asked for it. See Analyzer.Fingerprint and FingerprintRaw.
	Fingerprint string
	// Parsed reports whether the facts came from a parsed AST rather than a keyword
	// guess.
	Parsed bool
	// Err records why analysis fell back to a keyword guess (a parse failure), for
	// diagnostics; nil when Parsed. It is never an execution error.
	Err error
}

Analysis is the static analysis of one SQL statement: the facts derived from its text and AST alone, independent of execution. It is the analyze-side half of what a tracer records about a statement — the runtime half (timing, parameters, caller) belongs to the tracer — so the static facts are produced and owned here and embedded by the consumer.

Parsed distinguishes the two ways it is produced: from a parsed AST (Parsed true, every field meaningful) or, when the SQL could not be parsed, from a keyword guess (Parsed false, only Kind populated). Checks that depend on clause shape consult Parsed to avoid firing on the guess. Fingerprint and Err are filled by the producer of the analysis (see Analyzer.Analyze): Fingerprint only when the caller wants it, Err with the parse failure that forced the fallback.

func (Analysis) IsWrite

func (a Analysis) IsWrite() bool

IsWrite reports whether the statement modifies data or takes row locks: a definite write (INSERT/UPDATE/DELETE/DDL) by Kind, or a locking read (SELECT ... FOR UPDATE/SHARE), which acquires locks and a read-only transaction or replica rejects. It is the write predicate observers should use rather than Kind.IsWrite alone, which misses the locking read.

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer is a configured entry point to the analysis functions, for a consumer that runs several over the same query and wants the configuration set once. The free functions remain the primitives; the Analyzer bundles the options they need — currently the SQL dialect used for fingerprinting — so a caller (querytrace, driving its per-statement observability) need not repeat them call to call.

The zero value is usable: every method but Fingerprint is configuration-free, so a zero Analyzer behaves like the bare functions, and Fingerprint reports ok=false until a dialect is set. An Analyzer is read-only and safe for concurrent use.

func New

func New(opts ...Option) *Analyzer

New returns an Analyzer configured by opts.

func (*Analyzer) Analyze

func (a *Analyzer) Analyze(sqlText string, q sql.Query) Analysis

Analyze returns the static structural analysis of a statement, preferring the parsed AST q and falling back to a keyword guess on sqlText when q is nil or of an unrecognized kind (a parse failure). The structural fields (Tables, the clause facts, the WHERE predicates) are populated only on the parsed path; on the fallback only Kind is set, from DetectKind.

It does not set Fingerprint or Err — the caller fills those: Fingerprint when it wants one (it carries a cost; see Analyzer.Fingerprint), Err with the parse failure. This is the single entry point a tracer uses to gather a statement's static facts in one call.

func (*Analyzer) Fingerprint

func (a *Analyzer) Fingerprint(q sql.Query) (string, bool)

Fingerprint returns the normalized form of a parsed query using the configured dialect; ok is false when no dialect is set or the query fails to compile, in which case a caller falls back to FingerprintRaw. See Fingerprint.

func (*Analyzer) HasTopLevelJoin

func (a *Analyzer) HasTopLevelJoin(q sql.Query) bool

HasTopLevelJoin reports a top-level JOIN; see HasTopLevelJoin.

func (*Analyzer) HasTopLevelWhere

func (a *Analyzer) HasTopLevelWhere(q sql.Query) bool

HasTopLevelWhere reports a top-level WHERE; see HasTopLevelWhere.

func (*Analyzer) Kind

func (a *Analyzer) Kind(q sql.Query) (StmtKind, bool)

Kind reports the statement kind from the AST; see KindOf.

func (*Analyzer) SelectsStar

func (a *Analyzer) SelectsStar(q sql.Query) bool

SelectsStar reports a top-level SELECT *; see SelectsStar.

func (*Analyzer) Tables

func (a *Analyzer) Tables(q sql.Query) []string

Tables returns the base tables the statement references; see Tables.

func (*Analyzer) WherePredicates

func (a *Analyzer) WherePredicates(q sql.Query) (filterColumns []string, maxInList int)

WherePredicates returns the top-level WHERE filter columns and largest IN list in one walk; see WherePredicates.

type Clause

type Clause int

Clause names the part of a statement a column reference appears in, so a caller can tell a WHERE predicate from a SET target from a RETURNING item.

const (
	// ClauseUnknown is the zero value, for a reference whose clause was not classified.
	ClauseUnknown   Clause = iota
	ClauseSelect           // SELECT list (and DISTINCT ON)
	ClauseJoinOn           // JOIN ... ON / USING
	ClauseWhere            // WHERE
	ClauseGroupBy          // GROUP BY
	ClauseHaving           // HAVING
	ClauseOrderBy          // ORDER BY
	ClauseWindow           // WINDOW / OVER partitions and ordering
	ClauseSet              // UPDATE/MERGE/ON CONFLICT SET assignment
	ClauseInsert           // INSERT/MERGE target column list
	ClauseValues           // INSERT/MERGE inserted values
	ClauseConflict         // ON CONFLICT target / predicate
	ClauseMergeOn          // MERGE ... ON / WHEN condition
	ClauseReturning        // RETURNING
)

func (Clause) String

func (c Clause) String() string

String renders the clause for logs.

type ColumnRef

type ColumnRef struct {
	Qualifier sql.Ident
	Name      sql.Ident
	// SrcPos is the byte offset of the reference in the source SQL, or sql.NoPos
	// when it has no recorded position — a reference carried in the AST as a plain
	// name rather than an expression node (a USING column, an INSERT/MERGE column
	// list, a SET target), or one built rather than parsed.
	SrcPos int
}

ColumnRef is a column reference as written in a statement: its qualifier — the table name or alias before the final dot, "" when the reference is unqualified — and the bare column name. A reference qualified by schema and table ("public.users.id") keeps the whole prefix in Qualifier ("public.users").

Case is preserved as written; the names are sql.Ident, so compare them with Ident.Equal rather than == (an unquoted SQL identifier folds case).

func ColumnRefOf

func ColumnRefOf(c sql.ColumnExpression) ColumnRef

ColumnRefOf returns the column reference a ColumnExpression names: its qualifier and bare name split on the final dot, carrying the node's source position. It is the conversion the column-walking passes share, so a caller that already holds a ColumnExpression (a rewrite pass) resolves it the same way WalkColumns does.

func NamedColumn

func NamedColumn(name sql.Ident) ColumnRef

NamedColumn returns an unqualified column reference for a column named directly by an Ident rather than by an expression node — a USING column, an INSERT/MERGE column-list entry, or a SET target. These idents carry whatever position the parser recorded for them (NoPos for a built statement).

type ColumnUse

type ColumnUse struct {
	Column ColumnRef
	Clause Clause
	Access Access
}

ColumnUse is one occurrence of a column reference, located: the reference itself, the clause it appears in, and whether it is read or written. It is what WalkColumns reports, so a caller knows not just which columns a statement touches but where and how.

func Columns

func Columns(q sql.Query) []ColumnUse

Columns returns the column uses in q's own clauses, deduplicated in first-seen order. It covers the SELECT list, JOIN conditions and USING columns, WHERE/HAVING/GROUP BY/ORDER BY and window clauses, the assigned columns and values of INSERT/UPDATE/MERGE, ON CONFLICT targets, and RETURNING items. It does not descend into subqueries — each subquery is its own scope, walked on its own (see Walk and SourcesOf).

It is best-effort over the expression nodes sql.WalkExpr models, so an exotic expression may under-report.

type Option

type Option func(*Analyzer)

Option configures an Analyzer.

func WithDialect

func WithDialect(d sql.Dialect) Option

WithDialect sets the SQL dialect the Analyzer recompiles a parsed query through when fingerprinting. Without it, Fingerprint reports ok=false and a caller falls back to FingerprintRaw on the SQL text.

type Resolution

type Resolution struct {
	// Column is the reference that was resolved, as written.
	Column ColumnRef
	// Clause and Access locate the reference in the statement and say whether it is
	// read or written. They are set by Resolve and zero from ResolveColumn.
	Clause Clause
	Access Access
	// Schema and Table name the owning table when Status is Resolved — taken from
	// the in-scope source the reference bound to — and are empty otherwise.
	Schema sql.Ident
	Table  sql.Ident
	// Status reports whether the reference was placed (Resolved), could not be
	// (Unknown), or was Ambiguous across in-scope tables.
	Status Status
}

Resolution is the outcome of resolving one column reference: where it appears and how it is used (Clause/Access, from the ColumnUse), and the table it was tied to (Schema/Table/Status). Resolve fills Clause/Access; ResolveColumn, which takes a bare reference and no context, leaves them zero.

func Resolve

func Resolve(q sql.Query) []Resolution

Resolve resolves every column reference in q and its subqueries, returning one Resolution per occurrence in document (pre-order) order. Each query node is resolved against its own scope — the sources SourcesOf reports for that node.

Scopes are not nested: a correlated reference to an enclosing query's alias is resolved against the inner node's own sources only, so it comes out Unknown.

func ResolveColumn

func ResolveColumn(scope []SourceRef, col ColumnRef) Resolution

ResolveColumn resolves a single column reference against an explicit scope of table sources (as reported by SourcesOf), structurally: a qualified reference binds to the source its qualifier names; an unqualified reference binds to the sole base-table source, or is Ambiguous when several distinct tables are in scope. It is the per-reference core of Resolve, exposed for callers that drive their own traversal.

When the matched source is a derived table or CTE (SourceRef.Subquery is set), resolution descends into the subquery's SELECT list to trace the column back to its underlying base table, mirroring how a real RDBMS resolves such references.

func ResolveColumnAll

func ResolveColumnAll(scope []SourceRef, col ColumnRef) []Resolution

ResolveColumnAll resolves col against scope like ResolveColumn, but returns every base table the reference may map to rather than a single placement. A qualified reference yields the source its qualifier names; an unqualified one yields every base table in scope it could belong to (one for a single-table scope, several when ambiguous). Tracing descends into derived sources and CTEs, and through both arms of a set operation, so a column projected from a UNION yields a candidate per arm; a column projected through an expression (an alias, a function, a concatenation) yields a candidate per base column the expression reads. Each candidate's Column names the underlying base column, which may differ from col when a projection renames it.

The result holds only Resolved resolutions, deduplicated by owning table and base column, and is empty when the reference resolves to no base table in scope. It is the fan-out form of ResolveColumn, for a caller — a masking policy — that treats a column as sensitive when any candidate is, and so must see every provenance.

func ResolveColumnAllVia

func ResolveColumnAllVia(scope []SourceRef, col ColumnRef, ctes []sql.CTE) []Resolution

ResolveColumnAllVia is ResolveColumnAll with ambient CTEs — those visible from an enclosing WITH — applied to every scope it descends into, so a reference chained through several CTEs (a CTE whose body selects from an earlier CTE) is still traced to its base table. A query node alone cannot recover the CTEs in scope above it, so a caller that holds them passes them here.

type SourceRef

type SourceRef struct {
	// Schema is the source's schema, when its name was schema-qualified; "" otherwise.
	Schema sql.Ident
	// Table is the base table name, or "" for a derived (subquery or VALUES) source.
	Table sql.Ident
	// Alias is the name the source is bound to (its AS alias), or "" when it is
	// referenced by its own table name.
	Alias sql.Ident
	// Subquery is the inner query of a derived-table source, or nil for a base
	// table or VALUES source.
	Subquery sql.Query
}

SourceRef is one table source visible in a query scope: a base table named in a FROM/JOIN/USING clause or as the target of an UPDATE/DELETE/INSERT/MERGE, together with the alias it is bound to. It is the unit that gives a column reference's qualifier something to resolve against.

A derived-table source — a subquery or an inline VALUES — names no base table, so Table is empty; for a subquery, Subquery holds the inner query (its own scope), which a resolver may descend into to resolve the alias's columns.

func SourcesOf

func SourcesOf(q sql.Query) []SourceRef

SourcesOf returns the table sources in q's own scope, in source order: the FROM/JOIN sources of a SELECT, the target and FROM/JOIN sources of an UPDATE/DELETE, the target of an INSERT, and the target and USING source of a MERGE. It does not descend into subqueries — each subquery is its own scope, reachable through a returned SourceRef.Subquery — and a set operation (whose arms are subqueries) yields no direct sources.

A FROM/JOIN reference to a CTE is resolved: when the query carries a WITH clause, the SourceRef for each CTE-named source is annotated with the CTE's body query in its Subquery field, so a resolver can trace column references through the CTE definition just as it does for inline derived tables.

func TargetScope

func TargetScope(t sql.Table) []SourceRef

TargetScope is the single-source scope of a statement's target table — the scope to resolve the columns an INSERT/UPDATE/MERGE assigns against, since they always belong to the target, never a joined source.

type Status

type Status int

Status is the outcome of resolving a column reference. Resolution never errors; the status reports how confidently a reference was placed and leaves the policy — treat Unknown/Ambiguous as fine, or as a failure — to the caller.

const (
	// Unknown is a reference that could not be placed: an unqualified reference
	// with no base-table source in scope, or a qualifier that matches none. It is
	// the zero value.
	Unknown Status = iota
	// Resolved is a reference tied to exactly one table (Schema/Table are set):
	// any qualified reference whose qualifier matches a source, or an unqualified
	// reference in a single-table scope.
	Resolved
	// Ambiguous is an unqualified reference that several distinct in-scope tables
	// could own. Resolution does not guess; the caller disambiguates with schema
	// knowledge it has and analyze does not.
	Ambiguous
)

type StmtKind

type StmtKind int

StmtKind classifies a SQL statement by the kind of statement it represents. It is a read-only label for tracing and analysis (query budgets, N+1 detection, read-only enforcement) derived from a parsed query's shape.

const (
	// StmtUnknown is a statement of an unrecognized kind, or one that could not
	// be analyzed.
	StmtUnknown StmtKind = iota
	// StmtSelect is a row-returning read: SELECT, a set operation (UNION etc.),
	// or a standalone VALUES.
	StmtSelect
	// StmtInsert is an INSERT.
	StmtInsert
	// StmtUpdate is an UPDATE.
	StmtUpdate
	// StmtDelete is a DELETE.
	StmtDelete
	// StmtDDL is a schema statement (CREATE/ALTER/DROP/TRUNCATE/COMMENT).
	StmtDDL
)

func DetectKind

func DetectKind(sqlText string) StmtKind

DetectKind guesses a statement's kind from its leading keyword, the fallback when no AST is available (a parse failed, or no parser was configured). It returns StmtUnknown when the leading token is unrecognized.

func KindOf

func KindOf(q sql.Query) (StmtKind, bool)

KindOf reports the kind of a parsed statement from its AST. ok is false when q is nil or of an unrecognized type, the same condition under which the other analysis functions yield nothing, so a caller can fall back to DetectKind on the raw SQL text. It reads only the statement's top-level type through the sql introspection seam, never compiling or mutating q.

func (StmtKind) FiltersRows

func (k StmtKind) FiltersRows() bool

FiltersRows reports whether the statement selects the rows it acts on through a WHERE clause — a SELECT, UPDATE, or DELETE. An INSERT does not (it names its rows directly).

func (StmtKind) IsRead

func (k StmtKind) IsRead() bool

IsRead reports whether the statement only reads rows.

func (StmtKind) IsWrite

func (k StmtKind) IsWrite() bool

IsWrite reports whether the statement modifies data (INSERT/UPDATE/DELETE).

func (StmtKind) String

func (k StmtKind) String() string

String renders the kind for logs and reports.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL