querylogs

package
v0.5.0-rc2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 21, 2025 License: Apache-2.0 Imports: 9 Imported by: 0

Documentation

Index

Constants

View Source
const (
	StringPlaceholder = "?"
	NumberPlaceholder = "?"
)

Variables

This section is empty.

Functions

This section is empty.

Types

type DwhContext

type DwhContext struct {
	// Instance is the unique identifier for the data warehouse instance.
	// This is typically the hostname, account ID, or workspace URL that uniquely
	// identifies the data warehouse deployment.
	// - Set by: Snowflake (account), Databricks (workspace_url), Redshift/Postgres/Trino (host), DuckDB (motherduck_account)
	// - Empty for: BigQuery, ClickHouse, MySQL
	Instance string

	// Database name (may be empty for platforms without database level)
	// - Snowflake: database_name
	// - Databricks: catalog_name
	// - BigQuery: project_id
	// - Redshift/Postgres: database_name
	// - Trino: catalog
	// - MySQL: host (used as instance identifier)
	// - ClickHouse: hostname or configured database alias from config (used as instance identifier, NOT the schema-level database)
	// - DuckDB: "" (empty)
	Database string

	// Schema name (may be empty for platforms without schema concept)
	// - Snowflake/Databricks/Redshift/Postgres/MySQL: schema_name
	// - BigQuery: dataset_id
	// - Trino: schema
	// - ClickHouse: database_name (2-level hierarchy)
	// - DuckDB: schema_name
	Schema string

	// Warehouse identifier (available in Snowflake, Databricks)
	Warehouse string

	// User who executed the query
	User string

	// Role used to execute the query (available in Snowflake, Postgres, etc.)
	Role string

	// Cluster identifier or hostname (for platforms that use cluster concept)
	// Redshift: cluster hostname
	// ClickHouse: cluster name (if applicable)
	Cluster string
}

DwhContext represents the execution context of a query with optional fields that may or may not be available depending on the data warehouse platform.

These mappings MUST match exactly what's in scrapper/*/query_tables.go for consistency.

Platform-specific mappings:

  • Snowflake: Instance=account, Database=database_name, Schema=schema_name
  • Databricks: Instance=workspace_url, Database=catalog_name, Schema=schema_name
  • BigQuery: Instance="", Database=project_id, Schema=dataset_id
  • Redshift: Instance=host, Database=database_name, Schema=schema_name
  • Postgres: Instance=host, Database=database_name, Schema=schema_name
  • Trino: Instance=host, Database=catalog, Schema=schema
  • MySQL: Instance="", Database=host, Schema=schema_name
  • ClickHouse: Instance="", Database=hostname (or configured database alias), Schema=database_name
  • DuckDB: Instance=motherduck_account, Database="", Schema=schema_name

type NativeLineage

type NativeLineage struct {
	// InputTables are the tables read by the query
	InputTables []scrapper.DwhFqn

	// OutputTables are the tables written by the query
	OutputTables []scrapper.DwhFqn
}

NativeLineage represents table lineage information provided natively by the platform. This is available for platforms like BigQuery and ClickHouse that expose lineage metadata. Uses scrapper.DwhFqn for table references (InstanceName can be left empty when not applicable).

func (*NativeLineage) GetInputTables

func (nl *NativeLineage) GetInputTables() []scrapper.DwhFqn

GetInputTables returns the input tables, handling nil NativeLineage safely. Returns an empty slice if NativeLineage is nil or InputTables is nil.

func (*NativeLineage) GetOutputTables

func (nl *NativeLineage) GetOutputTables() []scrapper.DwhFqn

GetOutputTables returns the output tables, handling nil NativeLineage safely. Returns an empty slice if NativeLineage is nil or OutputTables is nil.

type ObfuscationMode

type ObfuscationMode int

ObfuscationMode represents the level of SQL obfuscation applied to query logs. This is critical for on-premise deployments where customers want to prevent sensitive data in SQL queries from being sent to SYNQ backend.

const (
	// ObfuscationNone indicates no obfuscation was applied
	ObfuscationNone ObfuscationMode = 0

	// ObfuscationRedactLiterals indicates string and numeric literals were replaced
	// with placeholders while preserving query structure for SQL parsing
	ObfuscationRedactLiterals ObfuscationMode = 1

	// ObfuscationRemoveQuery indicates the entire SQL query was removed (future feature)
	ObfuscationRemoveQuery ObfuscationMode = 2
)

type ObfuscatorOption

type ObfuscatorOption func(*obfuscatorConfig)

ObfuscatorOption configures the SQL obfuscator behavior. These options control what gets replaced during obfuscation.

func WithKeepJsonPath

func WithKeepJsonPath(keep bool) ObfuscatorOption

WithKeepJsonPath preserves JSON path expressions in SQL (e.g., $.field). When true, expressions like `data->>'$.field'` keep the JSON path intact. Default: true (keep JSON paths to preserve query structure)

func WithPreserveLiteralsMatching

func WithPreserveLiteralsMatching(patterns []string) ObfuscatorOption

WithPreserveLiteralsMatching preserves string and numeric literals whose full content matches any of the provided regex patterns. Matched literals are kept unchanged in the obfuscated SQL, while non-matching literals are replaced with placeholders.

Patterns are compiled into a single efficient regex at obfuscator creation time. This is useful for preserving dates, timestamps, UUIDs, or other structured data that doesn't contain sensitive information but is valuable for debugging and analysis.

Example patterns:

  • Date: `^\d{4}-\d{2}-\d{2}$` (matches '2023-10-01')
  • Timestamp: `^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$` (matches '2023-10-01 12:34:56')
  • UUID: `^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$` (matches UUID strings)

Example:

SQL: "WHERE created_at > '2023-10-01' AND email = 'user@example.com'"
With date pattern: "WHERE created_at > '2023-10-01' AND email = ?"

Default: no patterns (all literals are replaced)

func WithPreserveNumbers

func WithPreserveNumbers(preserve bool) ObfuscatorOption

WithPreserveNumbers controls whether numeric literals (like 123, 45.67) are kept unchanged or replaced with placeholders during obfuscation.

When true: "WHERE id = 123 AND score > 98.5" stays as "WHERE id = 123 AND score > 98.5" When false: "WHERE id = 123 AND score > 98.5" becomes "WHERE id = ? AND score > ?"

IMPORTANT: This option ONLY affects numeric literals (NUMBER tokens). Digits in identifiers (table1, col2, schema3) are ALWAYS preserved regardless of this setting.

This is useful when you want to see actual numeric values in query logs for debugging, while still obfuscating sensitive string data like names, emails, etc.

Default: false (numeric literals are replaced with ?)

type QueryLog

type QueryLog struct {
	// CreatedAt is the timestamp when the query log was recorded (when query finished and results committed).
	// This is the time when query results became available for other queries to use.
	// Should be set to FinishedAt/EndTime across all platforms for consistency.
	CreatedAt time.Time

	// StartedAt is when query execution started (optional, nil if unknown).
	// Platform scrappers should set this to the actual query start time.
	// Duration is computed as FinishedAt - StartedAt.
	StartedAt *time.Time

	// FinishedAt is when query execution finished and results were committed (optional, nil if unknown).
	// Platform scrappers should set this to the actual query end time.
	// Duration is computed as FinishedAt - StartedAt.
	FinishedAt *time.Time

	// QueryID is the native query identifier or computed hash
	QueryID string

	// SQL is the query text (may be obfuscated based on SqlObfuscationMode)
	SQL string

	// NormalizedQueryHash is a hash of the normalized/parameterized query (without literal values)
	// Used for lineage caching - queries that differ only in literal values share the same lineage
	// Available for: Snowflake (QUERY_PARAMETERIZED_HASH), ClickHouse (cityHash64(normalizeQuery))
	// For platforms without native support, this field is nil
	NormalizedQueryHash *string

	// SqlDialect is the SQL dialect from Scrapper.DialectType() (snowflake, bigquery, databricks, etc.)
	SqlDialect string

	// DwhContext contains structured context information (database, schema, warehouse, user, role, etc.)
	DwhContext *DwhContext

	// QueryType is the platform-specific query type (e.g., "CREATE_TABLE_AS_SELECT", "SELECT", "INSERT")
	QueryType string

	// Status represents the query execution status ("SUCCESS", "FAILED", "CANCELED", etc.)
	Status string

	// Metadata contains platform-specific fields that don't fit into the standard structure
	Metadata map[string]interface{}

	// SqlObfuscationMode indicates how the SQL was obfuscated (if at all)
	SqlObfuscationMode ObfuscationMode

	// HasCompleteNativeLineage indicates that NativeLineage contains complete lineage information
	// and SQL parsing can be skipped entirely. When true, the backend should trust the native
	// lineage and avoid expensive SQL parsing.
	HasCompleteNativeLineage bool

	// IsTruncated indicates whether the SQL text was truncated by the data warehouse.
	// When true, the SQL field contains incomplete query text and should not be parsed.
	// This is common in systems that limit query text length in their system tables.
	IsTruncated bool

	// NativeLineage contains table lineage information when provided natively by the platform
	// (e.g., BigQuery, ClickHouse). For other platforms, this will be nil and SQL parsing is required.
	NativeLineage *NativeLineage
}

QueryLog represents a standardized query log entry from any data warehouse platform. It provides a common structure while preserving platform-specific details in Metadata.

type QueryLogIterator

type QueryLogIterator interface {
	// Next returns the next query log from the iterator.
	//
	// Returns:
	//   - The next QueryLog and nil error on success
	//   - nil and io.EOF when no more logs are available
	//   - nil and an error if fetching fails
	//
	// Implementation notes:
	//   - Should fetch from warehouse in optimal batch sizes internally
	//   - Should buffer results to minimize warehouse connection time
	//   - Must respect context cancellation
	//   - MUST auto-close all resources before returning io.EOF (defensive programming)
	//   - Safe to call multiple times after io.EOF
	Next(ctx context.Context) (*QueryLog, error)

	// Close releases any resources held by the iterator.
	// Should be called when iteration is complete or abandoned, but implementations
	// MUST also auto-close when Next() returns io.EOF to prevent resource leaks.
	// Safe to call multiple times.
	Close() error
}

QueryLogIterator provides sequential access to query logs. Implementations should fetch from the data warehouse as fast as possible and buffer internally to minimize warehouse connection time.

IMPORTANT: Implementations MUST automatically close resources when returning io.EOF from Next(). This ensures no resource leaks even if caller forgets to call Close().

func NewSqlxRowsIterator

func NewSqlxRowsIterator[T any](
	rows *sqlx.Rows,
	obfuscator QueryObfuscator,
	sqlDialect string,
	convertFn func(*T, QueryObfuscator, string) (*QueryLog, error),
) QueryLogIterator

NewSqlxRowsIterator creates a new iterator for sqlx.Rows with a conversion function. The conversion function should return nil to skip a row, or an error to stop iteration.

type QueryLogsProvider

type QueryLogsProvider interface {
	// FetchQueryLogs returns an iterator for query logs within the specified time range.
	//
	// Parameters:
	//   - ctx: Context for cancellation and deadlines
	//   - from: Start of the time range (inclusive)
	//   - to: End of the time range (exclusive)
	//   - obfuscator: Controls SQL obfuscation mode and provides helper function
	//
	// Returns:
	//   - An iterator for sequential access to query logs
	//   - The caller is responsible for calling Close() on the iterator
	//
	// Implementation notes:
	//   - Iterator should fetch from warehouse as fast as possible
	//   - Platform-specific optimizations (batch sizes, parallel fetching) should happen inside the iterator
	//   - Query logs should be ordered by CreatedAt timestamp when possible
	//   - Context cancellation should be respected throughout iteration
	//   - Platforms should handle obfuscation optimally:
	//       * Use native SQL functions when available (e.g., ClickHouse normalizeQuery())
	//       * Use obfuscator.Obfuscate() helper for post-processing when native isn't available
	//       * Skip obfuscation entirely if obfuscator.Mode() == ObfuscationNone
	//
	// Example usage:
	//   obfuscator := querylogs.NewLiteralsObfuscator()
	//   iter, err := provider.FetchQueryLogs(ctx, from, to, obfuscator)
	//   if err != nil {
	//       return err
	//   }
	//   defer iter.Close()
	//
	//   for {
	//       log, err := iter.Next(ctx)
	//       if err == io.EOF {
	//           break
	//       }
	//       if err != nil {
	//           return err
	//       }
	//       // Process log...
	//   }
	FetchQueryLogs(ctx context.Context, from, to time.Time, obfuscator QueryObfuscator) (QueryLogIterator, error)
}

QueryLogsProvider defines the interface for fetching query logs from data warehouse platforms.

type QueryObfuscator

type QueryObfuscator interface {
	// Mode returns the obfuscation mode configured for this obfuscator.
	// Platforms with native obfuscation support should check this to decide
	// whether to use native SQL functions (e.g., normalizeQuery() in ClickHouse).
	Mode() ObfuscationMode

	// Obfuscate processes SQL text and returns obfuscated version based on Mode().
	// If Mode() is ObfuscationNone, returns SQL unchanged.
	// Otherwise applies configured obfuscation (literals, bind parameters, etc.)
	Obfuscate(sql string) string
}

QueryObfuscator provides SQL obfuscation for query logs.

Platforms can handle obfuscation in two ways: 1. Native obfuscation (ClickHouse): Check Mode(), if not ObfuscationNone, use native SQL functions 2. Post-processing (all platforms): Call Obfuscate(sql) - it handles obfuscation based on Mode()

The obfuscator is always required. Check Mode() to decide whether to use native obfuscation, then always call Obfuscate() for post-processing.

func NewQueryObfuscator

func NewQueryObfuscator(mode ObfuscationMode, opts ...ObfuscatorOption) (QueryObfuscator, error)

NewQueryObfuscator creates a SQL obfuscator with the specified mode and options. Mode is a required parameter that determines the obfuscation behavior.

Returns an error if any regex patterns in WithPreserveLiteralsMatching are invalid.

Modes:

  • ObfuscationNone: No obfuscation (returns SQL unchanged)
  • ObfuscationRedactLiterals: Replaces string and numeric literals with placeholders
  • ObfuscationRemoveQuery: Removes the entire SQL query (future feature)

Default behavior for ObfuscationRedactLiterals:

  • Replaces string and numeric literals with placeholders (e.g., 'value' -> ?, 123 -> ?)
  • Preserves digits in identifiers (table1, col2, etc. stay unchanged)
  • Preserves JSON paths ($.field) by default

Example for cloud service (no obfuscation):

obfuscator, err := querylogs.NewQueryObfuscator(querylogs.ObfuscationNone)

Example for on-premise (redact literals):

obfuscator, err := querylogs.NewQueryObfuscator(
    querylogs.ObfuscationRedactLiterals,
    querylogs.WithKeepJsonPath(true),
)

type SqlxRowsIterator

type SqlxRowsIterator[T any] struct {
	// contains filtered or unexported fields
}

SqlxRowsIterator is a generic iterator for sqlx.Rows that converts rows to QueryLog. It automatically handles row iteration, scanning, conversion, and skipping nil results.

func (*SqlxRowsIterator[T]) Close

func (it *SqlxRowsIterator[T]) Close() error

func (*SqlxRowsIterator[T]) Next

func (it *SqlxRowsIterator[T]) Next(ctx context.Context) (*QueryLog, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL