bigquery

package
v0.8.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 13, 2026 License: Apache-2.0 Imports: 32 Imported by: 0

README

BigQuery Agent Configuration

Configuration File

Create a YAML file defining BigQuery tables:

tables:
  - project_id: my-security-project
    dataset_id: cloudtrail_logs
    table_id: events
    description: CloudTrail events with user activity, API calls, and resource changes

scan_size_limit: "10GB"  # Required: Maximum bytes scanned per query
query_timeout: 5m        # Optional: Default 5m

Configuration Fields

Tables Structure
tables:
  - project_id: <project-id>  # Required: GCP project ID
    dataset_id: <dataset-id>  # Required: Dataset ID
    table_id: <table-id>      # Required: Table ID
    description: <desc>       # Recommended: Detailed table description
Global Settings
  • scan_size_limit (required): Maximum bytes scanned per query
    • Format: "1GB", "10GB", "1TB", etc.
  • query_timeout (optional): Query timeout duration
    • Format: "5m", "30s", etc.
    • Default: 5m

Table Descriptions

Good descriptions improve agent performance significantly.

Good example:

description: |
  AWS CloudTrail events containing:
  - Authentication events (ConsoleLogin, AssumeRole)
  - API calls (eventName, requestParameters, responseElements)
  - Failed attempts (errorCode, errorMessage)
  - Fields: sourceIPAddress, userAgent, mfaUsed
  - Partitioned by event_date (last 90 days)

Bad example:

description: CloudTrail data  # Too vague

SQL Runbooks

SQL runbooks are pre-written query templates for common investigation patterns.

Runbook File Format

Create .sql files with metadata in comments:

-- Title: Failed Login Investigation
-- Description: Query to find failed login attempts by IP address
SELECT
  timestamp,
  user_email,
  source_ip,
  error_code
FROM `project.dataset.auth_logs`
WHERE
  event_type = 'login_failed'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY timestamp DESC
LIMIT 100

Metadata Format (case insensitive):

  • -- Title: <title> or -- title: <title>
  • -- Description: <description> or -- description: <description>

If no title is specified, the filename (without .sql) is used.

How Runbooks Work
  1. Loading: Runbooks are loaded at agent initialization
  2. Listing: Agent's system prompt includes all runbook IDs, titles, and descriptions
  3. Retrieval: Agent can use get_runbook tool to fetch full SQL content
  4. Adaptation: Agent adapts runbook SQL for specific investigations

Deployment

Environment Variables
export WARREN_AGENT_BIGQUERY_CONFIG="/path/to/config.yaml"
export WARREN_AGENT_BIGQUERY_PROJECT_ID="my-project"
export WARREN_AGENT_BIGQUERY_SCAN_SIZE_LIMIT="10GB"  # Optional override
export WARREN_AGENT_BIGQUERY_RUNBOOK_DIR="/path/to/runbooks,/path/to/more"  # Optional
CLI Flags
warren serve \
  --agent-bigquery-config=/path/to/config.yaml \
  --agent-bigquery-project-id=my-project \
  --agent-bigquery-runbook-dir=/path/to/runbooks \
  --agent-bigquery-runbook-dir=/path/to/more/runbooks  # Can specify multiple times
Cloud Run
gcloud run services update warren \
  --set-env-vars="WARREN_AGENT_BIGQUERY_CONFIG=/app/config.yaml" \
  --set-env-vars="WARREN_AGENT_BIGQUERY_PROJECT_ID=my-project"

Note: Config file must be in the container image or mounted.

Authentication

Uses Application Default Credentials (ADC).

Local Development
gcloud auth application-default login
Service Account Impersonation

You can configure the BigQuery agent to impersonate a service account for all BigQuery operations. This is useful when:

  • Running with different permissions than the default credentials
  • Implementing least-privilege access patterns
  • Testing with specific service account permissions
Configuration

Set via CLI flag or environment variable:

# CLI flag
warren serve \
  --agent-bigquery-config=/path/to/config.yaml \
  --agent-bigquery-impersonate-service-account=bigquery-reader@my-project.iam.gserviceaccount.com

# Environment variable
export WARREN_AGENT_BIGQUERY_IMPERSONATE_SERVICE_ACCOUNT="bigquery-reader@my-project.iam.gserviceaccount.com"
warren serve --agent-bigquery-config=/path/to/config.yaml
Required Permissions

The identity running Warren (ADC) must have the roles/iam.serviceAccountTokenCreator role on the target service account:

gcloud iam service-accounts add-iam-policy-binding \
  bigquery-reader@my-project.iam.gserviceaccount.com \
  --member="<MEMBER>" \
  --role="roles/iam.serviceAccountTokenCreator"

The impersonated service account needs BigQuery permissions:

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:bigquery-reader@my-project.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:bigquery-reader@my-project.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"
Service Account Permissions

Required IAM roles for the service account (either default ADC or impersonated):

  • roles/bigquery.jobUser
  • roles/bigquery.dataViewer
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:warren-service@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:warren-service@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

Example Configurations

Minimal
tables:
  - project_id: my-project
    dataset_id: security_logs
    table_id: events
    description: Security event logs

scan_size_limit: "1GB"
Multiple Tables
tables:
  - project_id: prod-logs
    dataset_id: aws_cloudtrail
    table_id: events_2024
    description: |
      CloudTrail events from 2024:
      - Authentication (ConsoleLogin, AssumeRole)
      - S3 access (GetObject, PutObject)
      - IAM changes
      - Partitioned by event_date

  - project_id: prod-logs
    dataset_id: gcp_audit
    table_id: admin_activity
    description: GCP admin activity logs

  - project_id: staging-logs
    dataset_id: app_logs
    table_id: authentication
    description: App authentication events

scan_size_limit: "5GB"
query_timeout: 3m

Migration from Old Format

If you're upgrading from the previous nested format, convert your configuration:

Old format (deprecated):

projects:
  - id: my-project
    datasets:
      - id: security_logs
        tables:
          - id: events

New format:

tables:
  - project_id: my-project
    dataset_id: security_logs
    table_id: events

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseScanSizeLimit

func ParseScanSizeLimit(sizeStr string) (uint64, error)

ParseScanSizeLimit parses human-readable size string (e.g., "10GB") into bytes

Types

type Agent

type Agent struct {
	// contains filtered or unexported fields
}

Agent represents a BigQuery Sub-Agent

func New

func New() *Agent

New creates a new BigQuery Agent instance

func NewAgent

func NewAgent(
	config *Config,
	llmClient gollem.LLMClient,
	repo interfaces.Repository,
) *Agent

NewAgent creates a new BigQuery Agent instance with config (for testing and direct use)

func (*Agent) Configure

func (a *Agent) Configure(ctx context.Context) error

Configure implements interfaces.Tool

func (*Agent) Flags

func (a *Agent) Flags() []cli.Flag

Flags returns CLI flags for this agent

func (*Agent) Helper

func (a *Agent) Helper() *cli.Command

Helper implements interfaces.Tool

func (*Agent) ID

func (a *Agent) ID() string

ID implements SubAgent interface

func (*Agent) Init

func (a *Agent) Init(ctx context.Context, llmClient gollem.LLMClient, repo interfaces.Repository) (bool, error)

Init initializes the agent with LLM client and memory service. Returns (true, nil) if initialized successfully, (false, nil) if not configured, or (false, error) on error.

func (*Agent) IsEnabled

func (a *Agent) IsEnabled() bool

IsEnabled returns true if the agent is configured and initialized

func (*Agent) LogValue

func (a *Agent) LogValue() slog.Value

LogValue implements interfaces.Tool

func (*Agent) Name

func (a *Agent) Name() string

Name implements interfaces.Tool

func (*Agent) Prompt

func (a *Agent) Prompt(ctx context.Context) (string, error)

Prompt implements interfaces.Tool Returns table descriptions for system prompt

func (*Agent) Run

func (a *Agent) Run(ctx context.Context, name string, args map[string]any) (map[string]any, error)

Run implements gollem.ToolSet

func (*Agent) Specs

func (a *Agent) Specs(ctx context.Context) ([]gollem.ToolSpec, error)

Specs implements gollem.ToolSet

type Config

type Config struct {
	Tables           []TableConfig                              `yaml:"tables"`
	ScanSizeLimit    uint64                                     `yaml:"-"` // Parsed from ScanSizeLimitStr
	ScanSizeLimitStr string                                     `yaml:"scan_size_limit"`
	QueryTimeout     time.Duration                              `yaml:"query_timeout"` // Timeout for waiting for BigQuery job completion (default: 5 minutes)
	Runbooks         map[types.RunbookID]*bigquery.RunbookEntry `yaml:"-"`             // Loaded runbooks (not in YAML)
}

Config represents BigQuery Agent configuration

func LoadConfig

func LoadConfig(path string) (*Config, error)

LoadConfig loads BigQuery Agent configuration from a YAML file

func LoadConfigWithRunbooks added in v0.6.0

func LoadConfigWithRunbooks(ctx context.Context, path string, runbookPaths []string) (*Config, error)

LoadConfigWithRunbooks loads configuration and runbooks

func (*Config) GetQueryTimeout

func (c *Config) GetQueryTimeout() time.Duration

GetQueryTimeout returns the query timeout with fallback to default

type TableConfig

type TableConfig struct {
	ProjectID   string `yaml:"project_id"`
	DatasetID   string `yaml:"dataset_id"`
	TableID     string `yaml:"table_id"`
	Description string `yaml:"description,omitempty"`
}

TableConfig represents a BigQuery table configuration

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL