bigquery

package

v0.6.2 Latest Latest Go to latest Published: Nov 16, 2025 License: Apache-2.0 Imports: 32 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/secmon-lab/warren

Links

Open Source Insights

README ¶

BigQuery Agent Configuration

Configuration File

Create a YAML file defining BigQuery tables:

tables:
  - project_id: my-security-project
    dataset_id: cloudtrail_logs
    table_id: events
    description: CloudTrail events with user activity, API calls, and resource changes

scan_size_limit: "10GB"  # Required: Maximum bytes scanned per query
query_timeout: 5m        # Optional: Default 5m

Configuration Fields

Tables Structure

tables:
  - project_id: <project-id>  # Required: GCP project ID
    dataset_id: <dataset-id>  # Required: Dataset ID
    table_id: <table-id>      # Required: Table ID
    description: <desc>       # Recommended: Detailed table description

Global Settings

scan_size_limit (required): Maximum bytes scanned per query
- Format: "1GB", "10GB", "1TB", etc.
query_timeout (optional): Query timeout duration
- Format: "5m", "30s", etc.
- Default: 5m

Table Descriptions

Good descriptions improve agent performance significantly.

Good example:

description: |
  AWS CloudTrail events containing:
  - Authentication events (ConsoleLogin, AssumeRole)
  - API calls (eventName, requestParameters, responseElements)
  - Failed attempts (errorCode, errorMessage)
  - Fields: sourceIPAddress, userAgent, mfaUsed
  - Partitioned by event_date (last 90 days)

Bad example:

description: CloudTrail data  # Too vague

SQL Runbooks

SQL runbooks are pre-written query templates for common investigation patterns.

Runbook File Format

Create .sql files with metadata in comments:

-- Title: Failed Login Investigation
-- Description: Query to find failed login attempts by IP address
SELECT
  timestamp,
  user_email,
  source_ip,
  error_code
FROM `project.dataset.auth_logs`
WHERE
  event_type = 'login_failed'
  AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
ORDER BY timestamp DESC
LIMIT 100

Metadata Format (case insensitive):

-- Title: <title> or -- title: <title>
-- Description: <description> or -- description: <description>

If no title is specified, the filename (without .sql) is used.

How Runbooks Work

Loading: Runbooks are loaded at agent initialization
Listing: Agent's system prompt includes all runbook IDs, titles, and descriptions
Retrieval: Agent can use get_runbook tool to fetch full SQL content
Adaptation: Agent adapts runbook SQL for specific investigations

Deployment

Environment Variables

export WARREN_AGENT_BIGQUERY_CONFIG="/path/to/config.yaml"
export WARREN_AGENT_BIGQUERY_PROJECT_ID="my-project"
export WARREN_AGENT_BIGQUERY_SCAN_SIZE_LIMIT="10GB"  # Optional override
export WARREN_AGENT_BIGQUERY_RUNBOOK_DIR="/path/to/runbooks,/path/to/more"  # Optional

CLI Flags

warren serve \
  --agent-bigquery-config=/path/to/config.yaml \
  --agent-bigquery-project-id=my-project \
  --agent-bigquery-runbook-dir=/path/to/runbooks \
  --agent-bigquery-runbook-dir=/path/to/more/runbooks  # Can specify multiple times

Cloud Run

gcloud run services update warren \
  --set-env-vars="WARREN_AGENT_BIGQUERY_CONFIG=/app/config.yaml" \
  --set-env-vars="WARREN_AGENT_BIGQUERY_PROJECT_ID=my-project"

Note: Config file must be in the container image or mounted.

Authentication

Uses Application Default Credentials (ADC).

Local Development

gcloud auth application-default login

Service Account Impersonation

You can configure the BigQuery agent to impersonate a service account for all BigQuery operations. This is useful when:

Running with different permissions than the default credentials
Implementing least-privilege access patterns
Testing with specific service account permissions

Configuration

Set via CLI flag or environment variable:

# CLI flag
warren serve \
  --agent-bigquery-config=/path/to/config.yaml \
  --agent-bigquery-impersonate-service-account=bigquery-reader@my-project.iam.gserviceaccount.com

# Environment variable
export WARREN_AGENT_BIGQUERY_IMPERSONATE_SERVICE_ACCOUNT="bigquery-reader@my-project.iam.gserviceaccount.com"
warren serve --agent-bigquery-config=/path/to/config.yaml

Required Permissions

The identity running Warren (ADC) must have the roles/iam.serviceAccountTokenCreator role on the target service account:

gcloud iam service-accounts add-iam-policy-binding \
  bigquery-reader@my-project.iam.gserviceaccount.com \
  --member="<MEMBER>" \
  --role="roles/iam.serviceAccountTokenCreator"

The impersonated service account needs BigQuery permissions:

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:bigquery-reader@my-project.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:bigquery-reader@my-project.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

Service Account Permissions

Required IAM roles for the service account (either default ADC or impersonated):

roles/bigquery.jobUser
roles/bigquery.dataViewer

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:warren-service@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:warren-service@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataViewer"

Example Configurations

Minimal

tables:
  - project_id: my-project
    dataset_id: security_logs
    table_id: events
    description: Security event logs

scan_size_limit: "1GB"

Multiple Tables

tables:
  - project_id: prod-logs
    dataset_id: aws_cloudtrail
    table_id: events_2024
    description: |
      CloudTrail events from 2024:
      - Authentication (ConsoleLogin, AssumeRole)
      - S3 access (GetObject, PutObject)
      - IAM changes
      - Partitioned by event_date

  - project_id: prod-logs
    dataset_id: gcp_audit
    table_id: admin_activity
    description: GCP admin activity logs

  - project_id: staging-logs
    dataset_id: app_logs
    table_id: authentication
    description: App authentication events

scan_size_limit: "5GB"
query_timeout: 3m

Migration from Old Format

If you're upgrading from the previous nested format, convert your configuration:

Old format (deprecated):

projects:
  - id: my-project
    datasets:
      - id: security_logs
        tables:
          - id: events

New format:

tables:
  - project_id: my-project
    dataset_id: security_logs
    table_id: events

Documentation ¶

Index ¶

func ParseScanSizeLimit(sizeStr string) (uint64, error)
type Agent
- func New() *Agent
- func NewAgent(config *Config, llmClient gollem.LLMClient, memoryService *memory.Service) *Agent
type Config
- func LoadConfig(path string) (*Config, error)
- func LoadConfigWithRunbooks(ctx context.Context, path string, runbookPaths []string) (*Config, error)
- func (c *Config) GetQueryTimeout() time.Duration
type TableConfig

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ParseScanSizeLimit ¶

func ParseScanSizeLimit(sizeStr string) (uint64, error)

ParseScanSizeLimit parses human-readable size string (e.g., "10GB") into bytes

Types ¶

type Agent ¶

type Agent struct {
	// contains filtered or unexported fields
}

Agent represents a BigQuery Sub-Agent

func New ¶

func New() *Agent

New creates a new BigQuery Agent instance

func NewAgent ¶

func NewAgent(
	config *Config,
	llmClient gollem.LLMClient,
	memoryService *memory.Service,
) *Agent

NewAgent creates a new BigQuery Agent instance with config (for testing and direct use)

func (*Agent) Configure ¶

func (a *Agent) Configure(ctx context.Context) error

Configure implements interfaces.Tool

func (*Agent) Flags ¶

func (a *Agent) Flags() []cli.Flag

Flags returns CLI flags for this agent

func (*Agent) Helper ¶

func (a *Agent) Helper() *cli.Command

Helper implements interfaces.Tool

func (*Agent) ID ¶

func (a *Agent) ID() string

ID implements SubAgent interface

func (*Agent) Init ¶

func (a *Agent) Init(ctx context.Context, llmClient gollem.LLMClient, memoryService *memory.Service) (bool, error)

Init initializes the agent with LLM client and memory service. Returns (true, nil) if initialized successfully, (false, nil) if not configured, or (false, error) on error.

func (*Agent) IsEnabled ¶

func (a *Agent) IsEnabled() bool

IsEnabled returns true if the agent is configured and initialized

func (*Agent) LogValue ¶

func (a *Agent) LogValue() slog.Value

LogValue implements interfaces.Tool

func (*Agent) Name ¶

func (a *Agent) Name() string

Name implements interfaces.Tool

func (*Agent) Prompt ¶

func (a *Agent) Prompt(ctx context.Context) (string, error)

Prompt implements interfaces.Tool Returns table descriptions for system prompt

func (*Agent) Run ¶

func (a *Agent) Run(ctx context.Context, name string, args map[string]any) (map[string]any, error)

Run implements gollem.ToolSet

func (*Agent) Specs ¶

func (a *Agent) Specs(ctx context.Context) ([]gollem.ToolSpec, error)

Specs implements gollem.ToolSet

type Config ¶

type Config struct {
	Tables           []TableConfig                              `yaml:"tables"`
	ScanSizeLimit    uint64                                     `yaml:"-"` // Parsed from ScanSizeLimitStr
	ScanSizeLimitStr string                                     `yaml:"scan_size_limit"`
	QueryTimeout     time.Duration                              `yaml:"query_timeout"` // Timeout for waiting for BigQuery job completion (default: 5 minutes)
	Runbooks         map[types.RunbookID]*bigquery.RunbookEntry `yaml:"-"`             // Loaded runbooks (not in YAML)
}

Config represents BigQuery Agent configuration

func LoadConfig ¶

func LoadConfig(path string) (*Config, error)

LoadConfig loads BigQuery Agent configuration from a YAML file

func LoadConfigWithRunbooks ¶ added in v0.6.0

func LoadConfigWithRunbooks(ctx context.Context, path string, runbookPaths []string) (*Config, error)

LoadConfigWithRunbooks loads configuration and runbooks

func (*Config) GetQueryTimeout ¶

func (c *Config) GetQueryTimeout() time.Duration

GetQueryTimeout returns the query timeout with fallback to default

type TableConfig ¶

type TableConfig struct {
	ProjectID   string `yaml:"project_id"`
	DatasetID   string `yaml:"dataset_id"`
	TableID     string `yaml:"table_id"`
	Description string `yaml:"description,omitempty"`
}

TableConfig represents a BigQuery table configuration

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL