database

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 1, 2025 License: MIT Imports: 20 Imported by: 0

README

Database Schema Design

This document describes the database schema design for TypoSentinel's enterprise repository scanning features.

Overview

The database schema is designed to support:

  • Repository metadata storage
  • Organization data management
  • Scan results and findings tracking
  • Job queue and progress monitoring
  • Dependency analysis and vulnerability tracking

Schema Components

Core Tables
1. Organizations (organizations)

Stores metadata about organizations being scanned.

Key Fields:

  • platform: Source platform (github, gitlab, etc.)
  • login: Organization identifier
  • scan_status: Current scan status
  • scan_config: Organization-specific scan configuration
  • metadata: Platform-specific additional data
2. Repositories (repositories)

Stores metadata about individual repositories.

Key Fields:

  • platform: Source platform
  • organization: Parent organization
  • full_name: Complete repository identifier
  • scan_status: Current scan status
  • language: Primary programming language
  • is_private, is_fork, is_archived: Repository flags
  • metadata: Additional repository data
3. Scan Jobs (scan_jobs)

Tracks organization-wide scan jobs and their progress.

Key Fields:

  • job_id: Unique job identifier (UUID)
  • job_type: Type of scan (organization_scan, etc.)
  • status: Job status (pending, running, completed, etc.)
  • progress_percentage: Job completion percentage
  • scan_config: Job-specific configuration
  • total_repositories, completed_repositories: Progress tracking
4. Scan Results (scan_results)

Stores results of individual repository scans.

Key Fields:

  • repository_id: Reference to scanned repository
  • scan_id: Unique scan session identifier
  • scan_type: Type of scan performed
  • status: Scan completion status
  • results: Detailed scan results (JSONB)
  • sarif_output: SARIF-formatted output
  • threats_found, vulnerabilities_found: Summary counts
5. Scan Findings (scan_findings)

Detailed findings from scans (threats, vulnerabilities, etc.).

Key Fields:

  • finding_type: Type of finding (typosquatting, vulnerability, etc.)
  • severity: Finding severity level
  • package_name: Affected package
  • cve_id: CVE identifier (for vulnerabilities)
  • suspected_target: Target package (for typosquatting)
  • status: Finding resolution status
6. Repository Dependencies (repository_dependencies)

Tracks package dependencies found in repositories.

Key Fields:

  • package_name: Dependency package name
  • package_ecosystem: Package ecosystem (npm, pypi, etc.)
  • dependency_type: Type of dependency (direct, transitive, etc.)
  • manifest_file: Source manifest file
  • has_vulnerabilities: Security flag
  • is_malicious: Malware detection flag

Migration System

The schema uses a migration-based approach for version control:

Migration Files
  • Located in migrations/ directory
  • Named with format: 001_description.sql
  • Embedded in the binary using Go's embed package
  • Applied automatically on startup
Schema Manager

The SchemaManager handles:

  • Migration discovery and application
  • Checksum validation
  • Schema validation
  • Migration status tracking

Usage

Initializing the Database
import (
    "github.com/Alivanroy/Typosentinel/internal/database"
    "github.com/Alivanroy/Typosentinel/pkg/logger"
)

// Create schema manager
schemaManager := database.NewSchemaManager(db, logger)

// Initialize schema and apply migrations
if err := schemaManager.Initialize(ctx); err != nil {
    log.Fatal("Failed to initialize database schema:", err)
}

// Validate schema
if err := schemaManager.ValidateSchema(ctx); err != nil {
    log.Fatal("Schema validation failed:", err)
}
Checking Migration Status
migrations, err := schemaManager.GetMigrationStatus(ctx)
if err != nil {
    log.Fatal("Failed to get migration status:", err)
}

for _, migration := range migrations {
    status := "pending"
    if migration.AppliedAt != nil {
        status = "applied"
    }
    fmt.Printf("Migration %d: %s [%s]\n", migration.Version, migration.Name, status)
}

Database Configuration

database:
  type: postgres
  host: localhost
  port: 5432
  database: typosentinel
  username: typosentinel_user
  password: ${DB_PASSWORD}
  ssl_mode: require
  max_open_conns: 25
  max_idle_conns: 10
  conn_max_lifetime: 1h
  migrations_path: internal/database/migrations
SQLite (Development/Testing)
database:
  type: sqlite
  database: ./data/typosentinel.db
  migrations_path: internal/database/migrations

Indexes and Performance

Key Indexes
  • Repository lookups: platform, organization, full_name
  • Scan tracking: scan_status, last_scanned_at
  • Finding queries: finding_type, severity, package_name
  • Job monitoring: status, created_at, priority
  • JSONB fields: GIN indexes for metadata searches
Query Optimization
  • Composite indexes for common query patterns
  • Partial indexes for active/problematic records
  • JSONB indexes for flexible metadata queries

Security Considerations

Data Protection
  • Sensitive configuration stored in JSONB fields
  • No plain-text secrets in database
  • Audit trail through timestamps
Access Control
  • Database-level user permissions
  • Connection pooling and limits
  • SSL/TLS encryption for connections

Monitoring and Maintenance

Health Checks
  • Table existence validation
  • Migration status monitoring
  • Connection pool metrics
Cleanup Procedures
  • Old scan results archival
  • Completed job cleanup
  • Dependency data refresh

Future Enhancements

Planned Features
  • Audit logging table
  • User management and permissions
  • Scan scheduling and triggers
  • Historical trend analysis
  • Cross-repository dependency graphs
Scalability Considerations
  • Table partitioning for large datasets
  • Read replicas for reporting
  • Archival strategies for historical data
  • Caching layers for frequent queries

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type DatabaseConfig

type DatabaseConfig struct {
	Host     string `json:"host"`
	Port     int    `json:"port"`
	User     string `json:"user"`
	Password string `json:"password"`
	DBName   string `json:"dbname"`
	SSLMode  string `json:"sslmode"`
	MaxConns int    `json:"max_conns"`
	MaxIdle  int    `json:"max_idle"`
}

DatabaseConfig contains database connection configuration

type DatabaseManager

type DatabaseManager struct {
	// contains filtered or unexported fields
}

DatabaseManager manages database connections and initialization

func NewDatabaseManager

func NewDatabaseManager(config InitConfig, logger *logger.Logger) *DatabaseManager

NewDatabaseManager creates a new database manager

func (*DatabaseManager) Close

func (dm *DatabaseManager) Close() error

Close closes the database connection

func (*DatabaseManager) GetDB

func (dm *DatabaseManager) GetDB() *sql.DB

GetDB returns the database connection

func (*DatabaseManager) GetMigrationStatus

func (dm *DatabaseManager) GetMigrationStatus(ctx context.Context) ([]Migration, error)

GetMigrationStatus returns the current migration status

func (*DatabaseManager) GetSchemaManager

func (dm *DatabaseManager) GetSchemaManager() *SchemaManager

GetSchemaManager returns the schema manager

func (*DatabaseManager) GetService

func (dm *DatabaseManager) GetService() *DatabaseService

GetService returns the database service

func (*DatabaseManager) HealthCheck

func (dm *DatabaseManager) HealthCheck(ctx context.Context) error

HealthCheck performs a health check on the database

func (*DatabaseManager) Initialize

func (dm *DatabaseManager) Initialize(ctx context.Context) error

Initialize initializes the database connection and runs migrations

func (*DatabaseManager) ValidateSchema

func (dm *DatabaseManager) ValidateSchema(ctx context.Context) error

ValidateSchema validates the database schema

type DatabaseService

type DatabaseService struct {
	// contains filtered or unexported fields
}

DatabaseService provides CRUD operations for the database

func NewDatabaseService

func NewDatabaseService(config *DatabaseConfig) (*DatabaseService, error)

NewDatabaseService creates a new database service

func (*DatabaseService) Close

func (ds *DatabaseService) Close() error

Close closes the database connection

func (*DatabaseService) CreateOrganization

func (ds *DatabaseService) CreateOrganization(ctx context.Context, org *Organization) error

CreateOrganization creates a new organization record

func (*DatabaseService) CreateRepository

func (ds *DatabaseService) CreateRepository(ctx context.Context, repo *Repository) error

CreateRepository creates a new repository record

func (*DatabaseService) CreateScanJob

func (ds *DatabaseService) CreateScanJob(ctx context.Context, job *ScanJob) error

CreateScanJob creates a new scan job record

func (*DatabaseService) GetOrganization

func (ds *DatabaseService) GetOrganization(ctx context.Context, platform, login string) (*Organization, error)

GetOrganization retrieves an organization by platform and login

func (*DatabaseService) GetRepositoriesByOrganization

func (ds *DatabaseService) GetRepositoriesByOrganization(ctx context.Context, orgID string, limit, offset int) ([]*Repository, error)

GetRepositoriesByOrganization retrieves repositories for an organization

func (*DatabaseService) GetScanJob

func (ds *DatabaseService) GetScanJob(ctx context.Context, jobID string) (*ScanJob, error)

GetScanJob retrieves a scan job by ID

func (*DatabaseService) HealthCheck

func (ds *DatabaseService) HealthCheck(ctx context.Context) error

HealthCheck performs a database health check

func (*DatabaseService) UpdateScanJobStatus

func (ds *DatabaseService) UpdateScanJobStatus(ctx context.Context, jobID, status string, progress float64) error

UpdateScanJobStatus updates the status and progress of a scan job

type ExternalThreat

type ExternalThreat struct {
	PackageName string                 `json:"package_name"`
	Registry    string                 `json:"registry"`
	ThreatType  string                 `json:"threat_type"`
	Severity    string                 `json:"severity"`
	Confidence  float64                `json:"confidence"`
	Description string                 `json:"description"`
	Source      string                 `json:"source"`
	Metadata    map[string]interface{} `json:"metadata,omitempty"`
	ReportedAt  time.Time              `json:"reported_at"`
}

ExternalThreat represents threat data from external sources

type InitConfig

type InitConfig struct {
	Host            string        `yaml:"host" env:"DB_HOST" default:"localhost"`
	Port            int           `yaml:"port" env:"DB_PORT" default:"5432"`
	Database        string        `yaml:"database" env:"DB_NAME" default:"typosentinel"`
	Username        string        `yaml:"username" env:"DB_USER" default:"postgres"`
	Password        string        `yaml:"password" env:"DB_PASSWORD" default:""`
	SSLMode         string        `yaml:"ssl_mode" env:"DB_SSL_MODE" default:"disable"`
	MaxOpenConns    int           `yaml:"max_open_conns" env:"DB_MAX_OPEN_CONNS" default:"25"`
	MaxIdleConns    int           `yaml:"max_idle_conns" env:"DB_MAX_IDLE_CONNS" default:"5"`
	ConnMaxLifetime time.Duration `yaml:"conn_max_lifetime" env:"DB_CONN_MAX_LIFETIME" default:"1h"`
	ConnMaxIdleTime time.Duration `yaml:"conn_max_idle_time" env:"DB_CONN_MAX_IDLE_TIME" default:"30m"`
}

InitConfig holds database initialization configuration

type Migration

type Migration struct {
	Version   int
	Name      string
	Filename  string
	SQL       string
	AppliedAt *time.Time
	Checksum  string
}

Migration represents a database migration

type Organization

type Organization struct {
	ID          string                 `json:"id"`
	Platform    string                 `json:"platform"`
	Login       string                 `json:"login"`
	Name        *string                `json:"name"`
	Description *string                `json:"description"`
	HTMLURL     *string                `json:"html_url"`
	AvatarURL   *string                `json:"avatar_url"`
	Type        string                 `json:"type"`
	Location    *string                `json:"location"`
	Email       *string                `json:"email"`
	Blog        *string                `json:"blog"`
	Twitter     *string                `json:"twitter"`
	Company     *string                `json:"company"`
	PublicRepos int                    `json:"public_repos"`
	PublicGists int                    `json:"public_gists"`
	Followers   int                    `json:"followers"`
	Following   int                    `json:"following"`
	CreatedAt   time.Time              `json:"created_at"`
	UpdatedAt   time.Time              `json:"updated_at"`
	ScanStatus  string                 `json:"scan_status"`
	LastScanAt  *time.Time             `json:"last_scan_at"`
	Metadata    map[string]interface{} `json:"metadata"`
}

Organization represents an organization record

type Repository

type Repository struct {
	ID            string                 `json:"id"`
	Platform      string                 `json:"platform"`
	OrgID         *string                `json:"org_id"`
	Owner         string                 `json:"owner"`
	Name          string                 `json:"name"`
	FullName      string                 `json:"full_name"`
	Description   *string                `json:"description"`
	HTMLURL       string                 `json:"html_url"`
	CloneURL      string                 `json:"clone_url"`
	SSHURL        string                 `json:"ssh_url"`
	Homepage      *string                `json:"homepage"`
	Language      *string                `json:"language"`
	IsPrivate     bool                   `json:"is_private"`
	IsFork        bool                   `json:"is_fork"`
	IsArchived    bool                   `json:"is_archived"`
	IsDisabled    bool                   `json:"is_disabled"`
	Size          int64                  `json:"size"`
	StarsCount    int                    `json:"stars_count"`
	WatchersCount int                    `json:"watchers_count"`
	ForksCount    int                    `json:"forks_count"`
	IssuesCount   int                    `json:"issues_count"`
	Topics        []string               `json:"topics"`
	Branches      []string               `json:"branches"`
	CreatedAt     time.Time              `json:"created_at"`
	UpdatedAt     time.Time              `json:"updated_at"`
	PushedAt      *time.Time             `json:"pushed_at"`
	ScanStatus    string                 `json:"scan_status"`
	LastScanAt    *time.Time             `json:"last_scan_at"`
	Metadata      map[string]interface{} `json:"metadata"`
}

Repository represents a repository record

type ScanJob

type ScanJob struct {
	ID              string                 `json:"id"`
	OrgID           string                 `json:"org_id"`
	JobType         string                 `json:"job_type"`
	Configuration   map[string]interface{} `json:"configuration"`
	Status          string                 `json:"status"`
	Progress        float64                `json:"progress"`
	StartedAt       *time.Time             `json:"started_at"`
	CompletedAt     *time.Time             `json:"completed_at"`
	EstimatedTime   *time.Duration         `json:"estimated_time"`
	ActualTime      *time.Duration         `json:"actual_time"`
	TotalRepos      int                    `json:"total_repos"`
	ScannedRepos    int                    `json:"scanned_repos"`
	FailedRepos     int                    `json:"failed_repos"`
	TotalThreats    int                    `json:"total_threats"`
	CriticalThreats int                    `json:"critical_threats"`
	HighThreats     int                    `json:"high_threats"`
	MediumThreats   int                    `json:"medium_threats"`
	LowThreats      int                    `json:"low_threats"`
	WorkerID        *string                `json:"worker_id"`
	RetryCount      int                    `json:"retry_count"`
	MaxRetries      int                    `json:"max_retries"`
	ErrorMessage    *string                `json:"error_message"`
	ErrorDetails    map[string]interface{} `json:"error_details"`
	CreatedAt       time.Time              `json:"created_at"`
	UpdatedAt       time.Time              `json:"updated_at"`
	Metadata        map[string]interface{} `json:"metadata"`
}

ScanJob represents a scan job record

type SchemaManager

type SchemaManager struct {
	// contains filtered or unexported fields
}

SchemaManager handles database schema migrations and initialization

func NewSchemaManager

func NewSchemaManager(db *sql.DB, logger *logger.Logger) *SchemaManager

NewSchemaManager creates a new schema manager

func (*SchemaManager) GetMigrationStatus

func (sm *SchemaManager) GetMigrationStatus(ctx context.Context) ([]Migration, error)

GetMigrationStatus returns the status of all migrations

func (*SchemaManager) Initialize

func (sm *SchemaManager) Initialize(ctx context.Context) error

Initialize sets up the database schema and runs migrations

func (*SchemaManager) ValidateSchema

func (sm *SchemaManager) ValidateSchema(ctx context.Context) error

ValidateSchema validates that the current database schema matches expectations

type ThreatDB

type ThreatDB struct {
	// contains filtered or unexported fields
}

ThreatDB represents the SQLite threat database

func NewThreatDB

func NewThreatDB(dbPath string) (*ThreatDB, error)

NewThreatDB creates a new threat database instance

func (*ThreatDB) AddPattern

func (tdb *ThreatDB) AddPattern(pattern *ThreatPattern) error

AddPattern adds a new threat detection pattern

func (*ThreatDB) AddThreat

func (tdb *ThreatDB) AddThreat(threat *ThreatRecord) error

AddThreat adds a new threat record to the database

func (*ThreatDB) Close

func (tdb *ThreatDB) Close() error

Close closes the database connection

func (*ThreatDB) DeleteThreat

func (tdb *ThreatDB) DeleteThreat(packageName, registry string) error

DeleteThreat removes a threat record

func (*ThreatDB) GetPatterns

func (tdb *ThreatDB) GetPatterns(patternType string) ([]*ThreatPattern, error)

GetPatterns retrieves all enabled threat patterns

func (*ThreatDB) GetStats

func (tdb *ThreatDB) GetStats() (map[string]int, error)

GetStats returns database statistics

func (*ThreatDB) GetThreat

func (tdb *ThreatDB) GetThreat(packageName, registry string) (*ThreatRecord, error)

GetThreat retrieves a threat by package name and registry

func (*ThreatDB) GetThreats

func (tdb *ThreatDB) GetThreats(registry, threatType string, limit int) ([]*ThreatRecord, error)

GetThreats retrieves all threats matching the given criteria

type ThreatPattern

type ThreatPattern struct {
	ID          int64     `json:"id"`
	Name        string    `json:"name"`
	Pattern     string    `json:"pattern"`
	PatternType string    `json:"pattern_type"` // regex, exact, fuzzy
	ThreatType  string    `json:"threat_type"`
	Severity    string    `json:"severity"`
	Enabled     bool      `json:"enabled"`
	CreatedAt   time.Time `json:"created_at"`
	UpdatedAt   time.Time `json:"updated_at"`
}

ThreatPattern represents a threat detection pattern

type ThreatRecord

type ThreatRecord struct {
	ID          int64     `json:"id"`
	PackageName string    `json:"package_name"`
	Registry    string    `json:"registry"`
	ThreatType  string    `json:"threat_type"`
	Severity    string    `json:"severity"`
	Confidence  float64   `json:"confidence"`
	Description string    `json:"description"`
	Source      string    `json:"source"`
	CreatedAt   time.Time `json:"created_at"`
	UpdatedAt   time.Time `json:"updated_at"`
	Metadata    string    `json:"metadata"` // JSON string for additional data
}

ThreatRecord represents a threat record in the database

func (*ThreatRecord) ConvertToThreat

func (tr *ThreatRecord) ConvertToThreat() *types.Threat

ConvertToThreat converts a ThreatRecord to types.Threat

type ThreatSource

type ThreatSource struct {
	Name        string    `json:"name"`
	URL         string    `json:"url"`
	APIKey      string    `json:"api_key,omitempty"`
	Enabled     bool      `json:"enabled"`
	LastUpdated time.Time `json:"last_updated"`
}

ThreatSource represents a source of threat intelligence

type ThreatUpdater

type ThreatUpdater struct {
	// contains filtered or unexported fields
}

ThreatUpdater manages threat intelligence updates

func NewThreatUpdater

func NewThreatUpdater(db *ThreatDB, sources []ThreatSource) *ThreatUpdater

NewThreatUpdater creates a new threat updater instance

func (*ThreatUpdater) AddSource

func (tu *ThreatUpdater) AddSource(source ThreatSource)

AddSource adds a new threat intelligence source

func (*ThreatUpdater) EnableSource

func (tu *ThreatUpdater) EnableSource(name string, enabled bool) error

EnableSource enables or disables a threat intelligence source

func (*ThreatUpdater) GetUpdateStatus

func (tu *ThreatUpdater) GetUpdateStatus() []ThreatSource

GetUpdateStatus returns the status of threat intelligence sources

func (*ThreatUpdater) RemoveSource

func (tu *ThreatUpdater) RemoveSource(name string)

RemoveSource removes a threat intelligence source by name

func (*ThreatUpdater) UpdateThreats

func (tu *ThreatUpdater) UpdateThreats(ctx context.Context) error

UpdateThreats fetches and updates threat data from all enabled sources

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL