vulndb

package
v1.4.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 20, 2026 License: AGPL-3.0 Imports: 56 Imported by: 0

README

vulndb

The vulndb package manages the lifecycle of DevGuard's vulnerability database. It builds the database from upstream sources (OSV, EPSS, CISA KEV, exploit data, and malicious packages), packages it into a distributable archive, and imports it into a PostgreSQL database using an EXCEPT-based sync that is fully idempotent.


Overview

The package has two primary entry points:

  • ExportRC — fetches all upstream vulnerability data, writes it to the database, serializes snapshots to gob files, and packages everything into a vulndb.tar.zst archive that is pushed to an OCI registry.
  • ImportRC — pulls the archive from the OCI registry and imports its contents into a target database, with support for streaming (incremental) and bulk (full) processing modes.

Export (ExportRC)

OSV ingestion

OSV vulnerability data is downloaded in parallel for 13 ecosystems from:

https://storage.googleapis.com/osv-vulnerabilities/{ecosystem}/all.zip

Each zip contains one JSON file per vulnerability entry in OSV format.

Zip contents are processed through a pool of workers. To maximize write throughput, database indexes are dropped before bulk insertion and rebuilt afterwards using staging tables.

Additional data sources

After OSV ingestion completes, the following are fetched in parallel:

  • EPSS — exploit prediction scores.
  • CISA KEV — Known Exploited Vulnerabilities catalog.
  • Exploits — exploit metadata from ExploitDB and GitHub. Only exploits referencing a CVE present in the database are retained.

Before computing the integrity checksums, stale exploits (present in the database from a prior export but absent from the current fetch) are deleted so the live table exactly matches the gob.

Output artifacts

The following files are written before packaging:

File Contents
osv.gob Serialized OSV vulnerability entries
epss.gob EPSS scores
cisakev.gob CISA KEV entries
exploits.gob Exploit metadata
integrity_checks.json Per-table row counts and checksums used to validate imports

All artifacts are bundled into vulndb.tar.zst and pushed to an OCI registry.


Import (ImportRC)

Processing modes

Streaming (default)

Reads gob files in batches and streams them to staging tables. After all data is staged, syncAllTables applies an EXCEPT-based diff to the live tables (see below). Uses less memory than bulk mode.

Bulk (--bulk)

Loads all gob data into memory at once, then truncates the live tables and does a direct INSERT from staging. Faster for a clean initial import but requires ~2–3 GB of RAM.

EXCEPT-based sync (syncAllTables)

The incremental sync is implemented as a three-step SQL operation per table, handled by the generic syncTable function:

  1. DELETE rows present in the live table but absent from staging (EXCEPT).
  2. INSERT rows present in staging but absent from the live table (EXCEPT).
  3. UPDATE rows where the key exists on both sides but the content_hash (or equivalent change-detection column) differs.

This approach is fully idempotent — running the same import twice produces the same result. There is no dependency on a last-import watermark for correctness.

CVE change detection

CVEs use a stable primary key (id = hash(cve_string)) for FK stability, plus a separate content_hash column that covers the OSV-sourced fields (description, cvss, vector). EPSS and CISA KEV are intentionally excluded from content_hash — they are applied as separate UPDATE steps after the sync and their changes do not trigger a delete+reinsert of the CVE or its related rows.

EPSS and CISA KEV enrichment

After syncAllTables completes, EPSS scores and CISA KEV metadata are applied directly to the live cves table via bulk UPDATE. Before applying CISA KEV data, all CISA-related fields are reset to NULL so that CVEs removed from the KEV catalog do not retain stale metadata.


Integrity Checks

After every import, per-table row counts and checksums are computed and compared against the values in integrity_checks.json (written during export). A mismatch causes the import transaction to be rolled back and an error to be returned. Because the sync is deterministic, there is no retry — a mismatch indicates a real inconsistency between the gob data and the integrity file.


Affected Component Deduplication

During streaming, each batch transformer shares a componentToCVEs map (affectedComponentID → []cveID) across calls. This ensures that:

  • Each unique affected_components row is only staged once across all batches.
  • Each cve_affected_component pivot row is only staged once even if the same CVE→component relationship appears in multiple OSV entries.

Documentation

Index

Constants

View Source
const (
	RequirementsLevelHigh = "High"
)

Variables

View Source
var CisaKEVURL = "https://www.cisa.gov/sites/default/files/feeds/known_exploited_vulnerabilities.json"
View Source
var EpssURL = "https://epss.cyentia.com/epss_scores-current.csv.gz"
View Source
var Module = fx.Module("vulndb",
	fx.Provide(provideMaliciousPackageChecker),
	fx.Provide(fx.Annotate(NewVulnDBService, fx.As(new(shared.VulnDBService)))),
)

Functions

func AddIndexesAndConstraints added in v1.4.0

func AddIndexesAndConstraints(ctx context.Context, tx pgx.Tx) error

func ApplyQuickDiff added in v1.4.0

func ApplyQuickDiff(ctx context.Context, tx pgx.Tx, diff *QuickDiff) error

ApplyQuickDiff applies a pre-computed diff directly to the live tables without any staging tables or EXCEPT queries. EPSS and CISA KEV are still applied separately.

func CreateStagingTables added in v1.4.0

func CreateStagingTables(ctx context.Context, tx pgx.Tx) error

func FlushOSVStagingTables added in v1.4.0

func FlushOSVStagingTables(ctx context.Context, tx pgx.Tx) error

FlushOSVStagingTables is kept for the bulk import path which truncates live tables and then does a simple INSERT from staging (no EXCEPT diff needed on an empty table).

func InsertCISAKEVBulk added in v1.4.0

func InsertCISAKEVBulk(ctx context.Context, tx pgx.Tx, entries []CISAKEVEntry) error

func InsertCVERelationshipsBulk added in v1.4.0

func InsertCVERelationshipsBulk(ctx context.Context, tx pgx.Tx, cveRelationships []models.CVERelationship, table string) error

InsertCVERelationshipsBulk streams cve relationships into the staging table. Call flushStagingTables once after all batches.

func InsertCVEsBulk added in v1.4.0

func InsertCVEsBulk(ctx context.Context, tx pgx.Tx, cves []models.CVE, table string) error

InsertCVEsBulk streams cves into the staging table. Call flushStagingTables once after all batches.

func InsertEPSSBulk added in v1.4.0

func InsertEPSSBulk(ctx context.Context, tx pgx.Tx, epssData map[string]dtos.EPSS) error

func NewCISAKEVService

func NewCISAKEVService(cveRepository shared.CveRepository, cveRelationshipRepository shared.CVERelationshipRepository) cisaKEVService

func NewEPSSService

func NewEPSSService(cveRepository shared.CveRepository, cveRelationshipRepository shared.CVERelationshipRepository) epssService

func NewExploitDBService

func NewExploitDBService(exploitRepository shared.ExploitRepository) exploitDBService

func NewGithubExploitDBService

func NewGithubExploitDBService(exploitRepository shared.ExploitRepository) *githubExploitDBService

func NewOSVService

func NewOSVService(affectedCmpRepository shared.AffectedComponentRepository, cveRepository shared.CveRepository, cveRelationshipRepository shared.CVERelationshipRepository, pool *pgxpool.Pool) osvService

func PrepareBulkInsert added in v1.4.0

func PrepareBulkInsert(ctx context.Context, tx pgx.Tx) error

if we insert a lot of entries its faster to drop indexes and constrains and then rebuilding them afterwards instead of maintaining them on each insert also set some session parameters optimized for bulk inserts

func RawRisk

func RawRisk(cve *models.CVE, env shared.Environmental, affectedComponentDepth int) dtos.RiskCalculationReport

func RiskCalculation

func RiskCalculation(cve *models.CVE, env shared.Environmental) (dtos.RiskMetrics, string)

func RiskToColor

func RiskToColor(risk float64) string

returns hex without leading "#"

func RiskToSeverity

func RiskToSeverity(risk float64) (string, error)

func SnapshotPrevState added in v1.4.0

func SnapshotPrevState(ctx context.Context, tx pgx.Tx) error

SnapshotPrevState creates lightweight temp tables capturing the current DB state before the export truncates and reloads everything. Call this inside the export transaction before any TRUNCATE.

func SyncAllTables added in v1.4.0

func SyncAllTables(ctx context.Context, tx pgx.Tx) error

SyncAllTables syncs every staging table into its live counterpart using EXCEPT-based set operations. It replaces the old flush functions and makes every import fully idempotent regardless of import history.

func TruncateCVERelatedTables added in v1.4.0

func TruncateCVERelatedTables(ctx context.Context, tx pgx.Tx) error

func ValidateIntegrityInformation added in v1.4.0

func ValidateIntegrityInformation(workingDir string, groundTruth IntegrityInformation, localIntegrityInformation []TableIntegrityInformation) ([]string, bool)

returns a string slice with failing tables if nil, then all tables are valid

Types

type CISAKEVEntry added in v1.4.0

type CISAKEVEntry struct {
	CVE               string
	ExploitAddDate    *time.Time
	ActionDueDate     *time.Time
	RequiredAction    string
	VulnerabilityName string
}

CISAKEVEntry is the gob-safe representation of a CISA KEV record. Dates are stored as *time.Time to avoid the datatypes.Date gob limitation.

type Explanation

type Explanation struct {
	dtos.RiskMetrics

	ExploitMessage struct {
		Short string
		Long  string
	}
	EPSSMessage           string
	CVSSBEMessage         string
	ComponentDepthMessage string
	CVSSMessage           string
	DependencyVulnID      uuid.UUID
	Risk                  float64

	Depth int
	EPSS  float64

	CVEID          string
	CVEDescription string

	ComponentPurl string
	ArtifactNames string
	FixedVersion  *string

	ShortenedComponentPurl string `json:"componentPurl" gorm:"type:text;default:null;"`
}

func Explain

func Explain(dependencyVuln models.DependencyVuln, asset models.Asset, vector string, riskMetrics dtos.RiskMetrics) Explanation

provide the vector and risk metrics obtained from the risk calculation

func (Explanation) GenerateCommandsToFixPackage

func (e Explanation) GenerateCommandsToFixPackage() string

func (Explanation) Markdown

func (e Explanation) Markdown(baseURL, orgSlug, projectSlug, assetSlug, assetVersionSlug string, mermaidPathToComponent string) string

type GithubExploitDTO

type GithubExploitDTO struct {
	ID          int        `json:"id"`
	Owner       Owner      `json:"owner"`
	HTMLURL     string     `json:"html_url"`
	Description string     `json:"description"`
	Published   *time.Time `json:"pushed_at"`
	Updated     *time.Time `json:"updated_at"`
	Subscribers int        `json:"subscribers_count"`
	Watchers    int        `json:"watchers_count"`
	Stars       int        `json:"stargazers_count"`
	Forks       int        `json:"forks_count"`
}

type GobExploit added in v1.4.0

type GobExploit struct {
	ID          string
	ContentHash int64
	Published   *time.Time
	Updated     *time.Time
	Author      string
	Type        string
	Verified    bool
	SourceURL   string
	Description string
	CVEID       string
	Tags        string
	Forks       int
	Watchers    int
	Subscribers int
	Stars       int
}

GobExploit is the gob-safe representation of models.Exploit. It omits the nested CVE field which contains datatypes.Date.

type GobMaliciousComponent added in v1.4.0

type GobMaliciousComponent struct {
	ID                 string
	MaliciousPackageID string
	PurlWithoutVersion string
	Ecosystem          string
	Version            *string
	SemverIntroduced   *string
	SemverFixed        *string
	VersionIntroduced  *string
	VersionFixed       *string
}

GobMaliciousComponent is the gob-safe representation of models.MaliciousAffectedComponent.

type GobMaliciousPackagesExport added in v1.4.0

type GobMaliciousPackagesExport struct {
	Package    models.MaliciousPackage
	Components []GobMaliciousComponent
}

GobMaliciousPackagesExport bundles the full malicious-packages snapshot. models.MaliciousPackage only contains plain types and is gob-safe directly.

type IntegrityInformation added in v1.4.0

type IntegrityInformation struct {
	TableIntegrity   []TableIntegrityInformation `json:"table_integrity"`
	ImportTimestamp  time.Time                   `json:"import_timestamp"`
	ArtifactChecksum string                      `json:"artifact_checksum,omitempty"`
}

type MaliciousPackageChecker

type MaliciousPackageChecker struct {
	// contains filtered or unexported fields
}

MaliciousPackageChecker checks packages against the malicious package database

func NewMaliciousPackageChecker

func NewMaliciousPackageChecker(
	repository *repositories.MaliciousPackageRepository,
) (*MaliciousPackageChecker, error)

func (*MaliciousPackageChecker) IsMalicious

func (c *MaliciousPackageChecker) IsMalicious(ctx context.Context, ecosystem, packageName, version string) (bool, *dtos.OSV, error)

type OSVEntry added in v1.4.0

type OSVEntry struct {
	OSV               *dtos.OSV
	ModifiedTimestamp time.Time
}

type Owner

type Owner struct {
	Login string `json:"login"`
}

type QuickDiff added in v1.4.0

type QuickDiff struct {
	FromVersion time.Time

	CVEsDeleted  []int64
	CVEsInserted []quickDiffCVE // rows whose id is new
	CVEsUpdated  []quickDiffCVE // rows whose content_hash changed

	RelationshipsDeleted  []quickDiffRelKey
	RelationshipsInserted []quickDiffRelKey

	AffectedComponentsDeleted  []int64
	AffectedComponentsInserted []quickDiffAC

	PivotDeleted  []quickDiffPivot
	PivotInserted []quickDiffPivot

	ExploitsDeleted  []string
	ExploitsInserted []GobExploit
	ExploitsUpdated  []GobExploit

	MalPkgsDeleted  []string
	MalPkgsInserted []models.MaliciousPackage
	MalPkgsUpdated  []models.MaliciousPackage

	MalCompsDeleted  []string
	MalCompsInserted []GobMaliciousComponent
}

QuickDiff is a pre-computed incremental patch from one vulndb version to the next. When the importer's current DB version matches FromVersion the patch can be applied directly — no staging tables, no EXCEPT queries over millions of rows.

func ComputeQuickDiff added in v1.4.0

func ComputeQuickDiff(ctx context.Context, tx pgx.Tx, fromVersion time.Time) (*QuickDiff, error)

ComputeQuickDiff runs SQL diffs between the snapshot (prev state) and the current live tables (new state) and collects the results into a QuickDiff. Call this after the new data has been fully loaded into the live tables and EPSS/CISA applied.

type TableIntegrityInformation added in v1.4.0

type TableIntegrityInformation struct {
	TableName  string `json:"table_name"`
	Checksum   []byte `json:"checksum"`
	TotalCount int    `json:"total_count"`
}

func CalculateTotalIntegrityInformation added in v1.4.0

func CalculateTotalIntegrityInformation(ctx context.Context, tx pgx.Tx) ([]TableIntegrityInformation, error)

type VulnDBService added in v1.4.0

type VulnDBService struct {
	// contains filtered or unexported fields
}

VulnDBService orchestrates the full vulnerability database export and import, covering OSV, EPSS, CISA KEV, exploits (ExploitDB + GitHub PoC), and malicious packages.

func NewVulnDBService added in v1.4.0

func NewVulnDBService(
	cveRepository shared.CveRepository,
	cveRelationshipRepository shared.CVERelationshipRepository,
	affectedCmpRepository shared.AffectedComponentRepository,
	exploitRepository shared.ExploitRepository,
	maliciousPackageChecker *MaliciousPackageChecker,
	configService shared.ConfigService,
	pool *pgxpool.Pool,
) *VulnDBService

func (*VulnDBService) ExportRC added in v1.4.0

func (s *VulnDBService) ExportRC(ctx context.Context) error

ExportRC fetches all vulnerability data sources, writes gob files for each, populates the database, and writes a full integrity_checks.json.

func (*VulnDBService) ExportRCWithDiff added in v1.4.0

func (s *VulnDBService) ExportRCWithDiff(ctx context.Context, localArchive bool) error

ExportRCWithDiff is like ExportRC but also computes a QuickDiff against the current DB state and writes it as quickdiff.gob into the archive. Importers on exactly the previous version can skip staging tables and apply the patch directly. It first imports the current artifact to establish a known baseline in the DB, then exports fresh data and computes the diff — making it self-contained in CI.

func (*VulnDBService) ImportRC added in v1.4.0

func (s *VulnDBService) ImportRC(ctx context.Context, opts shared.ImportOptions) (err error)

ImportRC pulls the latest vulndb artifact from the OCI registry and applies all data sources (OSV, CISA KEV, exploits, malicious packages) to the database. If the integrity check fails after an incremental import, it alerts and retries as a full import (ignoring the last-import watermark).

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL