ingest

package
v0.1.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 9, 2026 License: GPL-3.0 Imports: 26 Imported by: 0

Documentation

Overview

Package ingest provides unified ingestion of assets and findings from various formats.

Index

Constants

View Source
const (
	// MaxAssetsPerReport is the maximum number of assets allowed in a single report.
	MaxAssetsPerReport = 100000

	// MaxFindingsPerReport is the maximum number of findings allowed in a single report.
	MaxFindingsPerReport = 100000

	// MaxPropertySize is the maximum size of a single property value in bytes.
	MaxPropertySize = 1024 * 1024 // 1MB

	// MaxPropertiesPerAsset is the maximum number of properties per asset.
	MaxPropertiesPerAsset = 100

	// MaxTagsPerAsset is the maximum number of tags per asset.
	MaxTagsPerAsset = 50

	// MaxErrorsToReturn limits the number of errors returned in the response.
	MaxErrorsToReturn = 100

	// BatchSize for database operations.
	BatchSize = 500

	// UnknownValue is used as a fallback when a required field is empty.
	UnknownValue = "unknown"
)
View Source
const MaxAssetNameLength = 500

MaxAssetNameLength is the maximum allowed length for auto-created asset names.

Variables

This section is empty.

Functions

func ConvertCTISDataFlowToFindingDataFlows

func ConvertCTISDataFlowToFindingDataFlows(
	findingID shared.ID,
	ctisDataFlow *ctis.DataFlow,
) ([]*vulnerability.FindingDataFlow, []*vulnerability.FindingFlowLocation, error)

ConvertCTISDataFlowToFindingDataFlows converts a CTIS DataFlow to domain FindingDataFlow and FindingFlowLocations. This function bridges the gap between the CTIS input schema and the domain entities.

Returns: - flows: slice of FindingDataFlow (usually 1, but could be more for complex flows) - locations: slice of FindingFlowLocation for all flows - error: if validation fails

func ConvertSARIFCodeFlowsToCTISDataFlows

func ConvertSARIFCodeFlowsToCTISDataFlows(codeFlows any) []*ctis.DataFlow

ConvertSARIFCodeFlowsToCTISDataFlows converts SARIF codeFlows to CTIS DataFlow structures. This is useful when ingesting SARIF reports that contain dataflow information.

SARIF codeFlow structure: - codeFlows[].threadFlows[].locations[].location

This function flattens SARIF's nested structure into the simpler CTIS DataFlow format.

func UnmarshalReport

func UnmarshalReport(data []byte) (*ctis.Report, error)

UnmarshalReport parses a CTIS report from JSON bytes.

Types

type AssetProcessor

type AssetProcessor struct {
	// contains filtered or unexported fields
}

AssetProcessor handles batch asset processing.

func NewAssetProcessor

func NewAssetProcessor(repo asset.Repository, log *logger.Logger) *AssetProcessor

NewAssetProcessor creates a new asset processor.

func (*AssetProcessor) ProcessBatch

func (p *AssetProcessor) ProcessBatch(
	ctx context.Context,
	tenantID shared.ID,
	report *ctis.Report,
	output *Output,
) (map[string]shared.ID, error)

ProcessBatch processes all assets using batch operations. Returns a map of asset ID (from CTIS) -> domain asset ID for finding association.

If no explicit assets are provided in the report but findings exist, it will attempt to auto-create an asset using a priority chain:

  1. BranchInfo.RepositoryURL in report metadata (most reliable for CI/CD)
  2. Unique AssetValue from findings (if all findings reference same asset)
  3. Scope.Name from report metadata
  4. Inferred repository from file path patterns (e.g., github.com/org/repo)
  5. Tool+ScanID fallback (ensures findings are never orphaned)

func (*AssetProcessor) SetRelationshipRepository added in v0.1.2

func (p *AssetProcessor) SetRelationshipRepository(repo asset.RelationshipRepository)

SetRelationshipRepository sets the asset relationship repository.

func (*AssetProcessor) SetRepositoryExtensionRepository

func (p *AssetProcessor) SetRepositoryExtensionRepository(repo asset.RepositoryExtensionRepository)

SetRepositoryExtensionRepository sets the repository extension repository.

func (*AssetProcessor) UpdateFindingCounts

func (p *AssetProcessor) UpdateFindingCounts(ctx context.Context, tenantID shared.ID, assetIDs []shared.ID) error

UpdateFindingCounts updates finding counts for processed assets.

type CheckFingerprintsInput

type CheckFingerprintsInput struct {
	Fingerprints []string `json:"fingerprints"`
}

CheckFingerprintsInput is the input for fingerprint checking.

type CheckFingerprintsOutput

type CheckFingerprintsOutput struct {
	Existing []string `json:"existing"`
	Missing  []string `json:"missing"`
}

CheckFingerprintsOutput is the result of fingerprint checking.

type ComponentOutput

type ComponentOutput struct {
	ComponentsCreated  int
	ComponentsUpdated  int
	DependenciesLinked int
	LicensesLinked     int
	Errors             []string
	Warnings           []string
}

ComponentOutput tracks component processing results.

type ComponentProcessor

type ComponentProcessor struct {
	// contains filtered or unexported fields
}

ComponentProcessor handles batch processing of dependencies/components during ingestion.

func NewComponentProcessor

func NewComponentProcessor(repo component.Repository, logger *slog.Logger) *ComponentProcessor

NewComponentProcessor creates a new component processor.

func (*ComponentProcessor) ProcessBatch

func (p *ComponentProcessor) ProcessBatch(
	ctx context.Context,
	tenantID shared.ID,
	report *ctis.Report,
	assetMap map[string]shared.ID,
	output *Output,
) error

ProcessBatch processes all dependencies from a CTIS report. It creates/updates global components and links them to assets. Three-pass approach to handle foreign key constraints: 1. Pass 1: Create all global components and collect their IDs 2. Pass 2: Insert asset_components WITHOUT parent_component_id 3. Pass 3: Update asset_components WITH parent_component_id

type CoverageType

type CoverageType string

CoverageType indicates the scan coverage level.

const (
	// CoverageTypeFull indicates a full scan that covers the entire codebase.
	// Auto-resolve is only enabled for full scans.
	CoverageTypeFull CoverageType = "full"

	// CoverageTypeIncremental indicates an incremental/diff scan covering only changed files.
	// Auto-resolve is disabled for incremental scans to prevent false auto-resolution.
	CoverageTypeIncremental CoverageType = "incremental"

	// CoverageTypePartial indicates a partial scan (e.g., specific directories).
	// Auto-resolve is disabled for partial scans.
	CoverageTypePartial CoverageType = "partial"
)

type FailedFinding

type FailedFinding struct {
	Index       int    `json:"index"`       // Index in the original report
	Fingerprint string `json:"fingerprint"` // Finding fingerprint
	RuleID      string `json:"rule_id"`     // Rule/check ID
	FilePath    string `json:"file_path"`   // File path if available
	Line        int    `json:"line"`        // Line number if available
	Error       string `json:"error"`       // Error message
}

FailedFinding contains details about a finding that failed during ingestion. This provides debugging context for audit logs.

type FindingCreatedCallback

type FindingCreatedCallback func(ctx context.Context, tenantID shared.ID, findings []*vulnerability.Finding)

FindingCreatedCallback is called when findings are created during ingestion.

type FindingProcessor

type FindingProcessor struct {
	// contains filtered or unexported fields
}

FindingProcessor handles batch finding processing.

func NewFindingProcessor

func NewFindingProcessor(repo vulnerability.FindingRepository, branchRepo branch.Repository, assetRepo asset.Repository, log *logger.Logger) *FindingProcessor

NewFindingProcessor creates a new finding processor.

func (*FindingProcessor) CheckFingerprints

func (p *FindingProcessor) CheckFingerprints(
	ctx context.Context,
	tenantID shared.ID,
	fingerprints []string,
) (existing, missing []string, err error)

CheckFingerprints checks which fingerprints already exist in the database.

func (*FindingProcessor) ProcessBatch

func (p *FindingProcessor) ProcessBatch(
	ctx context.Context,
	agt *agent.Agent,
	tenantID shared.ID,
	report *ctis.Report,
	assetMap map[string]shared.ID,
	tenantRules branch.BranchTypeRules,
	output *Output,
) error

ProcessBatch processes all findings using batch operations.

func (*FindingProcessor) SetActivityService added in v0.1.2

func (p *FindingProcessor) SetActivityService(svc activityRecorder)

SetActivityService sets the activity service for recording auto-reopen audit trail.

func (*FindingProcessor) SetComponentRepository

func (p *FindingProcessor) SetComponentRepository(repo component.Repository)

SetComponentRepository sets the component repository for linking findings to components.

func (*FindingProcessor) SetDataFlowRepository

func (p *FindingProcessor) SetDataFlowRepository(repo vulnerability.DataFlowRepository)

SetDataFlowRepository sets the data flow repository for persisting data flow traces.

func (*FindingProcessor) SetFindingCreatedCallback

func (p *FindingProcessor) SetFindingCreatedCallback(callback FindingCreatedCallback)

SetFindingCreatedCallback sets the callback for when findings are created.

type Input

type Input struct {
	Report *ctis.Report

	// CoverageType indicates the scan coverage level.
	// Auto-resolve is only enabled for full scans on default branch.
	// Default is empty, which disables auto-resolve for safety.
	CoverageType CoverageType

	// BranchInfo provides git branch context for branch-aware lifecycle.
	// Auto-resolve only applies when IsDefaultBranch=true and CoverageType=full.
	// If nil, branch info is read from Report.Metadata.Branch.
	BranchInfo *ctis.BranchInfo
}

Input represents the unified input for ingestion. All formats (CTIS, SARIF, Recon, etc.) are converted to this via adapters.

func (Input) GetBranchInfo

func (i Input) GetBranchInfo() *ctis.BranchInfo

GetBranchInfo returns branch info from Input or Report metadata. Input.BranchInfo takes precedence over Report.Metadata.Branch.

func (Input) IsDefaultBranchScan

func (i Input) IsDefaultBranchScan() bool

IsDefaultBranchScan returns true if this is a scan on the default branch.

func (Input) ShouldAutoResolve

func (i Input) ShouldAutoResolve() bool

ShouldAutoResolve returns true if auto-resolve should be enabled for this scan. Conditions: CoverageType=full AND scanning default branch.

type Output

type Output struct {
	ReportID             string   `json:"report_id"`
	AssetsCreated        int      `json:"assets_created"`
	AssetsUpdated        int      `json:"assets_updated"`
	FindingsCreated      int      `json:"findings_created"`
	FindingsUpdated      int      `json:"findings_updated"`
	FindingsSkipped      int      `json:"findings_skipped"`
	FindingsAutoResolved int      `json:"findings_auto_resolved,omitempty"`
	FindingsAutoReopened int      `json:"findings_auto_reopened,omitempty"`
	ComponentsCreated    int      `json:"components_created,omitempty"`
	ComponentsUpdated    int      `json:"components_updated,omitempty"`
	DependenciesLinked   int      `json:"dependencies_linked,omitempty"`
	LicensesDiscovered   int      `json:"licenses_discovered,omitempty"`
	LicensesLinked       int      `json:"licenses_linked,omitempty"`
	Errors               []string `json:"errors,omitempty"`
	Warnings             []string `json:"warnings,omitempty"`

	// FailedFindings contains detailed info about findings that failed to save.
	// This is used for audit logging and debugging purposes.
	FailedFindings []FailedFinding `json:"-"` // Not exposed in API response
}

Output represents the result of ingestion.

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service handles ingestion of assets and findings from various formats. This is the unified service that uses CTIS as the internal format. Supported input formats: CTIS (native), SARIF (via SDK converter), Recon (via SDK converter).

func NewService

func NewService(
	assetRepo asset.Repository,
	findingRepo vulnerability.FindingRepository,
	compRepo component.Repository,
	agentRepo agent.Repository,
	branchRepo branch.Repository,
	tenantRepo tenant.Repository,
	auditRepo audit.Repository,
	log *logger.Logger,
) *Service

NewService creates a new unified ingest service.

func (*Service) CheckFingerprints

func (s *Service) CheckFingerprints(ctx context.Context, agt *agent.Agent, input CheckFingerprintsInput) (*CheckFingerprintsOutput, error)

CheckFingerprints checks which fingerprints already exist in the database.

func (*Service) Ingest

func (s *Service) Ingest(ctx context.Context, agt *agent.Agent, input Input) (*Output, error)

Ingest processes a CTIS report from an agent. This is the main entry point for all ingestion.

func (*Service) IngestRecon

func (s *Service) IngestRecon(ctx context.Context, agt *agent.Agent, reconInput *ctis.ReconToCTISInput) (*Output, error)

IngestRecon processes recon data and ingests it.

func (*Service) IngestSARIF

func (s *Service) IngestSARIF(ctx context.Context, agt *agent.Agent, sarifData []byte) (*Output, error)

IngestSARIF processes a SARIF log and ingests it as findings.

func (*Service) SetActivityService added in v0.1.2

func (s *Service) SetActivityService(activityService *app.FindingActivityService)

SetActivityService sets the finding activity service for audit trail during ingestion.

func (*Service) SetComponentRepository

func (s *Service) SetComponentRepository(repo component.Repository)

SetComponentRepository sets the component repository for linking findings to components.

func (*Service) SetDataFlowRepository

func (s *Service) SetDataFlowRepository(repo vulnerability.DataFlowRepository)

SetDataFlowRepository sets the data flow repository for persisting taint tracking traces.

func (*Service) SetFindingCreatedCallback

func (s *Service) SetFindingCreatedCallback(callback FindingCreatedCallback)

SetFindingCreatedCallback sets the callback for when findings are created. This is used to trigger workflows when new findings are ingested.

func (*Service) SetRelationshipRepository added in v0.1.2

func (s *Service) SetRelationshipRepository(repo asset.RelationshipRepository)

SetRelationshipRepository sets the asset relationship repository for creating subdomain-to-domain relationships during asset ingestion.

func (*Service) SetRepositoryExtensionRepository

func (s *Service) SetRepositoryExtensionRepository(repo asset.RepositoryExtensionRepository)

SetRepositoryExtensionRepository sets the repository extension repository for auto-creating repository extensions with web_url during asset ingestion.

type Validator

type Validator struct {
	// contains filtered or unexported fields
}

Validator validates ingest inputs.

func NewValidator

func NewValidator() *Validator

NewValidator creates a new ingest validator.

func (*Validator) ValidateAssetProperties

func (v *Validator) ValidateAssetProperties(assetType string, properties map[string]any) []string

ValidateAssetProperties validates asset properties and returns warnings.

func (*Validator) ValidatePropertiesCount

func (v *Validator) ValidatePropertiesCount(properties map[string]any) bool

ValidatePropertiesCount checks if the number of properties exceeds the limit.

func (*Validator) ValidatePropertySize

func (v *Validator) ValidatePropertySize(properties map[string]any) (oversizedKey string, size int)

ValidatePropertySize checks if any property value exceeds the maximum allowed size. Returns the key of the first oversized property, or empty string if all are valid.

func (*Validator) ValidateReport

func (v *Validator) ValidateReport(report *ctis.Report) error

ValidateReport validates a CTIS report.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL