importer

package

v1.5.0 Latest Latest Go to latest Published: Mar 6, 2026 License: MIT Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ClusterCockpit/cc-backend

Links

Open Source Insights

README ¶

Importer Package

The importer package provides functionality for importing job data into the ClusterCockpit database from archived job files.

Overview

This package supports two primary import workflows:

Bulk Database Initialization - Reinitialize the entire job database from archived jobs
Individual Job Import - Import specific jobs from metadata/data file pairs

Both workflows enrich job metadata by calculating performance footprints and energy consumption metrics before persisting to the database.

Main Entry Points

InitDB()

Reinitializes the job database from all archived jobs.

if err := importer.InitDB(); err != nil {
    log.Fatal(err)
}

This function:

Flushes existing job, tag, and jobtag tables
Iterates through all jobs in the configured archive
Enriches each job with calculated metrics
Inserts jobs into the database in batched transactions (100 jobs per batch)
Continues on individual job failures, logging errors

Use Case: Initial database setup or complete database rebuild from archive.

HandleImportFlag(flag string)

Imports jobs from specified file pairs.

// Format: "<meta.json>:<data.json>[,<meta2.json>:<data2.json>,...]"
flag := "/path/to/meta.json:/path/to/data.json"
if err := importer.HandleImportFlag(flag); err != nil {
    log.Fatal(err)
}

This function:

Parses the comma-separated file pairs
Validates metadata and job data against schemas (if validation enabled)
Enriches each job with footprints and energy metrics
Imports jobs into both the archive and database
Fails fast on the first error

Use Case: Importing specific jobs from external sources or manual job additions.

Job Enrichment

Both import workflows use enrichJobMetadata() to calculate:

Performance Footprints

Performance footprints are calculated from metric averages based on the subcluster configuration:

job.Footprint["mem_used_avg"] = 45.2  // GB
job.Footprint["cpu_load_avg"] = 0.87   // percentage

Energy Metrics

Energy consumption is calculated from power metrics using the formula:

Energy (kWh) = (Power (W) × Duration (s) / 3600) / 1000

For each energy metric:

job.EnergyFootprint["acc_power"] = 12.5  // kWh
job.Energy = 150.2  // Total energy in kWh

Note: Energy calculations for metrics with unit "energy" (Joules) are not yet implemented.

Data Validation

SanityChecks(job *schema.Job)

Validates job metadata before database insertion:

Cluster exists in configuration
Subcluster is valid (assigns if needed)
Job state is valid
Resources and user fields are populated
Node counts and hardware thread counts are positive
Resource count matches declared node count

Normalization Utilities

The package includes utilities for normalizing metric values to appropriate SI prefixes:

Normalize(avg float64, prefix string)

Adjusts values and SI prefixes for readability:

factor, newPrefix := importer.Normalize(2048.0, "M")  
// Converts 2048 MB → ~2.0 GB
// Returns: factor for conversion, "G"

This is useful for automatically scaling metrics (e.g., memory, storage) to human-readable units.

Dependencies

github.com/ClusterCockpit/cc-backend/internal/repository - Database operations
github.com/ClusterCockpit/cc-backend/pkg/archive - Job archive access
github.com/ClusterCockpit/cc-lib/schema - Job schema definitions
github.com/ClusterCockpit/cc-lib/ccLogger - Logging
github.com/ClusterCockpit/cc-lib/ccUnits - SI unit handling

Error Handling

InitDB: Continues processing on individual job failures, logs errors, returns summary
HandleImportFlag: Fails fast on first error, returns immediately
Both functions log detailed error context for debugging

Performance

Transaction Batching: InitDB processes jobs in batches of 100 for optimal database performance
Tag Caching: Tag IDs are cached during import to minimize database queries
Progress Reporting: InitDB prints progress updates during bulk operations

Documentation ¶

Overview ¶

Package importer provides functionality for importing job data into the ClusterCockpit database.

The package supports two primary use cases:

Bulk database initialization from archived jobs via InitDB()
Individual job import from file pairs via HandleImportFlag()

Both operations enrich job metadata by calculating footprints and energy metrics before persisting to the database.

Index ¶

func HandleImportFlag(flag string) error
func InitDB() error
func Normalize(avg float64, p string) (float64, string)
func SanityChecks(job *schema.Job) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HandleImportFlag ¶

func HandleImportFlag(flag string) error

HandleImportFlag imports jobs from file pairs specified in a comma-separated flag string.

The flag format is: "<path-to-meta.json>:<path-to-data.json>[,<path-to-meta2.json>:<path-to-data2.json>,...]"

For each job pair, this function:

Reads and validates the metadata JSON file (schema.Job)
Reads and validates the job data JSON file (schema.JobData)
Enriches the job with calculated footprints and energy metrics
Validates the job using SanityChecks()
Imports the job into the archive
Inserts the job into the database with associated tags

Schema validation is performed if config.Keys.Validate is true.

Returns an error if file reading, validation, enrichment, or database operations fail. The function stops processing on the first error encountered.

func InitDB ¶

func InitDB() error

InitDB reinitializes the job database from archived job data.

This function performs the following operations:

Flushes existing job, tag, and jobtag tables
Iterates through all jobs in the archive
Enriches each job with calculated footprints and energy metrics
Inserts jobs and tags into the database in batched transactions

Jobs are processed in batches of 100 for optimal performance. The function continues processing even if individual jobs fail, logging errors and returning a summary at the end.

Returns an error if database initialization, transaction management, or critical operations fail. Individual job failures are logged but do not stop the overall import process.

func Normalize ¶

func Normalize(avg float64, p string) (float64, string)

Normalize adjusts a metric value and its SI unit prefix to a more readable range.

This function is useful for automatically scaling metrics to appropriate units. For example, normalizing 2048 MiB might result in ~2.0 GiB.

The function analyzes the average value and determines if a different SI prefix would make the number more human-readable (typically keeping values between 1 and 1000).

Parameters:

avg: The metric value to normalize
p: The current SI prefix as a string (e.g., "K", "M", "G")

Returns:

factor: The multiplicative factor to apply to convert the value
newPrefix: The new SI prefix string to use

Example:

factor, newPrefix := Normalize(2048.0, "M")  // returns factor for MB->GB conversion, "G"

func SanityChecks ¶

func SanityChecks(job *schema.Job) error

SanityChecks validates job metadata and ensures cluster/subcluster configuration is valid.

This function performs the following validations:

Verifies the cluster exists in the archive configuration
Assigns and validates the subcluster (may modify job.SubCluster)
Validates job state is a recognized value
Ensures resources and user fields are populated
Validates node counts and hardware thread counts are positive
Verifies the number of resources matches the declared node count

The function may modify the job's SubCluster field if it needs to be assigned.

Returns an error if any validation check fails.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL