importer

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 6, 2026 License: MIT Imports: 13 Imported by: 0

README

Importer Package

The importer package provides functionality for importing job data into the ClusterCockpit database from archived job files.

Overview

This package supports two primary import workflows:

  1. Bulk Database Initialization - Reinitialize the entire job database from archived jobs
  2. Individual Job Import - Import specific jobs from metadata/data file pairs

Both workflows enrich job metadata by calculating performance footprints and energy consumption metrics before persisting to the database.

Main Entry Points

InitDB()

Reinitializes the job database from all archived jobs.

if err := importer.InitDB(); err != nil {
    log.Fatal(err)
}

This function:

  • Flushes existing job, tag, and jobtag tables
  • Iterates through all jobs in the configured archive
  • Enriches each job with calculated metrics
  • Inserts jobs into the database in batched transactions (100 jobs per batch)
  • Continues on individual job failures, logging errors

Use Case: Initial database setup or complete database rebuild from archive.

HandleImportFlag(flag string)

Imports jobs from specified file pairs.

// Format: "<meta.json>:<data.json>[,<meta2.json>:<data2.json>,...]"
flag := "/path/to/meta.json:/path/to/data.json"
if err := importer.HandleImportFlag(flag); err != nil {
    log.Fatal(err)
}

This function:

  • Parses the comma-separated file pairs
  • Validates metadata and job data against schemas (if validation enabled)
  • Enriches each job with footprints and energy metrics
  • Imports jobs into both the archive and database
  • Fails fast on the first error

Use Case: Importing specific jobs from external sources or manual job additions.

Job Enrichment

Both import workflows use enrichJobMetadata() to calculate:

Performance Footprints

Performance footprints are calculated from metric averages based on the subcluster configuration:

job.Footprint["mem_used_avg"] = 45.2  // GB
job.Footprint["cpu_load_avg"] = 0.87   // percentage
Energy Metrics

Energy consumption is calculated from power metrics using the formula:

Energy (kWh) = (Power (W) × Duration (s) / 3600) / 1000

For each energy metric:

job.EnergyFootprint["acc_power"] = 12.5  // kWh
job.Energy = 150.2  // Total energy in kWh

Note: Energy calculations for metrics with unit "energy" (Joules) are not yet implemented.

Data Validation

SanityChecks(job *schema.Job)

Validates job metadata before database insertion:

  • Cluster exists in configuration
  • Subcluster is valid (assigns if needed)
  • Job state is valid
  • Resources and user fields are populated
  • Node counts and hardware thread counts are positive
  • Resource count matches declared node count

Normalization Utilities

The package includes utilities for normalizing metric values to appropriate SI prefixes:

Normalize(avg float64, prefix string)

Adjusts values and SI prefixes for readability:

factor, newPrefix := importer.Normalize(2048.0, "M")  
// Converts 2048 MB → ~2.0 GB
// Returns: factor for conversion, "G"

This is useful for automatically scaling metrics (e.g., memory, storage) to human-readable units.

Dependencies

  • github.com/ClusterCockpit/cc-backend/internal/repository - Database operations
  • github.com/ClusterCockpit/cc-backend/pkg/archive - Job archive access
  • github.com/ClusterCockpit/cc-lib/schema - Job schema definitions
  • github.com/ClusterCockpit/cc-lib/ccLogger - Logging
  • github.com/ClusterCockpit/cc-lib/ccUnits - SI unit handling

Error Handling

  • InitDB: Continues processing on individual job failures, logs errors, returns summary
  • HandleImportFlag: Fails fast on first error, returns immediately
  • Both functions log detailed error context for debugging

Performance

  • Transaction Batching: InitDB processes jobs in batches of 100 for optimal database performance
  • Tag Caching: Tag IDs are cached during import to minimize database queries
  • Progress Reporting: InitDB prints progress updates during bulk operations

Documentation

Overview

Package importer provides functionality for importing job data into the ClusterCockpit database.

The package supports two primary use cases:

  1. Bulk database initialization from archived jobs via InitDB()
  2. Individual job import from file pairs via HandleImportFlag()

Both operations enrich job metadata by calculating footprints and energy metrics before persisting to the database.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func HandleImportFlag

func HandleImportFlag(flag string) error

HandleImportFlag imports jobs from file pairs specified in a comma-separated flag string.

The flag format is: "<path-to-meta.json>:<path-to-data.json>[,<path-to-meta2.json>:<path-to-data2.json>,...]"

For each job pair, this function:

  1. Reads and validates the metadata JSON file (schema.Job)
  2. Reads and validates the job data JSON file (schema.JobData)
  3. Enriches the job with calculated footprints and energy metrics
  4. Validates the job using SanityChecks()
  5. Imports the job into the archive
  6. Inserts the job into the database with associated tags

Schema validation is performed if config.Keys.Validate is true.

Returns an error if file reading, validation, enrichment, or database operations fail. The function stops processing on the first error encountered.

func InitDB

func InitDB() error

InitDB reinitializes the job database from archived job data.

This function performs the following operations:

  1. Flushes existing job, tag, and jobtag tables
  2. Iterates through all jobs in the archive
  3. Enriches each job with calculated footprints and energy metrics
  4. Inserts jobs and tags into the database in batched transactions

Jobs are processed in batches of 100 for optimal performance. The function continues processing even if individual jobs fail, logging errors and returning a summary at the end.

Returns an error if database initialization, transaction management, or critical operations fail. Individual job failures are logged but do not stop the overall import process.

func Normalize

func Normalize(avg float64, p string) (float64, string)

Normalize adjusts a metric value and its SI unit prefix to a more readable range.

This function is useful for automatically scaling metrics to appropriate units. For example, normalizing 2048 MiB might result in ~2.0 GiB.

The function analyzes the average value and determines if a different SI prefix would make the number more human-readable (typically keeping values between 1 and 1000).

Parameters:

  • avg: The metric value to normalize
  • p: The current SI prefix as a string (e.g., "K", "M", "G")

Returns:

  • factor: The multiplicative factor to apply to convert the value
  • newPrefix: The new SI prefix string to use

Example:

factor, newPrefix := Normalize(2048.0, "M")  // returns factor for MB->GB conversion, "G"

func SanityChecks

func SanityChecks(job *schema.Job) error

SanityChecks validates job metadata and ensures cluster/subcluster configuration is valid.

This function performs the following validations:

  1. Verifies the cluster exists in the archive configuration
  2. Assigns and validates the subcluster (may modify job.SubCluster)
  3. Validates job state is a recognized value
  4. Ensures resources and user fields are populated
  5. Validates node counts and hardware thread counts are positive
  6. Verifies the number of resources matches the declared node count

The function may modify the job's SubCluster field if it needs to be assigned.

Returns an error if any validation check fails.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL