schema

package
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 18, 2025 License: MIT Imports: 15 Imported by: 2

README

Schema Package

The schema package provides core data structures and types for the ClusterCockpit system, including HPC job metadata, cluster configurations, performance metrics, user authentication, and validation utilities.

Overview

This package defines the fundamental schemas used throughout ClusterCockpit for representing:

  • Job Data: Complete metadata and metrics for HPC jobs
  • Cluster Configuration: Hardware topology and metric collection settings
  • Performance Metrics: Time series data with statistical aggregations
  • User Management: Authentication and authorization
  • Validation: JSON schema validation for data integrity

Key Components

Job Structures
Job

The central data structure containing all information about an HPC job:

  • Identification (cluster, job ID, user, project)
  • Resources (nodes, cores, accelerators, memory)
  • Timing (submission, start, duration)
  • State (current job state, monitoring status)
  • Metrics (performance statistics, time series data)
  • Metadata (tags, energy footprint)
JobMetric

Performance metric data with time series from individual hardware components and aggregated statistics.

JobState

Enumeration of job execution states matching common HPC schedulers (SLURM, PBS).

Cluster Configuration
Cluster

Complete HPC cluster definition with subclusters and metric collection settings.

SubCluster

Homogeneous partition within a cluster sharing identical hardware configuration.

Topology

Hardware topology mapping showing relationships between nodes, sockets, cores, hardware threads, and accelerators.

Metrics and Statistics
MetricScope

Hierarchical levels for metric measurement:

node > socket > memoryDomain > core > hwthread
(accelerator is a special scope at hwthread level)
Series

Time series of metric measurements from a single source (node, core, etc.) with min/avg/max statistics.

Float

Custom float64 type with special NaN handling:

  • NaN values serialize as JSON null
  • JSON null deserializes to NaN
  • Avoids pointer overhead for nullable metrics
  • Compatible with both JSON and GraphQL
User Management
User

User account with authentication and authorization information.

Role

Authorization hierarchy:

Anonymous < Api < User < Manager < Support < Admin
AuthSource

Authentication backends: LocalPassword, LDAP, Token, OIDC

Validation
Validate(kind Kind, r io.Reader) error

Validates JSON data against embedded JSON schemas:

  • Meta: Job metadata structure
  • Data: Job metric data
  • ClusterCfg: Cluster configuration

Usage Examples

Validating Cluster Configuration
import (
    "bytes"
    "github.com/ClusterCockpit/cc-lib/schema"
)

// Validate cluster.json against schema
err := schema.Validate(schema.ClusterCfg, bytes.NewReader(clusterJSON))
if err != nil {
    log.Fatal("Invalid cluster configuration:", err)
}
Working with Metrics
// Create job metric data
jobMetric := &schema.JobMetric{
    Unit:     schema.Unit{Base: "FLOP/s", Prefix: "G"},
    Timestep: 60,
    Series: []schema.Series{
        {
            Hostname: "node001",
            Data:     []schema.Float{1.5, 2.0, 1.8, schema.NaN}, // NaN for missing data
            Statistics: schema.MetricStatistics{
                Min: 1.5,
                Avg: 1.77,
                Max: 2.0,
            },
        },
    },
}

// Add aggregated statistics
jobMetric.AddStatisticsSeries()
User Role Checking
user := &schema.User{
    Username: "alice",
    Roles:    []string{"user", "manager"},
}

// Check if user has manager role
if user.HasRole(schema.RoleManager) {
    // Grant project-level access
}

// Check for admin or support
if user.HasAnyRole([]schema.Role{schema.RoleAdmin, schema.RoleSupport}) {
    // Grant elevated privileges
}
Topology Navigation
// Get sockets used by a job's hardware threads
hwthreads := []int{0, 1, 2, 3, 20, 21, 22, 23}
sockets, exclusive := topology.GetSocketsFromHWThreads(hwthreads)

if exclusive {
    fmt.Printf("Job has exclusive access to sockets: %v\n", sockets)
} else {
    fmt.Printf("Job shares sockets: %v\n", sockets)
}

Database Models

The package includes database models for persistent storage:

  • NodeDB: Static node configuration
  • NodeStateDB: Time-stamped node state snapshots
  • Job: Contains both API fields and raw database fields (Raw*)

Raw database fields (RawResources, RawMetaData, etc.) store JSON blobs that are decoded into typed fields when loaded from the database.

JSON Schema Files

Embedded JSON schemas in schemas/ directory:

  • cluster.schema.json: Cluster configuration validation
  • job-meta.schema.json: Job metadata validation
  • job-data.schema.json: Job metric data validation
  • job-metric-data.schema.json: Individual metric validation
  • job-metric-statistics.schema.json: Metric statistics validation
  • unit.schema.json: Unit of measurement validation

Performance Considerations

Float Type

The custom Float type avoids pointer overhead for nullable metrics. In large-scale metric storage with millions of data points, this provides significant memory savings compared to *float64.

Series Marshaling

The Series.MarshalJSON() method provides optimized JSON serialization with fewer allocations, important for REST API performance when returning large metric datasets.

  • ccLogger: Logging utilities used for validation warnings
  • util: Helper functions for statistics (median calculation)
  • jsonschema/v5: JSON schema validation implementation

Testing

Run tests:

go test ./schema/... -v

Check test coverage:

go test ./schema/... -cover

View godoc:

go doc -all schema

Documentation

Overview

Package schema provides core data structures and types for the ClusterCockpit system.

This package defines the fundamental schemas used throughout ClusterCockpit for representing HPC job metadata, cluster configurations, performance metrics, user authentication, and validation utilities.

Key components:

Job Data Structures:

  • Job: Complete metadata for HPC jobs including resources, state, and statistics
  • JobMetric: Performance metrics data with time series and statistics
  • JobState: Enumeration of possible job states (running, completed, failed, etc.)

Cluster Configuration:

  • Cluster: HPC cluster definition with subclusters and metric configuration
  • SubCluster: Partition of a cluster with specific hardware topology
  • Topology: Hardware topology mapping (nodes, sockets, cores, accelerators)

Metrics and Statistics:

  • MetricScope: Hierarchical metric scopes (node, socket, core, hwthread, accelerator)
  • Series: Time series data for metrics with statistics
  • JobStatistics: Statistical aggregations (min, avg, max) for job metrics

User Management:

  • User: User account with roles, projects, and authentication information
  • Role: Authorization levels (admin, support, manager, user, api, anonymous)
  • AuthSource: Authentication source types (local, LDAP, token, OIDC)

Validation:

  • Validate: JSON schema validation for job metadata, job data, and cluster configs
  • Kind: Enumeration of schema types for validation

Special Types:

  • Float: Custom float64 wrapper that handles NaN as JSON null for efficient metric storage
  • Node: Node state information including scheduler and monitoring states

The types in this package are designed to be serialized to/from JSON and are used across REST APIs, GraphQL interfaces, and internal data processing pipelines.

Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.

Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.

Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.

Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.

Index

Constants

View Source
const (
	MonitoringStatusDisabled            int32 = 0
	MonitoringStatusRunningOrArchiving  int32 = 1
	MonitoringStatusArchivingFailed     int32 = 2
	MonitoringStatusArchivingSuccessful int32 = 3
)

Variables

This section is empty.

Functions

func ConvertFloatToFloat64

func ConvertFloatToFloat64(s []Float) []float64

ConvertFloatToFloat64 converts a slice of Float values to a slice of float64 values. NaN values in the Float slice will remain as NaN in the float64 slice.

func GetRoleString

func GetRoleString(roleInt Role) string

func GetValidRoles

func GetValidRoles(user *User) ([]string, error)

Called by API endpoint '/roles/' from frontend: Only required for admin config -> Check Admin Role

func GetValidRolesMap

func GetValidRolesMap(user *User) (map[string]Role, error)

Called by routerConfig web.page setup in backend: Only requires known user

func IsValidRole

func IsValidRole(role string) bool

func Validate

func Validate(k Kind, r io.Reader) error

Validate validates JSON data against an embedded JSON schema.

The kind parameter determines which schema is used:

  • Meta: Validates job metadata structure
  • Data: Validates job performance metric data
  • ClusterCfg: Validates cluster configuration

The reader should contain JSON-encoded data to validate. Returns nil if validation succeeds, or an error describing validation failures.

Example:

err := schema.Validate(schema.ClusterCfg, bytes.NewReader(clusterJSON))

Types

type Accelerator

type Accelerator struct {
	ID    string `json:"id"`    // Unique identifier for the accelerator (e.g., "0", "1", "GPU-0")
	Type  string `json:"type"`  // Type of accelerator (e.g., "Nvidia GPU", "AMD GPU")
	Model string `json:"model"` // Specific model name (e.g., "A100", "MI100")
}

Accelerator represents a hardware accelerator (e.g., GPU, FPGA) attached to a compute node. Each accelerator has a unique identifier and type/model information.

type AuthSource

type AuthSource int

AuthSource identifies the authentication backend that validated a user.

const (
	AuthViaLocalPassword AuthSource = iota // Local database password authentication
	AuthViaLDAP                            // LDAP directory authentication
	AuthViaToken                           // JWT or API token authentication
	AuthViaOIDC                            // OpenID Connect authentication
	AuthViaAll                             // Accepts any auth source (special case)
)

type AuthType

type AuthType int

AuthType distinguishes between different authentication contexts.

const (
	AuthToken   AuthType = iota // API token-based authentication
	AuthSession                 // Session cookie-based authentication
)

type Cluster

type Cluster struct {
	Name         string          `json:"name"`         // Unique cluster name (e.g., "fritz", "alex")
	MetricConfig []*MetricConfig `json:"metricConfig"` // Cluster-wide metric configurations
	SubClusters  []*SubCluster   `json:"subClusters"`  // Homogeneous partitions within the cluster
}

Cluster represents a complete HPC cluster configuration. A cluster consists of one or more subclusters and defines metric collection/evaluation settings.

type ClusterSupport

type ClusterSupport struct {
	Cluster     string   `json:"cluster"`     // Cluster name
	SubClusters []string `json:"subclusters"` // List of subcluster names supporting this metric
}

ClusterSupport indicates which subclusters within a cluster support a particular metric. Used to track metric availability across heterogeneous clusters.

type Float

type Float float64

Float is a custom float64 type with special handling for NaN values in JSON and GraphQL serialization.

Standard Go encoding/json treats NaN as an error, but in metric data it's common to have missing or invalid measurements that should be represented as null in JSON. This type allows NaN values to be serialized as JSON null and vice versa, while avoiding the memory overhead of using *float64 pointers for every nullable metric value.

Key behaviors:

  • NaN values marshal to JSON null
  • JSON null unmarshals to NaN
  • Regular float values marshal/unmarshal normally
  • GraphQL marshaling follows the same null handling

This is particularly important for time series metric data where missing data points are common and need efficient representation.

var (
	NaN Float = Float(math.NaN())
)

func ConvertToFloat added in v0.8.0

func ConvertToFloat(input float64) Float

ConvertToFloat converts a regular float64 to a Float, treating -1.0 as a sentinel for NaN. This is useful when reading from systems that use -1.0 to indicate missing data.

func GetFloat64ToFloat

func GetFloat64ToFloat(s []float64) []Float

GetFloat64ToFloat converts a slice of float64 values to a slice of Float values. This is the inverse operation of ConvertFloatToFloat64.

func (Float) Double added in v0.8.0

func (f Float) Double() float64

func (Float) IsNaN

func (f Float) IsNaN() bool

func (Float) MarshalGQL

func (f Float) MarshalGQL(w io.Writer)

MarshalGQL implements the graphql.Marshaler interface. NaN will be serialized to `null`.

func (Float) MarshalJSON

func (f Float) MarshalJSON() ([]byte, error)

NaN will be serialized to `null`.

func (*Float) UnmarshalGQL

func (f *Float) UnmarshalGQL(v any) error

UnmarshalGQL implements the graphql.Unmarshaler interface.

func (*Float) UnmarshalJSON

func (f *Float) UnmarshalJSON(input []byte) error

`null` will be unserialized to NaN.

type FloatArray added in v0.8.0

type FloatArray []Float

FloatArray is an alias for []Float that can be marshaled to JSON more efficiently. This type exists to provide optimized JSON marshaling for arrays of Float values.

type GlobalMetricListItem

type GlobalMetricListItem struct {
	Name         string      `json:"name"`                // Metric name
	Unit         Unit        `json:"unit"`                // Unit of measurement
	Scope        MetricScope `json:"scope"`               // Metric scope level
	Footprint    string      `json:"footprint,omitempty"` // Footprint category
	Restrict     bool
	Availability []ClusterSupport `json:"availability"` // Where this metric is available
}

GlobalMetricListItem represents a metric in the global metric catalog. Tracks which clusters and subclusters support this metric across the entire system.

type Job

type Job struct {
	Cluster            string                   `json:"cluster" db:"cluster" example:"fritz"`
	SubCluster         string                   `json:"subCluster" db:"subcluster" example:"main"`
	Partition          string                   `json:"partition,omitempty" db:"cluster_partition" example:"main"`
	Project            string                   `json:"project" db:"project" example:"abcd200"`
	User               string                   `json:"user" db:"hpc_user" example:"abcd100h"`
	Shared             string                   `json:"shared" db:"shared" enums:"none,single_user,multi_user"`
	State              JobState                 `` /* 172-byte string literal not displayed */
	Tags               []*Tag                   `json:"tags,omitempty"`
	RawEnergyFootprint []byte                   `json:"-" db:"energy_footprint"`
	RawFootprint       []byte                   `json:"-" db:"footprint"`
	RawMetaData        []byte                   `json:"-" db:"meta_data"`
	RawResources       []byte                   `json:"-" db:"resources"`
	Resources          []*Resource              `json:"resources"`
	EnergyFootprint    map[string]float64       `json:"energyFootprint"`
	Footprint          map[string]float64       `json:"footprint"`
	MetaData           map[string]string        `json:"metaData"`
	ConcurrentJobs     JobLinkResultList        `json:"concurrentJobs"`
	Energy             float64                  `json:"energy" db:"energy"`
	ArrayJobID         int64                    `json:"arrayJobId,omitempty" db:"array_job_id" example:"123000"`
	Walltime           int64                    `json:"walltime,omitempty" db:"walltime" example:"86400" minimum:"1"`
	RequestedMemory    int64                    `json:"requestedMemory,omitempty" db:"requested_memory" example:"128000" minimum:"1"` // in MB
	JobID              int64                    `json:"jobId" db:"job_id" example:"123000"`
	Duration           int32                    `json:"duration" db:"duration" example:"43200" minimum:"1"`
	SMT                int32                    `json:"smt,omitempty" db:"smt" example:"4"`
	MonitoringStatus   int32                    `json:"monitoringStatus,omitempty" db:"monitoring_status" example:"1" minimum:"0" maximum:"3"`
	NumAcc             int32                    `json:"numAcc,omitempty" db:"num_acc" example:"2" minimum:"1"`
	NumHWThreads       int32                    `json:"numHwthreads,omitempty" db:"num_hwthreads" example:"20" minimum:"1"`
	NumNodes           int32                    `json:"numNodes" db:"num_nodes" example:"2" minimum:"1"`
	Statistics         map[string]JobStatistics `json:"statistics"`
	ID                 *int64                   `json:"id,omitempty" db:"id"`
	SubmitTime         int64                    `json:"submitTime,omitempty" db:"submit_time" example:"1649723812"`
	StartTime          int64                    `json:"startTime" db:"start_time" example:"1649723812"`
}

Job represents complete metadata for an HPC job in ClusterCockpit.

This is the central data structure containing all information about a job including: - Identification: cluster, job ID, user, project - Resources: nodes, cores, accelerators, memory - Timing: submission, start time, duration - State: current job state and monitoring status - Metrics: performance statistics and time series data - Metadata: tags, energy footprint, custom metadata

The RawX fields are used for database serialization of complex nested structures that are stored as JSON blobs in the database and decoded into their respective typed fields (Resources, EnergyFootprint, Footprint, MetaData) when loaded.

Job model @Description Information of a HPC job.

func (Job) GoString added in v0.3.0

func (j Job) GoString() string

type JobData

type JobData map[string]map[MetricScope]*JobMetric

JobData maps metric names to their data organized by scope. Structure: map[metricName]map[scope]*JobMetric

For example: jobData["cpu_load"]MetricScopeNode contains node-level CPU load data. This structure allows efficient lookup of metrics at different hierarchical levels.

func (*JobData) AddNodeScope

func (jd *JobData) AddNodeScope(metric string) bool

func (*JobData) RoundMetricStats added in v0.3.0

func (jd *JobData) RoundMetricStats()

func (*JobData) Size

func (jd *JobData) Size() int
type JobLink struct {
	ID    int64 `json:"id"`    // Internal database ID
	JobID int64 `json:"jobId"` // The job's external job ID
}

JobLink represents a lightweight reference to a job, typically used for linking related jobs. Used to track concurrent jobs or job relationships without including full job metadata.

type JobLinkResultList

type JobLinkResultList struct {
	Items []*JobLink `json:"items"` // List of job links
	Count int        `json:"count"` // Total count of available items
}

JobLinkResultList holds a paginated list of job links with a total count. Typically used for API responses that return lists of related jobs.

type JobMetric

type JobMetric struct {
	StatisticsSeries *StatsSeries `json:"statisticsSeries,omitempty"` // Aggregated statistics over time
	Unit             Unit         `json:"unit"`                       // Unit of measurement
	Series           []Series     `json:"series"`                     // Individual time series data
	Timestep         int          `json:"timestep"`                   // Sampling interval in seconds
}

JobMetric contains time series data and statistics for a single metric.

The Series field holds time series data from individual nodes/hardware components, while StatisticsSeries provides aggregated statistics across all series over time.

func (*JobMetric) AddPercentiles

func (jm *JobMetric) AddPercentiles(ps []int) bool

func (*JobMetric) AddStatisticsSeries

func (jm *JobMetric) AddStatisticsSeries()

type JobState

type JobState string

JobState represents the execution state of an HPC job. Valid states match common HPC scheduler states (SLURM, PBS, etc.).

const (
	JobStateBootFail    JobState = "boot_fail"
	JobStateCancelled   JobState = "cancelled"
	JobStateCompleted   JobState = "completed"
	JobStateDeadline    JobState = "deadline"
	JobStateFailed      JobState = "failed"
	JobStateNodeFail    JobState = "node_fail"
	JobStateOutOfMemory JobState = "out_of_memory"
	JobStatePending     JobState = "pending"
	JobStatePreempted   JobState = "preempted"
	JobStateRunning     JobState = "running"
	JobStateSuspended   JobState = "suspended"
	JobStateTimeout     JobState = "timeout"
)

func (JobState) MarshalGQL

func (e JobState) MarshalGQL(w io.Writer)

func (*JobState) UnmarshalGQL

func (e *JobState) UnmarshalGQL(v any) error

func (JobState) Valid

func (e JobState) Valid() bool

type JobStatistics

type JobStatistics struct {
	Unit Unit    `json:"unit"`
	Avg  float64 `json:"avg" example:"2500" minimum:"0"` // Job metric average
	Min  float64 `json:"min" example:"2000" minimum:"0"` // Job metric minimum
	Max  float64 `json:"max" example:"3000" minimum:"0"` // Job metric maximum
}

JobStatistics model @Description Specification for job metric statistics.

type Kind

type Kind int

Kind identifies which JSON schema to use for validation. Each kind corresponds to a different embedded schema file.

const (
	Meta       Kind = iota + 1 // Job metadata schema (job-meta.schema.json)
	Data                       // Job metric data schema (job-data.schema.json)
	ClusterCfg                 // Cluster configuration schema (cluster.schema.json)
)

type Metric added in v0.3.0

type Metric struct {
	Name    string  `json:"name"`    // Metric name (e.g., "cpu_load", "mem_used")
	Unit    Unit    `json:"unit"`    // Unit of measurement
	Peak    float64 `json:"peak"`    // Peak/maximum expected value (best performance)
	Normal  float64 `json:"normal"`  // Normal/typical value (good performance)
	Caution float64 `json:"caution"` // Caution threshold (concerning but not critical)
	Alert   float64 `json:"alert"`   // Alert threshold (requires attention)
}

Metric defines thresholds for a performance metric used in job classification and alerts. Thresholds help categorize job performance: peak (excellent), normal (good), caution (concerning), alert (problem).

type MetricConfig

type MetricConfig struct {
	Metric                            // Embedded metric thresholds
	Energy        string              `json:"energy"`                // Energy measurement method
	Scope         MetricScope         `json:"scope"`                 // Metric scope (node, socket, core, etc.)
	Aggregation   string              `json:"aggregation"`           // Aggregation function (avg, sum, min, max)
	Footprint     string              `json:"footprint,omitempty"`   // Footprint category
	SubClusters   []*SubClusterConfig `json:"subClusters,omitempty"` // Subcluster-specific overrides
	Timestep      int                 `json:"timestep"`              // Measurement interval in seconds
	Restrict      bool                `json:"restrict"`              // Restrict visibility to non user roles
	LowerIsBetter bool                `json:"lowerIsBetter"`         // Whether lower values are better
}

MetricConfig defines the configuration for a performance metric at the cluster level. Specifies how the metric is collected, aggregated, and evaluated across the cluster.

type MetricScope

type MetricScope string

MetricScope defines the hierarchical level at which a metric is measured.

Scopes form a hierarchy from coarse-grained (node) to fine-grained (hwthread/accelerator):

node > socket > memoryDomain > core > hwthread
accelerator is a special scope at the same level as hwthread

The scopePrecedence map defines numeric ordering for scope comparisons, which is used when aggregating metrics across different scopes.

const (
	MetricScopeInvalid MetricScope = "invalid_scope"

	MetricScopeNode         MetricScope = "node"
	MetricScopeSocket       MetricScope = "socket"
	MetricScopeMemoryDomain MetricScope = "memoryDomain"
	MetricScopeCore         MetricScope = "core"
	MetricScopeHWThread     MetricScope = "hwthread"

	MetricScopeAccelerator MetricScope = "accelerator"
)

func (*MetricScope) LT

func (e *MetricScope) LT(other MetricScope) bool

func (*MetricScope) LTE

func (e *MetricScope) LTE(other MetricScope) bool

func (MetricScope) MarshalGQL

func (e MetricScope) MarshalGQL(w io.Writer)

func (*MetricScope) Max

func (e *MetricScope) Max(other MetricScope) MetricScope

func (*MetricScope) UnmarshalGQL

func (e *MetricScope) UnmarshalGQL(v any) error

func (MetricScope) Valid

func (e MetricScope) Valid() bool

type MetricStatistics

type MetricStatistics struct {
	Avg float64 `json:"avg"` // Average/mean value
	Min float64 `json:"min"` // Minimum value
	Max float64 `json:"max"` // Maximum value
}

MetricStatistics holds statistical summary values for metric data. Provides the common statistical aggregations used throughout ClusterCockpit.

type MetricValue

type MetricValue struct {
	Unit  Unit    `json:"unit"`  // Unit of measurement (e.g., FLOP/s, GB/s)
	Value float64 `json:"value"` // Numeric value of the measurement
}

MetricValue represents a single metric measurement with its associated unit. Used for hardware performance characteristics like FLOP rates and memory bandwidth.

type MonitoringState added in v0.3.0

type MonitoringState string

MonitoringState indicates the health monitoring status of a node. Reflects whether metric collection is working correctly.

const (
	MonitoringStateFull    MonitoringState = "full"    // All metrics being collected successfully
	MonitoringStatePartial MonitoringState = "partial" // Some metrics missing
	MonitoringStateFailed  MonitoringState = "failed"  // Metric collection failing
)

type Node added in v0.3.0

type Node struct {
	Hostname        string            `json:"hostname"`        // Node hostname
	Cluster         string            `json:"cluster"`         // Cluster name
	SubCluster      string            `json:"subCluster"`      // Subcluster name
	MetaData        map[string]string `json:"metaData"`        // Additional metadata
	NodeState       SchedulerState    `json:"nodeState"`       // Scheduler/resource manager state
	HealthState     MonitoringState   `json:"healthState"`     // Monitoring system health
	CpusAllocated   int               `json:"cpusAllocated"`   // Number of allocated CPUs
	MemoryAllocated int               `json:"memoryAllocated"` // Allocated memory in MB
	GpusAllocated   int               `json:"gpusAllocated"`   // Number of allocated GPUs
	JobsRunning     int               `json:"jobsRunning"`     // Number of jobs running on this node
}

Node represents the current state and resource utilization of a compute node.

Combines scheduler state with monitoring health and current resource allocation. Used for displaying node status in dashboards and tracking node utilization.

type NodeDB added in v0.10.0

type NodeDB struct {
	ID          int64  `json:"id" db:"id"`                                // Database ID
	Hostname    string `json:"hostname" db:"hostname" example:"fritz"`    // Node hostname
	Cluster     string `json:"cluster" db:"cluster" example:"fritz"`      // Cluster name
	SubCluster  string `json:"subCluster" db:"subcluster" example:"main"` // Subcluster name
	RawMetaData []byte `json:"-" db:"meta_data"`                          // Metadata as JSON blob
}

NodeDB is the database model for the node table. Stores static node configuration and metadata.

type NodePayload added in v0.10.0

type NodePayload struct {
	Hostname        string   `json:"hostname"`        // Node hostname
	States          []string `json:"states"`          // State strings (flexible format)
	CpusAllocated   int      `json:"cpusAllocated"`   // Number of allocated CPUs
	MemoryAllocated int64    `json:"memoryAllocated"` // Allocated memory in MB
	GpusAllocated   int      `json:"gpusAllocated"`   // Number of allocated GPUs
	JobsRunning     int      `json:"jobsRunning"`     // Number of running jobs
}

NodePayload is the request body format for the node state REST API. Used when updateing node states from external monitoring or scheduler systems.

type NodeStateDB added in v0.10.0

type NodeStateDB struct {
	ID              int64           `json:"id" db:"id"`                                                                                                         // Database ID
	TimeStamp       int64           `json:"timeStamp" db:"time_stamp" example:"1649723812"`                                                                     // Unix timestamp
	NodeState       SchedulerState  `json:"nodeState" db:"node_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"`     // Scheduler state
	HealthState     MonitoringState `json:"healthState" db:"health_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"` // Monitoring health
	CpusAllocated   int             `json:"cpusAllocated" db:"cpus_allocated"`                                                                                  // Allocated CPUs
	MemoryAllocated int64           `json:"memoryAllocated" db:"memory_allocated"`                                                                              // Allocated memory (MB)
	GpusAllocated   int             `json:"gpusAllocated" db:"gpus_allocated"`                                                                                  // Allocated GPUs
	JobsRunning     int             `json:"jobsRunning" db:"jobs_running" example:"12"`                                                                         // Running jobs
	NodeID          int64           `json:"_" db:"node_id"`                                                                                                     // Foreign key to NodeDB
}

NodeStateDB is the database model for the node_state table. Stores time-stamped snapshots of node state and resource allocation.

type Resource

type Resource struct {
	Hostname      string   `json:"hostname"`                // Node hostname
	Configuration string   `json:"configuration,omitempty"` // Optional configuration identifier
	HWThreads     []int    `json:"hwthreads,omitempty"`     // Allocated hardware thread IDs
	Accelerators  []string `json:"accelerators,omitempty"`  // Allocated accelerator IDs (e.g., GPU IDs)
}

Resource represents the hardware resources assigned to a job on a single compute node.

A job typically uses multiple Resource entries, one for each allocated node. HWThreads lists the specific hardware thread IDs allocated, allowing for precise CPU pinning analysis. Accelerators lists assigned GPU/accelerator IDs.

Resource model @Description A resource used by a job

type Role

type Role int

Role defines the authorization level for a user in ClusterCockpit. Roles form a hierarchy with increasing privileges: Anonymous < Api < User < Manager < Support < Admin.

const (
	RoleAnonymous Role = iota // Unauthenticated or guest access
	RoleApi                   // API access (programmatic/service accounts)
	RoleUser                  // Regular user (can view own jobs)
	RoleManager               // Project manager (can view project jobs)
	RoleSupport               // Support staff (can view all jobs, limited admin)
	RoleAdmin                 // Full administrator access
	RoleError                 // Invalid/error role
)

type SchedulerState added in v0.9.0

type SchedulerState string

SchedulerState represents the current state of a node in the HPC job scheduler. States typically reflect SLURM/PBS node states.

const (
	NodeStateAllocated SchedulerState = "allocated" // Node is fully allocated to jobs
	NodeStateReserved  SchedulerState = "reserved"  // Node is reserved but not yet allocated
	NodeStateIdle      SchedulerState = "idle"      // Node is available for jobs
	NodeStateMixed     SchedulerState = "mixed"     // Node is partially allocated
	NodeStateDown      SchedulerState = "down"      // Node is down/offline
	NodeStateUnknown   SchedulerState = "unknown"   // Node state unknown
)

type ScopedJobStats added in v0.3.0

type ScopedJobStats map[string]map[MetricScope][]*ScopedStats

ScopedJobStats maps metric names to statistical summaries organized by scope. Structure: map[metricName]map[scope][]*ScopedStats

Used to store pre-computed statistics without the full time series data, reducing memory footprint when only aggregated values are needed.

type ScopedStats added in v0.3.0

type ScopedStats struct {
	Hostname string            `json:"hostname"`     // Source hostname
	Id       *string           `json:"id,omitempty"` // Optional scope ID
	Data     *MetricStatistics `json:"data"`         // Statistical summary
}

ScopedStats contains statistical summaries for a specific scope (e.g., one node, one socket). Used when full time series data isn't needed, only the aggregated statistics.

type Series

type Series struct {
	Id         *string          `json:"id,omitempty"` // Optional ID (e.g., core ID, GPU ID)
	Hostname   string           `json:"hostname"`     // Source hostname
	Data       []Float          `json:"data"`         // Time series measurements
	Statistics MetricStatistics `json:"statistics"`   // Statistical summary (min/avg/max)
}

Series represents a single time series of metric measurements.

Each series corresponds to one source (e.g., one node, one core) identified by Hostname and optional ID. The Data field contains the time-ordered measurements, and Statistics provides min/avg/max summaries.

func (*Series) MarshalJSON

func (s *Series) MarshalJSON() ([]byte, error)

Only used via REST-API, not via GraphQL. This uses a lot less allocations per series, but it turns out that the performance increase from using this is not that big.

type StatsSeries

type StatsSeries struct {
	Percentiles map[int][]Float `json:"percentiles,omitempty"` // Percentile values over time (e.g., 10th, 50th, 90th)
	Mean        []Float         `json:"mean"`                  // Mean values over time
	Median      []Float         `json:"median"`                // Median values over time
	Min         []Float         `json:"min"`                   // Minimum values over time
	Max         []Float         `json:"max"`                   // Maximum values over time
}

StatsSeries contains aggregated statistics across multiple time series over time.

Instead of storing individual series, this provides statistical summaries at each time step. For example, at time t, Mean[t] is the average value across all series at that time. Percentiles provides specified percentile values at each time step.

type SubCluster

type SubCluster struct {
	Name            string         `json:"name"`                      // Name of the subcluster (e.g., "main", "gpu", "bigmem")
	Nodes           string         `json:"nodes"`                     // Node list in condensed format (e.g., "node[001-100]")
	ProcessorType   string         `json:"processorType"`             // CPU model (e.g., "Intel Xeon Gold 6148")
	Topology        Topology       `json:"topology"`                  // Hardware topology of nodes in this subcluster
	FlopRateScalar  MetricValue    `json:"flopRateScalar"`            // Theoretical scalar FLOP rate per node
	FlopRateSimd    MetricValue    `json:"flopRateSimd"`              // Theoretical SIMD FLOP rate per node
	MemoryBandwidth MetricValue    `json:"memoryBandwidth"`           // Theoretical memory bandwidth per node
	MetricConfig    []MetricConfig `json:"metricConfig,omitempty"`    // Subcluster-specific metric configurations
	Footprint       []string       `json:"footprint,omitempty"`       // Default footprint metrics for jobs
	EnergyFootprint []string       `json:"energyFootprint,omitempty"` // Energy-related footprint metrics
	SocketsPerNode  int            `json:"socketsPerNode"`            // Number of CPU sockets per node
	CoresPerSocket  int            `json:"coresPerSocket"`            // Number of cores per CPU socket
	ThreadsPerCore  int            `json:"threadsPerCore"`            // Number of hardware threads per core (SMT level)
}

SubCluster represents a homogeneous partition of a cluster with identical hardware. A cluster may contain multiple subclusters with different processor types or configurations.

type SubClusterConfig

type SubClusterConfig struct {
	Metric               // Embedded metric thresholds
	Footprint     string `json:"footprint,omitempty"` // Footprint category for this metric
	Energy        string `json:"energy"`              // Energy measurement configuration
	LowerIsBetter bool   `json:"lowerIsBetter"`       // Whether lower values indicate better performance
	Restrict      bool   `json:"restrict"`            // Restrict visibility to non user roles
	Remove        bool   `json:"remove"`              // Whether to exclude this metric for this subcluster
}

SubClusterConfig extends Metric with subcluster-specific metric configuration. Allows overriding metric settings for specific subclusters within a cluster.

type Tag

type Tag struct {
	Type  string `json:"type" db:"tag_type" example:"Debug"`
	Name  string `json:"name" db:"tag_name" example:"Testjob"`
	Scope string `json:"scope" db:"tag_scope" example:"global"`
	ID    int64  `json:"id" db:"id"`
}

Tag model @Description Defines a tag using name and type.

type Topology

type Topology struct {
	Node         []int          `json:"node"`                   // All hardware thread IDs on this node
	Socket       [][]int        `json:"socket"`                 // Hardware threads grouped by socket
	MemoryDomain [][]int        `json:"memoryDomain"`           // Hardware threads grouped by NUMA domain
	Die          [][]*int       `json:"die,omitempty"`          // Hardware threads grouped by die (optional)
	Core         [][]int        `json:"core"`                   // Hardware threads grouped by core
	Accelerators []*Accelerator `json:"accelerators,omitempty"` // Attached accelerators (GPUs, etc.)
}

Topology defines the hardware topology of a compute node, mapping the hierarchical relationships between hardware threads, cores, sockets, memory domains, and accelerators.

The topology is represented as nested arrays where indices represent hardware IDs:

  • Node: Flat list of all hardware thread IDs on the node
  • Socket: Hardware threads grouped by physical CPU socket
  • Core: Hardware threads grouped by physical core
  • MemoryDomain: Hardware threads grouped by NUMA domain
  • Die: Optional grouping by CPU die within sockets
  • Accelerators: List of attached hardware accelerators

func (*Topology) GetAcceleratorID

func (topo *Topology) GetAcceleratorID(id int) (string, error)

GetAcceleratorID converts an integer accelerator index to its string ID. Returns an error if the index is out of range.

func (*Topology) GetAcceleratorIDs

func (topo *Topology) GetAcceleratorIDs() []string

GetAcceleratorIDs returns a list of all accelerator IDs as strings.

func (*Topology) GetAcceleratorIDsAsInt

func (topo *Topology) GetAcceleratorIDsAsInt() ([]int, error)

GetAcceleratorIDsAsInt attempts to convert all accelerator IDs to integers. Returns an error if any accelerator ID is not a valid integer. This method assumes accelerator IDs are numeric strings.

func (*Topology) GetCoresFromHWThreads

func (topo *Topology) GetCoresFromHWThreads(
	hwthreads []int,
) (cores []int, exclusive bool)

GetCoresFromHWThreads returns core IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned cores are present in the input list (i.e., the job has exclusive access to those cores).

func (*Topology) GetMemoryDomainsFromHWThreads

func (topo *Topology) GetMemoryDomainsFromHWThreads(
	hwthreads []int,
) (memDoms []int, exclusive bool)

GetMemoryDomainsFromHWThreads returns memory domain IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned memory domains are present in the input list (i.e., the job has exclusive access to those memory domains).

func (*Topology) GetSocketsFromCores added in v0.3.0

func (topo *Topology) GetSocketsFromCores(
	cores []int,
) (sockets []int, exclusive bool)

GetSocketsFromCores returns socket IDs that contain any of the given cores. The exclusive return value is true if all hardware threads in the returned sockets belong to cores in the input list (i.e., the job has exclusive access to those sockets).

func (*Topology) GetSocketsFromHWThreads

func (topo *Topology) GetSocketsFromHWThreads(
	hwthreads []int,
) (sockets []int, exclusive bool)

GetSocketsFromHWThreads returns socket IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned sockets are present in the input list (i.e., the job has exclusive access to those sockets).

type Unit

type Unit struct {
	Base   string `json:"base"`             // Base unit (e.g., "B/s", "F/s", "W")
	Prefix string `json:"prefix,omitempty"` // SI prefix (e.g., "G", "M", "K", "T")
}

Unit represents a unit of measurement with optional SI prefix.

Examples:

  • {Base: "B/s", Prefix: "G"} = GB/s (gigabytes per second)
  • {Base: "F/s", Prefix: "T"} = TF/s (teraflops per second)
  • {Base: "", Prefix: ""} = dimensionless (e.g., CPU load)

type User

type User struct {
	Username   string     `json:"username"`   // Unique username
	Password   string     `json:"-"`          // Password hash (never serialized to JSON)
	Name       string     `json:"name"`       // Full display name
	Email      string     `json:"email"`      // Email address
	Roles      []string   `json:"roles"`      // Assigned role names
	Projects   []string   `json:"projects"`   // Authorized project/account names
	AuthType   AuthType   `json:"authType"`   // How the user authenticated
	AuthSource AuthSource `json:"authSource"` // Which system authenticated the user
}

User represents a ClusterCockpit user account with authentication and authorization information.

Users are authenticated via various sources (local, LDAP, OIDC) and assigned roles that determine access levels. Projects lists the HPC projects/accounts the user has access to.

func (*User) GetAuthLevel

func (u *User) GetAuthLevel() Role

Find highest role

func (*User) HasAllRoles

func (u *User) HasAllRoles(queryroles []Role) bool

Check if User has ALL of the listed roles

func (*User) HasAnyRole

func (u *User) HasAnyRole(queryroles []Role) bool

Check if User has ANY of the listed roles

func (*User) HasNotRoles

func (u *User) HasNotRoles(queryroles []Role) bool

Check if User has NONE of the listed roles

func (*User) HasProject

func (u *User) HasProject(project string) bool

func (*User) HasRole

func (u *User) HasRole(role Role) bool

Check if User has SPECIFIED role

func (*User) HasValidRole

func (u *User) HasValidRole(role string) (hasRole bool, isValid bool)

Check if User has SPECIFIED role AND role is VALID

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL