schema

package

v1.0.2 Latest Latest Go to latest Published: Dec 18, 2025 License: MIT Imports: 15 Imported by: 2

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ClusterCockpit/cc-lib

Links

Open Source Insights

README ¶

Schema Package

The schema package provides core data structures and types for the ClusterCockpit system, including HPC job metadata, cluster configurations, performance metrics, user authentication, and validation utilities.

Overview

This package defines the fundamental schemas used throughout ClusterCockpit for representing:

Job Data: Complete metadata and metrics for HPC jobs
Cluster Configuration: Hardware topology and metric collection settings
Performance Metrics: Time series data with statistical aggregations
User Management: Authentication and authorization
Validation: JSON schema validation for data integrity

Key Components

Job Structures

`Job`

The central data structure containing all information about an HPC job:

Identification (cluster, job ID, user, project)
Resources (nodes, cores, accelerators, memory)
Timing (submission, start, duration)
State (current job state, monitoring status)
Metrics (performance statistics, time series data)
Metadata (tags, energy footprint)

`JobMetric`

Performance metric data with time series from individual hardware components and aggregated statistics.

`JobState`

Enumeration of job execution states matching common HPC schedulers (SLURM, PBS).

Cluster Configuration

`Cluster`

Complete HPC cluster definition with subclusters and metric collection settings.

`SubCluster`

Homogeneous partition within a cluster sharing identical hardware configuration.

`Topology`

Hardware topology mapping showing relationships between nodes, sockets, cores, hardware threads, and accelerators.

Metrics and Statistics

`MetricScope`

Hierarchical levels for metric measurement:

node > socket > memoryDomain > core > hwthread
(accelerator is a special scope at hwthread level)

`Series`

Time series of metric measurements from a single source (node, core, etc.) with min/avg/max statistics.

`Float`

Custom float64 type with special NaN handling:

NaN values serialize as JSON null
JSON null deserializes to NaN
Avoids pointer overhead for nullable metrics
Compatible with both JSON and GraphQL

User Management

`User`

User account with authentication and authorization information.

`Role`

Authorization hierarchy:

Anonymous < Api < User < Manager < Support < Admin

`AuthSource`

Authentication backends: LocalPassword, LDAP, Token, OIDC

Validation

`Validate(kind Kind, r io.Reader) error`

Validates JSON data against embedded JSON schemas:

Meta: Job metadata structure
Data: Job metric data
ClusterCfg: Cluster configuration

Usage Examples

Validating Cluster Configuration

import (
    "bytes"
    "github.com/ClusterCockpit/cc-lib/schema"
)

// Validate cluster.json against schema
err := schema.Validate(schema.ClusterCfg, bytes.NewReader(clusterJSON))
if err != nil {
    log.Fatal("Invalid cluster configuration:", err)
}

Working with Metrics

// Create job metric data
jobMetric := &schema.JobMetric{
    Unit:     schema.Unit{Base: "FLOP/s", Prefix: "G"},
    Timestep: 60,
    Series: []schema.Series{
        {
            Hostname: "node001",
            Data:     []schema.Float{1.5, 2.0, 1.8, schema.NaN}, // NaN for missing data
            Statistics: schema.MetricStatistics{
                Min: 1.5,
                Avg: 1.77,
                Max: 2.0,
            },
        },
    },
}

// Add aggregated statistics
jobMetric.AddStatisticsSeries()

User Role Checking

user := &schema.User{
    Username: "alice",
    Roles:    []string{"user", "manager"},
}

// Check if user has manager role
if user.HasRole(schema.RoleManager) {
    // Grant project-level access
}

// Check for admin or support
if user.HasAnyRole([]schema.Role{schema.RoleAdmin, schema.RoleSupport}) {
    // Grant elevated privileges
}

// Get sockets used by a job's hardware threads
hwthreads := []int{0, 1, 2, 3, 20, 21, 22, 23}
sockets, exclusive := topology.GetSocketsFromHWThreads(hwthreads)

if exclusive {
    fmt.Printf("Job has exclusive access to sockets: %v\n", sockets)
} else {
    fmt.Printf("Job shares sockets: %v\n", sockets)
}

Database Models

The package includes database models for persistent storage:

NodeDB: Static node configuration
NodeStateDB: Time-stamped node state snapshots
Job: Contains both API fields and raw database fields (Raw*)

Raw database fields (RawResources, RawMetaData, etc.) store JSON blobs that are decoded into typed fields when loaded from the database.

JSON Schema Files

Embedded JSON schemas in schemas/ directory:

cluster.schema.json: Cluster configuration validation
job-meta.schema.json: Job metadata validation
job-data.schema.json: Job metric data validation
job-metric-data.schema.json: Individual metric validation
job-metric-statistics.schema.json: Metric statistics validation
unit.schema.json: Unit of measurement validation

Performance Considerations

Float Type

The custom Float type avoids pointer overhead for nullable metrics. In large-scale metric storage with millions of data points, this provides significant memory savings compared to *float64.

Series Marshaling

The Series.MarshalJSON() method provides optimized JSON serialization with fewer allocations, important for REST API performance when returning large metric datasets.

ccLogger: Logging utilities used for validation warnings
util: Helper functions for statistics (median calculation)
jsonschema/v5: JSON schema validation implementation

Testing

Run tests:

go test ./schema/... -v

Check test coverage:

go test ./schema/... -cover

View godoc:

go doc -all schema

Documentation ¶

Overview ¶

Package schema provides core data structures and types for the ClusterCockpit system.

This package defines the fundamental schemas used throughout ClusterCockpit for representing HPC job metadata, cluster configurations, performance metrics, user authentication, and validation utilities.

Key components:

Job Data Structures:

Job: Complete metadata for HPC jobs including resources, state, and statistics
JobMetric: Performance metrics data with time series and statistics
JobState: Enumeration of possible job states (running, completed, failed, etc.)

Cluster Configuration:

Cluster: HPC cluster definition with subclusters and metric configuration
SubCluster: Partition of a cluster with specific hardware topology
Topology: Hardware topology mapping (nodes, sockets, cores, accelerators)

Metrics and Statistics:

MetricScope: Hierarchical metric scopes (node, socket, core, hwthread, accelerator)
Series: Time series data for metrics with statistics
JobStatistics: Statistical aggregations (min, avg, max) for job metrics

User Management:

User: User account with roles, projects, and authentication information
Role: Authorization levels (admin, support, manager, user, api, anonymous)
AuthSource: Authentication source types (local, LDAP, token, OIDC)

Validation:

Validate: JSON schema validation for job metadata, job data, and cluster configs
Kind: Enumeration of schema types for validation

Special Types:

Float: Custom float64 wrapper that handles NaN as JSON null for efficient metric storage
Node: Node state information including scheduler and monitoring states

The types in this package are designed to be serialized to/from JSON and are used across REST APIs, GraphQL interfaces, and internal data processing pipelines.

Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.

Index ¶

Constants
func ConvertFloatToFloat64(s []Float) []float64
func GetRoleString(roleInt Role) string
func GetValidRoles(user *User) ([]string, error)
func GetValidRolesMap(user *User) (map[string]Role, error)
func IsValidRole(role string) bool
func Validate(k Kind, r io.Reader) error
type Accelerator
type AuthSource
type AuthType
type Cluster
type ClusterSupport
type Float
- func ConvertToFloat(input float64) Float
- func GetFloat64ToFloat(s []float64) []Float
- func (f Float) Double() float64
- func (f Float) IsNaN() bool
- func (f Float) MarshalGQL(w io.Writer)
- func (f Float) MarshalJSON() ([]byte, error)
- func (f *Float) UnmarshalGQL(v any) error
- func (f *Float) UnmarshalJSON(input []byte) error
type FloatArray
type GlobalMetricListItem
type Job
- func (j Job) GoString() string
type JobData
- func (jd *JobData) AddNodeScope(metric string) bool
- func (jd *JobData) RoundMetricStats()
- func (jd *JobData) Size() int
type JobLink
type JobLinkResultList
type JobMetric
- func (jm *JobMetric) AddPercentiles(ps []int) bool
- func (jm *JobMetric) AddStatisticsSeries()
type JobState
- func (e JobState) MarshalGQL(w io.Writer)
- func (e *JobState) UnmarshalGQL(v any) error
- func (e JobState) Valid() bool
type JobStatistics
type Kind
type Metric
type MetricConfig
type MetricScope
- func (e *MetricScope) LT(other MetricScope) bool
- func (e *MetricScope) LTE(other MetricScope) bool
- func (e MetricScope) MarshalGQL(w io.Writer)
- func (e *MetricScope) Max(other MetricScope) MetricScope
- func (e *MetricScope) UnmarshalGQL(v any) error
- func (e MetricScope) Valid() bool
type MetricStatistics
type MetricValue
type MonitoringState
type Node
type NodeDB
type NodePayload
type NodeStateDB
type Resource
type Role
type SchedulerState
type ScopedJobStats
type ScopedStats
type Series
- func (s *Series) MarshalJSON() ([]byte, error)
type StatsSeries
type SubCluster
type SubClusterConfig
type Tag
type Topology
- func (topo *Topology) GetAcceleratorID(id int) (string, error)
- func (topo *Topology) GetAcceleratorIDs() []string
- func (topo *Topology) GetAcceleratorIDsAsInt() ([]int, error)
- func (topo *Topology) GetCoresFromHWThreads(hwthreads []int) (cores []int, exclusive bool)
- func (topo *Topology) GetMemoryDomainsFromHWThreads(hwthreads []int) (memDoms []int, exclusive bool)
- func (topo *Topology) GetSocketsFromCores(cores []int) (sockets []int, exclusive bool)
- func (topo *Topology) GetSocketsFromHWThreads(hwthreads []int) (sockets []int, exclusive bool)
type Unit
type User
- func (u *User) GetAuthLevel() Role
- func (u *User) HasAllRoles(queryroles []Role) bool
- func (u *User) HasAnyRole(queryroles []Role) bool
- func (u *User) HasNotRoles(queryroles []Role) bool
- func (u *User) HasProject(project string) bool
- func (u *User) HasRole(role Role) bool
- func (u *User) HasValidRole(role string) (hasRole bool, isValid bool)

Constants ¶

View Source

const (
	MonitoringStatusDisabled            int32 = 0
	MonitoringStatusRunningOrArchiving  int32 = 1
	MonitoringStatusArchivingFailed     int32 = 2
	MonitoringStatusArchivingSuccessful int32 = 3
)

Variables ¶

This section is empty.

Functions ¶

func ConvertFloatToFloat64 ¶

func ConvertFloatToFloat64(s []Float) []float64

ConvertFloatToFloat64 converts a slice of Float values to a slice of float64 values. NaN values in the Float slice will remain as NaN in the float64 slice.

func GetRoleString ¶

func GetRoleString(roleInt Role) string

func GetValidRoles ¶

func GetValidRoles(user *User) ([]string, error)

Called by API endpoint '/roles/' from frontend: Only required for admin config -> Check Admin Role

func GetValidRolesMap ¶

func GetValidRolesMap(user *User) (map[string]Role, error)

Called by routerConfig web.page setup in backend: Only requires known user

func IsValidRole ¶

func IsValidRole(role string) bool

func Validate ¶

func Validate(k Kind, r io.Reader) error

Validate validates JSON data against an embedded JSON schema.

The kind parameter determines which schema is used:

Meta: Validates job metadata structure
Data: Validates job performance metric data
ClusterCfg: Validates cluster configuration

The reader should contain JSON-encoded data to validate. Returns nil if validation succeeds, or an error describing validation failures.

Example:

err := schema.Validate(schema.ClusterCfg, bytes.NewReader(clusterJSON))

Types ¶

type Accelerator ¶

type Accelerator struct {
	ID    string `json:"id"`    // Unique identifier for the accelerator (e.g., "0", "1", "GPU-0")
	Type  string `json:"type"`  // Type of accelerator (e.g., "Nvidia GPU", "AMD GPU")
	Model string `json:"model"` // Specific model name (e.g., "A100", "MI100")
}

Accelerator represents a hardware accelerator (e.g., GPU, FPGA) attached to a compute node. Each accelerator has a unique identifier and type/model information.

type AuthSource ¶

type AuthSource int

AuthSource identifies the authentication backend that validated a user.

const (
	AuthViaLocalPassword AuthSource = iota // Local database password authentication
	AuthViaLDAP                            // LDAP directory authentication
	AuthViaToken                           // JWT or API token authentication
	AuthViaOIDC                            // OpenID Connect authentication
	AuthViaAll                             // Accepts any auth source (special case)
)

type AuthType ¶

type AuthType int

AuthType distinguishes between different authentication contexts.

const (
	AuthToken   AuthType = iota // API token-based authentication
	AuthSession                 // Session cookie-based authentication
)

type Cluster ¶

type Cluster struct {
	Name         string          `json:"name"`         // Unique cluster name (e.g., "fritz", "alex")
	MetricConfig []*MetricConfig `json:"metricConfig"` // Cluster-wide metric configurations
	SubClusters  []*SubCluster   `json:"subClusters"`  // Homogeneous partitions within the cluster
}

Cluster represents a complete HPC cluster configuration. A cluster consists of one or more subclusters and defines metric collection/evaluation settings.

type ClusterSupport ¶

type ClusterSupport struct {
	Cluster     string   `json:"cluster"`     // Cluster name
	SubClusters []string `json:"subclusters"` // List of subcluster names supporting this metric
}

ClusterSupport indicates which subclusters within a cluster support a particular metric. Used to track metric availability across heterogeneous clusters.

type Float ¶

type Float float64

Float is a custom float64 type with special handling for NaN values in JSON and GraphQL serialization.

Standard Go encoding/json treats NaN as an error, but in metric data it's common to have missing or invalid measurements that should be represented as null in JSON. This type allows NaN values to be serialized as JSON null and vice versa, while avoiding the memory overhead of using *float64 pointers for every nullable metric value.

Key behaviors:

NaN values marshal to JSON null
JSON null unmarshals to NaN
Regular float values marshal/unmarshal normally
GraphQL marshaling follows the same null handling

This is particularly important for time series metric data where missing data points are common and need efficient representation.

var (
	NaN Float = Float(math.NaN())
)

func ConvertToFloat ¶ added in v0.8.0

func ConvertToFloat(input float64) Float

ConvertToFloat converts a regular float64 to a Float, treating -1.0 as a sentinel for NaN. This is useful when reading from systems that use -1.0 to indicate missing data.

func GetFloat64ToFloat ¶

func GetFloat64ToFloat(s []float64) []Float

GetFloat64ToFloat converts a slice of float64 values to a slice of Float values. This is the inverse operation of ConvertFloatToFloat64.

func (Float) Double ¶ added in v0.8.0

func (f Float) Double() float64

func (Float) IsNaN ¶

func (f Float) IsNaN() bool

func (Float) MarshalGQL ¶

func (f Float) MarshalGQL(w io.Writer)

MarshalGQL implements the graphql.Marshaler interface. NaN will be serialized to `null`.

func (Float) MarshalJSON ¶

func (f Float) MarshalJSON() ([]byte, error)

NaN will be serialized to `null`.

func (*Float) UnmarshalGQL ¶

func (f *Float) UnmarshalGQL(v any) error

UnmarshalGQL implements the graphql.Unmarshaler interface.

func (*Float) UnmarshalJSON ¶

func (f *Float) UnmarshalJSON(input []byte) error

`null` will be unserialized to NaN.

type FloatArray ¶ added in v0.8.0

type FloatArray []Float

FloatArray is an alias for []Float that can be marshaled to JSON more efficiently. This type exists to provide optimized JSON marshaling for arrays of Float values.

type GlobalMetricListItem ¶

type GlobalMetricListItem struct {
	Name         string      `json:"name"`                // Metric name
	Unit         Unit        `json:"unit"`                // Unit of measurement
	Scope        MetricScope `json:"scope"`               // Metric scope level
	Footprint    string      `json:"footprint,omitempty"` // Footprint category
	Restrict     bool
	Availability []ClusterSupport `json:"availability"` // Where this metric is available
}

GlobalMetricListItem represents a metric in the global metric catalog. Tracks which clusters and subclusters support this metric across the entire system.

type Job ¶

type Job struct {
	Cluster            string                   `json:"cluster" db:"cluster" example:"fritz"`
	SubCluster         string                   `json:"subCluster" db:"subcluster" example:"main"`
	Partition          string                   `json:"partition,omitempty" db:"cluster_partition" example:"main"`
	Project            string                   `json:"project" db:"project" example:"abcd200"`
	User               string                   `json:"user" db:"hpc_user" example:"abcd100h"`
	Shared             string                   `json:"shared" db:"shared" enums:"none,single_user,multi_user"`
	State              JobState                 `` /* 172-byte string literal not displayed */
	Tags               []*Tag                   `json:"tags,omitempty"`
	RawEnergyFootprint []byte                   `json:"-" db:"energy_footprint"`
	RawFootprint       []byte                   `json:"-" db:"footprint"`
	RawMetaData        []byte                   `json:"-" db:"meta_data"`
	RawResources       []byte                   `json:"-" db:"resources"`
	Resources          []*Resource              `json:"resources"`
	EnergyFootprint    map[string]float64       `json:"energyFootprint"`
	Footprint          map[string]float64       `json:"footprint"`
	MetaData           map[string]string        `json:"metaData"`
	ConcurrentJobs     JobLinkResultList        `json:"concurrentJobs"`
	Energy             float64                  `json:"energy" db:"energy"`
	ArrayJobID         int64                    `json:"arrayJobId,omitempty" db:"array_job_id" example:"123000"`
	Walltime           int64                    `json:"walltime,omitempty" db:"walltime" example:"86400" minimum:"1"`
	RequestedMemory    int64                    `json:"requestedMemory,omitempty" db:"requested_memory" example:"128000" minimum:"1"` // in MB
	JobID              int64                    `json:"jobId" db:"job_id" example:"123000"`
	Duration           int32                    `json:"duration" db:"duration" example:"43200" minimum:"1"`
	SMT                int32                    `json:"smt,omitempty" db:"smt" example:"4"`
	MonitoringStatus   int32                    `json:"monitoringStatus,omitempty" db:"monitoring_status" example:"1" minimum:"0" maximum:"3"`
	NumAcc             int32                    `json:"numAcc,omitempty" db:"num_acc" example:"2" minimum:"1"`
	NumHWThreads       int32                    `json:"numHwthreads,omitempty" db:"num_hwthreads" example:"20" minimum:"1"`
	NumNodes           int32                    `json:"numNodes" db:"num_nodes" example:"2" minimum:"1"`
	Statistics         map[string]JobStatistics `json:"statistics"`
	ID                 *int64                   `json:"id,omitempty" db:"id"`
	SubmitTime         int64                    `json:"submitTime,omitempty" db:"submit_time" example:"1649723812"`
	StartTime          int64                    `json:"startTime" db:"start_time" example:"1649723812"`
}

Job represents complete metadata for an HPC job in ClusterCockpit.

This is the central data structure containing all information about a job including: - Identification: cluster, job ID, user, project - Resources: nodes, cores, accelerators, memory - Timing: submission, start time, duration - State: current job state and monitoring status - Metrics: performance statistics and time series data - Metadata: tags, energy footprint, custom metadata

The RawX fields are used for database serialization of complex nested structures that are stored as JSON blobs in the database and decoded into their respective typed fields (Resources, EnergyFootprint, Footprint, MetaData) when loaded.

Job model @Description Information of a HPC job.

func (Job) GoString ¶ added in v0.3.0

func (j Job) GoString() string

type JobData ¶

type JobData map[string]map[MetricScope]*JobMetric

JobData maps metric names to their data organized by scope. Structure: map[metricName]map[scope]*JobMetric

For example: jobData["cpu_load"]MetricScopeNode contains node-level CPU load data. This structure allows efficient lookup of metrics at different hierarchical levels.

func (*JobData) AddNodeScope ¶

func (jd *JobData) AddNodeScope(metric string) bool

func (*JobData) RoundMetricStats ¶ added in v0.3.0

func (jd *JobData) RoundMetricStats()

func (*JobData) Size ¶

func (jd *JobData) Size() int

type JobLink ¶

type JobLink struct {
	ID    int64 `json:"id"`    // Internal database ID
	JobID int64 `json:"jobId"` // The job's external job ID
}

JobLink represents a lightweight reference to a job, typically used for linking related jobs. Used to track concurrent jobs or job relationships without including full job metadata.

type JobLinkResultList ¶

type JobLinkResultList struct {
	Items []*JobLink `json:"items"` // List of job links
	Count int        `json:"count"` // Total count of available items
}

JobLinkResultList holds a paginated list of job links with a total count. Typically used for API responses that return lists of related jobs.

type JobMetric ¶

type JobMetric struct {
	StatisticsSeries *StatsSeries `json:"statisticsSeries,omitempty"` // Aggregated statistics over time
	Unit             Unit         `json:"unit"`                       // Unit of measurement
	Series           []Series     `json:"series"`                     // Individual time series data
	Timestep         int          `json:"timestep"`                   // Sampling interval in seconds
}

JobMetric contains time series data and statistics for a single metric.

The Series field holds time series data from individual nodes/hardware components, while StatisticsSeries provides aggregated statistics across all series over time.

func (*JobMetric) AddPercentiles ¶

func (jm *JobMetric) AddPercentiles(ps []int) bool

func (*JobMetric) AddStatisticsSeries ¶

func (jm *JobMetric) AddStatisticsSeries()

type JobState ¶

type JobState string

JobState represents the execution state of an HPC job. Valid states match common HPC scheduler states (SLURM, PBS, etc.).

const (
	JobStateBootFail    JobState = "boot_fail"
	JobStateCancelled   JobState = "cancelled"
	JobStateCompleted   JobState = "completed"
	JobStateDeadline    JobState = "deadline"
	JobStateFailed      JobState = "failed"
	JobStateNodeFail    JobState = "node_fail"
	JobStateOutOfMemory JobState = "out_of_memory"
	JobStatePending     JobState = "pending"
	JobStatePreempted   JobState = "preempted"
	JobStateRunning     JobState = "running"
	JobStateSuspended   JobState = "suspended"
	JobStateTimeout     JobState = "timeout"
)

func (JobState) MarshalGQL ¶

func (e JobState) MarshalGQL(w io.Writer)

func (*JobState) UnmarshalGQL ¶

func (e *JobState) UnmarshalGQL(v any) error

func (JobState) Valid ¶

func (e JobState) Valid() bool

type JobStatistics ¶

type JobStatistics struct {
	Unit Unit    `json:"unit"`
	Avg  float64 `json:"avg" example:"2500" minimum:"0"` // Job metric average
	Min  float64 `json:"min" example:"2000" minimum:"0"` // Job metric minimum
	Max  float64 `json:"max" example:"3000" minimum:"0"` // Job metric maximum
}

JobStatistics model @Description Specification for job metric statistics.

type Kind ¶

type Kind int

Kind identifies which JSON schema to use for validation. Each kind corresponds to a different embedded schema file.

const (
	Meta       Kind = iota + 1 // Job metadata schema (job-meta.schema.json)
	Data                       // Job metric data schema (job-data.schema.json)
	ClusterCfg                 // Cluster configuration schema (cluster.schema.json)
)

type Metric ¶ added in v0.3.0

type Metric struct {
	Name    string  `json:"name"`    // Metric name (e.g., "cpu_load", "mem_used")
	Unit    Unit    `json:"unit"`    // Unit of measurement
	Peak    float64 `json:"peak"`    // Peak/maximum expected value (best performance)
	Normal  float64 `json:"normal"`  // Normal/typical value (good performance)
	Caution float64 `json:"caution"` // Caution threshold (concerning but not critical)
	Alert   float64 `json:"alert"`   // Alert threshold (requires attention)
}

Metric defines thresholds for a performance metric used in job classification and alerts. Thresholds help categorize job performance: peak (excellent), normal (good), caution (concerning), alert (problem).

type MetricConfig ¶

type MetricConfig struct {
	Metric                            // Embedded metric thresholds
	Energy        string              `json:"energy"`                // Energy measurement method
	Scope         MetricScope         `json:"scope"`                 // Metric scope (node, socket, core, etc.)
	Aggregation   string              `json:"aggregation"`           // Aggregation function (avg, sum, min, max)
	Footprint     string              `json:"footprint,omitempty"`   // Footprint category
	SubClusters   []*SubClusterConfig `json:"subClusters,omitempty"` // Subcluster-specific overrides
	Timestep      int                 `json:"timestep"`              // Measurement interval in seconds
	Restrict      bool                `json:"restrict"`              // Restrict visibility to non user roles
	LowerIsBetter bool                `json:"lowerIsBetter"`         // Whether lower values are better
}

MetricConfig defines the configuration for a performance metric at the cluster level. Specifies how the metric is collected, aggregated, and evaluated across the cluster.

type MetricScope ¶

type MetricScope string

MetricScope defines the hierarchical level at which a metric is measured.

Scopes form a hierarchy from coarse-grained (node) to fine-grained (hwthread/accelerator):

node > socket > memoryDomain > core > hwthread
accelerator is a special scope at the same level as hwthread

The scopePrecedence map defines numeric ordering for scope comparisons, which is used when aggregating metrics across different scopes.

const (
	MetricScopeInvalid MetricScope = "invalid_scope"

	MetricScopeNode         MetricScope = "node"
	MetricScopeSocket       MetricScope = "socket"
	MetricScopeMemoryDomain MetricScope = "memoryDomain"
	MetricScopeCore         MetricScope = "core"
	MetricScopeHWThread     MetricScope = "hwthread"

	MetricScopeAccelerator MetricScope = "accelerator"
)

func (*MetricScope) LT ¶

func (e *MetricScope) LT(other MetricScope) bool

func (*MetricScope) LTE ¶

func (e *MetricScope) LTE(other MetricScope) bool

func (MetricScope) MarshalGQL ¶

func (e MetricScope) MarshalGQL(w io.Writer)

func (*MetricScope) Max ¶

func (e *MetricScope) Max(other MetricScope) MetricScope

func (*MetricScope) UnmarshalGQL ¶

func (e *MetricScope) UnmarshalGQL(v any) error

func (MetricScope) Valid ¶

func (e MetricScope) Valid() bool

type MetricStatistics ¶

type MetricStatistics struct {
	Avg float64 `json:"avg"` // Average/mean value
	Min float64 `json:"min"` // Minimum value
	Max float64 `json:"max"` // Maximum value
}

MetricStatistics holds statistical summary values for metric data. Provides the common statistical aggregations used throughout ClusterCockpit.

type MetricValue ¶

type MetricValue struct {
	Unit  Unit    `json:"unit"`  // Unit of measurement (e.g., FLOP/s, GB/s)
	Value float64 `json:"value"` // Numeric value of the measurement
}

MetricValue represents a single metric measurement with its associated unit. Used for hardware performance characteristics like FLOP rates and memory bandwidth.

type MonitoringState ¶ added in v0.3.0

type MonitoringState string

MonitoringState indicates the health monitoring status of a node. Reflects whether metric collection is working correctly.

const (
	MonitoringStateFull    MonitoringState = "full"    // All metrics being collected successfully
	MonitoringStatePartial MonitoringState = "partial" // Some metrics missing
	MonitoringStateFailed  MonitoringState = "failed"  // Metric collection failing
)

type Node ¶ added in v0.3.0

type Node struct {
	Hostname        string            `json:"hostname"`        // Node hostname
	Cluster         string            `json:"cluster"`         // Cluster name
	SubCluster      string            `json:"subCluster"`      // Subcluster name
	MetaData        map[string]string `json:"metaData"`        // Additional metadata
	NodeState       SchedulerState    `json:"nodeState"`       // Scheduler/resource manager state
	HealthState     MonitoringState   `json:"healthState"`     // Monitoring system health
	CpusAllocated   int               `json:"cpusAllocated"`   // Number of allocated CPUs
	MemoryAllocated int               `json:"memoryAllocated"` // Allocated memory in MB
	GpusAllocated   int               `json:"gpusAllocated"`   // Number of allocated GPUs
	JobsRunning     int               `json:"jobsRunning"`     // Number of jobs running on this node
}

Node represents the current state and resource utilization of a compute node.

Combines scheduler state with monitoring health and current resource allocation. Used for displaying node status in dashboards and tracking node utilization.

type NodeDB ¶ added in v0.10.0

type NodeDB struct {
	ID          int64  `json:"id" db:"id"`                                // Database ID
	Hostname    string `json:"hostname" db:"hostname" example:"fritz"`    // Node hostname
	Cluster     string `json:"cluster" db:"cluster" example:"fritz"`      // Cluster name
	SubCluster  string `json:"subCluster" db:"subcluster" example:"main"` // Subcluster name
	RawMetaData []byte `json:"-" db:"meta_data"`                          // Metadata as JSON blob
}

NodeDB is the database model for the node table. Stores static node configuration and metadata.

type NodePayload ¶ added in v0.10.0

type NodePayload struct {
	Hostname        string   `json:"hostname"`        // Node hostname
	States          []string `json:"states"`          // State strings (flexible format)
	CpusAllocated   int      `json:"cpusAllocated"`   // Number of allocated CPUs
	MemoryAllocated int64    `json:"memoryAllocated"` // Allocated memory in MB
	GpusAllocated   int      `json:"gpusAllocated"`   // Number of allocated GPUs
	JobsRunning     int      `json:"jobsRunning"`     // Number of running jobs
}

NodePayload is the request body format for the node state REST API. Used when updateing node states from external monitoring or scheduler systems.

type NodeStateDB ¶ added in v0.10.0

type NodeStateDB struct {
	ID              int64           `json:"id" db:"id"`                                                                                                         // Database ID
	TimeStamp       int64           `json:"timeStamp" db:"time_stamp" example:"1649723812"`                                                                     // Unix timestamp
	NodeState       SchedulerState  `json:"nodeState" db:"node_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"`     // Scheduler state
	HealthState     MonitoringState `json:"healthState" db:"health_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"` // Monitoring health
	CpusAllocated   int             `json:"cpusAllocated" db:"cpus_allocated"`                                                                                  // Allocated CPUs
	MemoryAllocated int64           `json:"memoryAllocated" db:"memory_allocated"`                                                                              // Allocated memory (MB)
	GpusAllocated   int             `json:"gpusAllocated" db:"gpus_allocated"`                                                                                  // Allocated GPUs
	JobsRunning     int             `json:"jobsRunning" db:"jobs_running" example:"12"`                                                                         // Running jobs
	NodeID          int64           `json:"_" db:"node_id"`                                                                                                     // Foreign key to NodeDB
}

NodeStateDB is the database model for the node_state table. Stores time-stamped snapshots of node state and resource allocation.

type Resource ¶

type Resource struct {
	Hostname      string   `json:"hostname"`                // Node hostname
	Configuration string   `json:"configuration,omitempty"` // Optional configuration identifier
	HWThreads     []int    `json:"hwthreads,omitempty"`     // Allocated hardware thread IDs
	Accelerators  []string `json:"accelerators,omitempty"`  // Allocated accelerator IDs (e.g., GPU IDs)
}

Resource represents the hardware resources assigned to a job on a single compute node.

A job typically uses multiple Resource entries, one for each allocated node. HWThreads lists the specific hardware thread IDs allocated, allowing for precise CPU pinning analysis. Accelerators lists assigned GPU/accelerator IDs.

Resource model @Description A resource used by a job

type Role ¶

type Role int

Role defines the authorization level for a user in ClusterCockpit. Roles form a hierarchy with increasing privileges: Anonymous < Api < User < Manager < Support < Admin.

const (
	RoleAnonymous Role = iota // Unauthenticated or guest access
	RoleApi                   // API access (programmatic/service accounts)
	RoleUser                  // Regular user (can view own jobs)
	RoleManager               // Project manager (can view project jobs)
	RoleSupport               // Support staff (can view all jobs, limited admin)
	RoleAdmin                 // Full administrator access
	RoleError                 // Invalid/error role
)

type SchedulerState ¶ added in v0.9.0

type SchedulerState string

SchedulerState represents the current state of a node in the HPC job scheduler. States typically reflect SLURM/PBS node states.

const (
	NodeStateAllocated SchedulerState = "allocated" // Node is fully allocated to jobs
	NodeStateReserved  SchedulerState = "reserved"  // Node is reserved but not yet allocated
	NodeStateIdle      SchedulerState = "idle"      // Node is available for jobs
	NodeStateMixed     SchedulerState = "mixed"     // Node is partially allocated
	NodeStateDown      SchedulerState = "down"      // Node is down/offline
	NodeStateUnknown   SchedulerState = "unknown"   // Node state unknown
)

type ScopedJobStats ¶ added in v0.3.0

type ScopedJobStats map[string]map[MetricScope][]*ScopedStats

ScopedJobStats maps metric names to statistical summaries organized by scope. Structure: map[metricName]map[scope][]*ScopedStats

Used to store pre-computed statistics without the full time series data, reducing memory footprint when only aggregated values are needed.

type ScopedStats ¶ added in v0.3.0

type ScopedStats struct {
	Hostname string            `json:"hostname"`     // Source hostname
	Id       *string           `json:"id,omitempty"` // Optional scope ID
	Data     *MetricStatistics `json:"data"`         // Statistical summary
}

ScopedStats contains statistical summaries for a specific scope (e.g., one node, one socket). Used when full time series data isn't needed, only the aggregated statistics.

type Series ¶

type Series struct {
	Id         *string          `json:"id,omitempty"` // Optional ID (e.g., core ID, GPU ID)
	Hostname   string           `json:"hostname"`     // Source hostname
	Data       []Float          `json:"data"`         // Time series measurements
	Statistics MetricStatistics `json:"statistics"`   // Statistical summary (min/avg/max)
}

Series represents a single time series of metric measurements.

Each series corresponds to one source (e.g., one node, one core) identified by Hostname and optional ID. The Data field contains the time-ordered measurements, and Statistics provides min/avg/max summaries.

func (*Series) MarshalJSON ¶

func (s *Series) MarshalJSON() ([]byte, error)

Only used via REST-API, not via GraphQL. This uses a lot less allocations per series, but it turns out that the performance increase from using this is not that big.

type StatsSeries ¶

type StatsSeries struct {
	Percentiles map[int][]Float `json:"percentiles,omitempty"` // Percentile values over time (e.g., 10th, 50th, 90th)
	Mean        []Float         `json:"mean"`                  // Mean values over time
	Median      []Float         `json:"median"`                // Median values over time
	Min         []Float         `json:"min"`                   // Minimum values over time
	Max         []Float         `json:"max"`                   // Maximum values over time
}

StatsSeries contains aggregated statistics across multiple time series over time.

Instead of storing individual series, this provides statistical summaries at each time step. For example, at time t, Mean[t] is the average value across all series at that time. Percentiles provides specified percentile values at each time step.

type SubCluster ¶

type SubCluster struct {
	Name            string         `json:"name"`                      // Name of the subcluster (e.g., "main", "gpu", "bigmem")
	Nodes           string         `json:"nodes"`                     // Node list in condensed format (e.g., "node[001-100]")
	ProcessorType   string         `json:"processorType"`             // CPU model (e.g., "Intel Xeon Gold 6148")
	Topology        Topology       `json:"topology"`                  // Hardware topology of nodes in this subcluster
	FlopRateScalar  MetricValue    `json:"flopRateScalar"`            // Theoretical scalar FLOP rate per node
	FlopRateSimd    MetricValue    `json:"flopRateSimd"`              // Theoretical SIMD FLOP rate per node
	MemoryBandwidth MetricValue    `json:"memoryBandwidth"`           // Theoretical memory bandwidth per node
	MetricConfig    []MetricConfig `json:"metricConfig,omitempty"`    // Subcluster-specific metric configurations
	Footprint       []string       `json:"footprint,omitempty"`       // Default footprint metrics for jobs
	EnergyFootprint []string       `json:"energyFootprint,omitempty"` // Energy-related footprint metrics
	SocketsPerNode  int            `json:"socketsPerNode"`            // Number of CPU sockets per node
	CoresPerSocket  int            `json:"coresPerSocket"`            // Number of cores per CPU socket
	ThreadsPerCore  int            `json:"threadsPerCore"`            // Number of hardware threads per core (SMT level)
}

SubCluster represents a homogeneous partition of a cluster with identical hardware. A cluster may contain multiple subclusters with different processor types or configurations.

type SubClusterConfig ¶

type SubClusterConfig struct {
	Metric               // Embedded metric thresholds
	Footprint     string `json:"footprint,omitempty"` // Footprint category for this metric
	Energy        string `json:"energy"`              // Energy measurement configuration
	LowerIsBetter bool   `json:"lowerIsBetter"`       // Whether lower values indicate better performance
	Restrict      bool   `json:"restrict"`            // Restrict visibility to non user roles
	Remove        bool   `json:"remove"`              // Whether to exclude this metric for this subcluster
}

SubClusterConfig extends Metric with subcluster-specific metric configuration. Allows overriding metric settings for specific subclusters within a cluster.

type Tag ¶

type Tag struct {
	Type  string `json:"type" db:"tag_type" example:"Debug"`
	Name  string `json:"name" db:"tag_name" example:"Testjob"`
	Scope string `json:"scope" db:"tag_scope" example:"global"`
	ID    int64  `json:"id" db:"id"`
}

Tag model @Description Defines a tag using name and type.

type Topology ¶

type Topology struct {
	Node         []int          `json:"node"`                   // All hardware thread IDs on this node
	Socket       [][]int        `json:"socket"`                 // Hardware threads grouped by socket
	MemoryDomain [][]int        `json:"memoryDomain"`           // Hardware threads grouped by NUMA domain
	Die          [][]*int       `json:"die,omitempty"`          // Hardware threads grouped by die (optional)
	Core         [][]int        `json:"core"`                   // Hardware threads grouped by core
	Accelerators []*Accelerator `json:"accelerators,omitempty"` // Attached accelerators (GPUs, etc.)
}

Topology defines the hardware topology of a compute node, mapping the hierarchical relationships between hardware threads, cores, sockets, memory domains, and accelerators.

The topology is represented as nested arrays where indices represent hardware IDs:

Node: Flat list of all hardware thread IDs on the node
Socket: Hardware threads grouped by physical CPU socket
Core: Hardware threads grouped by physical core
MemoryDomain: Hardware threads grouped by NUMA domain
Die: Optional grouping by CPU die within sockets
Accelerators: List of attached hardware accelerators

func (*Topology) GetAcceleratorID ¶

func (topo *Topology) GetAcceleratorID(id int) (string, error)

GetAcceleratorID converts an integer accelerator index to its string ID. Returns an error if the index is out of range.

func (*Topology) GetAcceleratorIDs ¶

func (topo *Topology) GetAcceleratorIDs() []string

GetAcceleratorIDs returns a list of all accelerator IDs as strings.

func (*Topology) GetAcceleratorIDsAsInt ¶

func (topo *Topology) GetAcceleratorIDsAsInt() ([]int, error)

GetAcceleratorIDsAsInt attempts to convert all accelerator IDs to integers. Returns an error if any accelerator ID is not a valid integer. This method assumes accelerator IDs are numeric strings.

func (*Topology) GetCoresFromHWThreads ¶

func (topo *Topology) GetCoresFromHWThreads(
	hwthreads []int,
) (cores []int, exclusive bool)

GetCoresFromHWThreads returns core IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned cores are present in the input list (i.e., the job has exclusive access to those cores).

func (*Topology) GetMemoryDomainsFromHWThreads ¶

func (topo *Topology) GetMemoryDomainsFromHWThreads(
	hwthreads []int,
) (memDoms []int, exclusive bool)

GetMemoryDomainsFromHWThreads returns memory domain IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned memory domains are present in the input list (i.e., the job has exclusive access to those memory domains).

func (*Topology) GetSocketsFromCores ¶ added in v0.3.0

func (topo *Topology) GetSocketsFromCores(
	cores []int,
) (sockets []int, exclusive bool)

GetSocketsFromCores returns socket IDs that contain any of the given cores. The exclusive return value is true if all hardware threads in the returned sockets belong to cores in the input list (i.e., the job has exclusive access to those sockets).

func (*Topology) GetSocketsFromHWThreads ¶

func (topo *Topology) GetSocketsFromHWThreads(
	hwthreads []int,
) (sockets []int, exclusive bool)

GetSocketsFromHWThreads returns socket IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned sockets are present in the input list (i.e., the job has exclusive access to those sockets).

type Unit ¶

type Unit struct {
	Base   string `json:"base"`             // Base unit (e.g., "B/s", "F/s", "W")
	Prefix string `json:"prefix,omitempty"` // SI prefix (e.g., "G", "M", "K", "T")
}

Unit represents a unit of measurement with optional SI prefix.

Examples:

{Base: "B/s", Prefix: "G"} = GB/s (gigabytes per second)
{Base: "F/s", Prefix: "T"} = TF/s (teraflops per second)
{Base: "", Prefix: ""} = dimensionless (e.g., CPU load)

type User ¶

type User struct {
	Username   string     `json:"username"`   // Unique username
	Password   string     `json:"-"`          // Password hash (never serialized to JSON)
	Name       string     `json:"name"`       // Full display name
	Email      string     `json:"email"`      // Email address
	Roles      []string   `json:"roles"`      // Assigned role names
	Projects   []string   `json:"projects"`   // Authorized project/account names
	AuthType   AuthType   `json:"authType"`   // How the user authenticated
	AuthSource AuthSource `json:"authSource"` // Which system authenticated the user
}

User represents a ClusterCockpit user account with authentication and authorization information.

Users are authenticated via various sources (local, LDAP, OIDC) and assigned roles that determine access levels. Projects lists the HPC projects/accounts the user has access to.

func (*User) GetAuthLevel ¶

func (u *User) GetAuthLevel() Role

Find highest role

func (*User) HasAllRoles ¶

func (u *User) HasAllRoles(queryroles []Role) bool

Check if User has ALL of the listed roles

func (*User) HasAnyRole ¶

func (u *User) HasAnyRole(queryroles []Role) bool

Check if User has ANY of the listed roles

func (*User) HasNotRoles ¶

func (u *User) HasNotRoles(queryroles []Role) bool

Check if User has NONE of the listed roles

func (*User) HasProject ¶

func (u *User) HasProject(project string) bool

func (*User) HasRole ¶

func (u *User) HasRole(role Role) bool

Check if User has SPECIFIED role

func (*User) HasValidRole ¶

func (u *User) HasValidRole(role string) (hasRole bool, isValid bool)

Check if User has SPECIFIED role AND role is VALID

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL