Documentation
¶
Overview ¶
Package schema provides core data structures and types for the ClusterCockpit system.
This package defines the fundamental schemas used throughout ClusterCockpit for representing HPC job metadata, cluster configurations, performance metrics, user authentication, and validation utilities.
Key components:
Job Data Structures:
- Job: Complete metadata for HPC jobs including resources, state, and statistics
- JobMetric: Performance metrics data with time series and statistics
- JobState: Enumeration of possible job states (running, completed, failed, etc.)
Cluster Configuration:
- Cluster: HPC cluster definition with subclusters and metric configuration
- SubCluster: Partition of a cluster with specific hardware topology
- Topology: Hardware topology mapping (nodes, sockets, cores, accelerators)
Metrics and Statistics:
- MetricScope: Hierarchical metric scopes (node, socket, core, hwthread, accelerator)
- Series: Time series data for metrics with statistics
- JobStatistics: Statistical aggregations (min, avg, max) for job metrics
User Management:
- User: User account with roles, projects, and authentication information
- Role: Authorization levels (admin, support, manager, user, api, anonymous)
- AuthSource: Authentication source types (local, LDAP, token, OIDC)
Validation:
- Validate: JSON schema validation for job metadata, job data, and cluster configs
- Kind: Enumeration of schema types for validation
Special Types:
- Float: Custom float64 wrapper that handles NaN as JSON null for efficient metric storage
- Node: Node state information including scheduler and monitoring states
The types in this package are designed to be serialized to/from JSON and are used across REST APIs, GraphQL interfaces, and internal data processing pipelines.
Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.
Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.
Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.
Copyright (C) NHR@FAU, University Erlangen-Nuremberg. All rights reserved. This file is part of cc-lib. Use of this source code is governed by a MIT-style license that can be found in the LICENSE file.
Index ¶
- Constants
- func ConvertFloatToFloat64(s []Float) []float64
- func GetRoleString(roleInt Role) string
- func GetValidRoles(user *User) ([]string, error)
- func GetValidRolesMap(user *User) (map[string]Role, error)
- func IsValidRole(role string) bool
- func Validate(k Kind, r io.Reader) error
- type Accelerator
- type AuthSource
- type AuthType
- type Cluster
- type ClusterSupport
- type Float
- type FloatArray
- type GlobalMetricListItem
- type Job
- type JobData
- type JobLink
- type JobLinkResultList
- type JobMetric
- type JobState
- type JobStatistics
- type Kind
- type Metric
- type MetricConfig
- type MetricScope
- type MetricStatistics
- type MetricValue
- type MonitoringState
- type Node
- type NodeDB
- type NodePayload
- type NodeStateDB
- type Resource
- type Role
- type SchedulerState
- type ScopedJobStats
- type ScopedStats
- type Series
- type StatsSeries
- type SubCluster
- type SubClusterConfig
- type Tag
- type Topology
- func (topo *Topology) GetAcceleratorID(id int) (string, error)
- func (topo *Topology) GetAcceleratorIDs() []string
- func (topo *Topology) GetAcceleratorIDsAsInt() ([]int, error)
- func (topo *Topology) GetCoresFromHWThreads(hwthreads []int) (cores []int, exclusive bool)
- func (topo *Topology) GetMemoryDomainsFromHWThreads(hwthreads []int) (memDoms []int, exclusive bool)
- func (topo *Topology) GetSocketsFromCores(cores []int) (sockets []int, exclusive bool)
- func (topo *Topology) GetSocketsFromHWThreads(hwthreads []int) (sockets []int, exclusive bool)
- type Unit
- type User
- func (u *User) GetAuthLevel() Role
- func (u *User) HasAllRoles(queryroles []Role) bool
- func (u *User) HasAnyRole(queryroles []Role) bool
- func (u *User) HasNotRoles(queryroles []Role) bool
- func (u *User) HasProject(project string) bool
- func (u *User) HasRole(role Role) bool
- func (u *User) HasValidRole(role string) (hasRole bool, isValid bool)
Constants ¶
const ( MonitoringStatusDisabled int32 = 0 MonitoringStatusRunningOrArchiving int32 = 1 MonitoringStatusArchivingFailed int32 = 2 MonitoringStatusArchivingSuccessful int32 = 3 )
Variables ¶
This section is empty.
Functions ¶
func ConvertFloatToFloat64 ¶
ConvertFloatToFloat64 converts a slice of Float values to a slice of float64 values. NaN values in the Float slice will remain as NaN in the float64 slice.
func GetRoleString ¶
func GetValidRoles ¶
Called by API endpoint '/roles/' from frontend: Only required for admin config -> Check Admin Role
func GetValidRolesMap ¶
Called by routerConfig web.page setup in backend: Only requires known user
func IsValidRole ¶
func Validate ¶
Validate validates JSON data against an embedded JSON schema.
The kind parameter determines which schema is used:
- Meta: Validates job metadata structure
- Data: Validates job performance metric data
- ClusterCfg: Validates cluster configuration
The reader should contain JSON-encoded data to validate. Returns nil if validation succeeds, or an error describing validation failures.
Example:
err := schema.Validate(schema.ClusterCfg, bytes.NewReader(clusterJSON))
Types ¶
type Accelerator ¶
type Accelerator struct {
ID string `json:"id"` // Unique identifier for the accelerator (e.g., "0", "1", "GPU-0")
Type string `json:"type"` // Type of accelerator (e.g., "Nvidia GPU", "AMD GPU")
Model string `json:"model"` // Specific model name (e.g., "A100", "MI100")
}
Accelerator represents a hardware accelerator (e.g., GPU, FPGA) attached to a compute node. Each accelerator has a unique identifier and type/model information.
type AuthSource ¶
type AuthSource int
AuthSource identifies the authentication backend that validated a user.
const ( AuthViaLocalPassword AuthSource = iota // Local database password authentication AuthViaLDAP // LDAP directory authentication AuthViaToken // JWT or API token authentication AuthViaOIDC // OpenID Connect authentication AuthViaAll // Accepts any auth source (special case) )
type Cluster ¶
type Cluster struct {
Name string `json:"name"` // Unique cluster name (e.g., "fritz", "alex")
MetricConfig []*MetricConfig `json:"metricConfig"` // Cluster-wide metric configurations
SubClusters []*SubCluster `json:"subClusters"` // Homogeneous partitions within the cluster
}
Cluster represents a complete HPC cluster configuration. A cluster consists of one or more subclusters and defines metric collection/evaluation settings.
type ClusterSupport ¶
type ClusterSupport struct {
Cluster string `json:"cluster"` // Cluster name
SubClusters []string `json:"subclusters"` // List of subcluster names supporting this metric
}
ClusterSupport indicates which subclusters within a cluster support a particular metric. Used to track metric availability across heterogeneous clusters.
type Float ¶
type Float float64
Float is a custom float64 type with special handling for NaN values in JSON and GraphQL serialization.
Standard Go encoding/json treats NaN as an error, but in metric data it's common to have missing or invalid measurements that should be represented as null in JSON. This type allows NaN values to be serialized as JSON null and vice versa, while avoiding the memory overhead of using *float64 pointers for every nullable metric value.
Key behaviors:
- NaN values marshal to JSON null
- JSON null unmarshals to NaN
- Regular float values marshal/unmarshal normally
- GraphQL marshaling follows the same null handling
This is particularly important for time series metric data where missing data points are common and need efficient representation.
func ConvertToFloat ¶ added in v0.8.0
ConvertToFloat converts a regular float64 to a Float, treating -1.0 as a sentinel for NaN. This is useful when reading from systems that use -1.0 to indicate missing data.
func GetFloat64ToFloat ¶
GetFloat64ToFloat converts a slice of float64 values to a slice of Float values. This is the inverse operation of ConvertFloatToFloat64.
func (Float) MarshalGQL ¶
MarshalGQL implements the graphql.Marshaler interface. NaN will be serialized to `null`.
func (Float) MarshalJSON ¶
NaN will be serialized to `null`.
func (*Float) UnmarshalGQL ¶
UnmarshalGQL implements the graphql.Unmarshaler interface.
func (*Float) UnmarshalJSON ¶
`null` will be unserialized to NaN.
type FloatArray ¶ added in v0.8.0
type FloatArray []Float
FloatArray is an alias for []Float that can be marshaled to JSON more efficiently. This type exists to provide optimized JSON marshaling for arrays of Float values.
type GlobalMetricListItem ¶
type GlobalMetricListItem struct {
Name string `json:"name"` // Metric name
Unit Unit `json:"unit"` // Unit of measurement
Scope MetricScope `json:"scope"` // Metric scope level
Footprint string `json:"footprint,omitempty"` // Footprint category
Restrict bool
Availability []ClusterSupport `json:"availability"` // Where this metric is available
}
GlobalMetricListItem represents a metric in the global metric catalog. Tracks which clusters and subclusters support this metric across the entire system.
type Job ¶
type Job struct {
Cluster string `json:"cluster" db:"cluster" example:"fritz"`
SubCluster string `json:"subCluster" db:"subcluster" example:"main"`
Partition string `json:"partition,omitempty" db:"cluster_partition" example:"main"`
Project string `json:"project" db:"project" example:"abcd200"`
User string `json:"user" db:"hpc_user" example:"abcd100h"`
State JobState `` /* 172-byte string literal not displayed */
Tags []*Tag `json:"tags,omitempty"`
RawEnergyFootprint []byte `json:"-" db:"energy_footprint"`
RawFootprint []byte `json:"-" db:"footprint"`
RawMetaData []byte `json:"-" db:"meta_data"`
RawResources []byte `json:"-" db:"resources"`
Resources []*Resource `json:"resources"`
EnergyFootprint map[string]float64 `json:"energyFootprint"`
Footprint map[string]float64 `json:"footprint"`
MetaData map[string]string `json:"metaData"`
ConcurrentJobs JobLinkResultList `json:"concurrentJobs"`
Energy float64 `json:"energy" db:"energy"`
ArrayJobID int64 `json:"arrayJobId,omitempty" db:"array_job_id" example:"123000"`
Walltime int64 `json:"walltime,omitempty" db:"walltime" example:"86400" minimum:"1"`
RequestedMemory int64 `json:"requestedMemory,omitempty" db:"requested_memory" example:"128000" minimum:"1"` // in MB
JobID int64 `json:"jobId" db:"job_id" example:"123000"`
Duration int32 `json:"duration" db:"duration" example:"43200" minimum:"1"`
SMT int32 `json:"smt,omitempty" db:"smt" example:"4"`
MonitoringStatus int32 `json:"monitoringStatus,omitempty" db:"monitoring_status" example:"1" minimum:"0" maximum:"3"`
NumAcc int32 `json:"numAcc,omitempty" db:"num_acc" example:"2" minimum:"1"`
NumHWThreads int32 `json:"numHwthreads,omitempty" db:"num_hwthreads" example:"20" minimum:"1"`
NumNodes int32 `json:"numNodes" db:"num_nodes" example:"2" minimum:"1"`
Statistics map[string]JobStatistics `json:"statistics"`
ID *int64 `json:"id,omitempty" db:"id"`
SubmitTime int64 `json:"submitTime,omitempty" db:"submit_time" example:"1649723812"`
StartTime int64 `json:"startTime" db:"start_time" example:"1649723812"`
}
Job represents complete metadata for an HPC job in ClusterCockpit.
This is the central data structure containing all information about a job including: - Identification: cluster, job ID, user, project - Resources: nodes, cores, accelerators, memory - Timing: submission, start time, duration - State: current job state and monitoring status - Metrics: performance statistics and time series data - Metadata: tags, energy footprint, custom metadata
The RawX fields are used for database serialization of complex nested structures that are stored as JSON blobs in the database and decoded into their respective typed fields (Resources, EnergyFootprint, Footprint, MetaData) when loaded.
Job model @Description Information of a HPC job.
type JobData ¶
type JobData map[string]map[MetricScope]*JobMetric
JobData maps metric names to their data organized by scope. Structure: map[metricName]map[scope]*JobMetric
For example: jobData["cpu_load"]MetricScopeNode contains node-level CPU load data. This structure allows efficient lookup of metrics at different hierarchical levels.
func (*JobData) AddNodeScope ¶
func (*JobData) RoundMetricStats ¶ added in v0.3.0
func (jd *JobData) RoundMetricStats()
type JobLink ¶
type JobLink struct {
ID int64 `json:"id"` // Internal database ID
JobID int64 `json:"jobId"` // The job's external job ID
}
JobLink represents a lightweight reference to a job, typically used for linking related jobs. Used to track concurrent jobs or job relationships without including full job metadata.
type JobLinkResultList ¶
type JobLinkResultList struct {
Items []*JobLink `json:"items"` // List of job links
Count int `json:"count"` // Total count of available items
}
JobLinkResultList holds a paginated list of job links with a total count. Typically used for API responses that return lists of related jobs.
type JobMetric ¶
type JobMetric struct {
StatisticsSeries *StatsSeries `json:"statisticsSeries,omitempty"` // Aggregated statistics over time
Unit Unit `json:"unit"` // Unit of measurement
Series []Series `json:"series"` // Individual time series data
Timestep int `json:"timestep"` // Sampling interval in seconds
}
JobMetric contains time series data and statistics for a single metric.
The Series field holds time series data from individual nodes/hardware components, while StatisticsSeries provides aggregated statistics across all series over time.
func (*JobMetric) AddPercentiles ¶
func (*JobMetric) AddStatisticsSeries ¶
func (jm *JobMetric) AddStatisticsSeries()
type JobState ¶
type JobState string
JobState represents the execution state of an HPC job. Valid states match common HPC scheduler states (SLURM, PBS, etc.).
const ( JobStateBootFail JobState = "boot_fail" JobStateCancelled JobState = "cancelled" JobStateCompleted JobState = "completed" JobStateDeadline JobState = "deadline" JobStateFailed JobState = "failed" JobStateNodeFail JobState = "node_fail" JobStateOutOfMemory JobState = "out_of_memory" JobStatePending JobState = "pending" JobStatePreempted JobState = "preempted" JobStateRunning JobState = "running" JobStateSuspended JobState = "suspended" JobStateTimeout JobState = "timeout" )
func (JobState) MarshalGQL ¶
func (*JobState) UnmarshalGQL ¶
type JobStatistics ¶
type JobStatistics struct {
Unit Unit `json:"unit"`
Avg float64 `json:"avg" example:"2500" minimum:"0"` // Job metric average
Min float64 `json:"min" example:"2000" minimum:"0"` // Job metric minimum
Max float64 `json:"max" example:"3000" minimum:"0"` // Job metric maximum
}
JobStatistics model @Description Specification for job metric statistics.
type Kind ¶
type Kind int
Kind identifies which JSON schema to use for validation. Each kind corresponds to a different embedded schema file.
type Metric ¶ added in v0.3.0
type Metric struct {
Name string `json:"name"` // Metric name (e.g., "cpu_load", "mem_used")
Unit Unit `json:"unit"` // Unit of measurement
Peak float64 `json:"peak"` // Peak/maximum expected value (best performance)
Normal float64 `json:"normal"` // Normal/typical value (good performance)
Caution float64 `json:"caution"` // Caution threshold (concerning but not critical)
Alert float64 `json:"alert"` // Alert threshold (requires attention)
}
Metric defines thresholds for a performance metric used in job classification and alerts. Thresholds help categorize job performance: peak (excellent), normal (good), caution (concerning), alert (problem).
type MetricConfig ¶
type MetricConfig struct {
Metric // Embedded metric thresholds
Energy string `json:"energy"` // Energy measurement method
Scope MetricScope `json:"scope"` // Metric scope (node, socket, core, etc.)
Aggregation string `json:"aggregation"` // Aggregation function (avg, sum, min, max)
Footprint string `json:"footprint,omitempty"` // Footprint category
SubClusters []*SubClusterConfig `json:"subClusters,omitempty"` // Subcluster-specific overrides
Timestep int `json:"timestep"` // Measurement interval in seconds
Restrict bool `json:"restrict"` // Restrict visibility to non user roles
LowerIsBetter bool `json:"lowerIsBetter"` // Whether lower values are better
}
MetricConfig defines the configuration for a performance metric at the cluster level. Specifies how the metric is collected, aggregated, and evaluated across the cluster.
type MetricScope ¶
type MetricScope string
MetricScope defines the hierarchical level at which a metric is measured.
Scopes form a hierarchy from coarse-grained (node) to fine-grained (hwthread/accelerator):
node > socket > memoryDomain > core > hwthread accelerator is a special scope at the same level as hwthread
The scopePrecedence map defines numeric ordering for scope comparisons, which is used when aggregating metrics across different scopes.
const ( MetricScopeInvalid MetricScope = "invalid_scope" MetricScopeNode MetricScope = "node" MetricScopeSocket MetricScope = "socket" MetricScopeMemoryDomain MetricScope = "memoryDomain" MetricScopeCore MetricScope = "core" MetricScopeHWThread MetricScope = "hwthread" MetricScopeAccelerator MetricScope = "accelerator" )
func (*MetricScope) LT ¶
func (e *MetricScope) LT(other MetricScope) bool
func (*MetricScope) LTE ¶
func (e *MetricScope) LTE(other MetricScope) bool
func (MetricScope) MarshalGQL ¶
func (e MetricScope) MarshalGQL(w io.Writer)
func (*MetricScope) Max ¶
func (e *MetricScope) Max(other MetricScope) MetricScope
func (*MetricScope) UnmarshalGQL ¶
func (e *MetricScope) UnmarshalGQL(v any) error
func (MetricScope) Valid ¶
func (e MetricScope) Valid() bool
type MetricStatistics ¶
type MetricStatistics struct {
Avg float64 `json:"avg"` // Average/mean value
Min float64 `json:"min"` // Minimum value
Max float64 `json:"max"` // Maximum value
}
MetricStatistics holds statistical summary values for metric data. Provides the common statistical aggregations used throughout ClusterCockpit.
type MetricValue ¶
type MetricValue struct {
Unit Unit `json:"unit"` // Unit of measurement (e.g., FLOP/s, GB/s)
Value float64 `json:"value"` // Numeric value of the measurement
}
MetricValue represents a single metric measurement with its associated unit. Used for hardware performance characteristics like FLOP rates and memory bandwidth.
type MonitoringState ¶ added in v0.3.0
type MonitoringState string
MonitoringState indicates the health monitoring status of a node. Reflects whether metric collection is working correctly.
const ( MonitoringStateFull MonitoringState = "full" // All metrics being collected successfully MonitoringStatePartial MonitoringState = "partial" // Some metrics missing MonitoringStateFailed MonitoringState = "failed" // Metric collection failing )
type Node ¶ added in v0.3.0
type Node struct {
Hostname string `json:"hostname"` // Node hostname
Cluster string `json:"cluster"` // Cluster name
SubCluster string `json:"subCluster"` // Subcluster name
MetaData map[string]string `json:"metaData"` // Additional metadata
NodeState SchedulerState `json:"nodeState"` // Scheduler/resource manager state
HealthState MonitoringState `json:"healthState"` // Monitoring system health
CpusAllocated int `json:"cpusAllocated"` // Number of allocated CPUs
MemoryAllocated int `json:"memoryAllocated"` // Allocated memory in MB
GpusAllocated int `json:"gpusAllocated"` // Number of allocated GPUs
JobsRunning int `json:"jobsRunning"` // Number of jobs running on this node
}
Node represents the current state and resource utilization of a compute node.
Combines scheduler state with monitoring health and current resource allocation. Used for displaying node status in dashboards and tracking node utilization.
type NodeDB ¶ added in v0.10.0
type NodeDB struct {
ID int64 `json:"id" db:"id"` // Database ID
Hostname string `json:"hostname" db:"hostname" example:"fritz"` // Node hostname
Cluster string `json:"cluster" db:"cluster" example:"fritz"` // Cluster name
SubCluster string `json:"subCluster" db:"subcluster" example:"main"` // Subcluster name
RawMetaData []byte `json:"-" db:"meta_data"` // Metadata as JSON blob
}
NodeDB is the database model for the node table. Stores static node configuration and metadata.
type NodePayload ¶ added in v0.10.0
type NodePayload struct {
Hostname string `json:"hostname"` // Node hostname
States []string `json:"states"` // State strings (flexible format)
CpusAllocated int `json:"cpusAllocated"` // Number of allocated CPUs
MemoryAllocated int64 `json:"memoryAllocated"` // Allocated memory in MB
GpusAllocated int `json:"gpusAllocated"` // Number of allocated GPUs
JobsRunning int `json:"jobsRunning"` // Number of running jobs
}
NodePayload is the request body format for the node state REST API. Used when updateing node states from external monitoring or scheduler systems.
type NodeStateDB ¶ added in v0.10.0
type NodeStateDB struct {
ID int64 `json:"id" db:"id"` // Database ID
TimeStamp int64 `json:"timeStamp" db:"time_stamp" example:"1649723812"` // Unix timestamp
NodeState SchedulerState `json:"nodeState" db:"node_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"` // Scheduler state
HealthState MonitoringState `json:"healthState" db:"health_state" example:"completed" enums:"completed,failed,cancelled,stopped,timeout,out_of_memory"` // Monitoring health
CpusAllocated int `json:"cpusAllocated" db:"cpus_allocated"` // Allocated CPUs
MemoryAllocated int64 `json:"memoryAllocated" db:"memory_allocated"` // Allocated memory (MB)
GpusAllocated int `json:"gpusAllocated" db:"gpus_allocated"` // Allocated GPUs
JobsRunning int `json:"jobsRunning" db:"jobs_running" example:"12"` // Running jobs
NodeID int64 `json:"_" db:"node_id"` // Foreign key to NodeDB
}
NodeStateDB is the database model for the node_state table. Stores time-stamped snapshots of node state and resource allocation.
type Resource ¶
type Resource struct {
Hostname string `json:"hostname"` // Node hostname
Configuration string `json:"configuration,omitempty"` // Optional configuration identifier
HWThreads []int `json:"hwthreads,omitempty"` // Allocated hardware thread IDs
Accelerators []string `json:"accelerators,omitempty"` // Allocated accelerator IDs (e.g., GPU IDs)
}
Resource represents the hardware resources assigned to a job on a single compute node.
A job typically uses multiple Resource entries, one for each allocated node. HWThreads lists the specific hardware thread IDs allocated, allowing for precise CPU pinning analysis. Accelerators lists assigned GPU/accelerator IDs.
Resource model @Description A resource used by a job
type Role ¶
type Role int
Role defines the authorization level for a user in ClusterCockpit. Roles form a hierarchy with increasing privileges: Anonymous < Api < User < Manager < Support < Admin.
const ( RoleAnonymous Role = iota // Unauthenticated or guest access RoleApi // API access (programmatic/service accounts) RoleUser // Regular user (can view own jobs) RoleManager // Project manager (can view project jobs) RoleSupport // Support staff (can view all jobs, limited admin) RoleAdmin // Full administrator access RoleError // Invalid/error role )
type SchedulerState ¶ added in v0.9.0
type SchedulerState string
SchedulerState represents the current state of a node in the HPC job scheduler. States typically reflect SLURM/PBS node states.
const ( NodeStateAllocated SchedulerState = "allocated" // Node is fully allocated to jobs NodeStateReserved SchedulerState = "reserved" // Node is reserved but not yet allocated NodeStateIdle SchedulerState = "idle" // Node is available for jobs NodeStateMixed SchedulerState = "mixed" // Node is partially allocated NodeStateDown SchedulerState = "down" // Node is down/offline NodeStateUnknown SchedulerState = "unknown" // Node state unknown )
type ScopedJobStats ¶ added in v0.3.0
type ScopedJobStats map[string]map[MetricScope][]*ScopedStats
ScopedJobStats maps metric names to statistical summaries organized by scope. Structure: map[metricName]map[scope][]*ScopedStats
Used to store pre-computed statistics without the full time series data, reducing memory footprint when only aggregated values are needed.
type ScopedStats ¶ added in v0.3.0
type ScopedStats struct {
Hostname string `json:"hostname"` // Source hostname
Id *string `json:"id,omitempty"` // Optional scope ID
Data *MetricStatistics `json:"data"` // Statistical summary
}
ScopedStats contains statistical summaries for a specific scope (e.g., one node, one socket). Used when full time series data isn't needed, only the aggregated statistics.
type Series ¶
type Series struct {
Id *string `json:"id,omitempty"` // Optional ID (e.g., core ID, GPU ID)
Hostname string `json:"hostname"` // Source hostname
Data []Float `json:"data"` // Time series measurements
Statistics MetricStatistics `json:"statistics"` // Statistical summary (min/avg/max)
}
Series represents a single time series of metric measurements.
Each series corresponds to one source (e.g., one node, one core) identified by Hostname and optional ID. The Data field contains the time-ordered measurements, and Statistics provides min/avg/max summaries.
func (*Series) MarshalJSON ¶
Only used via REST-API, not via GraphQL. This uses a lot less allocations per series, but it turns out that the performance increase from using this is not that big.
type StatsSeries ¶
type StatsSeries struct {
Percentiles map[int][]Float `json:"percentiles,omitempty"` // Percentile values over time (e.g., 10th, 50th, 90th)
Mean []Float `json:"mean"` // Mean values over time
Median []Float `json:"median"` // Median values over time
Min []Float `json:"min"` // Minimum values over time
Max []Float `json:"max"` // Maximum values over time
}
StatsSeries contains aggregated statistics across multiple time series over time.
Instead of storing individual series, this provides statistical summaries at each time step. For example, at time t, Mean[t] is the average value across all series at that time. Percentiles provides specified percentile values at each time step.
type SubCluster ¶
type SubCluster struct {
Name string `json:"name"` // Name of the subcluster (e.g., "main", "gpu", "bigmem")
Nodes string `json:"nodes"` // Node list in condensed format (e.g., "node[001-100]")
ProcessorType string `json:"processorType"` // CPU model (e.g., "Intel Xeon Gold 6148")
Topology Topology `json:"topology"` // Hardware topology of nodes in this subcluster
FlopRateScalar MetricValue `json:"flopRateScalar"` // Theoretical scalar FLOP rate per node
FlopRateSimd MetricValue `json:"flopRateSimd"` // Theoretical SIMD FLOP rate per node
MemoryBandwidth MetricValue `json:"memoryBandwidth"` // Theoretical memory bandwidth per node
MetricConfig []MetricConfig `json:"metricConfig,omitempty"` // Subcluster-specific metric configurations
Footprint []string `json:"footprint,omitempty"` // Default footprint metrics for jobs
EnergyFootprint []string `json:"energyFootprint,omitempty"` // Energy-related footprint metrics
SocketsPerNode int `json:"socketsPerNode"` // Number of CPU sockets per node
CoresPerSocket int `json:"coresPerSocket"` // Number of cores per CPU socket
ThreadsPerCore int `json:"threadsPerCore"` // Number of hardware threads per core (SMT level)
}
SubCluster represents a homogeneous partition of a cluster with identical hardware. A cluster may contain multiple subclusters with different processor types or configurations.
type SubClusterConfig ¶
type SubClusterConfig struct {
Metric // Embedded metric thresholds
Footprint string `json:"footprint,omitempty"` // Footprint category for this metric
Energy string `json:"energy"` // Energy measurement configuration
LowerIsBetter bool `json:"lowerIsBetter"` // Whether lower values indicate better performance
Restrict bool `json:"restrict"` // Restrict visibility to non user roles
Remove bool `json:"remove"` // Whether to exclude this metric for this subcluster
}
SubClusterConfig extends Metric with subcluster-specific metric configuration. Allows overriding metric settings for specific subclusters within a cluster.
type Tag ¶
type Tag struct {
Type string `json:"type" db:"tag_type" example:"Debug"`
Name string `json:"name" db:"tag_name" example:"Testjob"`
Scope string `json:"scope" db:"tag_scope" example:"global"`
ID int64 `json:"id" db:"id"`
}
Tag model @Description Defines a tag using name and type.
type Topology ¶
type Topology struct {
Node []int `json:"node"` // All hardware thread IDs on this node
Socket [][]int `json:"socket"` // Hardware threads grouped by socket
MemoryDomain [][]int `json:"memoryDomain"` // Hardware threads grouped by NUMA domain
Die [][]*int `json:"die,omitempty"` // Hardware threads grouped by die (optional)
Core [][]int `json:"core"` // Hardware threads grouped by core
Accelerators []*Accelerator `json:"accelerators,omitempty"` // Attached accelerators (GPUs, etc.)
}
Topology defines the hardware topology of a compute node, mapping the hierarchical relationships between hardware threads, cores, sockets, memory domains, and accelerators.
The topology is represented as nested arrays where indices represent hardware IDs:
- Node: Flat list of all hardware thread IDs on the node
- Socket: Hardware threads grouped by physical CPU socket
- Core: Hardware threads grouped by physical core
- MemoryDomain: Hardware threads grouped by NUMA domain
- Die: Optional grouping by CPU die within sockets
- Accelerators: List of attached hardware accelerators
func (*Topology) GetAcceleratorID ¶
GetAcceleratorID converts an integer accelerator index to its string ID. Returns an error if the index is out of range.
func (*Topology) GetAcceleratorIDs ¶
GetAcceleratorIDs returns a list of all accelerator IDs as strings.
func (*Topology) GetAcceleratorIDsAsInt ¶
GetAcceleratorIDsAsInt attempts to convert all accelerator IDs to integers. Returns an error if any accelerator ID is not a valid integer. This method assumes accelerator IDs are numeric strings.
func (*Topology) GetCoresFromHWThreads ¶
GetCoresFromHWThreads returns core IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned cores are present in the input list (i.e., the job has exclusive access to those cores).
func (*Topology) GetMemoryDomainsFromHWThreads ¶
func (topo *Topology) GetMemoryDomainsFromHWThreads( hwthreads []int, ) (memDoms []int, exclusive bool)
GetMemoryDomainsFromHWThreads returns memory domain IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned memory domains are present in the input list (i.e., the job has exclusive access to those memory domains).
func (*Topology) GetSocketsFromCores ¶ added in v0.3.0
GetSocketsFromCores returns socket IDs that contain any of the given cores. The exclusive return value is true if all hardware threads in the returned sockets belong to cores in the input list (i.e., the job has exclusive access to those sockets).
func (*Topology) GetSocketsFromHWThreads ¶
GetSocketsFromHWThreads returns socket IDs that contain any of the given hardware threads. The exclusive return value is true if all hardware threads in the returned sockets are present in the input list (i.e., the job has exclusive access to those sockets).
type Unit ¶
type Unit struct {
Base string `json:"base"` // Base unit (e.g., "B/s", "F/s", "W")
Prefix string `json:"prefix,omitempty"` // SI prefix (e.g., "G", "M", "K", "T")
}
Unit represents a unit of measurement with optional SI prefix.
Examples:
- {Base: "B/s", Prefix: "G"} = GB/s (gigabytes per second)
- {Base: "F/s", Prefix: "T"} = TF/s (teraflops per second)
- {Base: "", Prefix: ""} = dimensionless (e.g., CPU load)
type User ¶
type User struct {
Username string `json:"username"` // Unique username
Password string `json:"-"` // Password hash (never serialized to JSON)
Name string `json:"name"` // Full display name
Email string `json:"email"` // Email address
Roles []string `json:"roles"` // Assigned role names
Projects []string `json:"projects"` // Authorized project/account names
AuthType AuthType `json:"authType"` // How the user authenticated
AuthSource AuthSource `json:"authSource"` // Which system authenticated the user
}
User represents a ClusterCockpit user account with authentication and authorization information.
Users are authenticated via various sources (local, LDAP, OIDC) and assigned roles that determine access levels. Projects lists the HPC projects/accounts the user has access to.
func (*User) HasAllRoles ¶
Check if User has ALL of the listed roles
func (*User) HasAnyRole ¶
Check if User has ANY of the listed roles
func (*User) HasNotRoles ¶
Check if User has NONE of the listed roles