Documentation
¶
Overview ¶
Package metricstoreclient - Query Building
This file contains the query construction and scope transformation logic for cc-metric-store queries. It handles the complex mapping between requested metric scopes and native hardware topology, automatically aggregating or filtering metrics as needed.
Scope Transformations ¶
The buildScopeQueries function implements the core scope transformation algorithm. It handles 25+ different transformation cases, mapping between:
- Accelerator (GPU) scope
- HWThread (hardware thread/SMT) scope
- Core (CPU core) scope
- Socket (CPU package) scope
- MemoryDomain (NUMA domain) scope
- Node (full system) scope
Transformations follow these rules:
- Same scope: Return data as-is (e.g., Core → Core)
- Coarser scope: Aggregate data (e.g., Core → Socket with Aggregate=true)
- Finer scope: Error - cannot increase granularity
Query Building ¶
buildQueries and buildNodeQueries are the main entry points, handling job-specific and node-specific query construction respectively. They:
- Validate metric configurations
- Handle subcluster-specific metric filtering
- Detect and skip duplicate scope requests
- Call buildScopeQueries for each metric/scope/host combination
Package metricstoreclient provides a client for querying the cc-metric-store time series database.
The cc-metric-store is a high-performance time series database optimized for HPC metric data. This client handles HTTP communication, query construction, scope transformations, and data retrieval for job and node metrics across different metric scopes (node, socket, core, hwthread, accelerator).
Architecture ¶
The package is split into two main components:
- Client Operations (cc-metric-store.go): HTTP client, request handling, data loading methods
- Query Building (cc-metric-store-queries.go): Query construction and scope transformation logic
Basic Usage ¶
store := NewCCMetricStore("http://localhost:8080", "jwt-token")
// Load job data
jobData, err := store.LoadData(job, metrics, scopes, ctx, resolution)
if err != nil {
log.Fatal(err)
}
Metric Scopes ¶
The client supports hierarchical metric scopes that map to HPC hardware topology:
- MetricScopeAccelerator: GPU/accelerator level metrics
- MetricScopeHWThread: Hardware thread (SMT) level metrics
- MetricScopeCore: CPU core level metrics
- MetricScopeSocket: CPU socket level metrics
- MetricScopeMemoryDomain: NUMA domain level metrics
- MetricScopeNode: Full node level metrics
The client automatically handles scope transformations, aggregating finer-grained metrics to coarser scopes when needed (e.g., aggregating core metrics to socket level).
Error Handling ¶
The client supports partial errors - if some queries fail, it returns both the successful data and an error listing the failed queries. This allows processing partial results when some nodes or metrics are temporarily unavailable.
API Versioning ¶
The client uses cc-metric-store API v2, which includes support for:
- Data resampling for bandwidth optimization
- Multi-scope queries in a single request
- Aggregation across hardware topology levels
Index ¶
- type APIMetricData
- type APIQuery
- type APIQueryRequest
- type APIQueryResponse
- type CCMetricStore
- func (ccms *CCMetricStore) HealthCheck(cluster string, nodes []string, metrics []string) (map[string]ms.HealthCheckResult, error)
- func (ccms *CCMetricStore) LoadData(job *schema.Job, metrics []string, scopes []schema.MetricScope, ...) (schema.JobData, error)
- func (ccms *CCMetricStore) LoadNodeData(cluster string, metrics, nodes []string, scopes []schema.MetricScope, ...) (map[string]map[string][]*schema.JobMetric, error)
- func (ccms *CCMetricStore) LoadNodeListData(cluster, subCluster string, nodes []string, metrics []string, ...) (map[string]schema.JobData, error)
- func (ccms *CCMetricStore) LoadScopedStats(job *schema.Job, metrics []string, scopes []schema.MetricScope, ...) (schema.ScopedJobStats, error)
- func (ccms *CCMetricStore) LoadStats(job *schema.Job, metrics []string, ctx context.Context) (map[string]map[string]schema.MetricStatistics, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type APIMetricData ¶
type APIMetricData struct {
Error *string `json:"error"` // Error message if query failed
Data []schema.Float `json:"data"` // Time series data points
From int64 `json:"from"` // Actual start time of data
To int64 `json:"to"` // Actual end time of data
Resolution int `json:"resolution"` // Actual resolution of data in seconds
Avg schema.Float `json:"avg"` // Average value across time range
Min schema.Float `json:"min"` // Minimum value in time range
Max schema.Float `json:"max"` // Maximum value in time range
}
APIMetricData represents time series data and statistics for a single metric series. Error is set if this particular series failed to load.
type APIQuery ¶
type APIQuery struct {
Type *string `json:"type,omitempty"` // Scope type (e.g., "core", "socket")
SubType *string `json:"subtype,omitempty"` // Sub-scope type (reserved for future use)
Metric string `json:"metric"` // Metric name
Hostname string `json:"host"` // Target hostname
Resolution int `json:"resolution"` // Data resolution in seconds (0 = native)
TypeIds []string `json:"type-ids,omitempty"` // IDs for the scope type (e.g., core IDs)
SubTypeIds []string `json:"subtype-ids,omitempty"` // IDs for sub-scope (reserved)
Aggregate bool `json:"aggreg"` // Aggregate across TypeIds
}
APIQuery specifies a single metric query with optional scope filtering. Type and TypeIds define the hardware scope (core, socket, accelerator, etc.).
type APIQueryRequest ¶
type APIQueryRequest struct {
Cluster string `json:"cluster"` // Target cluster name
Queries []APIQuery `json:"queries"` // Explicit list of metric queries
ForAllNodes []string `json:"for-all-nodes"` // Metrics to query for all nodes
From int64 `json:"from"` // Start time (Unix timestamp)
To int64 `json:"to"` // End time (Unix timestamp)
WithStats bool `json:"with-stats"` // Include min/avg/max statistics
WithData bool `json:"with-data"` // Include time series data points
}
APIQueryRequest represents a request to the cc-metric-store query API. It supports both explicit queries and "for-all-nodes" bulk queries.
type APIQueryResponse ¶
type APIQueryResponse struct {
Queries []APIQuery `json:"queries,omitempty"` // Echoed queries (for bulk requests)
Results [][]APIMetricData `json:"results"` // Result data, indexed by query
}
APIQueryResponse contains the results from a cc-metric-store query. Results align with the Queries slice by index.
type CCMetricStore ¶
type CCMetricStore struct {
// contains filtered or unexported fields
}
CCMetricStore is the HTTP client for communicating with cc-metric-store. It manages connection details, authentication, and provides methods for querying metrics.
func NewCCMetricStore ¶
func NewCCMetricStore(url string, token string) *CCMetricStore
NewCCMetricStore creates and initializes a new (external) CCMetricStore client. The url parameter should include the protocol and port (e.g., "http://localhost:8080"). The token parameter is a JWT used for Bearer authentication; pass empty string if auth is disabled.
func (*CCMetricStore) HealthCheck ¶
func (ccms *CCMetricStore) HealthCheck(cluster string, nodes []string, metrics []string, ) (map[string]ms.HealthCheckResult, error)
HealthCheck queries the external cc-metric-store's health check endpoint. It sends a HealthCheckReq as the request body to /api/healthcheck and returns the per-node health check results.
func (*CCMetricStore) LoadData ¶
func (ccms *CCMetricStore) LoadData( job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, resolution int, ) (schema.JobData, error)
LoadData retrieves time series data and statistics for the specified job and metrics. It queries data for the job's time range and resources, handling scope transformations automatically.
Parameters:
- job: Job metadata including cluster, time range, and allocated resources
- metrics: List of metric names to retrieve
- scopes: Requested metric scopes (node, socket, core, etc.)
- ctx: Context for cancellation and timeouts
- resolution: Data resolution in seconds (0 for native resolution)
Returns JobData organized as: metric -> scope -> series list. Supports partial errors: returns available data even if some queries fail.
func (*CCMetricStore) LoadNodeData ¶
func (ccms *CCMetricStore) LoadNodeData( cluster string, metrics, nodes []string, scopes []schema.MetricScope, from, to time.Time, ctx context.Context, ) (map[string]map[string][]*schema.JobMetric, error)
LoadNodeData retrieves current metric data for specified nodes in a cluster. Used for the Systems-View Node-Overview to display real-time node status.
If nodes is nil, queries all metrics for all nodes in the cluster (bulk query). Returns data organized as: hostname -> metric -> list of JobMetric (with time series and stats).
func (*CCMetricStore) LoadNodeListData ¶
func (ccms *CCMetricStore) LoadNodeListData( cluster, subCluster string, nodes []string, metrics []string, scopes []schema.MetricScope, resolution int, from, to time.Time, ctx context.Context, ) (map[string]schema.JobData, error)
LoadNodeListData retrieves paginated node metrics for the Systems-View Node-List.
Supports filtering by subcluster and node name pattern. The nodeFilter performs substring matching on hostnames.
Returns:
- Node data organized as: hostname -> JobData (metric -> scope -> series)
- Total node count (before pagination)
- HasNextPage flag indicating if more pages are available
- Error (may be partial error with some data returned)
func (*CCMetricStore) LoadScopedStats ¶
func (ccms *CCMetricStore) LoadScopedStats( job *schema.Job, metrics []string, scopes []schema.MetricScope, ctx context.Context, ) (schema.ScopedJobStats, error)
LoadScopedStats retrieves statistics for job metrics across multiple scopes. Used for the Job-View Statistics Table to display per-scope breakdowns.
Returns statistics organized as: metric -> scope -> list of scoped statistics. Each scoped statistic includes hostname, hardware ID (if applicable), and min/avg/max values.
func (*CCMetricStore) LoadStats ¶
func (ccms *CCMetricStore) LoadStats( job *schema.Job, metrics []string, ctx context.Context, ) (map[string]map[string]schema.MetricStatistics, error)
LoadStats retrieves min/avg/max statistics for job metrics at node scope. This is faster than LoadData when only statistical summaries are needed (no time series data).
Returns statistics organized as: metric -> hostname -> statistics.