metricstoreclient

package
v1.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 6, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package metricstoreclient - Query Building

This file contains the query construction and scope transformation logic for cc-metric-store queries. It handles the complex mapping between requested metric scopes and native hardware topology, automatically aggregating or filtering metrics as needed.

Scope Transformations

The buildScopeQueries function implements the core scope transformation algorithm. It handles 25+ different transformation cases, mapping between:

  • Accelerator (GPU) scope
  • HWThread (hardware thread/SMT) scope
  • Core (CPU core) scope
  • Socket (CPU package) scope
  • MemoryDomain (NUMA domain) scope
  • Node (full system) scope

Transformations follow these rules:

  • Same scope: Return data as-is (e.g., Core → Core)
  • Coarser scope: Aggregate data (e.g., Core → Socket with Aggregate=true)
  • Finer scope: Error - cannot increase granularity

Query Building

buildQueries and buildNodeQueries are the main entry points, handling job-specific and node-specific query construction respectively. They:

  • Validate metric configurations
  • Handle subcluster-specific metric filtering
  • Detect and skip duplicate scope requests
  • Call buildScopeQueries for each metric/scope/host combination

Package metricstoreclient provides a client for querying the cc-metric-store time series database.

The cc-metric-store is a high-performance time series database optimized for HPC metric data. This client handles HTTP communication, query construction, scope transformations, and data retrieval for job and node metrics across different metric scopes (node, socket, core, hwthread, accelerator).

Architecture

The package is split into two main components:

  • Client Operations (cc-metric-store.go): HTTP client, request handling, data loading methods
  • Query Building (cc-metric-store-queries.go): Query construction and scope transformation logic

Basic Usage

store := NewCCMetricStore("http://localhost:8080", "jwt-token")

// Load job data
jobData, err := store.LoadData(job, metrics, scopes, ctx, resolution)
if err != nil {
    log.Fatal(err)
}

Metric Scopes

The client supports hierarchical metric scopes that map to HPC hardware topology:

  • MetricScopeAccelerator: GPU/accelerator level metrics
  • MetricScopeHWThread: Hardware thread (SMT) level metrics
  • MetricScopeCore: CPU core level metrics
  • MetricScopeSocket: CPU socket level metrics
  • MetricScopeMemoryDomain: NUMA domain level metrics
  • MetricScopeNode: Full node level metrics

The client automatically handles scope transformations, aggregating finer-grained metrics to coarser scopes when needed (e.g., aggregating core metrics to socket level).

Error Handling

The client supports partial errors - if some queries fail, it returns both the successful data and an error listing the failed queries. This allows processing partial results when some nodes or metrics are temporarily unavailable.

API Versioning

The client uses cc-metric-store API v2, which includes support for:

  • Data resampling for bandwidth optimization
  • Multi-scope queries in a single request
  • Aggregation across hardware topology levels

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type APIMetricData

type APIMetricData struct {
	Error      *string        `json:"error"`      // Error message if query failed
	Data       []schema.Float `json:"data"`       // Time series data points
	From       int64          `json:"from"`       // Actual start time of data
	To         int64          `json:"to"`         // Actual end time of data
	Resolution int            `json:"resolution"` // Actual resolution of data in seconds
	Avg        schema.Float   `json:"avg"`        // Average value across time range
	Min        schema.Float   `json:"min"`        // Minimum value in time range
	Max        schema.Float   `json:"max"`        // Maximum value in time range
}

APIMetricData represents time series data and statistics for a single metric series. Error is set if this particular series failed to load.

type APIQuery

type APIQuery struct {
	Type       *string  `json:"type,omitempty"`        // Scope type (e.g., "core", "socket")
	SubType    *string  `json:"subtype,omitempty"`     // Sub-scope type (reserved for future use)
	Metric     string   `json:"metric"`                // Metric name
	Hostname   string   `json:"host"`                  // Target hostname
	Resolution int      `json:"resolution"`            // Data resolution in seconds (0 = native)
	TypeIds    []string `json:"type-ids,omitempty"`    // IDs for the scope type (e.g., core IDs)
	SubTypeIds []string `json:"subtype-ids,omitempty"` // IDs for sub-scope (reserved)
	Aggregate  bool     `json:"aggreg"`                // Aggregate across TypeIds
}

APIQuery specifies a single metric query with optional scope filtering. Type and TypeIds define the hardware scope (core, socket, accelerator, etc.).

type APIQueryRequest

type APIQueryRequest struct {
	Cluster     string     `json:"cluster"`       // Target cluster name
	Queries     []APIQuery `json:"queries"`       // Explicit list of metric queries
	ForAllNodes []string   `json:"for-all-nodes"` // Metrics to query for all nodes
	From        int64      `json:"from"`          // Start time (Unix timestamp)
	To          int64      `json:"to"`            // End time (Unix timestamp)
	WithStats   bool       `json:"with-stats"`    // Include min/avg/max statistics
	WithData    bool       `json:"with-data"`     // Include time series data points
}

APIQueryRequest represents a request to the cc-metric-store query API. It supports both explicit queries and "for-all-nodes" bulk queries.

type APIQueryResponse

type APIQueryResponse struct {
	Queries []APIQuery        `json:"queries,omitempty"` // Echoed queries (for bulk requests)
	Results [][]APIMetricData `json:"results"`           // Result data, indexed by query
}

APIQueryResponse contains the results from a cc-metric-store query. Results align with the Queries slice by index.

type CCMetricStore

type CCMetricStore struct {
	// contains filtered or unexported fields
}

CCMetricStore is the HTTP client for communicating with cc-metric-store. It manages connection details, authentication, and provides methods for querying metrics.

func NewCCMetricStore

func NewCCMetricStore(url string, token string) *CCMetricStore

NewCCMetricStore creates and initializes a new (external) CCMetricStore client. The url parameter should include the protocol and port (e.g., "http://localhost:8080"). The token parameter is a JWT used for Bearer authentication; pass empty string if auth is disabled.

func (*CCMetricStore) HealthCheck

func (ccms *CCMetricStore) HealthCheck(cluster string,
	nodes []string, metrics []string,
) (map[string]ms.HealthCheckResult, error)

HealthCheck queries the external cc-metric-store's health check endpoint. It sends a HealthCheckReq as the request body to /api/healthcheck and returns the per-node health check results.

func (*CCMetricStore) LoadData

func (ccms *CCMetricStore) LoadData(
	job *schema.Job,
	metrics []string,
	scopes []schema.MetricScope,
	ctx context.Context,
	resolution int,
) (schema.JobData, error)

LoadData retrieves time series data and statistics for the specified job and metrics. It queries data for the job's time range and resources, handling scope transformations automatically.

Parameters:

  • job: Job metadata including cluster, time range, and allocated resources
  • metrics: List of metric names to retrieve
  • scopes: Requested metric scopes (node, socket, core, etc.)
  • ctx: Context for cancellation and timeouts
  • resolution: Data resolution in seconds (0 for native resolution)

Returns JobData organized as: metric -> scope -> series list. Supports partial errors: returns available data even if some queries fail.

func (*CCMetricStore) LoadNodeData

func (ccms *CCMetricStore) LoadNodeData(
	cluster string,
	metrics, nodes []string,
	scopes []schema.MetricScope,
	from, to time.Time,
	ctx context.Context,
) (map[string]map[string][]*schema.JobMetric, error)

LoadNodeData retrieves current metric data for specified nodes in a cluster. Used for the Systems-View Node-Overview to display real-time node status.

If nodes is nil, queries all metrics for all nodes in the cluster (bulk query). Returns data organized as: hostname -> metric -> list of JobMetric (with time series and stats).

func (*CCMetricStore) LoadNodeListData

func (ccms *CCMetricStore) LoadNodeListData(
	cluster, subCluster string,
	nodes []string,
	metrics []string,
	scopes []schema.MetricScope,
	resolution int,
	from, to time.Time,
	ctx context.Context,
) (map[string]schema.JobData, error)

LoadNodeListData retrieves paginated node metrics for the Systems-View Node-List.

Supports filtering by subcluster and node name pattern. The nodeFilter performs substring matching on hostnames.

Returns:

  • Node data organized as: hostname -> JobData (metric -> scope -> series)
  • Total node count (before pagination)
  • HasNextPage flag indicating if more pages are available
  • Error (may be partial error with some data returned)

func (*CCMetricStore) LoadScopedStats

func (ccms *CCMetricStore) LoadScopedStats(
	job *schema.Job,
	metrics []string,
	scopes []schema.MetricScope,
	ctx context.Context,
) (schema.ScopedJobStats, error)

LoadScopedStats retrieves statistics for job metrics across multiple scopes. Used for the Job-View Statistics Table to display per-scope breakdowns.

Returns statistics organized as: metric -> scope -> list of scoped statistics. Each scoped statistic includes hostname, hardware ID (if applicable), and min/avg/max values.

func (*CCMetricStore) LoadStats

func (ccms *CCMetricStore) LoadStats(
	job *schema.Job,
	metrics []string,
	ctx context.Context,
) (map[string]map[string]schema.MetricStatistics, error)

LoadStats retrieves min/avg/max statistics for job metrics at node scope. This is faster than LoadData when only statistical summaries are needed (no time series data).

Returns statistics organized as: metric -> hostname -> statistics.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL