cpv: Collection Profiles Validator

cpv is a command line tool for working with collection profiles.
Usage
cpv expects the following set of flags.
Usage of ./cpv:
-address string
Address of the Prometheus instance. (default "http://localhost:9090")
-allow-list-file string
Path to a file containing a list of allow-listed metrics that will always be included within the extracted metrics set. Requires -profile flag to be set.
-bearer-token string
Bearer token for authentication.
-kubeconfig string
Path to kubeconfig file. Defaults to $KUBECONFIG.
-noisy
Enable noisy assumptions: interpret the absence of the collection profiles label as the default 'full' profile (when using the -status flag).
-output-cardinality
Output cardinality of all extracted metrics to a file.
-profile string
Collection profile that the command is being run for.
-quiet
Suppress all output, and use $EDITOR for generated manifests.
-rule-file string
Path to a valid rule file to extract metrics from, for eg., https://github.com/prometheus/prometheus/blob/v0.45.0/model/rulefmt/testdata/test.yaml. Requires -profile flag to be set.
-status
Report collection profiles' implementation status. -profile may be empty to report status for all profiles.
-target-selectors string
Target selectors used to extract metrics, for eg., https://github.com/prometheus/client_golang/blob/644c80d1360fb1409a3fe8dfc5bad4228f282f3b/api/prometheus/v1/api_test.go#L1007. Requires -profile flag to be set.
-validate
Validate the collection profile implementation. Requires -profile flag to be set.
-version
Print version information.
Scenarios
While the utility can be used with the various aforementioned flag combinations to fulfill the desired use-case, the following ones may comparatively be more prominent within the general workflow and thus, have been documented in order to get the developers up-and-running with in no time.
The utility can be used to extract metrics based a set of given parameters that include:
-allow-list-file: Path to a file containing a list of metrics that will always be included within the extracted metrics set, even if they are not present in the Prometheus instance forwarded at -address.
-rule-file: Path to a file containing a set of RuleGroups. All metrics used to define expressions within the rules will be extracted. For example, model/rulefmt/testdata/test.yaml will result in the extraction of two metrics: errors_total and requests_total.
-target-selectors: A set of constraints (resembling VectorSelectors) satisfying the matchTarget parameter in TargetsMetadata. For example. "{job=\"prometheus\", severity=\"critical\"}" will result in the extraction of all metrics present in the Prometheus instance forwarded at -address, that have the job label set to prometheus and the severity label set to critical.
All these flags are mutually exclusive and require the -profile flag to be set. Once extracted, the metrics are used to generate a RelabelConfig that can be dropped into the ServiceMonitor or PodMonitor resource.
$ ./cpv -profile="$PROFILE" -rule-file="$RULE_FILE" -target-selectors="$TARGET_SELECTORS" -allow-list-file="$ALLOW_LIST_FILE"
sourcelabels:
- __name__
separator: ""
targetlabel: ""
regex: (foo|bar|...)
modulus: 0
replacement: ""
action: keep
Additionally, -output-cardinality may be specified to output the cardinality of all extracted metrics to a file, in order to better assess decisions around keeping or dropping certain metrics within the ServiceMonitor or PodMonitor resource(s) for a particular profile.
METRIC CARDINALITY
foo 40
bar 10
...
Status
The utility can be used to evaluate the extent to which a collection profile has been implemented for every default ServiceMonitor or PodMonitor resource that has opted-in to Collection Profiles feature. For example, with respect to the default Kube State Metrics ServiceMonitor (notice the explicit opt-in label), the utility, seeing that this has opted-in to the Collection Profiles feature, will check for the presence of all corresponding SupportedNonDefaultCollectionProfiles for that ServiceMonitor and report the status for each of them (whether they exist or not).
For all profiles to be "fully implemented" (i.e., when -status is used without specifying a particular -profile=$PROFILE) all of the default opted-in ServiceMonitor or PodMonitor resources (i.e., with monitoring.openshift.io/collection-profile label set to full) must have the same corresponding resources for every such profile. Here, "corresponding resources" mean the ServiceMonitor or PodMonitor resources that have their metadata.name same as their default opted-in ServiceMonitor or PodMonitor resource counterpart appended by the profile they fulfill, and with the monitoring.openshift.io/collection-profile label set to the profile being checked for.
So, for example, for an opted-in default ServiceMonitor resource with metadata.name as kube-state-metrics and monitoring.openshift.io/collection-profile: full present within its label set, the corresponding ServiceMonitor resources for the, say, minimal profile would be kube-state-metrics-minimal. The utility will check for the presence of all corresponding resources for every profile with the default resources' metadata.name as the base and report the status for each of them.
$ ./cpv -profile="$PROFILE" -status
PROFILE SERVICE MONITOR POD MONITOR ERROR
$PROFILE foo-monitor not implemented
$PROFILE bar-monitor not implemented
...
Additionally, a -noisy flag may be specified to interpret the absence of monitoring.openshift.io/collection-profile: full within the default ServiceMonitor or PodMonitor resources as the default full profile. This is useful when the ServiceMonitor or PodMonitor resources have not been updated to opt-in to the Collection Profiles feature yet.
Validation
The utility can be used to validate against any discrepancies that impact the specified ServiceMonitor or Podmonitor resources. For this purpose, the utility expects the -profile flag, i.e, the profile that the validation should run against, and the -validate flag to be set. The validation works by reporting the hierarchy of any missing metrics that the specified -profile depends on, the absence of which in turn may end up impacting the resources dependent on those metrics.
$ ./cpv -profile="$PROFILE" -validate
$PROFILE MONITOR GROUP LOCATION RULE QUERY METRIC ERROR
etcd-minimal etcd .../openshift-etcd-operator-etcd-prometheus-rules-....yaml etcdMemberCommunicationSlow histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket{job=~".*etcd.*"}[5m])) > 0.15 etcd_network_peer_round_trip_time_seconds_bucket not loaded
...
License
GNU GPLv3