ServerStatus
Yet Another ServerStatus Backend, using Prometheus as datasource
Quick Start
Scraping
First, you should set up node-exporter on each of target hosts, and prometheus or any Prometheus-compatible software like vmagent on the host you like to scarpe metrics on targets
As region, location, virtualization type of target hosts cannot be concluded from exported metrics, so you should set these attributes on prometheus scrape configs
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: node
scrape_interval: 15s
scrape_timeout: 15s
static_configs:
- targets: ['1.1.1.1:9100']
labels:
hostname: "host-a"
virt_type: "kvm"
region: "FR"
location: "Paris"
- targets: ['2.2.2.2:9100']
labels:
hostname: "host-b"
virt_type: "kvm"
region: "JP"
location: "Osaka"
metric_relabel_configs:
- action: labeldrop
regex: (region|location)
We recommend you adding hostname into target labels to identify hosts, but the auto-generated label instance can also be used to identify hosts, and displayed hostname can be replaced by ServerStatus configurations later
Action labeldrop is used to drop all the specific labels in scraped metrics except metrics that auto-generated by prometheus like up before metrics are stored. It's quite an ugly style but works well for our purpose to keep these extra infomation at a cheaper cost.
For example, for auto-generated metrics, it will be
up{hostname="host-a", instance="1.1.1.1:9100", job="node", location="Paris", region="FR", virt_type="kvm"} 1
and those scraped from node-exporter, it will be
node_boot_time_seconds{hostname="host-a", instance="1.1.1.1:9100", job="node"} 3333
querying
{
"version": 1,
"listen": "127.0.0.1:30000",
"refresh_interval": 120, # configuration refresh interval, nodes list will be automated be reloaded
"scrape_interval": 5, # how often we make a query to prometheus datasource
"log_path": "/path/to/logdir",
"nodes": {
"default_data_source": "prometheus_name",
"id_label": "hostname", # label name used to identify a host
"mode": "AUTO", # AUTO or STATIC, AUTO means hosts will be get from query, and STATIC means the following list will be the source
"network_overwrites": { # for aggerated metrics to calculate the total amount of network traffic
"enable": true,
"rx": "node_network_receive_bytes_total:30m_inc",
"tx": "node_network_transmit_bytes_total:30m_inc",
"align": "30m"
},
"list": [
{
"hostname": "host-a",
"overwrites": {
"hostname": "DisplayNameForHostA",
"net_devices": [
"eth4",
"pppoe0"
]
}
},
{
"hostname": "host-b",
"overwrites": {
"hostname": "DisplayNameForHostB",
"net_devices": [
"eth3",
"eth4",
"pppoe0",
"pppoe1"
],
"billing_date": "2023-09-15T00:00:00+08:00" # network traffic will reset at the day and hour of the month
}
}
],
"global_matcher": [
{
"label": "job",
"op": "=",
"value": "node"
}
]
},
"data_sources": [
{
"type": "prometheus",
"name": "prometheus_name",
"url": "https://127.0.0.1:9090"
}
]
}
If you don't add hostname in prometheus configuration, you can specific id_label as instance here, and fill hostname as 1.1.1.1:9100 or 2.2.2.2:9100 in list, and with a replaced hostname value in overwrites section.
pre-aggregated network metrics
At the end of the month or billing cycle, calculating the total network traffic usage can become a time-consuming task
To mitigate this, you can utilize record rules from VMAlert or Prometheus to aggregate network traffic metrics. For instance, by applying a rule to aggregate the increase in traffic over a 30-minute range, the number of data points will be reduced to 1/120th compared to the original data if your scrape duration is 15 seconds
You can enable this feature in the network_overwrites section
Please refer to ./doc/vm/vmalert_rule.yml for instructions on how to add record rules
systemd and reverse proxy config
Please refer to ./doc/ for detail