mon

package module
v1.5.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2025 License: MIT Imports: 17 Imported by: 20

README

go-mon

godoc

Go monitoring toolbox. Basic usage:

// register internal stats
mon.RegisterGcStats()

//create calculated rate via EWMA average
requestRate, _ := mon.GlobalRegistry.RegisterOrGet(`web.request_rate`, mon.NewEWMARate(time.Minute))
//create counter
requestCount, _ := mon.GlobalRegistry.RegisterOrGet(`web.request_count`, mon.NewRawCounter())

// create gauge
requestConcurrency, _ := mon.GlobalRegistry.RegisterOrGet(`web.concurrent_connections`, mon.NewRawGauge())
// second parameter for Metric type is unit
temp, _ := mon.GlobalRegistry.RegisterOrGet(`room`, mon.NewRawGauge("temperature"))

// update
// number is ignored for rate, it is treated always as single event
requestRate.Update(1)

requestCount.Update(100)
requestConcurrency.Update(20)
temp.Update(23.4)
mon.GlobalStatus.Update(mon.Ok,"all is fine")


// publish
http.Handle("/_status/health", mon.HandleHealthcheck)
http.Handle("/_status/metrics", mon.HandleMetrics)

If your app requires graceful stop, HAPRoxy's http-check send-state is also supported:

// publish and handle haproxy's "X-Haproxy-Server-State"
healthcheckHandler, haproxyStatus := mon.HandleHealthchecksHaproxy()
http.Handle("/_status/health", healthcheckHandler)
// prepare to stop
mon.GlobalStatus.Update(mon.Warning,"shutting down")
// then check whether you can shutdown server gracefully (i.e. haproxy marked it as down and we have no ongoing connections)
for i := 1; i <= 10; i++ {
    if haproxyStatus.SafeToStop() {
        os.Exit(0)
    } else {
        // waiting for connections to finish
        time.Sleep(time.Second * 3)
    }
}
// and exit if it takes more than 30s as we do not expect normal requests to take that long.
os.Exit(0)

Returning JSON for metrics:

{
  "metrics": {
    "gc.count": {
      "type": "c",
      "value": 0
    },
    "gc.cpu": {
      "type": "G",
      "unit": "percent",
      "value": 0
    },
    "gc.free": {
      "type": "c",
      "value": 636
    },
    "gc.heap_alloc": {
      "type": "G",
      "unit": "bytes",
      "value": 1561000
    },
    "gc.heap_idle": {
      "type": "G",
      "unit": "bytes",
      "value": 63897600
    },
    "gc.heap_inuse": {
      "type": "G",
      "unit": "bytes",
      "value": 2752512
    },
    "gc.heap_obj": {
      "type": "G",
      "value": 9459
    },
    "gc.malloc": {
      "type": "c",
      "value": 10095
    },
    "gc.mcache_inuse": {
      "type": "G",
      "unit": "bytes",
      "value": 13824
    },
    "gc.mspan_inuse": {
      "type": "G",
      "unit": "bytes",
      "value": 37544
    },
    "gc.pause": {
      "type": "C",
      "unit": "ns",
      "value": 0
    },
    "gc.stack_inuse": {
      "type": "G",
      "unit": "bytes",
      "value": 458752
    },
    "room": {
      "type": "G",
      "unit": "temperature",
      "value": 23.4
    },
    "web.concurrent_connections": {
      "type": "G",
      "value": 20
    },
    "web.request_count": {
      "type": "c",
      "value": 100
    },
    "web.request_rate": {
      "type": "G",
      "value": 9.962166085834647e-11
    }
  },
  "instance": "main",
  "interval": 10,
  "fqdn": "test.example.com",
  "ts": "2020-01-31T18:09:18.700941258+01:00"

Status:

{
  "state": 1,
  "name": "foobar",
  "fqdn": "test.example.com",
  "display_name": "foobar",
  "description": "do foo to bar",
  "msg": "started",
  "ok": true,
  "ts": "2020-01-31T18:19:18.347185356+01:00",
  "components": {}
}

Metrics

module by default creates global container for the metrics under mon.GlobalRegistry or you can create your own if you need to compartmentalize.

Example usage

Register GC stats

mon.RegisterGcStats()

Register some of our own metrics

mon.GlobalRegistry.Register(`web.request_rate`, mon.NewEWMARate(time.Minute))
mon.GlobalRegistry.Register(`web.request_count`, mon.NewCounter())

there is also MustRegister() that will panic if metric exists, and RegisterOrGet() that will just read existing one if it was already created

read them back from registry somewhere else in code

rate, _ := mon.GlobalRegistry.GetMetric(`web.request_rate`)
count, _ := mon.GlobalRegistry.GetMetric(`web.request_count`)

// and update

rate.Update(1) 
count.Update(1)

note that all *Rate metrics ignore update value ; each Update() is always "one request" for rate calculation purpose

Now we can expose them under url via standard HTTP interface helper

http.Handle("/_status/health", mon.HandleMetrics)

Status

How it works

Status object contains app's name, fqdn and human readable description, state, and 0 or more components (each being its own status object).

If there are no subcomponents status returns its own state.

If there is more than zero it uses merge function to return state of its components. Merge function by default chooses the worst state out of all.

States are typical nagios/Icinga states shifted by 1 to avoid 0 being interpreted as OK

  • 0- Invalid
  • 1 - OK
  • 2 - Warning
  • 3 - Critical
  • 4 - Unknown
Usage

Set app's name and description.

mon.GlobalStatus.Name = "my-app"
mon.GlobalStatus.DisplayName = "My Application"
mon.GlobalStatus.Description = "My great appserver"

for simple apps just use global object:

mon.GlobalStatus.Update(mon.Ok,"all is fine")

for more complex ones, add components (note: do not update parent object, it will get updated with worst status of the children)

dbState := mon.GlobalStatus.MustNewComponent("db")
cacheState := mon.GlobalStatus.MustNewComponent("cache")
// once they started
go func () {
	for {
           if stateGood() {
		    dbState.Update(mon.Ok, "running")
           } else {
              dbState.Update(mon.Critical, "error xyz")
           }
		time.Sleep(time.Second * 10)
	}
}()
...

then publish the results:

http.Handle("/_status/health", mon.HandleHealthcheck)

or use any other router that supports func(w http.ResponseWriter, r *http.Request)

r := gin.New()
appMetricsR := r.Group("/_status")
appMetricsR.GET("/health", gin.WrapF(mon.HandleHealthcheck))

Documentation

Index

Constants

View Source
const (
	// 0 is invalid state because that is initial value of int variable which can mean that someone didn't bother to actually set the state
	StatusInvalid = iota
	// nagios-compatible block
	// to get compatible state take nagios one and add +1
	// Service is ok
	StatusOk
	// service is in warning state
	// should be only used if service is *actually* working but have some problems that need to be resolved
	// like "disk getting full" or "worker queue is 90% busy"
	StatusWarning
	// Service is in critical state and is not performing its function
	StatusCritical
	// check failed to get status of service (for reason other than "service is not working)
	// i.e. check itself crashed before providing any useful information about service
	StatusUnknown
)
View Source
const (
	// 0 is invalid state because that is initial value of int variable which can mean that someone didn't bother to actually set the state
	HostInvalid = iota
	// host (as in "unit running service checks") is up
	HostUp
	// host is directly unavailable
	HostDown
	// host is down because its parent is down (it is impossble to check because device that connects to the host is unavailable
	HostUnreachable
)
View Source
const (
	MetricTypeGauge        = `G` // float64 gauge
	MetricTypeGaugeInt     = `g` // int64 gauge
	MetricTypeCounter      = `c` // int64 counter
	MetricTypeCounterFloat = `C` // float64 counter
)
View Source
const Critical = State(3)
View Source
const Invalid = State(0)
View Source
const Ok = State(1)
View Source
const StateCritical = State(3)
View Source
const StateInvalid = State(0)
View Source
const StateOk = State(1)
View Source
const StateUnknown = State(4)
View Source
const StateWarning = State(2)
View Source
const Unknown = State(4)
View Source
const Warning = State(2)

Variables

This section is empty.

Functions

func HandleHealthcheck added in v0.0.2

func HandleHealthcheck(w http.ResponseWriter, req *http.Request)

HandleHealthchecks returns GlobalStatus with appropriate HTTP code

func HandleMetrics

func HandleMetrics(w http.ResponseWriter, req *http.Request)

HandleMetrics is basic web hook that returns JSON dump of metrics in GlobalRegistry

func HandlePrometheus added in v1.4.0

func HandlePrometheus(w http.ResponseWriter, req *http.Request)

func RegisterGcStats

func RegisterGcStats(c ...GcStatsConfig)

func SummarizeStatusMessage added in v0.0.2

func SummarizeStatusMessage(component *map[string]*Status) (message string)

SummarizeStatusMessage generates status message based on map of components and their statuses

func WrapUint64Counter added in v0.1.0

func WrapUint64Counter(i uint64) (o int64)

Wraps unsigned 64 bit counter to 64 signed one, on zero

Types

type ErrMetricAlreadyRegistered

type ErrMetricAlreadyRegistered struct {
	Metric string
}

func (*ErrMetricAlreadyRegistered) Error

type ErrMetricAlreadyRegisteredWrongType

type ErrMetricAlreadyRegisteredWrongType struct {
	Metric        string
	NewMetricType string
	OldMetricType string
}

func (*ErrMetricAlreadyRegisteredWrongType) Error

type ErrMetricNotFound

type ErrMetricNotFound struct {
	Metric string
}

func (*ErrMetricNotFound) Error

func (e *ErrMetricNotFound) Error() string

type GcStatsConfig added in v0.1.0

type GcStatsConfig struct {
	Interval time.Duration
	Average  bool
}

GcStats configuration. Interval is time between probes, average turns on EWMA on most stats with 5x interval as half-life

type GobTag added in v1.5.0

type GobTag struct {
	T map[string]string
}

type HaproxyState added in v1.2.0

type HaproxyState struct {
	State        State
	BackendName  string
	ServerName   string
	LBNodeName   string
	ServerWeight int
	TotalWeight  int
	// Current connections going to this server
	ServerCurrentConnections int
	// Current connections going to backend
	BackendCurrentConnections int
	// Requests in server queue
	Queue int
	// whether header was found
	Found bool
	TS    time.Time
	sync.RWMutex
}

func HandleHaproxyState added in v1.2.0

func HandleHaproxyState(req *http.Request) (haproxyState HaproxyState, found bool, err error)

HandleHaproxyState parses haproxy state header and returns current backend state

Example header: X-Haproxy-Server-State: UP 2/3; name=bck/srv2; node=lb1; weight=1/2; scur=13/22; qcur=

func HandleHealthchecksHaproxy added in v1.2.0

func HandleHealthchecksHaproxy(emit404OnWarning ...bool) (handlerFunc func(w http.ResponseWriter, req *http.Request), haproxyState *HaproxyState)

HandleHealthchecksHaproxy returns GlobalStatus with appropriate HTTP code and handles X-Haproxy-Server-State header

func (*HaproxyState) SafeToStop added in v1.2.0

func (s *HaproxyState) SafeToStop() bool

type JSONOut

type JSONOut struct {
	Type    string      `json:"type"`
	Unit    string      `json:"unit,omitempty"`
	Invalid bool        `json:"invalid,omitempty"`
	Value   interface{} `json:"value"`
}

API-compatible JSON output structure

type Metric

type Metric interface {
	Type() string
	Update(float64)
	Unit() string
	Value() float64
}

Single metric handler interface

func NewCounter

func NewCounter(unit ...string) Metric

func NewEWMA

func NewEWMA(halflife time.Duration, unit ...string) Metric

New exponentally weighted moving average metric halflife is half-life of stat decay

func NewEWMARate

func NewEWMARate(halflife time.Duration, unit ...string) Metric

New exponentally weighted moving average event rate counter call Update(value is ignored) every time an event happens to get rate of the event

func NewGauge added in v1.1.0

func NewGauge(unit ...string) Metric

func NewRawCounter

func NewRawCounter(unit ...string) Metric

type MetricCounter added in v1.5.0

type MetricCounter struct {
	// contains filtered or unexported fields
}

func (*MetricCounter) MarshalJSON added in v1.5.0

func (f *MetricCounter) MarshalJSON() ([]byte, error)

func (*MetricCounter) Type added in v1.5.0

func (m *MetricCounter) Type() string

func (*MetricCounter) Unit added in v1.5.0

func (m *MetricCounter) Unit() string

func (*MetricCounter) Update added in v1.5.0

func (m *MetricCounter) Update(v float64)

func (*MetricCounter) Value added in v1.5.0

func (m *MetricCounter) Value() float64

type MetricFloat

type MetricFloat struct {
	sync.Mutex
	// contains filtered or unexported fields
}

raw float metric with no backend

func (*MetricFloat) MarshalJSON

func (f *MetricFloat) MarshalJSON() ([]byte, error)

func (*MetricFloat) Type

func (f *MetricFloat) Type() string

func (*MetricFloat) Unit

func (f *MetricFloat) Unit() string

func (*MetricFloat) Update

func (f *MetricFloat) Update(value float64) (err error)

func (*MetricFloat) Value

func (f *MetricFloat) Value() float64

func (*MetricFloat) ValueRaw

func (f *MetricFloat) ValueRaw() interface{}

type MetricFloatBackend

type MetricFloatBackend struct {
	sync.Mutex
	// contains filtered or unexported fields
}

Float metric with backend.

By default backend is updated with mutex lock, all other locking have to be handled by the backend itself

func (*MetricFloatBackend) MarshalJSON

func (f *MetricFloatBackend) MarshalJSON() ([]byte, error)

func (*MetricFloatBackend) Type

func (f *MetricFloatBackend) Type() string

func (*MetricFloatBackend) Unit

func (f *MetricFloatBackend) Unit() string

func (*MetricFloatBackend) Update

func (f *MetricFloatBackend) Update(value float64)

func (*MetricFloatBackend) Value

func (f *MetricFloatBackend) Value() float64

type MetricGauge added in v1.5.0

type MetricGauge struct {
	// contains filtered or unexported fields
}

func (*MetricGauge) MarshalJSON added in v1.5.0

func (f *MetricGauge) MarshalJSON() ([]byte, error)

func (*MetricGauge) Type added in v1.5.0

func (m *MetricGauge) Type() string

func (*MetricGauge) Unit added in v1.5.0

func (m *MetricGauge) Unit() string

func (*MetricGauge) Update added in v1.5.0

func (m *MetricGauge) Update(v float64)

func (*MetricGauge) Value added in v1.5.0

func (m *MetricGauge) Value() float64

type MetricRawCounter added in v1.5.0

type MetricRawCounter struct {
	// contains filtered or unexported fields
}

func (*MetricRawCounter) MarshalJSON added in v1.5.0

func (f *MetricRawCounter) MarshalJSON() ([]byte, error)

func (*MetricRawCounter) Type added in v1.5.0

func (m *MetricRawCounter) Type() string

func (*MetricRawCounter) Unit added in v1.5.0

func (m *MetricRawCounter) Unit() string

func (*MetricRawCounter) Update added in v1.5.0

func (m *MetricRawCounter) Update(v float64)

func (*MetricRawCounter) Value added in v1.5.0

func (m *MetricRawCounter) Value() float64

type PrometheusHandler added in v1.4.0

type PrometheusHandler struct {
}

type Registry

type Registry struct {
	Metrics  map[string]map[string]Metric `json:"metrics"`
	Instance string                       `json:"instance"`
	Interval float64                      `json:"interval"`
	FQDN     string                       `json:"fqdn"`
	Ts       time.Time                    `json:"ts,omitempty"`
	sync.Mutex
}
var GlobalRegistry *Registry

Global registry, will use app's executable name as instance and try best to guess FQDN You can change thos via Set..() family of methods

func NewRegistry added in v0.0.3

func NewRegistry(fqdn string, instance string, interval float64) (*Registry, error)

func (*Registry) GetMetric

func (r *Registry) GetMetric(name string, tags ...map[string]string) (Metric, error)

func (*Registry) GetRegistry added in v0.0.3

func (r *Registry) GetRegistry() *Registry

Returns a shallow copy of registry with current timestamp. Should be used as source for any serializer

func (*Registry) MustRegister

func (r *Registry) MustRegister(name string, metric Metric, tags ...map[string]string) Metric

MustRegister() does same as Register() except it panic()s if metric already exists. It is mostly intended to be used for top of the package, package-scoped metrics like

var request_rate =  mon.GlobalRegistry.Register("backend.mysql.qps",mon.NewEWMARate(time.Duration(time.Minute)))

func (*Registry) Register

func (r *Registry) Register(name string, metric Metric, tags ...map[string]string) (Metric, error)

Register() a given metric or return error if name is already used

func (*Registry) RegisterOrGet

func (r *Registry) RegisterOrGet(name string, metric Metric, tags ...map[string]string) (Metric, error)

func (*Registry) SetFQDN added in v0.0.4

func (r *Registry) SetFQDN(name string)

Set FQDN returned by registry during marshalling

func (*Registry) SetInstance

func (r *Registry) SetInstance(name string)

Set instance name returned by registry during marshalling

func (*Registry) SetInterval added in v0.0.5

func (r *Registry) SetInterval(interval float64)

func (*Registry) UpdateTs added in v0.0.3

func (r *Registry) UpdateTs()

update timestamp. Should be called before read if timestamp is desirable in output

type Service

type Service struct {
	// name of the host/metahost service is running on
	Host string `json:"host"`
	// name of service
	Service string `json:"service"`
	// numeric state
	State uint8 `json:"state"`
	// timestamp of the check
	Timestamp time.Time `json:"ts"`
	// duration since last state change
	StateDuration time.Duration `json:"duration,omitempty"`
	// sub-service state
	// if service (say web app) have multiple internal components (for example DB backend, video transcoder etc) that allows it to send state of them to the upstream without multiplying amount of service checks
	// Note that status of them **HAVE** to be aggregated into parent's State
	Components []Service `json:"components,omitempty"`
}

type StatBackendFloat

type StatBackendFloat interface {
	Update(float64)
	Value() float64
}

backend interface handling single integer stat

type StatBackendInt

type StatBackendInt interface {
	Update(int64)
	Value() int64
}

backend interface handling single integer stat

type State added in v1.0.0

type State uint8

func SummarizeStatusState added in v0.0.2

func SummarizeStatusState(component *map[string]*Status) (state State)

SummarizeStatusState returns highest ( critical>unknown>warning>ok ) state of underlying status map

type Status

type Status struct {
	State State `json:"state"`
	// Canonical service name (required)
	Name string `json:"name"`
	// FQDN
	FQDN string `json:"fqdn,omitempty"`
	// Pretty display name of service
	DisplayName string `json:"display_name,omitempty"`
	// Description of service
	Description string `json:"description,omitempty"`
	// status check message
	Msg string `json:"msg"`
	// data format initialization canary.
	// Proper implementation will set ok to true if status is really okay
	// but fresh (all fields zero) object will be invalid (state = 0 but ok = false)
	// and that can be detected upstream.
	// Other function is to allow just checking one bool flag to decide if it is ok or not
	Ok         bool               `json:"ok"`
	Ts         time.Time          `json:"ts"`
	Components map[string]*Status `json:"components,omitempty"`

	//
	sync.RWMutex
	// contains filtered or unexported fields
}

Status forms hierarchical structure. Parent status code and message is always generated from status of children so running update on it is pointless

var GlobalStatus *Status

func NewStatus added in v0.0.2

func NewStatus(name string, p ...string) *Status

NewStatus creates new status object with state set to unknown optional parameters are * display name * description

func (*Status) GetMessage added in v0.0.2

func (s *Status) GetMessage() string

update and return message

func (*Status) GetOK added in v1.3.3

func (s *Status) GetOK() bool

func (*Status) GetState added in v0.0.2

func (s *Status) GetState() State

update and return message

func (*Status) MustNewComponent added in v1.0.1

func (s *Status) MustNewComponent(name string, p ...string) *Status

func (*Status) MustUpdate added in v1.0.1

func (s *Status) MustUpdate(status State, message string)

func (*Status) NewComponent added in v0.0.2

func (s *Status) NewComponent(name string, p ...string) (*Status, error)

NewComponent adds a new child component to the Status optional parameters are * display name * description

func (*Status) Update added in v0.0.2

func (s *Status) Update(status State, message string) error

Update updates state of the Status component. It should be used only on component with no children or else it will err out

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL