dnms

command module

v0.0.0-...-80b60c5 Latest Latest Go to latest Published: May 11, 2026 License: MIT Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jacksontj/dnms

Links

Open Source Insights

README ¶

Distributed Network Monitoring System

objective

Black-box test the network from the edges/leafs of the network

How?

Effectively a distributed ping + traceroute across the whole infrastructure at some interval. This data is then aggregated into a central setvice to expose the data through an API

Terms

Peer: another server on the network

NetworkGraph: graph of the entire network Route: a set of links NetworkNode: Router in the network-- something that should respond to traceroute Link: specific connection between 2 NetworkNodes

Ping: a ping with a specific source port PingGroup: a group of pings against a specific destination

Traceroute: a traceroute with a specific source port Traceroute Group: a group of traceroutes against a specific destination

how those fit together?

Any given node will know about the peers in the network. It will intermittently ping and traceroute peers in the network, and keep track of failures. In the event of a failure we'll determine what links are at fault for the disruption.

Implementation

The goal here is to create a few layers that in themselves create something useful

Base parts:

Memberlist: our peers on the network to talk to
Mapper: responsible for mapping the network based on who is in the memberlist
Pinger: ping all peers in the network-- specifically to hit all routes in the mapper
Aggregator: aggregate all the graph info from the members of the memberlist

Running

go build ./...
./dnms -gossipAddr <advertise-ip> -peer <seed-peer:33434>

Flags:

Flag	Default	Notes
`-gossipAddr`	local IP	Address advertised to the memberlist gossip layer.
`-peer`	(none)	Seed `host:port` to join an existing cluster. Empty starts a standalone node.
`-aggregator`	false	Run the aggregator HTTP API in addition to the mapper.
`-httpAddr`	`:12345`	Bind address for the HTTP API (graph, routemap, events).
`-pingPort`	`33435`	UDP port for the ping/ack transport. Must be the same on every cluster node.
`-mapInterval`	`1s`	Pause between traceroutes inside a single source-port sweep.
`-mapSrcPortStart` / `-mapSrcPortEnd`	auto	Source-port range used to elicit ECMP variation. Defaults to `pingPort+1 .. +11`; main fatals at startup if the range overlaps `-pingPort`.
`-pingPeerInterval`	`100ms`	Pause between successive peers in a ping sweep.
`-pingRouteInterval`	`1s`	Pause between successive routes when pinging one peer.
`-pingTimeout`	`1s`	Ack timeout for a single ping.
`-metricRingSize`	`100`	Number of recent ping samples retained per route.
`-httpAllowOrigin`	`*`	Value for `Access-Control-Allow-Origin`. Lock down by setting to a specific origin.
`-httpToken`	(none)	If non-empty, requires `Authorization: Bearer <token>` on every HTTP API request.
`-aggregatorToken`	(none)	Token the aggregator sends when subscribing to peers; defaults to `-httpToken`.

Note: gossip still uses UDP/TCP 33434 (memberlist), and -pingPort is a separate socket that dnms owns directly — no more piggy-backing pings on the memberlist receive loop.

HTTP API

All endpoints return JSON. Useful ones:

Path	Description
`/v1/graph`	Full per-node graph (nodes + edges + routes).
`/v1/graph/nodes`	Just the nodes the mapper has observed.
`/v1/graph/edges`	Just the links.
`/v1/graph/edges/health`	Per-link fault attribution — see below.
`/v1/graph/routes`	Routes with per-route loss/latency/jitter metrics.
`/v1/mapper/peers`	Peers the local node knows about.
`/v1/mapper/routemap`	`(src,dst)` → route lookup.
`/v1/events/graph`	Server-Sent Events stream of graph mutations.
`/v1/aggregator/graph*`	Same shapes as `/v1/graph*`, but over the cluster-wide merged view.
`/v1/aggregator/events/graph`	SSE stream of the aggregated graph.

Per-link fault attribution: `/v1/graph/edges/health`

Between any two peers there are usually N different paths through the intermediate network. Each route has its own observed loss rate. To figure out which hop is responsible for the loss, dnms correlates per-route loss observations with link membership:

For each link L:
  AvgLossThrough    = sample-weighted mean lossRate of routes containing L
  AvgLossNotThrough = sample-weighted mean lossRate of routes NOT containing L
  Suspicion         = AvgLossThrough − AvgLossNotThrough

The endpoint returns one row per link sorted by suspicion descending. A genuinely bad link surfaces with a high positive score because every route through it shares its loss while routes avoiding it don't; a healthy link on otherwise noisy paths lands near zero or negative.

The same correlation runs on the aggregator side at /v1/aggregator/graph/edges/health, where it has access to every peer's routes — much more evidence than any single mapper sees on its own.

Documentation ¶

Overview ¶

TODO: separate package (to avoid namespace collisions)

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
aggregator
graph TODO: better name? network topology?	TODO: better name? network topology?
mapper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL