sysmon

package

v0.0.0-...-11e1b04 Latest Latest Go to latest Published: May 29, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kernel/kernel-images

Links

Open Source Insights

README ¶

sysmon

VM-internal failure telemetry for the kernel-images browser VM. Surfaces two event types onto the existing EventStream → SSE / S2 pipeline:

Event type	Source	Owned by
`system_oom_kill`	Linux kernel OOM-killer via `/dev/kmsg`	`lib/sysmon` (in-process goroutine)
`service_crashed`	supervisord eventlistener protocol	`cmd/supervisord-shim` (separate binary, POSTs to `/telemetry/events`)

Both paths funnel through the active TelemetrySession, which gates on the live telemetry config before forwarding to events.EventStream. Events fired while no telemetry session is active (or whose category is disabled) are dropped before reaching the stream.

Why two binaries

Concern	sysmon (in-process)	supervisord-shim (separate process)
Why separate	n/a	supervisord's eventlistener protocol requires a separate process talking over stdin/stdout
Triggers	kernel OOM-killer writes to `/dev/kmsg`	supervised service exits unexpectedly or FATALs
Transport	in-process `PublishFunc` wired to `TelemetrySession.Publish`	`POST /telemetry/events` over localhost HTTP
Failure mode	open of `/dev/kmsg` may fail (no CAP_SYSLOG); API logs and continues without OOM telemetry	API may be down during shim's POST (shim logs and always ACKs supervisord; event lost); or API returns `204` because telemetry is unconfigured / `service_crashed` is filtered (shim treats as success; event dropped on purpose)

Event taxonomy

`system_oom_kill`

Parsed from one kernel OOM dump in /dev/kmsg. Payload (see BrowserSystemOomKillEventData in openapi.yaml for the authoritative schema):

Field	Meaning	Absent when
`process_name`	comm of the killed process (max 15 chars, kernel TASK_COMM_LEN limit)	never
`pid`	PID of the killed process	never
`rss_kb`	sum of anon-rss + file-rss + shmem-rss in KiB	never
`constraint`	`none` / `memcg` / `cpuset` / `memory_policy`	pre-Linux-5.0 kernels (no structured `oom-kill:` line)
`mem_total_kb`	total RAM from `N pages RAM` × 4 KiB	kernel did not emit Mem-Info (e.g. memcg OOM)
`mem_free_kb`	free RAM from `free:N free_pcp:N` × 4 KiB	as above
`top_tasks`	up to 5 processes from `Tasks state` table, sorted by RSS desc	kernel did not emit the table
`trigger_process_name`	comm of the process whose allocation triggered the OOM-killer	sysrq-triggered OOMs (no opener line)
`trigger_pid`	PID of the trigger	as above; pre-CPU/PID header kernels

`service_crashed`

Mapped from supervisord PROCESS_STATE_EXITED (with expected=0) or PROCESS_STATE_FATAL. Schema in BrowserServiceCrashedEventData:

Field	Meaning
`service_name`	supervisord program name (e.g. `chromium`, `mutter`, `kernel-images-api`)
`pid`	live PID at exit (omitted for `gave_up` since supervisord no longer tracks one)
`phase`	`startup` (died during STARTING) / `running` (crashed after reaching RUNNING) / `gave_up` (FATAL via exhausted startretries)

Clean stops (supervisorctl stop, exit codes in the configured exitcodes list) do not produce events — supervisord marks them expected=1 and the shim skips them.

File layout

File	Concern
`sysmon.go`	`Monitor` lifecycle (Start/Wait), goroutine wiring, `publishOomKill`
`kmsg.go`	OOM-dump text parser (regex + state machine) — see file header for format compatibility notes
`kmsg_linux.go`	Linux-only `/dev/kmsg` open via `euank/go-kmsg-parser`, SeekEnd on start so we don't replay history on API restart
`kmsg_other.go`	non-Linux stub so dev machines still compile
`kmsg_test.go`	parser fixtures + tests (both pre-5.14 and post-5.14 Tasks-state layouts)
`sysmon_test.go`	end-to-end test from stub kmsg source through EventStream

The supervisord-shim lives at cmd/supervisord-shim/. Its configuration is duplicated as supervisord-shim.conf under both images/chromium-headless/image/supervisor/services/ and images/chromium-headful/supervisor/services/.

How to verify locally (Docker)

These steps reproduce the smoke matrix from PR #254. Container image is built with cd images/chromium-headless && ./build-docker.sh.

Heads up: every test below assumes telemetry is configured. TelemetrySession.Publish drops every event when no session is active, so without the PUT /telemetry step you'll see nothing on the SSE stream and silently get false negatives. The shared setup block does this once.

# Start the container detached (the script's run-docker.sh hardcodes -it).
docker run -d --rm --name chromium-headless-test \
  --platform linux/amd64 --privileged --tmpfs /dev/shm:size=128m \
  -p 9222:9222 -p 444:10001 \
  onkernel/chromium-headless-test:latest

# Wait for the API.
sleep 10 && curl -sf http://localhost:444/spec.json >/dev/null && echo "API up"

# Configure telemetry so a session is active. The empty body captures every
# browser category — system events flow regardless because Start force-includes
# them. (Setting all browser categories to enabled:false is treated as
# "clear the configuration" and tears the session down — don't do that here.)
curl -sf -X PUT http://localhost:444/telemetry \
  -H 'content-type: application/json' -d '{}'

# Open the SSE stream in another shell to watch events in real time.
curl -sN http://localhost:444/telemetry/stream

service_crashed (phase=running)

# Kill the chromium browser process the launcher actually spawned.
docker exec chromium-headless-test supervisorctl signal KILL chromium
# Expect one service_crashed event with phase=running.

service_crashed (phase=gave_up)

# Install a deliberately failing service.
docker exec chromium-headless-test bash -c 'cat > /etc/supervisor/conf.d/services/flaky.conf <<EOF
[program:flaky]
command=/bin/sh -c "echo flaky starting; sleep 0.1; exit 1"
autostart=false
autorestart=true
startsecs=1
startretries=3
stdout_logfile=/var/log/supervisord/flaky
redirect_stderr=true
EOF
supervisorctl reread && supervisorctl update'

docker exec chromium-headless-test supervisorctl start flaky
# Wait ~15 s for supervisord to exhaust the 3 retries and transition to FATAL.
# Expect one service_crashed event with phase=gave_up and no pid.

Clean stop is suppressed (negative test)

docker exec chromium-headless-test supervisorctl stop chromium
# Expect NO new SSE event (only the 15 s keepalive ":" frame).

system_oom_kill (synthetic kmsg injection)

docker exec chromium-headless-test bash -c '
for line in \
  "chromium invoked oom-killer: gfp_mask=0x100cca, order=0, oom_score_adj=0" \
  "CPU: 2 PID: 9999 Comm: chromium Not tainted 5.15.0-1-amd64 #1" \
  "Mem-Info:" \
  " free:4560 free_pcp:0 free_cma:0" \
  "524288 pages RAM" \
  "Tasks state (memory values in pages):" \
  "[   1234]  1000  1234  1308611  1205975  1205675   200   100   9678848   0   0 chromium" \
  "oom-kill:constraint=CONSTRAINT_NONE,task=chromium,pid=1234,uid=1000" \
  "Out of memory: Killed process 1234 (chromium) total-vm:5234572kB, anon-rss:4823900kB, file-rss:100kB, shmem-rss:200kB, UID:1000 pgtables:9678848kB oom_score_adj:0"
do
  echo "<6>$line" > /dev/kmsg
done'
# Expect one system_oom_kill event with constraint=none, mem_total_kb=2097152,
# top_tasks[0].name="chromium", trigger_process_name="chromium".

system_oom_kill (real cgroup OOM)

docker rm -f chromium-headless-test
# 512 MB cap keeps the API itself alive while letting Chrome OOM.
docker run -d --rm --name chromium-headless-test \
  --platform linux/amd64 --privileged --tmpfs /dev/shm:size=128m \
  --memory 512m --memory-swap 512m \
  -p 9222:9222 -p 444:10001 \
  onkernel/chromium-headless-test:latest

# Re-apply the telemetry PUT — the respawn dropped the previous session.
sleep 10 && curl -sf -X PUT http://localhost:444/telemetry \
  -H 'content-type: application/json' -d '{}'

# Run a memory hog inside.
docker exec chromium-headless-test python3 -c '
import sys, time
chunks=[]
while True:
    chunks.append(b"x"*(60*1024*1024)); sys.stdout.write(f"{len(chunks)*60}MB\n"); sys.stdout.flush(); time.sleep(0.3)
'
# Expect system_oom_kill events with constraint=memcg, and mem_total_kb /
# mem_free_kb omitted (the kernel skips the global Mem-Info dump on memcg
# OOMs). Sanity-check top_tasks names: they should be single tokens
# (`chromium`, `python3`). If they include numbers or extra columns, the
# Tasks-state regex in kmsg.go needs updating for the current kernel.

How to verify in production (real Linux 6.x VM)

This procedure asserts on sysmon's internal log lines, which fire regardless of telemetry config. To also see the event reach a downstream consumer, configure telemetry on the session first (same PUT /telemetry body as the local-Docker setup, against the session's metro-api telemetry endpoint).

# Spin up a browser session.
kernel browsers create
# Note the session ID, then exec into the VM.

# Confirm sysmon is running.
kernel browsers process exec <session_id> -- /bin/bash -c \
  'tail -50 /var/log/supervisord/kernel-images-api | grep sysmon'
# Look for: "sysmon: kmsg OOM reader started"

# Trigger an OOM (kills the highest-oom_score process; expect chromium).
kernel browsers process exec <session_id> -- /bin/bash -c \
  'echo 1 > /proc/sys/kernel/sysrq; echo f > /proc/sysrq-trigger'

# Verify sysmon parsed and emitted the event.
kernel browsers process exec <session_id> -- /bin/bash -c \
  'tail -50 /var/log/supervisord/kernel-images-api | grep "sysmon: emitted system_oom_kill"'

# Clean up.
kernel browsers delete <session_id>

Known limitations

API self-crash is invisible to sysmon. If kernel-images-api itself dies, the shim's POST fails (connection refused) and that event is lost. The host platform's process/VM-level monitoring is the layer that catches it. Closing the gap inside this binary would require persistent shim-side buffering and is out of scope.
process_name is truncated to 15 chars. This is the kernel's TASK_COMM_LEN-1 limit, not a parser bug. kernel-images-api shows up as kernel-images-a.
Page size is hard-coded to 4 KiB. Correct on x86_64; would be wrong on ARM 16K/64K page kernels.
mem_total_kb / mem_free_kb are omitted on memcg OOMs. The kernel does not emit the global pages RAM / free:N lines when the OOM is cgroup-scoped. This is correct behavior, documented in the openapi schema.
trigger_* fields are absent on sysrq-triggered OOMs. The X invoked oom-killer: opener line is only emitted on allocation-driven kills. Real allocation OOMs always populate these fields; only the synthetic sysrq f test path doesn't.
No de-dup between kmsg and supervisord. If a Chrome OOM both fires kmsg (system_oom_kill) and causes supervisord to notice the exit (service_crashed), both events fire. The overlap is itself a useful signal (RAM exhaustion vs. process bug).

Where to look when things break

Symptom	First place to check
No `system_oom_kill` events in prod	API logs for `sysmon: kmsg OOM monitor disabled` — indicates `/dev/kmsg` open failed
`system_oom_kill` events have corrupt `top_tasks` names	`oomTaskEntryRe` in `kmsg.go` — kernel changed the Tasks-state column layout again
`system_oom_kill` events missing fields after a kernel upgrade	each `oom*Re` regex in `kmsg.go` — sections may have been renamed in the kernel
No `service_crashed` events	`cat /var/log/supervisord/supervisord-shim` inside the container; check for `connection refused` to the API
Shim looping (supervisord shows repeated spawn)	the shim should never enter FATAL because `startretries=999999`; if it does, check `/var/log/supervisord.log` for spawn errors
sysmon log says "emitted system_oom_kill" but no SSE event	check `GET /telemetry` — without an active session both producers are dropped at `TelemetrySession.Publish`. Issue a `PUT /telemetry` and re-trigger.
Events admitted by `TelemetrySession` but don't reach downstream consumers	check the SSE / S2 pipeline (`POST /telemetry/events` → `TelemetrySession.Publish` → `EventStream.Publish` → SSE / S2) — that's `lib/events` territory, not sysmon

Documentation ¶

Rendered for

Overview ¶

kmsg.go parses the kernel OOM-dump text inside /dev/kmsg messages. The wire envelope is handled by euank/go-kmsg-parser; body text parsing is ours.

Format stability across kernel versions:

"Killed process N (name) anon-rss:... file-rss:... shmem-rss:..." is unchanged since 2.6.x.
"oom-kill:constraint=CONSTRAINT_X" appeared in 5.0 and is stable since. Absent on older kernels; Constraint is omitted.
"N pages RAM" and "free:N free_pcp:N free_cma:N" are stable.
Tasks-state row gained rss_anon/rss_file/rss_shmem in 5.14 (9-col → 12-col). We anchor on bracketed pid + rss-as-5th-col
trailing-token name so both layouts parse. Production is Linux 6.12; the older layout is kept for dev environments.

On format breakage the failure mode is graceful: missing fields are omitted from the published event, and oomScannerWatchdog abandons a stuck section without leaking memory.

Package sysmon emits VM-internal failure telemetry — OOM kills surfaced through /dev/kmsg, and (via the supervisord-shim binary POSTing to the telemetry HTTP endpoint) supervised-service crashes.

The package only owns the in-process kmsg reader; service crashes are delivered as ordinary caller-published events via POST /telemetry/events from the shim.

Index ¶

type KmsgMessage
type Monitor
- func New(publish PublishFunc, logger *slog.Logger, opts ...option) *Monitor
- func (m *Monitor) Start(ctx context.Context) error
- func (m *Monitor) Wait()
type OomInstance
type PublishFunc
type TaskMemSnapshot

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type KmsgMessage ¶

type KmsgMessage struct {
	Timestamp time.Time
	Body      string
}

KmsgMessage is the minimal subset of a /dev/kmsg record that the OOM state machine consumes. Decoupling from the underlying kmsg library lets the parser run portably under unit tests; the production wiring lives in kmsg_linux.go.

type Monitor ¶

type Monitor struct {
	// contains filtered or unexported fields
}

Monitor runs the in-process sysmon goroutine and hands each event off to the configured publish func.

func New ¶

func New(publish PublishFunc, logger *slog.Logger, opts ...option) *Monitor

New constructs a Monitor. The Monitor does nothing until Start is called.

func (*Monitor) Start ¶

func (m *Monitor) Start(ctx context.Context) error

Start opens the kmsg source (validating that it is usable) and launches the background OOM reader goroutine. It returns an error if the kmsg source cannot be opened; the goroutine then never starts and the caller can decide whether the failure is fatal.

Start must be called at most once per Monitor. Calling it twice would spawn two readers racing on the same kmsg channel and corrupt the OOM state machine. Callers needing a restart should construct a new Monitor.

The goroutine shuts down when ctx is cancelled; Wait blocks until it returns.

func (*Monitor) Wait ¶

func (m *Monitor) Wait()

Wait blocks until all goroutines launched by Start have returned.

type OomInstance ¶

type OomInstance struct {
	// ProcessName is the comm of the killed process, bounded to 15 chars
	// by the kernel (TASK_COMM_LEN-1). May contain spaces.
	ProcessName string
	// Pid is the PID of the killed process.
	Pid int
	// RssKb is the sum of anon-rss, file-rss, and shmem-rss in KiB.
	// Zero if the kernel format did not include the per-class breakdown.
	RssKb int
	// Constraint is one of "none", "memcg", "cpuset", "memory_policy",
	// extracted from the structured `oom-kill:` line that kernels >= 5.0
	// emit. Empty on older kernels.
	Constraint string
	// MemTotalKb is the total system memory at the time of the kill,
	// derived from the `N pages RAM` line. Zero if not parseable.
	MemTotalKb int
	// MemFreeKb is free memory at the time of the kill, derived from the
	// `free:N` field in Mem-Info. Zero if not parseable.
	MemFreeKb int
	// TopTasks is up to topTasksN processes from the Tasks state table,
	// sorted by RSS descending. Nil if the kernel did not emit the table.
	TopTasks []TaskMemSnapshot
	// TriggerProcessName is the comm of the process whose allocation
	// failed and caused the OOM-killer to run. Captured from the prefix
	// of the "invoked oom-killer:" line. Often equal to ProcessName but
	// can differ when the kernel selected a different victim.
	TriggerProcessName string
	// TriggerPid is the PID of the triggering process, captured from
	// the standard "CPU: N PID: N Comm: ..." header line. Zero if the
	// kernel did not emit that header.
	TriggerPid int
	// TimeOfDeath is the timestamp of the closing "Killed process" line
	// as reported by the kmsg envelope.
	TimeOfDeath time.Time
}

OomInstance is a parsed kernel OOM-killer event extracted from /dev/kmsg.

Fields map to BrowserSystemOomKillEventData. Optional fields use the zero value when the kernel did not emit the corresponding kmsg line; the publisher decides whether to encode them.

type PublishFunc ¶

type PublishFunc func(events.Event) (events.Envelope, bool)

PublishFunc receives events emitted by the Monitor. Production callers wire this to TelemetrySession.Publish so events are gated by the active telemetry config; the Monitor itself ignores the returns.

type TaskMemSnapshot ¶

type TaskMemSnapshot struct {
	Pid   int
	Name  string
	RssKb int
}

TaskMemSnapshot is one row from the kernel's Tasks state dump, representing a single process's memory footprint at the moment of the OOM kill.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL