Documentation
¶
Overview ¶
Package memwatch provides a user-space memory watchdog that triggers an orderly shutdown before the kernel OOM-killer sends SIGKILL.
Motivation ¶
elastickv runs in memory-constrained containers (e.g. 3GB RAM VMs). Go's runtime is unaware of the container/host memory limit, and even with GOMEMLIMIT set the process can still lose the race against the kernel OOM-killer under sustained memtable/goroutine growth. A SIGKILL leaves the Raft WAL potentially truncated mid-operation; a cooperative SIGTERM path lets the node sync the WAL and stop raft cleanly, avoiding the election storms and lease loss that follow crash-restarts.
The watcher samples runtime/metrics at a fixed cadence. When the live heap-in-use byte count crosses the configured threshold it invokes OnExceed once and exits. The watcher never calls os.Exit or sends signals itself; callers wire OnExceed to the existing shutdown path (typically a root context.CancelFunc).
Wiring in elastickv (see main.go):
ctx, cancel := context.WithCancel(context.Background())
// ... build runtimes, servers, errgroup ...
w := memwatch.New(memwatch.Config{
ThresholdBytes: threshold,
PollInterval: pollInterval,
OnExceed: func() {
memoryPressureExit.Store(true) // flips exit code to 2
cancel() // fires the same shutdown path
}, // SIGTERM would use.
})
eg.Go(func() error { w.Start(runCtx); return nil })
Metric choice ¶
We sample `runtime/metrics` (Go 1.16+) rather than `runtime.ReadMemStats`. ReadMemStats triggers a stop-the-world pause proportional to the number of goroutines and heap size; at 1 s cadence that's typically negligible, but at a tighter MinPollInterval (10 ms) it begins to register. runtime/metrics readers are lock-free for the counters we need and do not stop the world.
The threshold is compared against
/memory/classes/heap/objects:bytes + /memory/classes/heap/unused:bytes
which is the runtime/metrics equivalent of MemStats.HeapInuse: bytes held in heap spans that are currently allocated from the OS, including span overhead, but excluding pages the runtime has released back. RSS from /proc/self/status is more accurate but requires a read syscall on every poll and is not what the Go allocator itself tracks. We deliberately do NOT compare against "total heap classes" (which includes released memory already returned to the OS) or "heap/objects" alone (which misses span fragmentation that the OOM-killer sees).
Index ¶
Constants ¶
const DefaultPollInterval = time.Second
DefaultPollInterval is the polling cadence used when Config.PollInterval is zero. One second is frequent enough to catch fast-growing memtables before the kernel kills the process, and infrequent enough that even aggressive log rollups don't observe the watcher as a hot sampler.
const MinPollInterval = 10 * time.Millisecond
MinPollInterval is the floor enforced by New. runtime/metrics reads are cheap but a sub-10ms cadence produces no detection benefit over 10ms (memory pressure does not move that fast on these VMs) and would churn the ticker for no gain.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct {
// ThresholdBytes is the heap-in-use threshold in bytes. When the
// sampled heap-in-use crosses this value the watcher invokes OnExceed
// exactly once and returns. A zero value disables the watcher entirely
// (Start returns immediately).
ThresholdBytes uint64
// PollInterval is how often the metrics are sampled. Defaults to
// DefaultPollInterval when zero; values below MinPollInterval are
// clamped up to MinPollInterval.
PollInterval time.Duration
// OnExceed is called at most once, from the watcher's own goroutine,
// when the threshold is crossed. It must be non-blocking or at least
// must not block the caller indefinitely (the watcher returns
// immediately after invocation regardless). Typical implementations
// cancel a root context and flag a process-wide exit-code sentinel.
OnExceed func()
// Logger, if non-nil, receives a single structured log line when the
// threshold is crossed. When nil, slog.Default() is used.
Logger *slog.Logger
}
Config configures a Watcher.
type Watcher ¶
type Watcher struct {
// contains filtered or unexported fields
}
Watcher polls process memory and fires OnExceed once, when heap-in-use crosses the configured threshold. Callers get a single-shot notification and are expected to initiate graceful shutdown; Watcher does not call os.Exit or send signals itself.
func New ¶
New constructs a Watcher from the given Config. The Watcher does not start polling until Start is called.
func (*Watcher) Done ¶
func (w *Watcher) Done() <-chan struct{}
Done returns a channel that is closed when Start returns. Tests can use it to assert the watcher goroutine actually exits (no leak) after ctx cancel or OnExceed.
func (*Watcher) Start ¶
Start runs the watchdog loop. It returns when ctx is cancelled, when OnExceed has fired, or immediately when ThresholdBytes is zero (the watcher is disabled). It is safe to call Start at most once per Watcher; subsequent calls return immediately because the done channel has already been closed.