English | δΈζ
πΎ catpaw
catpaw is a lightweight monitoring agent with AI-powered diagnostics.
It detects anomalies through plugin-based checks, produces standardized events, and β when an alert fires β can automatically trigger AI root-cause analysis using 70+ built-in diagnostic tools.
Events can be forwarded to any alert platform (Flashduty, PagerDuty, or any HTTP endpoint), or simply printed to the console for quick validation.
β¨ Key Features
- πͺΆ Lightweight, zero heavy dependencies β single binary, easy to deploy
- π Plugin-based monitoring β 25+ check plugins, enable only what you need
- π€ AI-powered diagnosis β automatic root-cause analysis triggered by alerts
- π¬ Interactive AI chat β troubleshoot issues conversationally with AI + tools
- π©Ί Proactive health inspection β on-demand AI-driven health checks
- π οΈ 70+ diagnostic tools β system, network, storage, security, process, kernel
- π MCP integration β connect external data sources (Prometheus, Jaeger, CMDB, etc.) via Model Context Protocol
- π‘ Flexible notification β console, generic WebAPI, Flashduty, PagerDuty, or any combination
- π Self-monitoring friendly β ideal for monitoring your monitoring systems
ποΈ Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β catpaw agent β
β β
β βββββββββββββββ alert ββββββββββββββββ AI + Tools β
β β 25+ Check β ββββββββββ β AI Diagnose β βββββββββββββββ β
β β Plugins β trigger β Engine β β β
β ββββββββ¬βββββββ ββββββββββββββββ β β
β β βΌ β
β β events ββββββββββββββββ βββββββββββββββββ β
β βββββββββββ β Notifiers β β 70+ Diagnose β β
β β (multiple) β β Tools β β
β ββββββββββββββββ βββββββββ¬ββββββββ β
β β β
β βββββββββββββββ ββββββββββ΄ββββββββ β
β β AI Chat β βββββ interactive ββββββββ β MCP External β β
β β (CLI) β troubleshoot β Data Sources β β
β βββββββββββββββ ββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Check Plugins
| Plugin |
Description |
cert |
TLS certificate expiry check (remote TLS + local files; STARTTLS, SNI, glob) |
conntrack |
Linux conntrack table usage β prevent silent packet drops |
cpu |
CPU utilization and per-core normalized load average |
disk |
Disk space, inode, and writability check |
dns |
DNS resolution check |
docker |
Docker container monitoring (state, restart, health, CPU/mem) |
exec |
Run scripts/commands to produce events (JSON and Nagios modes) |
filecheck |
File existence, mtime, and checksum check |
filefd |
System-level file descriptor usage (Linux) |
http |
HTTP availability, status code, response body, cert expiry |
journaltail |
Incremental journalctl log reading with keyword matching (Linux) |
logfile |
Log file monitoring (offset tracking, rotation, glob, multi-encoding) |
mem |
Memory and swap usage check |
mount |
Mount point baseline (fs type, options compliance; Linux) |
neigh |
ARP/neighbor table usage β prevent new-IP failures (K8s) |
net |
TCP/UDP connectivity and response time |
netif |
Network interface health (link state, error/drop delta; Linux) |
ntp |
NTP sync, clock offset, stratum (Linux) |
ping |
ICMP reachability, packet loss, latency |
procfd |
Per-process fd usage β prevent nofile exhaustion |
procnum |
Process count check (multiple lookup methods) |
scriptfilter |
Script output filter-rule matching |
secmod |
SELinux/AppArmor baseline (Linux) |
sockstat |
TCP listen queue overflow detection (Linux) |
sysctl |
Kernel parameter baseline β detect silent resets (Linux) |
systemd |
systemd service status (Linux) |
tcpstate |
TCP state monitoring (CLOSE_WAIT/TIME_WAIT; Netlink; Linux) |
uptime |
Unexpected reboot detection |
zombie |
Zombie process detection |
When AI diagnosis is triggered (by alert, inspection, or chat), the AI agent has access to a rich toolkit:
βοΈ System & Process: CPU top, memory breakdown, OOM history, cgroup limits, process threads (with wchan), open files, environment variables, PSI pressure
π Network: ping, traceroute, DNS resolve, ARP neighbors, TCP connection states, socket details (RTT/cwnd), retransmission rate, connection latency summary, listen queue overflow, TCP tuning check, softnet stats, route table, IP addresses, interface stats, firewall rules
πΎ Storage: disk I/O latency, block device topology, LVM status, mount info
π Kernel & Security: dmesg, interrupts distribution, conntrack stats, NUMA stats, thermal zones, sysctl snapshot, SELinux/AppArmor status, coredump list
π Logs: log tail, log grep (with pattern matching), journald query
π³ Services: systemd service status, failed services list, timer list, Docker ps/inspect
π Remote plugins (Redis, etc.) contribute their own specialized diagnostic tools for deep introspection.
π MCP external tools: Connect Prometheus, Jaeger, CMDB, or any MCP-compatible data source β the AI automatically discovers and uses their tools.
π₯οΈ CLI Commands
catpaw run [flags] # Start the monitoring agent
catpaw chat [-v] # Interactive AI chat for troubleshooting
catpaw inspect <plugin> [target] # Proactive AI health inspection
catpaw diagnose list|show <id> # View past diagnosis records
catpaw selftest [filter] [-q] # Smoke-test all diagnostic tools
catpaw mcptest # Test MCP server connections
π Quick Start
π¦ Installation
Download the binary from GitHub Releases.
Basic Monitoring
- Enable plugin configs under
conf.d/p.<plugin>/
- Start:
./catpaw run
The default config enables [notify.console], so events are printed to the terminal with colored output β no external service needed for a quick test.
π‘ Event Notification
catpaw supports multiple notification channels. Configure one or more in conf.d/config.toml:
| Channel |
Config Section |
Description |
| Console |
[notify.console] |
Print events to terminal (enabled by default) |
| WebAPI |
[notify.webapi] |
Push raw Event JSON to any HTTP endpoint |
| Flashduty |
[notify.flashduty] |
Forward to Flashduty alert platform |
| PagerDuty |
[notify.pagerduty] |
Forward to PagerDuty incident management |
Multiple channels can be active simultaneously. For example, you can print to console for debugging while also forwarding to your alert platform.
Console (default β for quick validation):
[notify.console]
enabled = true
WebAPI (push raw Event JSON to any HTTP endpoint):
[notify.webapi]
url = "https://your-service.example.com/api/v1/events"
# method = "POST"
# timeout = "10s"
[notify.webapi.headers]
Authorization = "Bearer ${WEBAPI_TOKEN}"
Flashduty:
[notify.flashduty]
integration_key = "your-integration-key"
PagerDuty:
[notify.pagerduty]
routing_key = "your-routing-key"
π€ AI Diagnosis (optional)
Add to conf.d/config.toml:
[ai]
enabled = true
model_priority = ["default"]
[ai.models.default]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"
model = "gpt-4o"
Now when alerts fire, AI automatically analyzes root cause using built-in diagnostic tools.
π¬ Interactive Chat
./catpaw chat
Ask questions like "Why is CPU high?" or "Check disk I/O latency" β the AI uses diagnostic tools and shell commands (with confirmation) to investigate.
π MCP External Data Sources (optional)
Connect Prometheus, Jaeger, or other MCP servers for AI to query historical metrics, traces, etc.:
[ai.mcp]
enabled = true
[[ai.mcp.servers]]
name = "prometheus"
command = "/usr/local/bin/mcp-prometheus"
args = ["serve"]
identity = 'instance="${IP}:9100"'
[ai.mcp.servers.env]
PROMETHEUS_URL = "http://127.0.0.1:9090"
[[ai.mcp.servers]]
name = "nightingale"
command = "npx"
args = ["-y", "@n9e/n9e-mcp-server", "stdio"]
identity = 'ident="${HOSTNAME}"'
tools_allow = []
[ai.mcp.servers.env]
N9E_TOKEN = "480c04ed-ebe7-4266-xxxx-f8daf7819a6d"
N9E_BASE_URL = "http://127.0.0.1:17000"
Verify connectivity:
./catpaw mcptest
βοΈ Configuration
- Global config:
conf.d/config.toml
- Plugin configs:
conf.d/p.<plugin>/*.toml (multiple files merged on load)
- Hot-reload plugin configs with
SIGHUP:
kill -HUP $(pidof catpaw)
π Documentation
WeChat: add picobyte and mention catpaw to join the group.