kubernaut

module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2026 License: Apache-2.0

README

Kubernaut

AIOps Platform for Intelligent Kubernetes Remediation

Go Report Card Go Version Kubernetes License CI

Kubernaut closes the loop from Kubernetes alert to automated remediation. When something goes wrong in your cluster, Kubernaut detects the signal, sends it to an LLM for live root cause investigation with real kubectl access, selects a remediation workflow, and executes the fix — or escalates to a human with a full RCA when it can't.

Full Documentation · Demo Scenarios


Why

Kubernetes operators spend hours manually triaging alerts, diagnosing root causes from scattered logs and metrics, and executing remediation steps from runbooks that drift out of date. The response depends on tribal knowledge, human availability, and often happens at 3am.

Rule-based remediation tools help with known, deterministic problems — "if X, do Y." But when the same symptom has multiple root causes, or the right fix depends on context the rule can't see, they fall short.

Kubernaut bridges that gap. It uses an LLM to investigate the actual root cause with live cluster access, selects the right remediation from a workflow catalog, executes it, and verifies the fix worked — escalating to humans only when it should. Rule-based tools are thermostats. Kubernaut is a diagnostician that also adjusts the thermostat.

Why Kubernaut? — full comparison with rule-based tools


What It Does

  • Detects — Ingests Prometheus AlertManager alerts and Kubernetes Events, validates resource scope, and deduplicates by fingerprint
  • Investigates — HolmesGPT performs live root cause analysis using Kubernetes inspection tools, configurable observability toolsets (Prometheus, etc.), and remediation history
  • Remediates — Selects and executes a workflow from a searchable catalog via Tekton Pipelines, Kubernetes Jobs, or Ansible (AWX/AAP), with optional human approval gates
  • Closes the loop — Notifies the team (Slack, console), evaluates whether the fix worked via health checks, alert resolution, and spec hash drift detection, and feeds effectiveness scores back into future investigations
Architecture

Kubernaut Layered Architecture


Roadmap

v1.2 — Operational Resilience and Security Hardening (current)
  • Per-workflow ServiceAccount — Each remediation workflow runs under its own SA with least-privilege RBAC, replacing the shared default
  • Short-lived token injection — Ansible executor uses Kubernetes TokenRequest API with configurable TTL instead of long-lived secrets
  • PVC-wipe resilience — Deterministic workflow IDs and startup reconciliation recover the workflow catalog automatically after data loss
  • Smarter effectiveness assessment — Partial vs full assessment paths based on actual workflow completion, with configurable Prometheus lookback and concurrency
  • CRD schema hardening — Typed enums across all 9 CRDs with OpenAPI validation at admission time
  • Hash-capture degradation visibility — Explicit conditions and notification enrichment when spec hash capture fails

See CHANGELOG.md for the complete list.

v1.3 — Go Unification and Enterprise Distribution (next)
  • HAPI Go rewrite — Full reimplementation of the HolmesGPT-API service in Go, eliminating the Python runtime dependency
  • Mock LLM Go rewrite — DAG-based conversation engine with declarative YAML scenarios and fault injection for resilience testing
  • Kubernaut Operator — OLM-packaged operator for OperatorHub distribution on OpenShift and vanilla Kubernetes
  • Inter-pod TLS — Encrypted communication between all internal services
  • Audit event retention — Automated deletion of expired audit events
  • Label-based notification routing — Route notifications to signal and RCA target resource owners

Track progress on the v1.3 milestone.


Installation

See the Installation Guide for prerequisites, configuration, and deployment instructions.


Documentation

Resource Link
User & Operator Guide jordigilh.github.io/kubernaut-docs
Architecture Overview Architecture
Developer Guide docs/DEVELOPER_GUIDE.md
Must-Gather Diagnostics cmd/must-gather/README.md

Repository Description
kubernaut-docs Documentation website (MkDocs Material)
kubernaut-demo-scenarios Demo scenarios, scripts, and recordings

Development

make build-all          # Build all services
make test-tier-unit     # Run unit tests
make test-all-gateway   # Run all test tiers for a service

We use Ginkgo/Gomega BDD for testing and follow a TDD workflow. See the Developer Guide for environment setup, build targets, and test commands.


Contributing

See CONTRIBUTING.md for guidelines. In short: create a feature branch, implement with tests, update docs, and open a PR.


License

Apache License 2.0 — see LICENSE.


Issues: GitHub Issues · Discussions: GitHub Discussions

Kubernaut — From alert to remediation, intelligently.

Directories

Path Synopsis
api
actiontype/v1alpha1
Package v1alpha1 contains API Schema definitions for the ActionType v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the ActionType v1alpha1 API group.
aianalysis/v1alpha1
Package v1alpha1 contains API Schema definitions for the aianalysis v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the aianalysis v1alpha1 API group.
effectivenessassessment/v1alpha1
Package v1alpha1 contains API Schema definitions for the effectivenessassessment v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the effectivenessassessment v1alpha1 API group.
notification/v1alpha1
Package v1alpha1 contains API Schema definitions for the notification v1alpha1 API group +kubebuilder:object:generate=true +groupName=kubernaut.ai
Package v1alpha1 contains API Schema definitions for the notification v1alpha1 API group +kubebuilder:object:generate=true +groupName=kubernaut.ai
remediation/v1alpha1
Package v1alpha1 contains API Schema definitions for the remediation v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the remediation v1alpha1 API group.
remediationworkflow/v1alpha1
Package v1alpha1 contains API Schema definitions for the RemediationWorkflow v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the RemediationWorkflow v1alpha1 API group.
signalprocessing/v1alpha1
Package v1alpha1 contains API Schema definitions for the signalprocessing v1alpha1 API group.
Package v1alpha1 contains API Schema definitions for the signalprocessing v1alpha1 API group.
workflowexecution/v1alpha1
Package v1alpha1 contains API Schema definitions for the workflowexecution v1alpha1 API group +kubebuilder:object:generate=true +groupName=kubernaut.ai
Package v1alpha1 contains API Schema definitions for the workflowexecution v1alpha1 API group +kubebuilder:object:generate=true +groupName=kubernaut.ai
cmd
aianalysis command
Package main is the entry point for the AIAnalysis controller.
Package main is the entry point for the AIAnalysis controller.
authwebhook command
datastorage command
gateway command
notification command
signalprocessing command
======================================== SIGNAL PROCESSING CONTROLLER (DD-006) ========================================
======================================== SIGNAL PROCESSING CONTROLLER (DD-006) ========================================
internal
controller/aianalysis
Package aianalysis implements the AIAnalysis CRD controller.
Package aianalysis implements the AIAnalysis CRD controller.
controller/effectivenessmonitor
Package controller provides the Kubernetes controller for EffectivenessAssessment CRDs.
Package controller provides the Kubernetes controller for EffectivenessAssessment CRDs.
controller/remediationorchestrator
Package controller provides the Kubernetes controller for RemediationRequest CRDs.
Package controller provides the Kubernetes controller for RemediationRequest CRDs.
controller/signalprocessing
Package signalprocessing provides interfaces for signal processing components.
Package signalprocessing provides interfaces for signal processing components.
controller/workflowexecution
Package workflowexecution provides failure analysis for Tekton PipelineRun failures.
Package workflowexecution provides failure analysis for Tekton PipelineRun failures.
pkg
aianalysis
Package aianalysis implements the AIAnalysis CRD controller.
Package aianalysis implements the AIAnalysis CRD controller.
aianalysis/audit
Package audit provides audit event generation for the AIAnalysis controller.
Package audit provides audit event generation for the AIAnalysis controller.
aianalysis/handlers
Package handlers provides phase-specific handlers for AIAnalysis reconciliation.
Package handlers provides phase-specific handlers for AIAnalysis reconciliation.
aianalysis/metrics
Package metrics provides Prometheus metrics for the AIAnalysis controller.
Package metrics provides Prometheus metrics for the AIAnalysis controller.
aianalysis/phase
Package phase provides phase constants and state machine logic for AIAnalysis.
Package phase provides phase constants and state machine logic for AIAnalysis.
aianalysis/rego
Package rego provides Rego policy evaluation for AIAnalysis approval decisions.
Package rego provides Rego policy evaluation for AIAnalysis approval decisions.
audit
Package audit provides shared audit event types and utilities for all Kubernaut services.
Package audit provides shared audit event types and utilities for all Kubernaut services.
authwebhook/config
Package config provides configuration types for the AuthWebhook admission controller.
Package config provides configuration types for the AuthWebhook admission controller.
datastorage/metrics
Package metrics provides Prometheus metrics for the Data Storage service.
Package metrics provides Prometheus metrics for the Data Storage service.
datastorage/ogen-client
Code generated by ogen, DO NOT EDIT.
Code generated by ogen, DO NOT EDIT.
datastorage/reconstruction
Package reconstruction provides RemediationRequest CRD reconstruction from audit traces.
Package reconstruction provides RemediationRequest CRD reconstruction from audit traces.
datastorage/repository
Package repository provides data access for the DataStorage service.
Package repository provides data access for the DataStorage service.
datastorage/repository/sqlutil
Package sqlutil provides utility functions for working with database operations.
Package sqlutil provides utility functions for working with database operations.
datastorage/repository/txretry
Package txretry provides retry logic for PostgreSQL serializable transactions.
Package txretry provides retry logic for PostgreSQL serializable transactions.
datastorage/server
Adapter that bridges the repository.RemediationHistoryRepository to the server.RemediationHistoryQuerier interface, converting EffectivenessEventRow (repository package) to EffectivenessEvent (server package).
Adapter that bridges the repository.RemediationHistoryRepository to the server.RemediationHistoryQuerier interface, converting EffectivenessEventRow (repository package) to EffectivenessEvent (server package).
datastorage/validation
Package validation provides validation logic for workflow registration.
Package validation provides validation logic for workflow registration.
effectivenessmonitor/alert
Package alert provides the alert resolution scorer for the Effectiveness Monitor.
Package alert provides the alert resolution scorer for the Effectiveness Monitor.
effectivenessmonitor/audit
Package audit provides audit event construction for the Effectiveness Monitor.
Package audit provides audit event construction for the Effectiveness Monitor.
effectivenessmonitor/client
Package client defines interfaces for external dependencies of the Effectiveness Monitor.
Package client defines interfaces for external dependencies of the Effectiveness Monitor.
effectivenessmonitor/conditions
Package conditions provides condition helpers for the EffectivenessAssessment CRD.
Package conditions provides condition helpers for the EffectivenessAssessment CRD.
effectivenessmonitor/config
Package config provides configuration parsing and validation for the Effectiveness Monitor.
Package config provides configuration parsing and validation for the Effectiveness Monitor.
effectivenessmonitor/hash
Package hash provides the spec hash computation and comparison for the Effectiveness Monitor.
Package hash provides the spec hash computation and comparison for the Effectiveness Monitor.
effectivenessmonitor/health
Package health provides the health check scorer for the Effectiveness Monitor.
Package health provides the health check scorer for the Effectiveness Monitor.
effectivenessmonitor/metrics
Package metrics provides Prometheus metrics for the Effectiveness Monitor.
Package metrics provides Prometheus metrics for the Effectiveness Monitor.
effectivenessmonitor/phase
Package phase provides phase constants and state machine logic for the Effectiveness Monitor.
Package phase provides phase constants and state machine logic for the Effectiveness Monitor.
effectivenessmonitor/startup
Package startup provides best-effort readiness checks for the EffectivenessMonitor.
Package startup provides best-effort readiness checks for the EffectivenessMonitor.
effectivenessmonitor/timing
Package timing provides pure functions for computing derived timing fields in the EffectivenessAssessment lifecycle.
Package timing provides pure functions for computing derived timing fields in the EffectivenessAssessment lifecycle.
effectivenessmonitor/types
Package types provides shared types for the Effectiveness Monitor.
Package types provides shared types for the Effectiveness Monitor.
effectivenessmonitor/validity
Package validity provides validity window logic for the Effectiveness Monitor.
Package validity provides validity window logic for the Effectiveness Monitor.
holmesgpt/client
Package client provides the HolmesGPT-API client
Package client provides the HolmesGPT-API client
http/cors
Package cors provides shared CORS configuration for Kubernaut HTTP services.
Package cors provides shared CORS configuration for Kubernaut HTTP services.
k8sutil
Package k8sutil provides utilities for Kubernetes client initialization.
Package k8sutil provides utilities for Kubernetes client initialization.
log
Package log provides a unified logging interface for all Kubernaut services.
Package log provides a unified logging interface for all Kubernaut services.
notification/delivery
Package delivery provides shared error types for notification delivery
Package delivery provides shared error types for notification delivery
notification/metrics
Package metrics provides Prometheus metrics for the Notification controller.
Package metrics provides Prometheus metrics for the Notification controller.
notification/phase
Package phase provides phase constants and state machine logic for Notification service.
Package phase provides phase constants and state machine logic for Notification service.
notification/routing
Package routing implements BR-NOT-065 (Channel Routing Based on Spec Fields) and BR-NOT-066 (Alertmanager-Compatible Configuration Format).
Package routing implements BR-NOT-065 (Channel Routing Based on Spec Fields) and BR-NOT-066 (Alertmanager-Compatible Configuration Format).
ogenx
Package ogenx provides utilities for working with ogen-generated OpenAPI clients.
Package ogenx provides utilities for working with ogen-generated OpenAPI clients.
pii
Package pii provides PII (Personally Identifiable Information) redaction utilities for SOC2 privacy compliance and data minimization.
Package pii provides PII (Personally Identifiable Information) redaction utilities for SOC2 privacy compliance and data minimization.
remediationapprovalrequest
Package remediationapprovalrequest provides condition helpers for the RemediationApprovalRequest CRD.
Package remediationapprovalrequest provides condition helpers for the RemediationApprovalRequest CRD.
remediationapprovalrequest/audit
Package audit provides audit event generation for RemediationApprovalRequest controller.
Package audit provides audit event generation for RemediationApprovalRequest controller.
remediationorchestrator
Package remediationorchestrator provides the central coordinator for the Kubernaut remediation lifecycle.
Package remediationorchestrator provides the central coordinator for the Kubernaut remediation lifecycle.
remediationorchestrator/audit
Package audit provides audit event manager for Remediation Orchestrator.
Package audit provides audit event manager for Remediation Orchestrator.
remediationorchestrator/config
Package config provides centralized configuration constants for Remediation Orchestrator.
Package config provides centralized configuration constants for Remediation Orchestrator.
remediationorchestrator/creator
Package creator provides child CRD creation logic for the Remediation Orchestrator.
Package creator provides child CRD creation logic for the Remediation Orchestrator.
remediationorchestrator/handler/skip
Package skip provides handlers for WorkflowExecution skip reasons.
Package skip provides handlers for WorkflowExecution skip reasons.
remediationorchestrator/helpers
Package helpers provides common helper utilities for Remediation Orchestrator.
Package helpers provides common helper utilities for Remediation Orchestrator.
remediationorchestrator/metrics
Package metrics provides Prometheus metrics for the Remediation Orchestrator.
Package metrics provides Prometheus metrics for the Remediation Orchestrator.
remediationorchestrator/phase
Package phase provides phase constants and state machine logic for RO.
Package phase provides phase constants and state machine logic for RO.
remediationorchestrator/routing
Package routing provides routing decision logic for RemediationOrchestrator.
Package routing provides routing decision logic for RemediationOrchestrator.
remediationorchestrator/timeout
Package timeout provides timeout detection for RemediationOrchestrator.
Package timeout provides timeout detection for RemediationOrchestrator.
remediationrequest
Package remediationrequest provides condition helpers for the RemediationRequest CRD.
Package remediationrequest provides condition helpers for the RemediationRequest CRD.
shared/audit
Package audit provides shared audit types for standardized error details.
Package audit provides shared audit types for standardized error details.
shared/auth
Package auth provides authentication and authorization interfaces and implementations for Kubernetes-based REST API services.
Package auth provides authentication and authorization interfaces and implementations for Kubernetes-based REST API services.
shared/backoff
Package backoff provides shared utilities for exponential backoff calculations.
Package backoff provides shared utilities for exponential backoff calculations.
shared/conditions
Package conditions provides shared utilities for Kubernetes Conditions across all CRD controllers.
Package conditions provides shared utilities for Kubernetes Conditions across all CRD controllers.
shared/dsclient
Package dsclient defines the unified Data Storage client interface, composing the per-concern interfaces currently scattered across pkg/audit and pkg/authwebhook.
Package dsclient defines the unified Data Storage client interface, composing the per-concern interfaces currently scattered across pkg/audit and pkg/authwebhook.
shared/events
Package events defines the authoritative Kubernetes Event reason constants for all Kubernaut CRD controllers.
Package events defines the authoritative Kubernetes Event reason constants for all Kubernaut CRD controllers.
shared/hash
Package hash provides a canonical hashing utility for Kubernetes resource specs.
Package hash provides a canonical hashing utility for Kubernetes resource specs.
shared/hotreload
Package hotreload provides generic ConfigMap hot-reloading functionality.
Package hotreload provides generic ConfigMap hot-reloading functionality.
shared/k8serrors
Package k8serrors provides helpers for classifying Kubernetes controller errors that lack typed error types in the K8s API or controller-runtime.
Package k8serrors provides helpers for classifying Kubernetes controller errors that lack typed error types in the K8s API or controller-runtime.
shared/sanitization
Package sanitization provides DD-005 compliant log sanitization utilities.
Package sanitization provides DD-005 compliant log sanitization utilities.
shared/scope
Package scope provides resource scope management for Kubernaut.
Package scope provides resource scope management for Kubernaut.
shared/types
Package types provides shared types used across multiple Kubernaut CRDs.
Package types provides shared types used across multiple Kubernaut CRDs.
signalprocessing
Package signalprocessing provides condition helpers for SignalProcessing CRDs.
Package signalprocessing provides condition helpers for SignalProcessing CRDs.
signalprocessing/audit
Package audit provides audit event generation for the SignalProcessing controller.
Package audit provides audit event generation for the SignalProcessing controller.
signalprocessing/cache
Package cache provides TTL-based caching for signal processing.
Package cache provides TTL-based caching for signal processing.
signalprocessing/config
Package config provides configuration types for the SignalProcessing controller.
Package config provides configuration types for the SignalProcessing controller.
signalprocessing/enricher
Package enricher provides Kubernetes context enrichment for signal processing.
Package enricher provides Kubernetes context enrichment for signal processing.
signalprocessing/evaluator
Package evaluator provides a unified OPA Rego evaluator for SignalProcessing.
Package evaluator provides a unified OPA Rego evaluator for SignalProcessing.
signalprocessing/handler
Package handler provides phase-specific handler logic extracted from the monolithic controller.
Package handler provides phase-specific handler logic extracted from the monolithic controller.
signalprocessing/metrics
Package metrics provides Prometheus metrics for the SignalProcessing controller.
Package metrics provides Prometheus metrics for the SignalProcessing controller.
signalprocessing/ownerchain
Package ownerchain provides K8s ownership chain traversal for enrichment.
Package ownerchain provides K8s ownership chain traversal for enrichment.
signalprocessing/phase
Package phase provides phase constants and state machine logic for SignalProcessing.
Package phase provides phase constants and state machine logic for SignalProcessing.
workflowexecution/audit
Package audit provides audit trail management for WorkflowExecution.
Package audit provides audit trail management for WorkflowExecution.
workflowexecution/client
Package client provides clients for querying external services from the WFE controller.
Package client provides clients for querying external services from the WFE controller.
workflowexecution/config
Package config provides configuration types for the WorkflowExecution controller.
Package config provides configuration types for the WorkflowExecution controller.
workflowexecution/executor
Package executor defines the Strategy pattern interface for workflow execution backends.
Package executor defines the Strategy pattern interface for workflow execution backends.
workflowexecution/metrics
Package metrics provides Prometheus metrics for workflow execution observability.
Package metrics provides Prometheus metrics for workflow execution observability.
workflowexecution/phase
Package phase provides phase constants and state machine logic for WorkflowExecution.
Package phase provides phase constants and state machine logic for WorkflowExecution.
workflowschema
Package workflowschema provides conversion functions between the WorkflowSchema (parser/DS model) and RemediationWorkflowSpec (CRD API type).
Package workflowschema provides conversion functions between the WorkflowSchema (parser/DS model) and RemediationWorkflowSpec (CRD API type).
test
infrastructure
Package infrastructure provides shared E2E test infrastructure for all services.
Package infrastructure provides shared E2E test infrastructure for all services.
shared/builders
Package builders provides test object builders for creating test fixtures.
Package builders provides test object builders for creating test fixtures.
shared/helpers
Package shared provides shared utilities for E2E tests across all services.
Package shared provides shared utilities for E2E tests across all services.
shared/integration
Package integration provides shared helpers for integration tests.
Package integration provides shared helpers for integration tests.
shared/validators
Package testutil provides test utilities for Kubernaut services.
Package testutil provides test utilities for Kubernaut services.
testutil
Package testutil provides shared test helpers for loading workflow fixtures and building type-safe workflow schema test inputs.
Package testutil provides shared test helpers for loading workflow fixtures and building type-safe workflow schema test inputs.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL