cerberus

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 5, 2026 License: MPL-2.0 Imports: 14 Imported by: 0

README

Cerberus

Lightweight drift detection watchdog for security-critical Go applications.

Cerberus polls registered probes at configurable intervals and emits a DriftEvent when the state hash of a resource changes. It detects — it never acts. What to do with a drift event is the caller's responsibility.

License: MPL-2.0 Requires: Go 1.22+


Design principles

  • Detect, never act. Side effects belong outside the watchdog.
  • Zero surprise dependencies. Only github.com/agilira/go-errors.
  • CPU-light. One base ticker goroutine; probes scheduled via a priority queue (O(log n) reschedule per probe, O(k) work per tick).
  • Panic-safe. A panicking probe emits ChangeError and does not crash the host process (CWE-440 mitigation).
  • Hot registration. RegisterProbe and UnregisterProbe are safe to call while the watchdog is running.
  • Context-aware. Every probe call is bounded by a configurable ProbeTimeout; a hung probe times out instead of starving others (CWE-770 mitigation).

Installation

go get github.com/agilira/cerberus

Quick start

watchdog := cerberus.New(cerberus.Config{
    BufferSize:   64,
    ProbeTimeout: time.Second,
    OnStateChange: func(id string, prev, curr *cerberus.State) {
        // Persist curr to external storage for restart recovery.
        db.SaveState(id, curr)
    },
})

// Restore baseline from a previous run (optional).
watchdog.LoadBaseline(db.LoadAllStates())

if err := watchdog.RegisterProbe(myFileProbe); err != nil {
    log.Fatal(err)
}
if err := watchdog.Start(); err != nil {
    log.Fatal(err)
}
defer func() {
    if err := watchdog.Stop(); err != nil {
        // ErrCodeStopTimeout means a probe did not finish within 5s.
        log.Println("cerberus stop:", err)
    }
}()

for event := range watchdog.Drifts() {
    fmt.Printf("drift: %s changed [%s]\n", event.ResourceID, event.ChangeType)
}

Configuration

All fields are optional. New applies safe defaults for any zero value.

Field Type Default Notes
BufferSize int 64 Capacity of the drift event channel. Size for peak burst rate.
ProbeTimeout time.Duration 1s Maximum time one probe may run. Exceeded -> context cancelled.
SensitivityProfile *SensitivityProfile auto Per-resource poll intervals. Nil -> DefaultSensitivityForResource.
OnStateChange StateChangeHandler nil Called on every state transition (including first poll).
CongestionThreshold int64 10 Dropped-event count that fires OnCongestion. Resets on drain.
OnCongestion CongestionHandler nil Called once per congestion episode, not once per drop.
EmitCongestionEvent bool false Emit a ChangeError DriftEvent on congestion (opt-in).

PollInterval is accepted for backward compatibility but has no effect. Use SensitivityProfile to control polling frequency.


The Probe interface

type Probe interface {
    ID()           string
    ResourceType() ResourceType
    Probe(ctx context.Context) (State, error)
}

Probes must be:

  • Idempotent — safe to call repeatedly without side effects.
  • Context-aware — respect ctx.Done() for clean cancellation.
  • Fast — aim well under the configured ProbeTimeout.
  • Thread-safe — may be called from the watchdog goroutine concurrently with other probes in future worker-pool extensions.

Example: file probe

type FileProbe struct{ path string }

func (p *FileProbe) ID() string                        { return "file:" + p.path }
func (p *FileProbe) ResourceType() cerberus.ResourceType { return cerberus.ResourceFile }

func (p *FileProbe) Probe(ctx context.Context) (cerberus.State, error) {
    select {
    case <-ctx.Done():
        return cerberus.State{}, ctx.Err()
    default:
    }
    info, err := os.Stat(p.path)
    if err != nil {
        return cerberus.State{}, err
    }
    h := fnv.New64a()
    fmt.Fprintf(h, "%d%d%s", info.ModTime().UnixNano(), info.Size(), info.Mode())
    return cerberus.State{
        ResourceID: p.path,
        Hash:       h.Sum64(),
        Timestamp:  time.Now(),
    }, nil
}

Resource types

Constant String Default sensitivity
ResourceFile file Medium - 1s
ResourcePort port High - 500ms
ResourceProcess process High - 500ms
ResourceLog log Low - 5s
ResourceContainer container Medium - 1s
ResourceCertificate certificate Critical - 100ms
ResourceDNS dns Medium - 1s
ResourceIAMPolicy iam_policy Critical - 100ms
ResourceNetworkRule network_rule High - 500ms
ResourceSecret secret Critical - 100ms
ResourceService service Medium - 1s
ResourceEndpoint endpoint Medium - 1s
ResourceCustom custom Medium - 1s
ResourceModelWeight model_weight Critical - 100ms
ResourcePromptTemplate prompt_template High - 500ms
ResourceEnvVar env_var High - 500ms
ResourceAgentConfig agent_config Critical - 100ms
ResourceCerberus cerberus Medium - 1s

Custom intervals override the sensitivity-based default:

profile := cerberus.NewSensitivityProfile()
profile.SetInterval(cerberus.ResourceFile, 250*time.Millisecond)
profile.SetSensitivity(cerberus.ResourceSecret, cerberus.SensitivityCritical)

watchdog := cerberus.New(cerberus.Config{SensitivityProfile: profile})

The absolute minimum interval is MinPollInterval = 10ms; values below this are clamped automatically.


Drift events

type DriftEvent struct {
    ProbeID      string
    ResourceID   string
    ResourceType ResourceType
    ChangeType   ChangeType   // ChangeCreate | ChangeDrift | ChangeError | ...
    PrevHash     uint64
    CurrHash     uint64
    Timestamp    time.Time
    Error        error        // non-nil when ChangeType == ChangeError
}

ChangeCreate is emitted on the first poll for a probe (no previous state). ChangeDrift is emitted when the hash differs from the previous poll. ChangeError is emitted when the probe returns an error or panics.


Dynamic probe generation

ProbeFactory builds probes at runtime from ProbeDefinition values without stopping the watchdog:

factory := cerberus.NewProbeFactory()
factory.RegisterGenerator(cerberus.ResourceFile, func(ctx context.Context, def cerberus.ProbeDefinition) (cerberus.Probe, error) {
    return NewFileProbe(def.Target), nil
})

defs := []cerberus.ProbeDefinition{
    {ID: "file:/etc/passwd", ResourceType: cerberus.ResourceFile, Target: "/etc/passwd"},
}
probes, errs := factory.CreateProbesFromDefinitions(ctx, defs)
for i, p := range probes {
    if errs[i] != nil {
        continue
    }
    _ = watchdog.RegisterProbe(p)
}

ProbeDefinition.Validate() enforces ^[a-zA-Z0-9_\-\.]+$ on IDs and rejects null bytes, path separators, and oversized values (CWE-116).


Baseline integrity

Persisted state can be signed with HMAC-SHA256 to detect tampering:

key := []byte("32-byte-secret-key-for-production!")

// Before shutdown - sign and persist.
signed, err := cerberus.SignBaseline(watchdog.ExportState(), key)
saveToFile(signed)

// On startup - verify before loading.
loaded := loadFromFile()
ok, err := cerberus.VerifyBaseline(loaded, key)
if err != nil || !ok {
    log.Fatal("baseline tampered - refusing to load")
}
watchdog.LoadBaseline(loaded.States)

VerifyBaseline uses hmac.Equal for constant-time comparison (CWE-354).


Statistics and health

stats := watchdog.Stats()
// stats.PollCount, stats.DriftCount, stats.DroppedCount
// stats.LastPollAt (time.Time), stats.LastPollDuration (time.Duration)
// stats.IsRunning, stats.ProbeCount, stats.BaselineCount

health := watchdog.HealthCheck()
// health.IsHealthy, health.IsRunning, health.ProbeCount
// health.DroppedEvents, health.BufferCapacity, health.BufferUsed
// health.LastPollAt, health.LastPollDuration

HealthStatus.IsHealthy is false when DroppedEvents >= CongestionThreshold.


Congestion

When the drift channel is full, emitDrift drops the event and increments Stats.DroppedCount. When dropped events reach CongestionThreshold:

  1. OnCongestion is called exactly once per congestion episode.
  2. The latch resets automatically when the next event is successfully enqueued and the buffer is no longer at capacity.
  3. A subsequent burst fires OnCongestion again.

Recommended: always provide an OnCongestion handler and monitor Stats.DroppedCount in your health dashboard.


Stop behaviour

Stop() signals the poll loop to exit and waits up to 5 seconds. If a probe is stuck (ignores context cancellation, e.g. a blocked syscall), Stop() returns ErrCodeStopTimeout but still marks the watchdog as stopped. The caller decides whether to treat this as fatal.

if err := watchdog.Stop(); err != nil {
    // At least one probe did not finish cleanly.
    log.Printf("cerberus stop: %v", err)
}

Security

The following threat vectors are covered by security_test.go:

CWE Vector Mitigation
CWE-440 Probe panic recover() in pollProbe; converted to ChangeError.
CWE-20 Invalid config (negative intervals, zero buffers) applyDefaults() clamps all fields.
CWE-116 Injection via probe IDs (newlines, null bytes, path separators) ProbeDefinition.Validate() enforces strict charset.
CWE-354 Baseline tampering HMAC-SHA256 + constant-time compare in VerifyBaseline.
CWE-400 Drift storm / buffer flood Non-blocking channel + drop counter + congestion latch.
CWE-770 Slow probe DoS Per-probe context.WithTimeout; timed-out probes emit ChangeError.
CWE-362 Race on concurrent register/unregister probesMu RWMutex + double-check before reschedule.

Thread safety

All public methods are safe to call concurrently. Specifically:

  • RegisterProbe and UnregisterProbe may be called while the watchdog is running. The probe map is protected by probesMu (RWMutex).
  • Stats() and HealthCheck() use lock-free atomic reads.
  • Drifts() returns a read-only channel; multiple consumers are safe.
  • SensitivityProfile operations are protected by an internal RWMutex.

Copyright (c) 2025 AGILira - A. Giordano SPDX-License-Identifier: MPL-2.0

Documentation

Overview

Package cerberus implements a lightweight, CPU-efficient drift detection watchdog.

Design Principles

Cerberus detects — it never acts. Every state change is emitted as a DriftEvent on a buffered channel; what to do with that information is the caller's concern. This separation keeps Cerberus auditable, testable, and safe to embed in any security-critical process without hidden side effects.

Additional design constraints:

  • Zero external dependencies beyond github.com/agilira/go-errors.
  • CPU-light: one base ticker goroutine; probes polled via a priority queue.
  • Context-aware: every probe call is bounded by a configurable ProbeTimeout.
  • Panic-safe: a panicking probe emits ChangeError and does NOT crash the host.
  • Thread-safe: [RegisterProbe] and [UnregisterProbe] are safe to call while the watchdog is running.

Quick start

watchdog := cerberus.New(cerberus.Config{
    BufferSize:   64,
    ProbeTimeout: time.Second,
    OnStateChange: func(id string, prev, curr *cerberus.State) {
        // persist curr to external storage for restart recovery
    },
})

if err := watchdog.RegisterProbe(myProbe); err != nil {
    log.Fatal(err)
}
if err := watchdog.Start(); err != nil {
    log.Fatal(err)
}
defer func() {
    if err := watchdog.Stop(); err != nil {
        log.Println("cerberus stop:", err) // may be ErrCodeStopTimeout
    }
}()

for event := range watchdog.Drifts() {
    handleDrift(event)
}

Configuration

Config controls all runtime behaviour. Passing a zero value is safe — defaults are applied by New via Config.applyDefaults():

  • BufferSize (default 64): capacity of the drift event channel.
  • ProbeTimeout (default 1s): maximum time a single probe may run.
  • CongestionThreshold (default 10): dropped-event count that triggers [Config.OnCongestion].
  • SensitivityProfile: per-resource polling intervals (see SensitivityProfile).

Probes

Any type that satisfies the Probe interface can be registered. The interface is intentionally minimal:

type Probe interface {
    ID()           string
    ResourceType() ResourceType
    Probe(context.Context) (State, error)
}

Probes are polled at the interval returned by SensitivityProfile.GetInterval for their ResourceType. The default intervals range from 100ms (SensitivityCritical) to 5s (SensitivityLow).

Dynamic probes

ProbeFactory generates probes at runtime from ProbeDefinition values. Register a generator per ResourceType, then call CreateProbesFromDefinitions to build a batch without restarting the watchdog.

State persistence and restart recovery

Cerberus is stateless by design. Persistence is opt-in:

Baseline integrity

SignBaseline and VerifyBaseline provide HMAC-SHA256 protection for persisted state. Tampered baselines are rejected before they can influence drift decisions.

Self-health

Cerberus.HealthCheck returns HealthStatus with live congestion metrics, probe count, and the timestamp/duration of the last poll cycle. Cerberus.Stats returns cumulative counters for dashboards and alerting.

Copyright (c) 2025 AGILira - A. Giordano SPDX-License-Identifier: MPL-2.0

Index

Constants

View Source
const (
	DefaultPollInterval        = 500 * time.Millisecond
	DefaultBufferSize          = 64
	DefaultProbeTimeout        = 1 * time.Second
	DefaultCongestionThreshold = 10 // Alert after 10 dropped events
)

Default configuration values

View Source
const (
	ErrCodeStopTimeout    = "CERBERUS_STOP_TIMEOUT"
	ErrCodeCompromised    = "CERBERUS_COMPROMISED"
	ErrCodeNilProbe       = "CERBERUS_NIL_PROBE"
	ErrCodeDuplicateProbe = "CERBERUS_DUPLICATE_PROBE"
	ErrCodeProbeNotFound  = "CERBERUS_PROBE_NOT_FOUND"
	ErrCodeAlreadyRunning = "CERBERUS_ALREADY_RUNNING"
	ErrCodeNotRunning     = "CERBERUS_NOT_RUNNING"
	ErrCodeProbeWhileRun  = "CERBERUS_PROBE_WHILE_RUNNING"
)

Error codes for Cerberus operations

View Source
const (
	MaxProbeIDLength     = 128
	MaxProbeTargetLength = 1024
	MaxMetadataKeys      = 50
	MaxMetadataValueLen  = 1024
)

Maximum constraints for adversarial protection — used as defaults in ValidationLimits. Override per-factory via NewProbeFactoryWithLimits instead of changing these constants.

View Source
const MinPollInterval = 10 * time.Millisecond

MinPollInterval is the absolute minimum polling interval (10ms) Below this, CPU overhead becomes excessive

Variables

View Source
var (
	ErrProbeFailure = errors.New("probe execution failed")
	ErrProbeTimeout = errors.New("probe execution timed out")
)

Error sentinels for probe operations

Functions

func VerifyBaseline

func VerifyBaseline(signed *SignedBaseline, key []byte) (bool, error)

VerifyBaseline verifies the HMAC signature of a signed baseline. Returns true if the signature is valid, false if tampered.

Types

type AIModelMetadata

type AIModelMetadata struct {
	// ModelName is the human-readable model identifier
	ModelName string `json:"model_name"`

	// ModelVersion is the semantic version of the model
	ModelVersion string `json:"model_version,omitempty"`

	// Checksum is the cryptographic hash of model weights
	// Format: "algorithm:hexdigest" (e.g., "sha256:abc123...")
	Checksum string `json:"checksum"`

	// ChecksumAlgo is the algorithm used (sha256, blake3, etc.)
	ChecksumAlgo string `json:"checksum_algo"`

	// ParameterCount is the number of parameters (for quick identification)
	ParameterCount int64 `json:"parameter_count,omitempty"`

	// QuantizationType describes quantization (Q4_K_M, FP16, etc.)
	QuantizationType string `json:"quantization_type,omitempty"`

	// Framework is the inference framework (llama.cpp, ONNX, etc.)
	Framework string `json:"framework,omitempty"`

	// Source is where the model was obtained (HuggingFace, internal, etc.)
	Source string `json:"source,omitempty"`

	// ApprovedBy is who approved this model version for use
	ApprovedBy string `json:"approved_by,omitempty"`
}

AIModelMetadata provides structured metadata for AI model resources. Use with ResourceModelWeight probe type.

type AgentConfigMetadata

type AgentConfigMetadata struct {
	// AgentID is the unique agent identifier
	AgentID string `json:"agent_id"`

	// ConfigVersion is the semantic version of the config
	ConfigVersion string `json:"config_version,omitempty"`

	// Checksum is the hash of config content
	Checksum string `json:"checksum"`

	// AllowedPlugins lists plugins this agent can use
	AllowedPlugins []string `json:"allowed_plugins,omitempty"`

	// MaxTokenBudget is the maximum tokens per request
	MaxTokenBudget int `json:"max_token_budget,omitempty"`

	// TrustLevel indicates agent privilege level
	TrustLevel string `json:"trust_level,omitempty"`

	// Owner is the team/user responsible for this agent
	Owner string `json:"owner,omitempty"`

	// LastModifiedBy tracks who last changed the config
	LastModifiedBy string `json:"last_modified_by,omitempty"`
}

AgentConfigMetadata provides structured metadata for agent configurations. Use with ResourceAgentConfig probe type.

type Cerberus

type Cerberus struct {
	// contains filtered or unexported fields
}

Cerberus is a lightweight drift detection watchdog It detects state changes but does NOT act on them. Themis OS handles the actual response via Policy/RBAC/Reflex.

func New

func New(config Config) *Cerberus

New creates a new Cerberus watchdog

func (*Cerberus) Drifts

func (c *Cerberus) Drifts() <-chan DriftEvent

Drifts returns the channel where drift events are emitted Themis OS should consume this channel

func (*Cerberus) ExportSignedState

func (c *Cerberus) ExportSignedState(key []byte) (*SignedBaseline, error)

ExportSignedState exports current state with HMAC signature. Use this instead of ExportState for secure persistence.

func (*Cerberus) ExportState

func (c *Cerberus) ExportState() map[string]State

ExportState returns current state for external persistence Use this to save state before shutdown for restart recovery

func (*Cerberus) HealthCheck

func (c *Cerberus) HealthCheck() HealthStatus

HealthCheck returns current health status Use for monitoring dashboards and alerting

func (*Cerberus) IsRunning

func (c *Cerberus) IsRunning() bool

IsRunning returns whether the watchdog is active

func (*Cerberus) LoadBaseline

func (c *Cerberus) LoadBaseline(baseline map[string]State)

LoadBaseline loads previously persisted state for restart recovery This allows Cerberus to detect drift that occurred while it was stopped

func (*Cerberus) LoadSignedBaseline

func (c *Cerberus) LoadSignedBaseline(signed *SignedBaseline, key []byte) error

LoadSignedBaseline loads a verified baseline. Returns error if signature verification fails (possible tampering). This is the SECURE alternative to LoadBaseline for production use.

func (*Cerberus) RegisterProbe

func (c *Cerberus) RegisterProbe(probe Probe) error

RegisterProbe adds a probe to the watchdog. Safe to call while running — the internal probesMu write-lock and the scheduler's own mutex provide all required concurrency protection. WHY no running-guard: the probes map and scheduler are mutex-protected; blocking mid-run registration would prevent dynamic lifecycle consumers (ADR-017 skills watcher, auto-protect) from working without a disruptive Stop→Register→Start cycle that resets all other probes.

func (*Cerberus) Start

func (c *Cerberus) Start() error

Start begins the polling loop. Returns ErrCodeCompromised if a previous Stop() timed out — in that case the pollLoop goroutine may still be running and this instance must not be restarted. Create a new Cerberus via New() instead.

func (*Cerberus) Stats

func (c *Cerberus) Stats() Stats

Stats returns runtime statistics

func (*Cerberus) Stop

func (c *Cerberus) Stop() error

Stop halts the polling loop gracefully. Returns ErrCodeStopTimeout if the poll loop does not exit within 5 seconds. WHY: a stuck probe (network hang, broken filesystem) can block pollProbe indefinitely even with a timeout context if the probe ignores cancellation. Returning an error lets callers log or alert without a silent hang at shutdown.

func (*Cerberus) UnregisterProbe

func (c *Cerberus) UnregisterProbe(id string) error

UnregisterProbe removes a probe from the watchdog. Safe to call while running — same rationale as RegisterProbe. The scheduler removal and lastState cleanup are performed inside the probesMu write-lock so that pollDueProbes sees a consistent view.

type ChangeType

type ChangeType uint8

ChangeType identifies what kind of change was detected

const (
	ChangeNone ChangeType = iota
	ChangeCreate
	ChangeModify
	ChangeDelete
	ChangeDrift // State differs from expected
	ChangeError // Probe error
)

func (ChangeType) String

func (c ChangeType) String() string

String returns human-readable change type name

type Config

type Config struct {
	// Deprecated: PollInterval is accepted for backward compatibility but has no
	// effect. Use SensitivityProfile to control polling frequency.
	PollInterval time.Duration

	// BufferSize is the drift event channel buffer size
	// Default: 64
	BufferSize int

	// ProbeTimeout is the maximum time a probe can take before being cancelled
	// Default: 1s
	// SAFETY: Prevents slow/hung probes from blocking the poll loop
	ProbeTimeout time.Duration

	// SensitivityProfile configures per-resource polling intervals
	// If nil, uses DefaultSensitivityForResource() for each probe
	// SOVEREIGNTY: Policy controls sensitivity, not code
	SensitivityProfile *SensitivityProfile

	// OnStateChange is called when any probe's state changes
	// Enables external persistence for baseline recovery after restart
	// SOVEREIGNTY: Cerberus is stateless, persistence is external
	OnStateChange StateChangeHandler

	// CongestionThreshold is the number of dropped events before alerting
	// Default: 10
	CongestionThreshold int64

	// OnCongestion is called when dropped events exceed threshold
	// Critical for GRC compliance - monitoring failures must be visible
	OnCongestion CongestionHandler

	// EmitCongestionEvent sends a DriftEvent when congestion occurs
	// Default: false (opt-in to avoid event flood)
	EmitCongestionEvent bool
}

Config configures Cerberus behavior

type CongestionHandler

type CongestionHandler func(droppedCount int64)

CongestionHandler is called when dropped events exceed threshold This is critical for GRC - you must know when monitoring fails

type DriftEvent

type DriftEvent struct {
	ProbeID      string       // Which probe detected this
	ResourceID   string       // What resource drifted
	ResourceType ResourceType // Type of resource
	ChangeType   ChangeType   // What kind of change
	PrevHash     uint64       // Previous state hash
	CurrHash     uint64       // Current state hash
	Timestamp    time.Time    // When detected
	Error        error        // If ChangeType == ChangeError
}

DriftEvent represents a detected drift that Cerberus barks about

type EnvVarMetadata

type EnvVarMetadata struct {
	// VarName is the environment variable name
	VarName string `json:"var_name"`

	// Checksum is the hash of the value (never store actual secrets)
	Checksum string `json:"checksum"`

	// Category classifies the variable (api_key, config, path, etc.)
	Category string `json:"category,omitempty"`

	// Sensitive indicates if this is a secret value
	Sensitive bool `json:"sensitive,omitempty"`

	// Source tracks where this var comes from (file, vault, k8s, etc.)
	Source string `json:"source,omitempty"`
}

EnvVarMetadata provides structured metadata for environment variables. Use with ResourceEnvVar probe type.

type GenericProbe

type GenericProbe struct {
	// contains filtered or unexported fields
}

GenericProbe is a runtime-generated probe that uses a check function This allows probes to be created without writing custom types

func NewGenericProbe

func NewGenericProbe(def ProbeDefinition, checkFn func(ctx context.Context, target string) (uint64, error)) *GenericProbe

NewGenericProbe creates a new generic probe with a custom check function

func (*GenericProbe) Definition

func (p *GenericProbe) Definition() ProbeDefinition

Definition returns the original probe definition

func (*GenericProbe) ID

func (p *GenericProbe) ID() string

ID returns the probe's unique identifier

func (*GenericProbe) Probe

func (p *GenericProbe) Probe(ctx context.Context) (State, error)

Probe executes the check function and returns the current state

func (*GenericProbe) ResourceType

func (p *GenericProbe) ResourceType() ResourceType

ResourceType returns the type of resource being monitored

func (*GenericProbe) Target

func (p *GenericProbe) Target() string

Target returns the probe's target (file path, port, etc.)

type HealthStatus

type HealthStatus struct {
	IsHealthy        bool          // True if watchdog is operating normally
	IsRunning        bool          // Whether watchdog is running
	ProbeCount       int           // Number of registered probes
	DroppedEvents    int64         // Number of dropped events
	BufferCapacity   int           // Size of drift event buffer
	BufferUsed       int           // Current buffer usage (approximate)
	LastPollAt       time.Time     // When last poll cycle completed (zero if never polled)
	LastPollDuration time.Duration // Duration of the last poll cycle
	PollOverrun      bool          // True if the last poll cycle exceeded MinPollInterval (sequential bottleneck)
}

HealthStatus contains health check information

type Probe

type Probe interface {
	// ID returns unique identifier for this probe
	ID() string

	// ResourceType returns what kind of resource this probe monitors
	ResourceType() ResourceType

	// Probe fetches current state and returns it
	// Must be thread-safe, idempotent, and respect context cancellation
	// Context will have a deadline set by Cerberus (ProbeTimeout config)
	Probe(ctx context.Context) (State, error)
}

Probe interface that all resource monitors must implement Probes are responsible for: 1. Fetching current state of a resource 2. Computing a hash for fast comparison 3. Returning any errors encountered

Probes should be: - Idempotent (safe to call repeatedly) - Thread-safe (may be called from multiple goroutines) - Fast (polling happens frequently) - Pure (no side effects) - Context-aware (respect timeouts and cancellation)

type ProbeDefinition

type ProbeDefinition struct {
	// ID is the unique identifier for the probe
	ID string `json:"id"`

	// ResourceType determines which generator to use
	ResourceType ResourceType `json:"resource_type"`

	// Target is the resource-specific target (path, port, etc.)
	Target string `json:"target"`

	// Sensitivity overrides default sensitivity for this probe
	Sensitivity *Sensitivity `json:"sensitivity,omitempty"`

	// Metadata contains resource-specific configuration
	Metadata map[string]string `json:"metadata,omitempty"`
}

ProbeDefinition describes a probe to be created at runtime This is the interface between WorldModel entities and Cerberus probes

func (ProbeDefinition) Validate

func (d ProbeDefinition) Validate() error

Validate checks the ProbeDefinition against the package-default limits. For custom limits use ValidateWith.

func (ProbeDefinition) ValidateWith

func (d ProbeDefinition) ValidateWith(lim ValidationLimits) error

ValidateWith checks the ProbeDefinition against caller-supplied limits. Useful when the operator has legitimately higher bounds (e.g. AI metadata) while still blocking obviously adversarial inputs.

type ProbeFactory

type ProbeFactory struct {
	// contains filtered or unexported fields
}

ProbeFactory creates probes dynamically from definitions Thread-safe for concurrent registration and creation

func NewProbeFactory

func NewProbeFactory() *ProbeFactory

NewProbeFactory creates a ProbeFactory with default validation limits.

func NewProbeFactoryWithLimits

func NewProbeFactoryWithLimits(limits ValidationLimits) *ProbeFactory

NewProbeFactoryWithLimits creates a ProbeFactory with caller-supplied validation limits. Use this when the default limits are too restrictive for your workload (e.g. large AI metadata payloads). Only raise the specific field you need; leave the others at their defaults by starting from DefaultValidationLimits() and overriding fields.

func (*ProbeFactory) CreateProbe

func (f *ProbeFactory) CreateProbe(ctx context.Context, def ProbeDefinition) (Probe, error)

CreateProbe creates a single probe from a definition

func (*ProbeFactory) CreateProbesFromDefinitions

func (f *ProbeFactory) CreateProbesFromDefinitions(ctx context.Context, defs []ProbeDefinition) ([]Probe, []error)

CreateProbesFromDefinitions creates multiple probes from definitions Returns successful probes and any errors encountered Partial success is supported - some probes may fail while others succeed

func (*ProbeFactory) HasGenerator

func (f *ProbeFactory) HasGenerator(rt ResourceType) bool

HasGenerator checks if a generator is registered for a resource type

func (*ProbeFactory) RegisterGenerator

func (f *ProbeFactory) RegisterGenerator(rt ResourceType, gen ProbeGenerator)

RegisterGenerator registers a generator for a resource type Thread-safe: can be called during runtime

type ProbeGenerator

type ProbeGenerator func(ctx context.Context, def ProbeDefinition) (Probe, error)

ProbeGenerator creates a Probe from a ProbeDefinition Different generators handle different resource types (file, port, etc.)

type ProbeScheduler

type ProbeScheduler struct {
	// contains filtered or unexported fields
}

ProbeScheduler manages probe scheduling with a priority queue Thread-safe: uses mutex for all operations

func NewProbeScheduler

func NewProbeScheduler() *ProbeScheduler

NewProbeScheduler creates a new scheduler

func (*ProbeScheduler) Clear

func (ps *ProbeScheduler) Clear()

Clear removes all entries from the scheduler

func (*ProbeScheduler) Len

func (ps *ProbeScheduler) Len() int

Len returns the number of scheduled probes

func (*ProbeScheduler) NextPollTime

func (ps *ProbeScheduler) NextPollTime() time.Time

NextPollTime returns when the next probe is due Returns zero time if no probes scheduled

func (*ProbeScheduler) PopDue

func (ps *ProbeScheduler) PopDue(now time.Time) []string

PopDue returns all probes that are due (nextPoll <= now) Returns probe IDs in order of their scheduled time Caller is responsible for rescheduling them after polling

func (*ProbeScheduler) Remove

func (ps *ProbeScheduler) Remove(probeID string)

Remove removes a probe from the scheduler

func (*ProbeScheduler) Schedule

func (ps *ProbeScheduler) Schedule(probeID string, nextPoll time.Time)

Schedule adds or updates a probe's next poll time If probe already scheduled, updates its position in the queue

func (*ProbeScheduler) ScheduleNow

func (ps *ProbeScheduler) ScheduleNow(probeID string)

ScheduleNow schedules a probe to be polled immediately

type PromptTemplateMetadata

type PromptTemplateMetadata struct {
	// TemplateName is the unique template identifier
	TemplateName string `json:"template_name"`

	// Version is the semantic version of the template
	Version string `json:"version,omitempty"`

	// Checksum is the hash of template content
	Checksum string `json:"checksum"`

	// TokenCount is the approximate token count (for budget planning)
	TokenCount int `json:"token_count,omitempty"`

	// Author is who created/last modified the template
	Author string `json:"author,omitempty"`

	// ApprovedAt is when the template was approved for production
	ApprovedAt string `json:"approved_at,omitempty"`

	// Category classifies the template (system, user, tool, etc.)
	Category string `json:"category,omitempty"`

	// SecurityLevel indicates sensitivity (public, internal, confidential)
	SecurityLevel string `json:"security_level,omitempty"`
}

PromptTemplateMetadata provides structured metadata for prompt templates. Use with ResourcePromptTemplate probe type.

type ResourceType

type ResourceType uint8

ResourceType identifies what kind of resource a probe monitors

const (
	ResourceFile ResourceType = iota
	ResourcePort
	ResourceProcess
	ResourceLog
	ResourceContainer
	ResourceCertificate
	ResourceDNS
	ResourceIAMPolicy
	ResourceNetworkRule
	ResourceSecret
	ResourceService
	ResourceEndpoint
	ResourceCustom

	// AI-Specific Resources (Sovereign Agentic OS)
	ResourceModelWeight    // LLM model weights - detect tampering
	ResourcePromptTemplate // System prompts - detect prompt injection
	ResourceEnvVar         // Environment variables - detect agent hijacking
	ResourceAgentConfig    // Agent configuration - critical for sovereignty

	// Meta-Resources (Self-Health)
	ResourceCerberus // Cerberus itself (congestion, health)
)

func (ResourceType) String

func (r ResourceType) String() string

String returns the canonical string representation of the resource type.

type Sensitivity

type Sensitivity uint8

Sensitivity levels for resources

const (
	// SensitivityLow: Non-critical resources, 5s polling (logs, metrics)
	SensitivityLow Sensitivity = iota

	// SensitivityMedium: Standard resources, 1s polling (files, containers)
	SensitivityMedium

	// SensitivityHigh: Important resources, 500ms polling (ports, processes)
	SensitivityHigh

	// SensitivityCritical: Security-critical, 100ms polling (secrets, certs, IAM)
	SensitivityCritical
)

func DefaultSensitivityForResource

func DefaultSensitivityForResource(rt ResourceType) Sensitivity

DefaultSensitivityForResource returns the default sensitivity for a resource type This embodies security best practices: - Secrets, certs, IAM, agent config = Critical (fast detection of compromise) - Ports, processes, network, AI resources = High (fast detection of intrusion) - Files, containers, services = Medium (balanced) - Logs = Low (rarely critical for real-time detection)

func (Sensitivity) DefaultInterval

func (s Sensitivity) DefaultInterval() time.Duration

DefaultInterval returns the default polling interval for this sensitivity

func (Sensitivity) String

func (s Sensitivity) String() string

String returns human-readable sensitivity name

type SensitivityProfile

type SensitivityProfile struct {
	// contains filtered or unexported fields
}

SensitivityProfile configures per-resource polling intervals Thread-safe for runtime updates via policy changes

func NewSensitivityProfile

func NewSensitivityProfile() *SensitivityProfile

NewSensitivityProfile creates a profile with default sensitivities

func (*SensitivityProfile) Clone

Clone returns a deep copy of the profile

func (*SensitivityProfile) GetInterval

func (p *SensitivityProfile) GetInterval(rt ResourceType) time.Duration

GetInterval returns the polling interval for a resource type Priority: custom interval > sensitivity default > medium default

func (*SensitivityProfile) SetInterval

func (p *SensitivityProfile) SetInterval(rt ResourceType, interval time.Duration)

SetInterval sets a custom polling interval for a resource type Thread-safe for runtime policy updates

func (*SensitivityProfile) SetSensitivity

func (p *SensitivityProfile) SetSensitivity(rt ResourceType, s Sensitivity)

SetSensitivity sets the sensitivity level for a resource type Clears any custom interval override

type SignedBaseline

type SignedBaseline struct {
	// States is the probe state map being protected
	States map[string]State `json:"states"`

	// Signature is the HMAC-SHA256 signature of canonical state JSON
	Signature string `json:"signature"`

	// Version allows future format changes
	Version int `json:"version"`
}

SignedBaseline holds probe states with cryptographic signature. This protects against attackers modifying persisted baselines to hide unauthorized changes made while Cerberus was stopped.

func SignBaseline

func SignBaseline(states map[string]State, key []byte) (*SignedBaseline, error)

SignBaseline creates a signed baseline from probe states. The key should be the same HMAC key used by the audit system (typically from config.SigningKey) for consistent security.

type State

type State struct {
	ResourceID string    // Unique identifier for the resource
	Hash       uint64    // Hash of current state for fast comparison
	Timestamp  time.Time // When this state was captured
	Metadata   any       // Optional: additional state info for detailed diff
}

State represents the current state of a monitored resource

type StateChangeHandler

type StateChangeHandler func(probeID string, prevState, newState *State)

StateChangeHandler is called when probe state changes prevState is nil on first poll (no previous state) This hook enables external persistence (WorldModel, disk, etc.)

type Stats

type Stats struct {
	PollCount        int64         // Total polls executed
	DriftCount       int64         // Total drifts detected
	DroppedCount     int64         // Events dropped due to full buffer
	OverrunCount     int64         // Poll cycles that exceeded MinPollInterval (sequential bottleneck signal)
	ProbeCount       int           // Number of registered probes
	LastPollAt       time.Time     // Wall-clock time when the last poll cycle completed
	LastPollDuration time.Duration // Duration of the last poll cycle
	IsRunning        bool          // Whether watchdog is running
	BaselineCount    int           // Number of baseline entries loaded
}

Stats contains runtime statistics

type ValidationLimits

type ValidationLimits struct {
	// MaxIDLength is the maximum byte length of a ProbeDefinition.ID.
	MaxIDLength int
	// MaxTargetLength is the maximum byte length of ProbeDefinition.Target.
	MaxTargetLength int
	// MaxMetadataKeys is the maximum number of keys in ProbeDefinition.Metadata.
	MaxMetadataKeys int
	// MaxMetadataValueLen is the maximum byte length of any single metadata value.
	MaxMetadataValueLen int
}

ValidationLimits configures the bounds enforced by ProbeDefinition.ValidateWith. All defaults are intentionally conservative to prevent adversarial payloads. If your workload requires larger metadata (e.g. LLM prompt configs), raise only the specific field you need; leave the others at their defaults.

func DefaultValidationLimits

func DefaultValidationLimits() ValidationLimits

DefaultValidationLimits returns conservative limits aligned with the package constants. These are the same values enforced by the zero-config Validate() method.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL