rebuild

package
v1.136.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2026 License: MPL-2.0 Imports: 5 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// MaxImmediateRetries is the maximum retry count within the same process.
	MaxImmediateRetries = 1

	// MaxDeferredRetries is the maximum deferred retry count across boots/triggers.
	MaxDeferredRetries = 2

	// DeferredRetryDelaySec is the delay before deferred retry (systemd timer).
	DeferredRetryDelaySec = 60
)

Retry policy constants.

View Source
const DefaultMarkerPath = "/var/lib/nftban/state/rebuild_recovery.json"

DefaultMarkerPath is the canonical location for the recovery marker.

Variables

This section is empty.

Functions

func Clear

func Clear() error

Clear removes the recovery marker (rebuild succeeded).

func ClearFrom

func ClearFrom(path string) error

ClearFrom removes a recovery marker at a specific path.

func IsRetryable

func IsRetryable(class FailureClass) bool

IsRetryable returns true if the failure class is potentially retryable. For POSTVALIDATION_REGRESSION, this returns true but actual retry depends on whether the regression is daemon-related (checked at call site).

Types

type FailureClass

type FailureClass string

FailureClass categorizes the root cause of a rebuild failure. Each class has a fixed retry policy — see policy.go.

Contract: Only transient classes are retryable. Invalid desired state, structural failures, and rollback failures are NEVER retried.

const (
	// ClassPrevalidationFailed means nft -c -f rejected the rendered config.
	// This is an INPUT failure — no kernel mutation occurred.
	// INV-RR-001: must abort before any kernel change.
	// MUST NOT write recovery marker. MUST NOT enter recovery flow.
	ClassPrevalidationFailed FailureClass = "PREVALIDATION_FAILED"

	// ClassSnapshotFailed means pre-rebuild state could not be captured.
	// INV-RR-002: snapshot must exist before destructive change.
	ClassSnapshotFailed FailureClass = "SNAPSHOT_FAILED"

	// ClassApplyFailed means nft -f load failed after validation passed.
	// May be transient (kernel busy, resource exhaustion).
	ClassApplyFailed FailureClass = "APPLY_FAILED"

	// ClassPostvalidationRegression means post-state is worse than pre-state.
	// Retry is CONDITIONAL: only if caused by daemon/module restore path.
	// If caused by validator structural failure → never retry.
	ClassPostvalidationRegression FailureClass = "POSTVALIDATION_REGRESSION"

	// ClassPostvalidationHardFail means validator reports structural DOWN.
	ClassPostvalidationHardFail FailureClass = "POSTVALIDATION_HARD_FAIL"

	// ClassDaemonRestartFailed means nftband could not start for module restore.
	// Retryable — daemon may start on deferred retry after boot completes.
	ClassDaemonRestartFailed FailureClass = "DAEMON_RESTART_FAILED"

	// ClassModuleRestoreFailed means module re-enable failed explicitly.
	// Retryable if daemon dependency is the root cause.
	ClassModuleRestoreFailed FailureClass = "MODULE_RESTORE_FAILED"

	// ClassModuleRestoreIncomplete means some modules restored, others failed.
	// Retryable once — may succeed if daemon becomes available.
	ClassModuleRestoreIncomplete FailureClass = "MODULE_RESTORE_INCOMPLETE"

	// ClassRollbackFailed means snapshot restore failed. Never retry.
	// Exit code 3. Manual recovery required.
	ClassRollbackFailed FailureClass = "ROLLBACK_FAILED"

	// ClassAuthorityConflict means a foreign firewall was detected during rebuild.
	ClassAuthorityConflict FailureClass = "AUTHORITY_CONFLICT"

	// ClassBackupMissing means no snapshot was available for rollback.
	ClassBackupMissing FailureClass = "BACKUP_MISSING"

	// ClassRetryExhausted means max retry attempts reached. Stop.
	ClassRetryExhausted FailureClass = "RETRY_EXHAUSTED"
)

type ModuleRestoreReport

type ModuleRestoreReport struct {
	DDoS     ModuleRestoreResult `json:"ddos"`
	Portscan ModuleRestoreResult `json:"portscan"`
	BotGuard ModuleRestoreResult `json:"botguard"`
	LoginMon ModuleRestoreResult `json:"loginmon"`
}

ModuleRestoreReport captures per-module restore outcomes after rebuild.

func (*ModuleRestoreReport) AllOK

func (r *ModuleRestoreReport) AllOK() bool

AllOK returns true if all enabled modules restored successfully.

func (*ModuleRestoreReport) HasFailure

func (r *ModuleRestoreReport) HasFailure() bool

HasFailure returns true if any enabled module failed to restore.

func (*ModuleRestoreReport) HasIncomplete

func (r *ModuleRestoreReport) HasIncomplete() bool

HasIncomplete returns true if any module is partially restored.

type ModuleRestoreResult

type ModuleRestoreResult string

ModuleRestoreResult represents the outcome of restoring a single module after rebuild. Three verification levels are required:

Level 1: Structure presence (chains + sets exist)
Level 2: Wiring correctness (jumps in correct anchor positions)
Level 3: Activation evidence (counters, active set references, or validator confirmation)

Classification:

Level 1 only         → RestoreIncomplete
Level 1 + Level 2    → RestoreIncomplete (wired but unproven)
Level 1 + 2 + 3      → RestoreOK
Level 1 fails        → RestoreFailed
Not attempted        → RestoreSkipped
const (
	// RestoreOK means module restored and verified at all 3 levels.
	RestoreOK ModuleRestoreResult = "RESTORE_OK"

	// RestoreFailed means module restore attempt failed (Level 1 not met).
	RestoreFailed ModuleRestoreResult = "RESTORE_FAILED"

	// RestoreIncomplete means partial verification (Level 1 or 1+2 only).
	RestoreIncomplete ModuleRestoreResult = "RESTORE_INCOMPLETE"

	// RestoreSkipped means module was not enabled pre-rebuild or not applicable.
	RestoreSkipped ModuleRestoreResult = "RESTORE_SKIPPED"
)

type OperationResult

type OperationResult string

OperationResult represents the outcome of a rebuild operation. This is distinct from system health (validator.Status).

A rebuild can FAIL while system health remains PROTECTED if rollback restored a validated known-good state. These are separate truths:

  • OperationResult = what happened during rebuild
  • validator.Status = current firewall protection state
const (
	// ResultSuccess means rebuild completed and post-validation passed.
	ResultSuccess OperationResult = "SUCCESS"

	// ResultFailedRecovered means rebuild failed but rollback restored a
	// validated PROTECTED state. System is healthy, but rebuild did not achieve
	// its goal. Command exits non-zero, health may be PROTECTED.
	ResultFailedRecovered OperationResult = "FAILED_RECOVERED"

	// ResultFailedDegraded means rebuild failed and system is in DEGRADED state.
	// Either rollback restored a DEGRADED state, retry was exhausted,
	// or module restoration was incomplete.
	ResultFailedDegraded OperationResult = "FAILED_DEGRADED"

	// ResultFailedFatal means rollback itself failed. Manual recovery required.
	// System is in DOWN or untrusted state. Exit code 3.
	ResultFailedFatal OperationResult = "FAILED_FATAL"
)

type RecoveryMarker

type RecoveryMarker struct {
	// FailureClass categorizes the root cause.
	FailureClass FailureClass `json:"failure_class"`

	// OperationResult is the rebuild operation outcome.
	OperationResult OperationResult `json:"operation_result"`

	// RetryCount is the number of retry attempts so far (immediate + deferred).
	RetryCount int `json:"retry_count"`

	// MaxRetries is the configured maximum total retries.
	MaxRetries int `json:"max_retries"`

	// DeferredRetryPending indicates a deferred retry should be attempted.
	DeferredRetryPending bool `json:"deferred_retry_pending"`

	// FirstFailureAt is the timestamp of the initial failure.
	FirstFailureAt time.Time `json:"first_failure_at"`

	// LastFailureAt is the timestamp of the most recent failure/retry.
	LastFailureAt time.Time `json:"last_failure_at"`

	// RollbackAttempted indicates whether rollback was triggered.
	RollbackAttempted bool `json:"rollback_attempted"`

	// RollbackResult is the outcome of the rollback attempt.
	RollbackResult string `json:"rollback_result"` // "success", "failed", "not_attempted"

	// BackupPath is the snapshot directory used for rollback.
	BackupPath string `json:"backup_path"`

	// LastHealthState is the validator state after recovery settled.
	LastHealthState string `json:"last_health_state"` // "protected", "degraded", "down"

	// Exhausted is true when max retries are reached.
	Exhausted bool `json:"exhausted"`

	// ModuleRestore captures per-module restore outcomes.
	ModuleRestore *ModuleRestoreReport `json:"module_restore,omitempty"`

	// DaemonRelated indicates whether the failure involved daemon unavailability.
	DaemonRelated bool `json:"daemon_related"`
}

RecoveryMarker persists rebuild failure state for recovery tracking. Written on any non-SUCCESS rebuild outcome (except PREVALIDATION_FAILED). Cleared on SUCCESS. Read by deferred retry service.

func NewMarker

func NewMarker(class FailureClass, result OperationResult) *RecoveryMarker

NewMarker creates a new recovery marker for an initial failure.

func ReadMarker

func ReadMarker() (*RecoveryMarker, error)

ReadMarker reads a recovery marker from the default path. Returns nil, nil if marker does not exist (no recovery pending).

func ReadMarkerFrom

func ReadMarkerFrom(path string) (*RecoveryMarker, error)

ReadMarkerFrom reads a recovery marker from a specific path. Returns nil, nil if marker does not exist.

func (*RecoveryMarker) IncrementRetry

func (m *RecoveryMarker) IncrementRetry()

IncrementRetry updates the marker for a retry attempt.

func (*RecoveryMarker) SetRollbackResult

func (m *RecoveryMarker) SetRollbackResult(success bool)

SetRollbackResult records the rollback outcome.

func (*RecoveryMarker) ShouldDeferRetry

func (m *RecoveryMarker) ShouldDeferRetry() bool

ShouldDeferRetry returns true if a deferred retry should be scheduled.

func (*RecoveryMarker) Write

func (m *RecoveryMarker) Write() error

Write persists the recovery marker atomically. Uses temp file + rename for crash safety.

func (*RecoveryMarker) WriteTo

func (m *RecoveryMarker) WriteTo(path string) error

WriteTo persists the recovery marker to a specific path.

type RetryDisposition

type RetryDisposition string

RetryDisposition is the decision on whether and how to retry a failure.

const (
	// RetryImmediate means retry once in the same process execution.
	RetryImmediate RetryDisposition = "RETRY_IMMEDIATE"

	// RetryDeferred means schedule a follow-up retry via systemd timer.
	RetryDeferred RetryDisposition = "RETRY_DEFERRED"

	// NoRetry means this failure class must not be retried.
	NoRetry RetryDisposition = "NO_RETRY"
)

func GetRetryDisposition

func GetRetryDisposition(class FailureClass, immediateAttempts, deferredAttempts int, daemonRelated bool) RetryDisposition

GetRetryDisposition determines whether a failure should be retried and how.

Rules:

  • Never-retry classes → NoRetry regardless of attempt count
  • POSTVALIDATION_REGRESSION → conditional (see isDaemonRelatedRegression)
  • Immediate-retry classes → RetryImmediate if attempt < max
  • Otherwise → RetryDeferred if deferred attempts remain
  • Exhausted → NoRetry

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL