Documentation
¶
Index ¶
Constants ¶
const ( // MaxImmediateRetries is the maximum retry count within the same process. MaxImmediateRetries = 1 // MaxDeferredRetries is the maximum deferred retry count across boots/triggers. MaxDeferredRetries = 2 // DeferredRetryDelaySec is the delay before deferred retry (systemd timer). DeferredRetryDelaySec = 60 )
Retry policy constants.
const DefaultMarkerPath = "/var/lib/nftban/state/rebuild_recovery.json"
DefaultMarkerPath is the canonical location for the recovery marker.
Variables ¶
This section is empty.
Functions ¶
func IsRetryable ¶
func IsRetryable(class FailureClass) bool
IsRetryable returns true if the failure class is potentially retryable. For POSTVALIDATION_REGRESSION, this returns true but actual retry depends on whether the regression is daemon-related (checked at call site).
Types ¶
type FailureClass ¶
type FailureClass string
FailureClass categorizes the root cause of a rebuild failure. Each class has a fixed retry policy — see policy.go.
Contract: Only transient classes are retryable. Invalid desired state, structural failures, and rollback failures are NEVER retried.
const ( // ClassPrevalidationFailed means nft -c -f rejected the rendered config. // This is an INPUT failure — no kernel mutation occurred. // INV-RR-001: must abort before any kernel change. // MUST NOT write recovery marker. MUST NOT enter recovery flow. ClassPrevalidationFailed FailureClass = "PREVALIDATION_FAILED" // ClassSnapshotFailed means pre-rebuild state could not be captured. // INV-RR-002: snapshot must exist before destructive change. ClassSnapshotFailed FailureClass = "SNAPSHOT_FAILED" // ClassApplyFailed means nft -f load failed after validation passed. // May be transient (kernel busy, resource exhaustion). ClassApplyFailed FailureClass = "APPLY_FAILED" // ClassPostvalidationRegression means post-state is worse than pre-state. // Retry is CONDITIONAL: only if caused by daemon/module restore path. // If caused by validator structural failure → never retry. ClassPostvalidationRegression FailureClass = "POSTVALIDATION_REGRESSION" // ClassPostvalidationHardFail means validator reports structural DOWN. ClassPostvalidationHardFail FailureClass = "POSTVALIDATION_HARD_FAIL" // ClassDaemonRestartFailed means nftband could not start for module restore. // Retryable — daemon may start on deferred retry after boot completes. ClassDaemonRestartFailed FailureClass = "DAEMON_RESTART_FAILED" // ClassModuleRestoreFailed means module re-enable failed explicitly. // Retryable if daemon dependency is the root cause. ClassModuleRestoreFailed FailureClass = "MODULE_RESTORE_FAILED" // ClassModuleRestoreIncomplete means some modules restored, others failed. // Retryable once — may succeed if daemon becomes available. ClassModuleRestoreIncomplete FailureClass = "MODULE_RESTORE_INCOMPLETE" // ClassRollbackFailed means snapshot restore failed. Never retry. // Exit code 3. Manual recovery required. ClassRollbackFailed FailureClass = "ROLLBACK_FAILED" // ClassAuthorityConflict means a foreign firewall was detected during rebuild. ClassAuthorityConflict FailureClass = "AUTHORITY_CONFLICT" // ClassBackupMissing means no snapshot was available for rollback. ClassBackupMissing FailureClass = "BACKUP_MISSING" // ClassRetryExhausted means max retry attempts reached. Stop. ClassRetryExhausted FailureClass = "RETRY_EXHAUSTED" )
type ModuleRestoreReport ¶
type ModuleRestoreReport struct {
DDoS ModuleRestoreResult `json:"ddos"`
Portscan ModuleRestoreResult `json:"portscan"`
BotGuard ModuleRestoreResult `json:"botguard"`
LoginMon ModuleRestoreResult `json:"loginmon"`
}
ModuleRestoreReport captures per-module restore outcomes after rebuild.
func (*ModuleRestoreReport) AllOK ¶
func (r *ModuleRestoreReport) AllOK() bool
AllOK returns true if all enabled modules restored successfully.
func (*ModuleRestoreReport) HasFailure ¶
func (r *ModuleRestoreReport) HasFailure() bool
HasFailure returns true if any enabled module failed to restore.
func (*ModuleRestoreReport) HasIncomplete ¶
func (r *ModuleRestoreReport) HasIncomplete() bool
HasIncomplete returns true if any module is partially restored.
type ModuleRestoreResult ¶
type ModuleRestoreResult string
ModuleRestoreResult represents the outcome of restoring a single module after rebuild. Three verification levels are required:
Level 1: Structure presence (chains + sets exist) Level 2: Wiring correctness (jumps in correct anchor positions) Level 3: Activation evidence (counters, active set references, or validator confirmation)
Classification:
Level 1 only → RestoreIncomplete Level 1 + Level 2 → RestoreIncomplete (wired but unproven) Level 1 + 2 + 3 → RestoreOK Level 1 fails → RestoreFailed Not attempted → RestoreSkipped
const ( // RestoreOK means module restored and verified at all 3 levels. RestoreOK ModuleRestoreResult = "RESTORE_OK" // RestoreFailed means module restore attempt failed (Level 1 not met). RestoreFailed ModuleRestoreResult = "RESTORE_FAILED" // RestoreIncomplete means partial verification (Level 1 or 1+2 only). RestoreIncomplete ModuleRestoreResult = "RESTORE_INCOMPLETE" // RestoreSkipped means module was not enabled pre-rebuild or not applicable. RestoreSkipped ModuleRestoreResult = "RESTORE_SKIPPED" )
type OperationResult ¶
type OperationResult string
OperationResult represents the outcome of a rebuild operation. This is distinct from system health (validator.Status).
A rebuild can FAIL while system health remains PROTECTED if rollback restored a validated known-good state. These are separate truths:
- OperationResult = what happened during rebuild
- validator.Status = current firewall protection state
const ( // ResultSuccess means rebuild completed and post-validation passed. ResultSuccess OperationResult = "SUCCESS" // ResultFailedRecovered means rebuild failed but rollback restored a // validated PROTECTED state. System is healthy, but rebuild did not achieve // its goal. Command exits non-zero, health may be PROTECTED. ResultFailedRecovered OperationResult = "FAILED_RECOVERED" // ResultFailedDegraded means rebuild failed and system is in DEGRADED state. // Either rollback restored a DEGRADED state, retry was exhausted, // or module restoration was incomplete. ResultFailedDegraded OperationResult = "FAILED_DEGRADED" // ResultFailedFatal means rollback itself failed. Manual recovery required. // System is in DOWN or untrusted state. Exit code 3. ResultFailedFatal OperationResult = "FAILED_FATAL" )
type RecoveryMarker ¶
type RecoveryMarker struct {
// FailureClass categorizes the root cause.
FailureClass FailureClass `json:"failure_class"`
// OperationResult is the rebuild operation outcome.
OperationResult OperationResult `json:"operation_result"`
// RetryCount is the number of retry attempts so far (immediate + deferred).
RetryCount int `json:"retry_count"`
// MaxRetries is the configured maximum total retries.
MaxRetries int `json:"max_retries"`
// DeferredRetryPending indicates a deferred retry should be attempted.
DeferredRetryPending bool `json:"deferred_retry_pending"`
// FirstFailureAt is the timestamp of the initial failure.
FirstFailureAt time.Time `json:"first_failure_at"`
// LastFailureAt is the timestamp of the most recent failure/retry.
LastFailureAt time.Time `json:"last_failure_at"`
// RollbackAttempted indicates whether rollback was triggered.
RollbackAttempted bool `json:"rollback_attempted"`
// RollbackResult is the outcome of the rollback attempt.
RollbackResult string `json:"rollback_result"` // "success", "failed", "not_attempted"
// BackupPath is the snapshot directory used for rollback.
BackupPath string `json:"backup_path"`
// LastHealthState is the validator state after recovery settled.
LastHealthState string `json:"last_health_state"` // "protected", "degraded", "down"
// Exhausted is true when max retries are reached.
Exhausted bool `json:"exhausted"`
// ModuleRestore captures per-module restore outcomes.
ModuleRestore *ModuleRestoreReport `json:"module_restore,omitempty"`
// DaemonRelated indicates whether the failure involved daemon unavailability.
DaemonRelated bool `json:"daemon_related"`
}
RecoveryMarker persists rebuild failure state for recovery tracking. Written on any non-SUCCESS rebuild outcome (except PREVALIDATION_FAILED). Cleared on SUCCESS. Read by deferred retry service.
func NewMarker ¶
func NewMarker(class FailureClass, result OperationResult) *RecoveryMarker
NewMarker creates a new recovery marker for an initial failure.
func ReadMarker ¶
func ReadMarker() (*RecoveryMarker, error)
ReadMarker reads a recovery marker from the default path. Returns nil, nil if marker does not exist (no recovery pending).
func ReadMarkerFrom ¶
func ReadMarkerFrom(path string) (*RecoveryMarker, error)
ReadMarkerFrom reads a recovery marker from a specific path. Returns nil, nil if marker does not exist.
func (*RecoveryMarker) IncrementRetry ¶
func (m *RecoveryMarker) IncrementRetry()
IncrementRetry updates the marker for a retry attempt.
func (*RecoveryMarker) SetRollbackResult ¶
func (m *RecoveryMarker) SetRollbackResult(success bool)
SetRollbackResult records the rollback outcome.
func (*RecoveryMarker) ShouldDeferRetry ¶
func (m *RecoveryMarker) ShouldDeferRetry() bool
ShouldDeferRetry returns true if a deferred retry should be scheduled.
func (*RecoveryMarker) Write ¶
func (m *RecoveryMarker) Write() error
Write persists the recovery marker atomically. Uses temp file + rename for crash safety.
func (*RecoveryMarker) WriteTo ¶
func (m *RecoveryMarker) WriteTo(path string) error
WriteTo persists the recovery marker to a specific path.
type RetryDisposition ¶
type RetryDisposition string
RetryDisposition is the decision on whether and how to retry a failure.
const ( // RetryImmediate means retry once in the same process execution. RetryImmediate RetryDisposition = "RETRY_IMMEDIATE" // RetryDeferred means schedule a follow-up retry via systemd timer. RetryDeferred RetryDisposition = "RETRY_DEFERRED" // NoRetry means this failure class must not be retried. NoRetry RetryDisposition = "NO_RETRY" )
func GetRetryDisposition ¶
func GetRetryDisposition(class FailureClass, immediateAttempts, deferredAttempts int, daemonRelated bool) RetryDisposition
GetRetryDisposition determines whether a failure should be retried and how.
Rules:
- Never-retry classes → NoRetry regardless of attempt count
- POSTVALIDATION_REGRESSION → conditional (see isDaemonRelatedRegression)
- Immediate-retry classes → RetryImmediate if attempt < max
- Otherwise → RetryDeferred if deferred attempts remain
- Exhausted → NoRetry