sandboxpool

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2026 License: Apache-2.0 Imports: 41 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// FinalizerName is the finalizer name for SandboxPool
	FinalizerName = "agentbox.navix.sh/finalizer"

	// RequeueAfter is the duration to wait before requeuing
	RequeueAfter = 10 * time.Second
)

Variables

View Source
var ErrSandboxNotFound = errors.New("sandbox not found")

Functions

func CaptureSandboxStopRecord

func CaptureSandboxStopRecord(pod *corev1.Pod) domain.Sandbox

CaptureSandboxStopRecord builds a terminal domain.Sandbox for a pod that is transitioning through the Stopping phase. It must be called BEFORE MarkUpdateCompleted so that stop-metadata annotations are still present.

The stop reason defaults to "Canceled" (never reached Running) or "Completed" when the annotation is absent, for upgrade-safety with pre-annotation pods.

The running-images annotation snapshot is used for ContainerImages when available, falling back to pod.Spec images.

func FindClaimedPodBySandboxID

func FindClaimedPodBySandboxID(ctx context.Context, c client.Client, namespace, sandboxID string) (*corev1.Pod, error)

func IdleContainerImages

func IdleContainerImages(pool *agentsv1alpha1.SandboxPool) map[string]string

func ListClaimedPods

func ListClaimedPods(ctx context.Context, c client.Client, namespace string) ([]corev1.Pod, error)

func ListClaimedPodsWithFilter

func ListClaimedPodsWithFilter(ctx context.Context, c client.Client, namespace, team, user string) ([]corev1.Pod, error)

ListClaimedPodsWithFilter lists all claimed (non-idle) pods in the given namespace. When team and user are both non-empty, only pods with matching labels are returned.

func SandboxBaseFromPod

func SandboxBaseFromPod(pod *corev1.Pod) domain.Sandbox

SandboxBaseFromPod populates the fields of a domain.Sandbox that are common to both live (API) and historical (store) representations of a sandbox:

  • Identity: SandboxID, Namespace, PoolName, PodName
  • Ownership: Team, User
  • Timing: ClaimedAt, StartedAt
  • Content: ContainerImages (from pod.Spec), Metadata (parsed from annotation JSON)
  • Resources: CPU, Memory (summed from container requests/limits; zero if not set)

Callers set Status and any terminal fields (TerminatedAt, FailureReason, etc.) after calling this function. When a running-images annotation snapshot is available (Stopping path), the caller may overwrite ContainerImages with that pre-release value.

func SyncAnnotationsFromTemplate

func SyncAnnotationsFromTemplate(dst, tmplAnnotations map[string]string)

SyncAnnotationsFromTemplate merges Template Annotations into Pool Annotations, skipping keys listed in annotationsToExcludeFromTemplateSync. Callers are responsible for overwriting system-managed keys afterwards.

func SyncLabelsFromTemplate

func SyncLabelsFromTemplate(dst, tmplLabels map[string]string)

SyncLabelsFromTemplate merges Template Labels into Pool Labels, skipping any key listed in labelsToExcludeFromTemplateSync (e.g. agentbox.io/sync-source). Template values take precedence over any existing Pool values for the same key.

Types

type IdleNotifier

type IdleNotifier interface {
	// NotifyIdleAvailable is called whenever a Pod successfully transitions to
	// the Idle phase (Stopping → Idle via MarkUpdateCompleted). Wakes any
	// waiting Create requests for the corresponding pool.
	NotifyIdleAvailable(namespace, poolName string)

	// OnSandboxReleased is called at the same Stopping → Idle transition,
	// carrying the sandbox ID that was just released. Implementations should
	// invalidate any cached mapping keyed on sandboxID (e.g. the ExtProc
	// route cache) so subsequent router queries return NotFound instead of
	// briefly hitting a stale entry.
	OnSandboxReleased(ctx context.Context, sandboxID string)
}

IdleNotifier is called by the SandboxPoolReconciler to signal sandbox lifecycle transitions observed during pool reconciliation. Implementations must be non-blocking and safe to call from multiple goroutines concurrently.

The typical implementation is k8sSandboxService, which routes these events to the poolClaimScheduler (to wake waiters) and to the ExtProc gRPC client (to invalidate the route cache).

type IdleTimeoutReconciler

type IdleTimeoutReconciler struct {
	// contains filtered or unexported fields
}

IdleTimeoutReconciler is a background runnable that periodically:

  1. Polls the ExtProc control-plane API for per-sandbox last-active timestamps.
  2. Patches pod last-active annotations so ExtProc can recover state after restarts.
  3. Releases Running pods whose idle duration exceeds their idle-timeout annotation.

If the source is unreachable, the check is skipped entirely to avoid false-positive releases during ExtProc rolling updates.

func NewIdleTimeoutReconciler

func NewIdleTimeoutReconciler(c client.Client, s store.SandboxStore, interval time.Duration, lastActive LastActiveSource) *IdleTimeoutReconciler

NewIdleTimeoutReconciler creates a new IdleTimeoutReconciler. If lastActive is nil, the reconciler logs a warning and idles (no releases issued).

func (*IdleTimeoutReconciler) CheckAndReleasePendingSandboxes

func (r *IdleTimeoutReconciler) CheckAndReleasePendingSandboxes(ctx context.Context)

TODO: enable this check after implementing startup timeouts in the e2b sdk.

func (*IdleTimeoutReconciler) Start

Start implements manager.Runnable. It performs an initial check immediately, then ticks at checkInterval. Runs only on the leader when leader election is enabled.

type LastActiveSource

type LastActiveSource interface {
	GetLastActive(ctx context.Context) (map[string]time.Time, error)
}

LastActiveSource retrieves per-sandbox last-activity timestamps. Implemented by the ExtProc gRPC client (production) and test doubles. Kept narrow so this package stays decoupled from the full ExtProc RPC surface.

type PodDiagnosticEvent

type PodDiagnosticEvent struct {
	// Reason is the event reason, e.g. "Failed", "BackOff", "ErrImagePull".
	Reason string
	// Message is the human-readable event message.
	Message string
	// LastTimestamp is the RFC3339 time of the most recent occurrence.
	LastTimestamp string
	// Count is the total number of times this event fired.
	Count int32
}

PodDiagnosticEvent is a single Warning event associated with a Pod.

type PodStatusDetail

type PodStatusDetail struct {
	// PodName is the name of the pod.
	PodName string
	// Phase is the agentbox pod phase label value (starting, failed, …).
	Phase string
	// Reason is a machine-readable cause extracted from the Pod spec, e.g.
	// "Pulling", "ImagePullBackOff", "ErrImagePull", "CrashLoopBackOff", "OOMKilled".
	Reason string
	// Message is a human-readable description derived from the Pod spec.
	Message string
	// Events contains Warning events fetched from the Kubernetes API (only
	// populated by Get, never by List).
	Events []PodDiagnosticEvent
}

PodStatusDetail holds diagnostic information for a single pod, derived on-demand from the Pod's current YAML state and (for Get) from its Kubernetes Events.

func BuildPodStatusDetailWithEvents

func BuildPodStatusDetailWithEvents(ctx context.Context, clientset kubernetes.Interface, pod *corev1.Pod) *PodStatusDetail

BuildPodStatusDetailWithEvents is like BuildPodStatusDetailFromPod but also fetches Warning events for the pod from the Kubernetes API and attaches them. clientset may be nil, in which case Events will be empty.

func BuildSandboxStatusDetailFromPod

func BuildSandboxStatusDetailFromPod(pod *corev1.Pod) *PodStatusDetail

BuildSandboxStatusDetailFromPod derives diagnostic information for a Starting or Failed pod purely from its in-memory Pod object — no API calls are made. Returns nil when the pod has no diagnosable state.

type PoolExpectations

type PoolExpectations struct {
	// contains filtered or unexported fields
}

PoolExpectations tracks in-flight pod creation and deletion counts per SandboxPool to prevent the informer-cache lag from causing oscillation.

When a scale-up creates N pods, the reconciler records N pending creations before issuing r.Create calls. Subsequent Reconcile calls skip the scaling decision until all N Pod Add events have been observed (decrementing the counter to zero), or the 5-minute TTL expires as a safety valve.

All methods are safe for concurrent use.

func NewPoolExpectations

func NewPoolExpectations() *PoolExpectations

NewPoolExpectations returns an initialized PoolExpectations.

func (*PoolExpectations) CreationObserved

func (e *PoolExpectations) CreationObserved(key types.NamespacedName)

CreationObserved decrements the pending-creation counter by 1 (clamped to zero). Called from the Pod Add event handler when a new pod belonging to the pool is observed in the informer.

func (*PoolExpectations) DeleteExpectations

func (e *PoolExpectations) DeleteExpectations(key types.NamespacedName)

DeleteExpectations removes the entry for the given pool. Called when the pool is deleted so the map does not grow unboundedly.

func (*PoolExpectations) DeletionObserved

func (e *PoolExpectations) DeletionObserved(key types.NamespacedName)

DeletionObserved decrements the pending-deletion counter by 1 (clamped to zero). Called from the Pod Delete event handler.

func (*PoolExpectations) ExpectCreations

func (e *PoolExpectations) ExpectCreations(key types.NamespacedName, n int)

ExpectCreations records n pending pod creations for the given pool. The timestamp is refreshed so the TTL clock starts from this scale decision. Any previous pending-creation count is replaced; pending deletions are left unchanged.

func (*PoolExpectations) ExpectDeletions

func (e *PoolExpectations) ExpectDeletions(key types.NamespacedName, n int)

ExpectDeletions records n pending pod deletions for the given pool. Any previous pending-deletion count is replaced; pending creations are left unchanged.

func (*PoolExpectations) Satisfied

func (e *PoolExpectations) Satisfied(key types.NamespacedName) bool

Satisfied reports whether all pending operations for the pool have been observed. Returns true when:

  • no entry exists for the key (first reconcile, or after controller restart)
  • both counters are zero
  • the entry is older than expectationsTTL (safety valve)

A stale (TTL-expired) entry is deleted as a side-effect.

type ReleaseSandboxPodOptions

type ReleaseSandboxPodOptions struct {
	StopReason     agentsv1alpha1.SandboxStopReason // "Completed" | "Released" | "Failed" | "Canceled"; defaults to "Completed"
	TerminatedAt   string                           // RFC3339; defaults to time.Now().UTC() if empty
	FailureReason  string                           // e.g. "IdleTimeout", "OOMKilled"
	FailureMessage string                           // human-readable
	ExitCode       *int32                           // container exit code for Failed sandboxes

	ExpectedCurrentSandboxPhase string

	// DisableRetry, when true, skips the retry-on-conflict loop in the
	// underlying TriggerUpdateWithOptions call. Use this for opportunistic
	// release paths (e.g. unexpected-restart detection) where retrying on a
	// stale pod view could misfire; the caller should skip and let the next
	// Reconcile re-observe before acting.
	DisableRetry bool
}

ReleaseSandboxPodOptions carries the stop metadata written to Pod annotations during Stopping so the reconciler can perform a deferred KV store write at Stopping→Idle.

type SandboxPoolReconciler

type SandboxPoolReconciler struct {
	client.Client
	Scheme        *runtime.Scheme
	Recorder      events.EventRecorder   // for publishing Phase-transition events
	Clientset     kubernetes.Interface   // nil means Event-based diagnostics disabled
	SandboxStore  store.SandboxStore     // nil means history recording disabled
	PluginManager *plugins.PluginManager // nil means lifecycle plugins disabled
	// IdleNotifier is called whenever a Pod transitions Stopping → Idle.
	// When non-nil, it wakes the per-pool claim scheduler immediately so
	// pending Create requests can be dispatched without a poll-timer delay.
	// nil = disabled (no notification sent).
	IdleNotifier IdleNotifier
	// SandboxReadyHook is called (in a goroutine) after a Starting pod is
	// successfully marked Running via MarkUpdateCompleted. nil = disabled.
	SandboxReadyHook SandboxReadyHook
	// DigestResolver resolves image tags to content digests for in-place
	// update completion detection. nil = disabled (digest-based comparison
	// will fail and updates may not complete).
	DigestResolver imageresolver.DigestResolver
	// contains filtered or unexported fields
}

SandboxPoolReconciler reconciles a SandboxPool object

func (*SandboxPoolReconciler) Reconcile

func (r *SandboxPoolReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)

Reconcile is part of the main kubernetes reconciliation loop which aims to move the current state of the cluster closer to the desired state.

func (*SandboxPoolReconciler) SetupWithManager

func (r *SandboxPoolReconciler) SetupWithManager(mgr ctrl.Manager) error

SetupWithManager sets up the controller with the Manager.

type SandboxReadyHook

type SandboxReadyHook interface {
	OnSandboxReady(ctx context.Context, pod *corev1.Pod)
}

SandboxReadyHook is called in a goroutine after a Starting pod successfully transitions to Running via MarkUpdateCompleted. Implementations must be goroutine-safe.

Directories

Path Synopsis
Package poststarthooks executes post-start hook actions on sandbox pods that have just transitioned Starting → Running.
Package poststarthooks executes post-start hook actions on sandbox pods that have just transitioned Starting → Running.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL