pod

package

v0.12.1 Latest Latest Go to latest Published: May 1, 2026 License: Apache-2.0 Imports: 16 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/NVIDIA/aicr

Links

Open Source Insights

Documentation ¶

Overview ¶

Package pod provides shared utilities for Kubernetes Job and Pod operations.

This package consolidates common functionality used by the snapshot agent (pkg/k8s/agent) and validator Job orchestrator (pkg/validator/job):

Job lifecycle: WaitForJobCompletion
Pod phase: WaitForPodSucceeded, WaitForPodReady
Pod logs: StreamLogs, GetPodLogs
ConfigMap URIs: ParseConfigMapURI

All functions use structured error handling (pkg/errors) and respect context deadlines for proper timeout management.

Example usage:

// Wait for job completion
err := pod.WaitForJobCompletion(ctx, client, namespace, jobName, timeout)

// Wait for pod to succeed
err := pod.WaitForPodSucceeded(ctx, client, namespace, podName, timeout)

// Stream pod logs to writer (empty container = first container)
err := pod.StreamLogs(ctx, client, namespace, podName, "", os.Stdout)

// Get pod logs as string with specific container
logs, err := pod.GetPodLogs(ctx, client, namespace, podName, "my-container")

Index ¶

func GetPodForJob(ctx context.Context, client kubernetes.Interface, namespace, jobName string) (*corev1.Pod, error)
func GetPodLogs(ctx context.Context, client kubernetes.Interface, ...) (string, error)
func ParseConfigMapURI(uri string) (namespace, name string, err error)
func StreamLogs(ctx context.Context, client kubernetes.Interface, ...) error
func WaitForJobCompletion(ctx context.Context, client kubernetes.Interface, namespace, name string, ...) error
func WaitForJobTerminal(ctx context.Context, client kubernetes.Interface, namespace, name string, ...) (*batchv1.Job, error)
func WaitForPodReady(ctx context.Context, client kubernetes.Interface, namespace, name string, ...) error
func WaitForPodSucceeded(ctx context.Context, client kubernetes.Interface, namespace, name string, ...) error
func WaitForTermination(ctx context.Context, client kubernetes.Interface, namespace, name string) error

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GetPodForJob ¶ added in v0.12.1

func GetPodForJob(ctx context.Context, client kubernetes.Interface, namespace, jobName string) (*corev1.Pod, error)

GetPodForJob returns the first pod owned by the named Job, located via the standard `batch.kubernetes.io/job-name=<jobName>` label selector.

Returns an ErrCodeNotFound StructuredError when the listing succeeds but matches zero pods (Job's pod has not yet been created or was deleted). Returns an ErrCodeInternal StructuredError if the List call itself fails.

func GetPodLogs ¶

func GetPodLogs(ctx context.Context, client kubernetes.Interface, namespace, podName, containerName string) (string, error)

GetPodLogs retrieves all logs from a pod as a string. This function is suitable for completed pods or when you need the full log history. When containerName is empty, Kubernetes defaults to the first container.

func ParseConfigMapURI ¶

func ParseConfigMapURI(uri string) (namespace, name string, err error)

ParseConfigMapURI parses a ConfigMap URI in the format "cm://namespace/name" and returns the namespace and name components.

Returns error if URI format is invalid.

func StreamLogs ¶

func StreamLogs(ctx context.Context, client kubernetes.Interface, namespace, podName, containerName string, logWriter io.Writer) error

StreamLogs streams pod logs to the provided writer in real-time. Logs are written line-by-line as they are received from the pod. When containerName is empty, Kubernetes defaults to the first container.

func WaitForJobCompletion ¶

func WaitForJobCompletion(ctx context.Context, client kubernetes.Interface, namespace, name string, timeout time.Duration) error

WaitForJobCompletion waits for a Kubernetes Job to complete successfully or fail. Returns nil if job completes successfully, error if job fails or context deadline exceeded.

Performs an initial Get to catch already-complete Jobs, then uses the watch API for efficient monitoring.

func WaitForJobTerminal ¶ added in v0.12.1

func WaitForJobTerminal(ctx context.Context, client kubernetes.Interface, namespace, name string, timeout time.Duration) (*batchv1.Job, error)

WaitForJobTerminal waits for a Kubernetes Job to reach a terminal state — Complete OR Failed — and returns the observed Job without classifying the terminal disposition as an error. This differs from WaitForJobCompletion which returns an error for Failed Jobs.

Use this helper when the caller wants to make its own pass/fail decision from the Job's status (e.g., the validator orchestrator extracts the exit code from the underlying pod and treats both Complete and Failed Jobs as legitimate completions).

Returns ErrCodeInternal if the initial Get or Watch call fails, or if the Job is deleted while being watched. Returns ErrCodeTimeout on context deadline exceeded. Returns ErrCodeUnavailable if the watch channel closes without a terminal state being observed (after one re-Get fast-path retry).

Performs an initial Get to catch already-terminal Jobs, then uses the watch API for efficient monitoring.

func WaitForPodReady ¶

func WaitForPodReady(ctx context.Context, client kubernetes.Interface, namespace, name string, timeout time.Duration) error

WaitForPodReady waits for a pod to become ready within the specified timeout. Returns nil if pod becomes ready, error if timeout or pod fails.

func WaitForPodSucceeded ¶ added in v0.8.2

func WaitForPodSucceeded(ctx context.Context, client kubernetes.Interface, namespace, name string, timeout time.Duration) error

WaitForPodSucceeded waits for a pod to reach the Succeeded phase. Returns nil on PodSucceeded, error on PodFailed, error on timeout. Performs an initial Get to catch already-terminal pods, then uses the watch API for efficient monitoring.

func WaitForTermination ¶ added in v0.12.1

func WaitForTermination(ctx context.Context, client kubernetes.Interface, namespace, name string) error

WaitForTermination watches a pod and returns nil once it has reached a terminal state — either the pod object has been deleted (the API server emitted a watch.Deleted event or a subsequent Get returns NotFound) or its phase is Succeeded or Failed. Unlike WaitForPodSucceeded, a Failed phase is NOT treated as an error here: the caller has already decided that any terminal disposition is acceptable (e.g., RBAC cleanup races).

If the watch channel closes before a terminal state is observed, this function performs ONE retry by re-issuing the watch starting from the most recent ResourceVersion observed. If the second watch also closes without reaching a terminal state, an ErrCodeUnavailable error is returned so callers can decide log severity rather than swallow the failure.

Context cancellation/timeout is surfaced as an ErrCodeTimeout error.

Types ¶

This section is empty.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL