job

package
v0.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 15, 2026 License: Apache-2.0 Imports: 26 Imported by: 0

Documentation

Index

Constants

View Source
const (
	// ValidatorContainerName is the required name for the validator container.
	// This is part of the validator package contract to ensure sidecar-safety.
	ValidatorContainerName = "validator"
)

Variables

This section is empty.

Functions

func CleanupRBAC

func CleanupRBAC(ctx context.Context, clientset kubernetes.Interface, namespace, runID string) error

CleanupRBAC removes the per-run ServiceAccount and ClusterRoleBinding. Ignores NotFound errors (idempotent). Call once at end of validation run.

When both deletes fail, the returned StructuredError wraps the joined underlying errors via stderrors.Join so callers can inspect individual failures with errors.Is / errors.As.

func ClusterRoleBindingName

func ClusterRoleBindingName(runID string) string

ClusterRoleBindingName returns the per-run ClusterRoleBinding name. The CRB is cluster-scoped, so name uniqueness across concurrent runs (even on different namespaces) is what prevents cross-run cleanup races.

func EnsureRBAC

func EnsureRBAC(ctx context.Context, clientset kubernetes.Interface, namespace, runID string) error

EnsureRBAC applies the ServiceAccount and ClusterRoleBinding for validator Jobs using server-side apply. Call once per validation run before deploying any Jobs. The runID scopes the resource names so overlapping runs do not clobber each other.

func ServiceAccountName

func ServiceAccountName(runID string) string

ServiceAccountName returns the per-run ServiceAccount name used by the validator Jobs deployed for runID. Each `aicr validate` invocation generates a unique runID, so the SA created at run start is the same one deleted at run end — overlapping runs cannot clobber each other.

Types

type Deployer

type Deployer struct {
	// contains filtered or unexported fields
}

Deployer manages the lifecycle of a single validator Job.

func NewDeployer

func NewDeployer(
	clientset kubernetes.Interface,
	factory informers.SharedInformerFactory,
	namespace, runID, cliVersion, cliCommit string,
	entry catalog.ValidatorEntry,
	imagePullSecrets []string,
	tolerations []corev1.Toleration,
	nodeSelector map[string]string,
) *Deployer

NewDeployer creates a Deployer for a single validator catalog entry. The factory must be a namespace-scoped SharedInformerFactory started by the caller. cliVersion is the CLI's own version string; empty is acceptable for dev builds and is forwarded to the validator container via the AICR_CLI_VERSION env var so the validator can resolve images it references outside the catalog (e.g. the AIPerf benchmark image used by inference-perf) using the same rewriting rules as catalog.Load. cliCommit is the git commit SHA, forwarded via AICR_CLI_COMMIT for SHA-based image tag resolution in dev builds.

func (*Deployer) CleanupJob

func (d *Deployer) CleanupJob(ctx context.Context) error

CleanupJob deletes the validator Job with foreground propagation (waits for pod deletion).

func (*Deployer) DeployJob

func (d *Deployer) DeployJob(ctx context.Context) error

DeployJob applies the validator Job using server-side apply. A unique name is generated client-side and stored in d.jobName.

func (*Deployer) ExtractResult

func (d *Deployer) ExtractResult(ctx context.Context) *ctrf.ValidatorResult

ExtractResult reads the exit code, termination message, and stdout from the "validator" container in a completed validator pod.

CONTRACT: The container name MUST be "validator". This is a frozen public contract of the validator package to ensure sidecar-safety — ExtractResult will only read from the "validator" container, ignoring any sidecar containers that may be injected by external controllers (e.g., log streaming, result processing).

Returns a ValidatorResult regardless of how the container terminated — the caller maps the result to a CTRF status.

This method must be called after WaitForCompletion returns, when the Job is in a terminal state (Complete or Failed).

func (*Deployer) HandleTimeout

func (d *Deployer) HandleTimeout(ctx context.Context) *ctrf.ValidatorResult

HandleTimeout extracts whatever result is available when the orchestrator's wait has timed out. Uses a fresh context since the parent may be canceled.

func (*Deployer) JobName

func (d *Deployer) JobName() string

JobName returns the Kubernetes Job name assigned by the API server. Empty until DeployJob is called.

func (*Deployer) WaitForCompletion

func (d *Deployer) WaitForCompletion(ctx context.Context, timeout time.Duration) error

WaitForCompletion watches the Job until it reaches a terminal state (Complete or Failed). Returns nil for both — the caller uses ExtractResult to determine pass/fail/skip from the exit code.

Returns error only for infrastructure failures (watch error, timeout). Job failure (exit != 0) is NOT an error return — that decision lives here in the validator orchestrator, not in the shared pod.WaitForJobTerminal helper, which intentionally treats both Complete and Failed Jobs as legitimate completions and lets the caller classify them.

func (*Deployer) WaitForPodTermination

func (d *Deployer) WaitForPodTermination(ctx context.Context) error

WaitForPodTermination watches the Job's pod until it reaches a terminal state. Prevents RBAC cleanup from racing with in-progress pod operations.

Returns the underlying error from pod.WaitForTermination so callers can decide log severity. A nil error means the pod is gone or terminal; a non-nil error means the wait was abandoned (timeout, watch failure, or repeated watch closures) and the cleanup may race with an in-progress pod.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL