Documentation
¶
Index ¶
- Constants
- func CleanupRBAC(ctx context.Context, clientset kubernetes.Interface, namespace, runID string) error
- func ClusterRoleBindingName(runID string) string
- func EnsureRBAC(ctx context.Context, clientset kubernetes.Interface, namespace, runID string) error
- func ServiceAccountName(runID string) string
- type Deployer
- func (d *Deployer) CleanupJob(ctx context.Context) error
- func (d *Deployer) DeployJob(ctx context.Context) error
- func (d *Deployer) ExtractResult(ctx context.Context) *ctrf.ValidatorResult
- func (d *Deployer) HandleTimeout(ctx context.Context) *ctrf.ValidatorResult
- func (d *Deployer) JobName() string
- func (d *Deployer) WaitForCompletion(ctx context.Context, timeout time.Duration) error
- func (d *Deployer) WaitForPodTermination(ctx context.Context) error
Constants ¶
const ( // ValidatorContainerName is the required name for the validator container. // This is part of the validator package contract to ensure sidecar-safety. ValidatorContainerName = "validator" )
Variables ¶
This section is empty.
Functions ¶
func CleanupRBAC ¶
func CleanupRBAC(ctx context.Context, clientset kubernetes.Interface, namespace, runID string) error
CleanupRBAC removes the per-run ServiceAccount and ClusterRoleBinding. Ignores NotFound errors (idempotent). Call once at end of validation run.
When both deletes fail, the returned StructuredError wraps the joined underlying errors via stderrors.Join so callers can inspect individual failures with errors.Is / errors.As.
func ClusterRoleBindingName ¶
ClusterRoleBindingName returns the per-run ClusterRoleBinding name. The CRB is cluster-scoped, so name uniqueness across concurrent runs (even on different namespaces) is what prevents cross-run cleanup races.
func EnsureRBAC ¶
EnsureRBAC applies the ServiceAccount and ClusterRoleBinding for validator Jobs using server-side apply. Call once per validation run before deploying any Jobs. The runID scopes the resource names so overlapping runs do not clobber each other.
func ServiceAccountName ¶
ServiceAccountName returns the per-run ServiceAccount name used by the validator Jobs deployed for runID. Each `aicr validate` invocation generates a unique runID, so the SA created at run start is the same one deleted at run end — overlapping runs cannot clobber each other.
Types ¶
type Deployer ¶
type Deployer struct {
// contains filtered or unexported fields
}
Deployer manages the lifecycle of a single validator Job.
func NewDeployer ¶
func NewDeployer( clientset kubernetes.Interface, factory informers.SharedInformerFactory, namespace, runID, cliVersion, cliCommit string, entry catalog.ValidatorEntry, imagePullSecrets []string, tolerations []corev1.Toleration, nodeSelector map[string]string, ) *Deployer
NewDeployer creates a Deployer for a single validator catalog entry. The factory must be a namespace-scoped SharedInformerFactory started by the caller. cliVersion is the CLI's own version string; empty is acceptable for dev builds and is forwarded to the validator container via the AICR_CLI_VERSION env var so the validator can resolve images it references outside the catalog (e.g. the AIPerf benchmark image used by inference-perf) using the same rewriting rules as catalog.Load. cliCommit is the git commit SHA, forwarded via AICR_CLI_COMMIT for SHA-based image tag resolution in dev builds.
func (*Deployer) CleanupJob ¶
CleanupJob deletes the validator Job with foreground propagation (waits for pod deletion).
func (*Deployer) DeployJob ¶
DeployJob applies the validator Job using server-side apply. A unique name is generated client-side and stored in d.jobName.
func (*Deployer) ExtractResult ¶
func (d *Deployer) ExtractResult(ctx context.Context) *ctrf.ValidatorResult
ExtractResult reads the exit code, termination message, and stdout from the "validator" container in a completed validator pod.
CONTRACT: The container name MUST be "validator". This is a frozen public contract of the validator package to ensure sidecar-safety — ExtractResult will only read from the "validator" container, ignoring any sidecar containers that may be injected by external controllers (e.g., log streaming, result processing).
Returns a ValidatorResult regardless of how the container terminated — the caller maps the result to a CTRF status.
This method must be called after WaitForCompletion returns, when the Job is in a terminal state (Complete or Failed).
func (*Deployer) HandleTimeout ¶
func (d *Deployer) HandleTimeout(ctx context.Context) *ctrf.ValidatorResult
HandleTimeout extracts whatever result is available when the orchestrator's wait has timed out. Uses a fresh context since the parent may be canceled.
func (*Deployer) JobName ¶
JobName returns the Kubernetes Job name assigned by the API server. Empty until DeployJob is called.
func (*Deployer) WaitForCompletion ¶
WaitForCompletion watches the Job until it reaches a terminal state (Complete or Failed). Returns nil for both — the caller uses ExtractResult to determine pass/fail/skip from the exit code.
Returns error only for infrastructure failures (watch error, timeout). Job failure (exit != 0) is NOT an error return — that decision lives here in the validator orchestrator, not in the shared pod.WaitForJobTerminal helper, which intentionally treats both Complete and Failed Jobs as legitimate completions and lets the caller classify them.
func (*Deployer) WaitForPodTermination ¶
WaitForPodTermination watches the Job's pod until it reaches a terminal state. Prevents RBAC cleanup from racing with in-progress pod operations.
Returns the underlying error from pod.WaitForTermination so callers can decide log severity. A nil error means the pod is gone or terminal; a non-nil error means the wait was abandoned (timeout, watch failure, or repeated watch closures) and the cleanup may race with an in-progress pod.