Documentation
¶
Index ¶
- Constants
- func CleanupRBAC(ctx context.Context, clientset kubernetes.Interface, namespace string) error
- func EnsureRBAC(ctx context.Context, clientset kubernetes.Interface, namespace string) error
- type Deployer
- func (d *Deployer) CleanupJob(ctx context.Context) error
- func (d *Deployer) DeployJob(ctx context.Context) error
- func (d *Deployer) ExtractResult(ctx context.Context) *ctrf.ValidatorResult
- func (d *Deployer) HandleTimeout(ctx context.Context) *ctrf.ValidatorResult
- func (d *Deployer) JobName() string
- func (d *Deployer) WaitForCompletion(ctx context.Context, timeout time.Duration) error
- func (d *Deployer) WaitForPodTermination(ctx context.Context) error
Constants ¶
const ( // ServiceAccountName is the name of the ServiceAccount used by all validator Jobs. ServiceAccountName = "aicr-validator" // ClusterRoleBindingName is the name of the ClusterRoleBinding that grants // cluster-admin to the validator ServiceAccount. ClusterRoleBindingName = "aicr-validator" )
Variables ¶
This section is empty.
Functions ¶
func CleanupRBAC ¶
CleanupRBAC removes the ServiceAccount and ClusterRoleBinding. Ignores NotFound errors (idempotent). Call once at end of validation run.
When both deletes fail, the returned StructuredError wraps the joined underlying errors via stderrors.Join so callers can inspect individual failures with errors.Is / errors.As.
func EnsureRBAC ¶
EnsureRBAC applies the ServiceAccount and ClusterRoleBinding for validator Jobs using server-side apply. Call once per validation run before deploying any Jobs.
Types ¶
type Deployer ¶
type Deployer struct {
// contains filtered or unexported fields
}
Deployer manages the lifecycle of a single validator Job.
func NewDeployer ¶
func NewDeployer( clientset kubernetes.Interface, factory informers.SharedInformerFactory, namespace, runID, cliVersion, cliCommit string, entry catalog.ValidatorEntry, imagePullSecrets []string, tolerations []corev1.Toleration, nodeSelector map[string]string, ) *Deployer
NewDeployer creates a Deployer for a single validator catalog entry. The factory must be a namespace-scoped SharedInformerFactory started by the caller. cliVersion is the CLI's own version string; empty is acceptable for dev builds and is forwarded to the validator container via the AICR_CLI_VERSION env var so the validator can resolve images it references outside the catalog (e.g. the AIPerf benchmark image used by inference-perf) using the same rewriting rules as catalog.Load. cliCommit is the git commit SHA, forwarded via AICR_CLI_COMMIT for SHA-based image tag resolution in dev builds.
func (*Deployer) CleanupJob ¶
CleanupJob deletes the validator Job with foreground propagation (waits for pod deletion).
func (*Deployer) DeployJob ¶
DeployJob applies the validator Job using server-side apply. A unique name is generated client-side and stored in d.jobName.
func (*Deployer) ExtractResult ¶
func (d *Deployer) ExtractResult(ctx context.Context) *ctrf.ValidatorResult
ExtractResult reads the exit code, termination message, and stdout from a completed validator pod. Returns a ValidatorResult regardless of how the container terminated — the caller maps the result to a CTRF status.
This method must be called after WaitForCompletion returns, when the Job is in a terminal state (Complete or Failed).
func (*Deployer) HandleTimeout ¶
func (d *Deployer) HandleTimeout(ctx context.Context) *ctrf.ValidatorResult
HandleTimeout extracts whatever result is available when the orchestrator's wait has timed out. Uses a fresh context since the parent may be canceled.
func (*Deployer) JobName ¶
JobName returns the Kubernetes Job name assigned by the API server. Empty until DeployJob is called.
func (*Deployer) WaitForCompletion ¶
WaitForCompletion watches the Job until it reaches a terminal state (Complete or Failed). Returns nil for both — the caller uses ExtractResult to determine pass/fail/skip from the exit code.
Returns error only for infrastructure failures (watch error, timeout). Job failure (exit != 0) is NOT an error return — that decision lives here in the validator orchestrator, not in the shared pod.WaitForJobTerminal helper, which intentionally treats both Complete and Failed Jobs as legitimate completions and lets the caller classify them.
func (*Deployer) WaitForPodTermination ¶
WaitForPodTermination watches the Job's pod until it reaches a terminal state. Prevents RBAC cleanup from racing with in-progress pod operations.
Returns the underlying error from pod.WaitForTermination so callers can decide log severity. A nil error means the pod is gone or terminal; a non-nil error means the wait was abandoned (timeout, watch failure, or repeated watch closures) and the cleanup may race with an in-progress pod.