validator

package

v0.7.6 Latest Latest Go to latest Published: Feb 21, 2026 License: Apache-2.0 Imports: 28 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/NVIDIA/aicr

Links

Open Source Insights

README ¶

Validator Package

The validator package provides a comprehensive validation framework for GPU-accelerated Kubernetes clusters. It validates cluster state against recipe specifications across multiple phases using a Job-based execution model.

Quick Start

import (
    "context"
    "github.com/NVIDIA/aicr/pkg/validator"
    "github.com/NVIDIA/aicr/pkg/recipe"
    "github.com/NVIDIA/aicr/pkg/snapshotter"
)

// Load recipe and snapshot
recipe := recipe.Load("recipe.yaml")
snapshot := snapshotter.Load("snapshot.yaml")

// Create validator
v := validator.New(validator.WithKubeconfig("/path/to/kubeconfig"))

// Validate a specific phase
result, err := v.ValidatePhase(context.Background(), "deployment", recipe, snapshot)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Status: %s, Passed: %d, Failed: %d\n",
    result.Status, result.Summary.Passed, result.Summary.Failed)

Architecture

Validation Phases

Phase	Execution	Data Source	Purpose
Readiness	Constraints inline, Checks in Jobs	Snapshot only	Validate prerequisites before deployment
Deployment	All in Jobs	Snapshot + Live cluster	Verify deployed resources
Performance	All in Jobs	Snapshot + Live cluster	Measure system performance
Conformance	All in Jobs	Snapshot + Live cluster	Validate API conformance

Execution Model

Recipe Definition
    ↓
┌─────────────────────────────────────────────────────┐
│ Readiness Phase                                     │
│ • Constraints: Evaluated inline (snapshot)          │
│ • Checks: Run in Jobs (GPU detection, kernel, OS)   │
└─────────────────────────────────────────────────────┘
    ↓ (if passed)
┌─────────────────────────────────────────────────────┐
│ Deployment Phase                                    │
│ • Constraints: Run in Jobs (operator versions)      │
│ • Checks: Run in Jobs (operator health, resources)  │
└─────────────────────────────────────────────────────┘
    ↓ (if passed)
┌─────────────────────────────────────────────────────┐
│ Performance Phase                                   │
│ • Constraints: Run in Jobs (bandwidth thresholds)   │
│ • Checks: Run in Jobs (NCCL tests, fabric health)   │
└─────────────────────────────────────────────────────┘
    ↓ (if passed)
┌─────────────────────────────────────────────────────┐
│ Conformance Phase                                   │
│ • Constraints: Run in Jobs (API versions)           │
│ • Checks: Run in Jobs (API conformance, workloads)  │
└─────────────────────────────────────────────────────┘
    ↓
Validation Results

Job-Based Execution

All checks run inside Kubernetes Jobs for:

Isolation: Proper RBAC and resource limits
Observability: Jobs visible in kubectl get jobs
Reproducibility: Consistent execution environment
Flexibility: Node affinity for GPU tests

Validator (CLI/API)
    ↓
Agent Deployer
    ├─► RBAC (ServiceAccount, Role, RoleBinding)
    ├─► ConfigMaps (snapshot.yaml, recipe.yaml, validation-result.yaml)
    └─► Job
         ├─► Executes: go test -json (all tests in phase)
         ├─► Test wrapper loads ValidationContext
         ├─► Check functions run with snapshot + K8s client
         └─► Results output to logs (JSON format)
              └─► Validator parses logs and updates ValidationResult ConfigMap

Validator Image

Validation Jobs require a special image with Go toolchain to run tests in-cluster.

Why a Separate Image?

Main aicr image (built with Ko): Contains only the compiled binary, no Go toolchain
Validator image: Contains Go toolchain + source code to run go test commands

Building the Validator Image:

# Local development (with local registry)
make image-validator IMAGE_REGISTRY=localhost:5001 IMAGE_TAG=latest

# Production release (published to GHCR)
# Automatically built by goreleaser on git tags
docker pull ghcr.io/nvidia/aicr-validator:latest
docker pull ghcr.io/nvidia/aicr-validator:v0.4.0

Image Configuration:

// Default image (overridable)
v := validator.New(
    validator.WithImage("ghcr.io/nvidia/aicr-validator:latest"),
)

// Or via CLI
aicr validate --image localhost:5001/aicr-validator:latest \
  -r recipe.yaml -s snapshot.yaml

// Or via environment variable (for CI)
export AICR_VALIDATOR_IMAGE=localhost:5001/aicr-validator:local
aicr validate -r recipe.yaml -s snapshot.yaml

CI/CD:

E2E tests build validator image from current source code
Release pipeline publishes to ghcr.io/nvidia/aicr-validator
Multi-platform support (linux/amd64, linux/arm64)
SLSA attestation for supply chain security

Test Wrapper Infrastructure:

Checks execute via Go's standard test framework:

// Check function (registered in init())
func CheckGPUHardwareDetection(ctx *checks.ValidationContext) error {
    // Access snapshot data and K8s API
    for _, m := range ctx.Snapshot.Measurements {
        if m.Type == measurement.TypeGPU { /* validate */ }
    }
    return nil
}

// Test wrapper (enables Job execution)
func TestGPUHardwareDetection(t *testing.T) {
    runner, err := checks.NewTestRunner(t)  // Loads context from Job env
    if err != nil {
        t.Skipf("Skipping (not in Kubernetes): %v", err)
        return
    }
    runner.RunCheck("gpu-hardware-detection")  // Executes check
}

The test wrapper pattern enables:

✅ Standard Go testing infrastructure (go test)
✅ Automatic context loading (snapshot, K8s client)
✅ Graceful skipping during local development
✅ JSON test output for result parsing

See: checks/README.md for complete guide, examples, and troubleshooting.

Validation Run Management (RunID)

Each validation run is assigned a unique RunID for resource isolation and resumability:

RunID Format: YYYYMMDD-HHMMSS-XXXXXXXXXXXXXXXX (e.g., 20260206-140523-a3f9b2c1e7d04a68b2c1e7d04a68)

Timestamp: Date and time when validation started
Random suffix: 16 hex characters for uniqueness

Resource Naming: All resources created during a validation run include the RunID:

Input ConfigMaps: aicr-snapshot-{runID}, aicr-recipe-{runID} (shared by all phases)
Output ConfigMap: aicr-validation-result-{runID} (progressively updated)
Jobs: aicr-{runID}-readiness, aicr-{runID}-deployment, etc. (one per phase)

Benefits:

Concurrent Validations: Multiple validation runs can execute simultaneously without conflicts
Resumability: Failed validations can be resumed from the last successful phase (future feature)
Traceability: All resources for a run are grouped by RunID label
Cleanup: Resources can be cleaned up per-run using RunID labels

CLI Output:

$ aicr validate --phase all --recipe recipe.yaml --snapshot snapshot.yaml
Starting validation run: 20260206-140523-a3f9b2c1e7d04a68
...

Querying Validation Runs:

# List all validation runs
kubectl get configmaps -n aicr-validation \
  -l app.kubernetes.io/component=validation

# List resources for specific run
kubectl get jobs,configmaps -n aicr-validation \
  -l aicr.nvidia.com/run-id=20260206-140523-a3f9b2c1e7d04a68

# View run details
kubectl get configmap -n aicr-validation \
  -l aicr.nvidia.com/run-id=20260206-140523-a3f9b2c1e7d04a68 \
  -o yaml

Cleanup by RunID:

# Cleanup specific validation run
kubectl delete jobs,configmaps -n aicr-validation \
  -l aicr.nvidia.com/run-id=20260206-140523-a3f9b2c1e7d04a68

# Cleanup all validation runs (caution!)
kubectl delete jobs,configmaps -n aicr-validation \
  -l app.kubernetes.io/component=validation

ValidationResult ConfigMap (Resumability)

The validator creates a single ValidationResult ConfigMap per validation run that is progressively updated:

ConfigMap: aicr-validation-result-{runID}

Lifecycle:

Creation: Created at validation start with empty structure
Progressive Updates: Updated after each phase completes with results
Resume: Read by --resume flag to continue from failed phase
Cleanup: Automatically deleted after validation completes

Resume Functionality:

# New validation (auto-generates RunID)
aicr validate --phase all --recipe recipe.yaml --snapshot snapshot.yaml
# Output: Starting validation run: 20260206-140523-a3f9b2c1e7d04a68

# Validation fails at deployment phase (readiness passed)
# Resume from failed phase
aicr validate --phase all --resume 20260206-140523-a3f9b2c1e7d04a68
# Reads existing results, skips readiness (passed), continues from deployment

Query Validation State:

# View current validation progress
kubectl get cm aicr-validation-result-20260206-140523-a3f9b2c1e7d04a68 -o yaml

# Check which phases passed/failed
kubectl get cm aicr-validation-result-20260206-140523-a3f9b2c1e7d04a68 \
  -o jsonpath='{.data.result\.yaml}' | yq '.phases'

Implementation:

createValidationResultConfigMap() - Creates empty structure
updateValidationResultConfigMap() - Updates after each phase
readValidationResultConfigMap() - Reads for resume
determineStartPhase() - Finds where to resume from

ConfigMap Management

The validator automatically manages ConfigMaps for snapshot and recipe data:

Lifecycle:

Creation: ConfigMaps are created once per validation run before any phases execute
- aicr-snapshot-{runID}: Contains the cluster snapshot (YAML)
- aicr-recipe-{runID}: Contains the recipe configuration (YAML)
Reuse: All phases in a validation run share the same ConfigMaps
- Readiness phase uses snapshot-{runID} and recipe-{runID}
- Deployment phase uses snapshot-{runID} and recipe-{runID}
- Performance phase uses snapshot-{runID} and recipe-{runID}
- Conformance phase uses snapshot-{runID} and recipe-{runID}
Mounting: Jobs mount these ConfigMaps as volumes at:
- /data/snapshot/snapshot.yaml
- /data/recipe/recipe.yaml
Cleanup: ConfigMaps are automatically deleted once after all phases complete

Implementation Details:

ConfigMaps are created once per validation run, not per phase (efficient)
ConfigMaps are uniquely named per validation run using RunID
Each ConfigMap includes labels for querying and cleanup:
- aicr.nvidia.com/run-id: The validation run identifier
- aicr.nvidia.com/created-at: Timestamp (format: YYYYMMDD-HHMMSS)
- aicr.nvidia.com/data-type: snapshot or recipe
Cleanup happens in defer blocks to ensure removal even on errors
Test wrappers load data from mounted ConfigMaps using LoadValidationContext()

Security Considerations:

ConfigMaps may contain sensitive cluster information
Access is restricted by Kubernetes RBAC
ConfigMaps are namespace-scoped (default: aicr-validation)
RunID-based naming prevents conflicts between concurrent validations

Recipe Format

Constraints and Checks

Constraints - Expression-based validations:

validation:
  deployment:
    constraints:
      - name: Deployment.gpu-operator.version
        value: ">= v24.6.0"
      - name: Deployment.device-plugin.replicas
        value: ">= 1"

Checks - Named validation tests:

# expected-resources check requires expectedResources on componentRefs
componentRefs:
  - name: gpu-operator
    type: Helm
    expectedResources:
      - kind: Deployment
        name: gpu-operator
        namespace: gpu-operator

validation:
  deployment:
    checks:
      - operator-health
      - expected-resources

Multi-Phase Recipe Example

# expectedResources are declared on componentRefs (used by expected-resources check)
componentRefs:
  - name: gpu-operator
    type: Helm
    expectedResources:
      - kind: Deployment
        name: gpu-operator
        namespace: gpu-operator
      - kind: DaemonSet
        name: nvidia-driver-daemonset
        namespace: gpu-operator

validation:
  # Phase 1: Readiness (pre-deployment validation)
  readiness:
    constraints:
      - name: GPU.count
        value: ">= 8"
      - name: OS.version
        value: "== ubuntu"
      - name: Kernel.version
        value: ">= 5.15.0"
    checks:
      - gpu-hardware-detection
      - kernel-parameters
      - os-prerequisites

  # Phase 2: Deployment (verify deployed resources)
  deployment:
    constraints:
      - name: Deployment.gpu-operator.version
        value: ">= v24.6.0"
    checks:
      - operator-health
      - expected-resources

  # Phase 3: Performance (measure system performance)
  performance:
    constraints:
      - name: Performance.nccl.bandwidth
        value: ">= 200"  # GB/s
    checks:
      - nccl-bandwidth-test
      - fabric-health

  # Phase 4: Conformance (validate compatibility)
  conformance:
    checks:
      - ai-workload-validation

Result Format

type ValidationResult struct {
    Phase     string              // "readiness", "deployment", etc.
    Status    ValidationStatus    // "pass", "fail", "skipped"
    StartTime time.Time
    EndTime   time.Time
    Duration  time.Duration

    // Constraints evaluated
    Constraints []ConstraintValidation

    // Checks executed
    Checks []CheckResult

    // Summary statistics
    Summary ValidationSummary
}

type ConstraintValidation struct {
    Name     string  // e.g., "Deployment.gpu-operator.version"
    Expected string  // e.g., ">= v24.6.0"
    Actual   string  // e.g., "v24.6.0"
    Passed   bool
    Message  string
}

type CheckResult struct {
    Name     string  // e.g., "operator-health"
    Status   ValidationStatus
    Message  string
    Duration time.Duration
}

CLI Usage

# Validate all phases
aicr validate --phase all \
  --recipe recipe.yaml \
  --snapshot snapshot.yaml

# Validate specific phase
aicr validate --phase deployment \
  --recipe recipe.yaml \
  --snapshot snapshot.yaml

# Output formats
aicr validate --phase all -o json
aicr validate --phase all -o yaml
aicr validate --phase all -o table

Documentation

Core Documentation

Constraint Expression Reference - Syntax and operators
Agent Architecture - Job execution model

Check Development

Checks Architecture - Overview and design
How-To Guide - Registering checks and constraints
Troubleshooting - Common issues and solutions

Phase-Specific Guides

Deployment Checks - Deployment phase validations

Key Concepts

Checks vs Constraints

Aspect	Check	Constraint
Definition	Named validation test	Expression-based validation
Returns	Pass/fail (error)	Actual value + pass/fail
Registration	`RegisterCheck()`	`RegisterConstraintValidator()`
Recipe Syntax	`checks: [name]`	`constraints: [{name, value}]`
Example	`operator-health`	`Deployment.gpu-operator.version: ">= v24.6.0"`

ValidationContext

Validation functions receive a context with:

type ValidationContext struct {
    Context   context.Context        // Cancellation and timeouts
    Snapshot  *snapshotter.Snapshot  // Captured cluster state
    Clientset kubernetes.Interface   // Live Kubernetes API access
    RecipeData map[string]interface{} // Recipe metadata
}

Snapshot: Hardware inventory, OS info, pre-capture state
Clientset: Query live cluster (deployments, pods, etc.)
RecipeData: Access recipe configuration

Phase Dependencies

Phases execute sequentially with early exit:

Readiness must pass before Deployment
Deployment must pass before Performance
Performance must pass before Conformance

If any phase fails, subsequent phases are skipped.

Testing

Unit Testing Validators

func TestValidateOperatorVersion(t *testing.T) {
    // Create fake Kubernetes client
    deployment := createTestDeployment("v24.6.0")
    clientset := fake.NewSimpleClientset(deployment)

    ctx := &checks.ValidationContext{
        Context:   context.Background(),
        Clientset: clientset,
    }

    constraint := recipe.Constraint{
        Name:  "Deployment.gpu-operator.version",
        Value: ">= v24.6.0",
    }

    actual, passed, err := ValidateGPUOperatorVersion(ctx, constraint)
    assert.NoError(t, err)
    assert.True(t, passed)
    assert.Equal(t, "v24.6.0", actual)
}

Integration Testing

# Run all validator tests
go test -v ./pkg/validator/...

# Run with race detector
go test -v -race ./pkg/validator/...

# Run specific phase tests
go test -v ./pkg/validator/checks/deployment/...

Design Decisions

Why Job-Based Execution?

Cluster Context: Checks run with proper RBAC inside the cluster
Resource Control: Jobs can have CPU/memory limits
Node Scheduling: Performance tests can target GPU nodes
Observability: Jobs appear in kubectl get jobs
Isolation: Each check is independent

Why Constraint Validators Run in Jobs?

Deployment, performance, and conformance constraints need live cluster access:

Query deployed operator versions
Measure network bandwidth
Check API conformance

Only readiness constraints can evaluate inline because they only need snapshot data.

Why ConfigMaps for Results?

Single ValidationResult ConfigMap per validation run:

ConfigMap: aicr-validation-result-{runID}
Progressively updated as each phase completes
Enables resumability (--resume flag)
Persists even if CLI crashes or disconnects

Benefits:

Resumability: Continue from failed phase using --resume {runID}
Observability: Query current validation state with kubectl get cm
Persistence: Results survive Job deletion and CLI disconnection
Progressive Updates: Real-time visibility into validation progress
Accessibility: Easy to retrieve and inspect with kubectl

Example:

# Check validation progress
kubectl get cm aicr-validation-result-20260206-140523-a3f9b2c1e7d04a68 -o yaml

# Resume from failure
aicr validate --resume 20260206-140523-a3f9b2c1e7d04a68

Examples

Example 1: Validate GPU Operator Deployment

validation:
  deployment:
    constraints:
      - name: Deployment.gpu-operator.version
        value: ">= v24.6.0"
    checks:
      - operator-health

Example 2: Performance Validation

validation:
  performance:
    constraints:
      - name: Performance.nccl.bandwidth
        value: ">= 200"
    checks:
      - nccl-bandwidth-test
      - fabric-health

Example 3: Full Multi-Phase Validation

validation:
  readiness:
    constraints:
      - name: GPU.count
        value: ">= 8"
    checks:
      - gpu-hardware-detection

  deployment:
    constraints:
      - name: Deployment.gpu-operator.version
        value: ">= v24.6.0"
    checks:
      - operator-health

  performance:
    checks:
      - nccl-bandwidth-test

  conformance:
    checks:
      - ai-workload-validation

API Reference

Main API

// Create validator
validator := validator.New(
    validator.WithKubeconfig(kubeconfigPath),
    validator.WithTimeout(5 * time.Minute),
)

// Validate specific phase
result, err := validator.ValidatePhase(ctx, "deployment", recipe, snapshot)

// Validate all phases
results, err := validator.ValidateAll(ctx, recipe, snapshot)

// Validate with phase filter
results, err := validator.ValidatePhases(ctx,
    []string{"readiness", "deployment"}, recipe, snapshot)

Registry API

// Get registered check
check, ok := checks.GetCheck("operator-health")

// Get registered constraint validator
validator, ok := checks.GetConstraintValidator("Deployment.gpu-operator.version")

// List all checks for a phase
checkList := checks.ListChecks("deployment")

// List all constraint validators
validators := checks.ListConstraintValidators()

Troubleshooting

See Troubleshooting Guide for:

Common errors and solutions
RBAC permission issues
Job timeout debugging
How to view Job logs
Test mode vs production mode

Contributing

To add new validation checks or constraint validators:

Read How-To Guide for step-by-step instructions
Follow existing patterns in pkg/validator/checks/
Write comprehensive tests
Update documentation

References

Constraint Syntax: CONSTRAINTS.md
Check Development: checks/README.md
Architecture Details: checks/README.md
Agent Implementation: agent/README.md

Documentation ¶

Overview ¶

Package validator provides recipe constraint validation against system snapshots.

Overview ¶

The validator package evaluates recipe constraints against actual system measurements captured in snapshots. It supports version comparison operators and exact string matching to determine if a cluster meets the requirements specified in a recipe.

Constraint Format ¶

Constraints use fully qualified measurement paths in the format: {Type}.{Subtype}.{Key}

Examples:

K8s.server.version         -> Kubernetes server version
OS.release.ID              -> Operating system identifier (e.g., "ubuntu")
OS.release.VERSION_ID      -> OS version (e.g., "24.04")
OS.sysctl./proc/sys/kernel/osrelease -> Kernel version

Supported Operators ¶

The following comparison operators are supported in constraint values:

">=" - Greater than or equal (version comparison)
"<=" - Less than or equal (version comparison)
">" - Greater than (version comparison)
"<" - Less than (version comparison)
"==" - Exact match (string or version)
"!=" - Not equal (string or version)
(no operator) - Exact string match

Usage ¶

Basic validation:

v := validator.New()
result, err := v.Validate(ctx, recipe, snapshot)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Status: %s\n", result.Summary.Status)
for _, r := range result.Results {
    fmt.Printf("  %s: expected %q, got %q - %v\n",
        r.Name, r.Expected, r.Actual, r.Status)
}

Result Structure ¶

ValidationResult contains:

Summary: Overall pass/fail counts and status
Results: Per-constraint validation results with expected/actual values

Error Handling ¶

Constraints that cannot be evaluated (e.g., path not found in snapshot) are marked as "skipped" with appropriate warning messages, allowing partial validation results to be returned.

Index ¶

Constants
Variables
type CheckResult
type ConstraintEvalResult
- func EvaluateConstraint(constraint recipe.Constraint, snap *snapshotter.Snapshot) ConstraintEvalResult
type ConstraintPath
- func ParseConstraintPath(path string) (*ConstraintPath, error)
- func (cp *ConstraintPath) ExtractValue(snap *snapshotter.Snapshot) (string, error)
- func (cp *ConstraintPath) String() string
type ConstraintStatus
type ConstraintValidation
type Operator
type Option
- func WithCleanup(cleanup bool) Option
- func WithImage(image string) Option
- func WithImagePullSecrets(secrets []string) Option
- func WithNamespace(namespace string) Option
- func WithNoCluster(noCluster bool) Option
- func WithRunID(runID string) Option
- func WithVersion(version string) Option
type ParsedConstraint
- func ParseConstraintExpression(expr string) (*ParsedConstraint, error)
- func (pc *ParsedConstraint) Evaluate(actual string) (bool, error)
- func (pc *ParsedConstraint) String() string
type PhaseResult
type ValidationPhaseName
type ValidationResult
- func NewValidationResult() *ValidationResult
type ValidationStatus
type ValidationSummary
type Validator
- func New(opts ...Option) *Validator
- func (v *Validator) Validate(ctx context.Context, recipeResult *recipe.RecipeResult, ...) (*ValidationResult, error)
- func (v *Validator) ValidatePhase(ctx context.Context, phase ValidationPhaseName, ...) (*ValidationResult, error)
- func (v *Validator) ValidatePhases(ctx context.Context, phases []ValidationPhaseName, ...) (*ValidationResult, error)

Constants ¶

View Source

const (
	DefaultReadinessTimeout   = defaults.ValidateReadinessTimeout
	DefaultDeploymentTimeout  = defaults.ValidateDeploymentTimeout
	DefaultPerformanceTimeout = defaults.ValidatePerformanceTimeout
	DefaultConformanceTimeout = defaults.ValidateConformanceTimeout
)

Phase timeout aliases — defined in pkg/defaults/timeouts.go.

View Source

const (
	// APIVersion is the API version for validation results.
	APIVersion = "aicr.nvidia.com/v1alpha1"
)

Variables ¶

View Source

var PhaseOrder = []ValidationPhaseName{
	PhaseReadiness,
	PhaseDeployment,
	PhasePerformance,
	PhaseConformance,
}

PhaseOrder defines the canonical execution order for validation phases. Readiness and deployment must run before performance or conformance.

Functions ¶

This section is empty.

Types ¶

type CheckResult ¶

type CheckResult struct {
	// Name is the check identifier.
	Name string `json:"name" yaml:"name"`

	// Status is the check outcome.
	Status ValidationStatus `json:"status" yaml:"status"`

	// Reason explains why the check failed or was skipped.
	Reason string `json:"reason,omitempty" yaml:"reason,omitempty"`

	// Remediation provides actionable guidance for fixing failures.
	Remediation string `json:"remediation,omitempty" yaml:"remediation,omitempty"`
}

CheckResult represents the result of a named validation check.

type ConstraintEvalResult ¶

type ConstraintEvalResult struct {
	// Passed indicates if the constraint was satisfied.
	Passed bool

	// Actual is the actual value extracted from the snapshot.
	Actual string

	// Error contains the error if evaluation failed (e.g., value not found).
	Error error
}

ConstraintEvalResult represents the result of evaluating a single constraint.

func EvaluateConstraint ¶

func EvaluateConstraint(constraint recipe.Constraint, snap *snapshotter.Snapshot) ConstraintEvalResult

EvaluateConstraint evaluates a single constraint against a snapshot. This is a standalone function that can be used by other packages without creating a full Validator instance. Used by the recipe package to filter overlays based on constraint evaluation during snapshot-based recipe generation.

type ConstraintPath ¶

type ConstraintPath struct {
	Type    measurement.Type
	Subtype string
	Key     string
}

ConstraintPath represents a parsed fully qualified constraint path. Format: {Type}.{Subtype}.{Key} Example: "K8s.server.version" -> Type="K8s", Subtype="server", Key="version"

func ParseConstraintPath ¶

func ParseConstraintPath(path string) (*ConstraintPath, error)

ParseConstraintPath parses a fully qualified constraint path. The path format is: {Type}.{Subtype}.{Key} The key portion may contain dots (e.g., "/proc/sys/kernel/osrelease").

func (*ConstraintPath) ExtractValue ¶

func (cp *ConstraintPath) ExtractValue(snap *snapshotter.Snapshot) (string, error)

ExtractValue extracts the value at this path from a snapshot. Returns the value as a string, or an error if the path doesn't exist.

func (*ConstraintPath) String ¶

func (cp *ConstraintPath) String() string

String returns the fully qualified path string.

type ConstraintStatus ¶

type ConstraintStatus string

ConstraintStatus represents the outcome of evaluating a single constraint.

const (
	// ConstraintStatusPassed indicates the constraint was satisfied.
	ConstraintStatusPassed ConstraintStatus = "passed"

	// ConstraintStatusFailed indicates the constraint was not satisfied.
	ConstraintStatusFailed ConstraintStatus = "failed"

	// ConstraintStatusSkipped indicates the constraint couldn't be evaluated.
	ConstraintStatusSkipped ConstraintStatus = "skipped"
)

type ConstraintValidation ¶

type ConstraintValidation struct {
	// Name is the fully qualified constraint name (e.g., "K8s.server.version").
	Name string `json:"name" yaml:"name"`

	// Expected is the constraint expression from the recipe (e.g., ">= 1.32.4").
	Expected string `json:"expected" yaml:"expected"`

	// Actual is the value found in the snapshot (e.g., "v1.33.5-eks-3025e55").
	Actual string `json:"actual" yaml:"actual"`

	// Status is the outcome of this constraint evaluation.
	Status ConstraintStatus `json:"status" yaml:"status"`

	// Message provides additional context, especially for failures or skipped constraints.
	Message string `json:"message,omitempty" yaml:"message,omitempty"`
}

ConstraintValidation represents the result of evaluating a single constraint.

type Operator ¶

type Operator string

Operator represents a comparison operator in constraint expressions.

const (
	// OperatorGTE represents ">=" (greater than or equal).
	OperatorGTE Operator = ">="

	// OperatorLTE represents "<=" (less than or equal).
	OperatorLTE Operator = "<="

	// OperatorGT represents ">" (greater than).
	OperatorGT Operator = ">"

	// OperatorLT represents "<" (less than).
	OperatorLT Operator = "<"

	// OperatorEQ represents "==" (exact match).
	OperatorEQ Operator = "=="

	// OperatorNE represents "!=" (not equal).
	OperatorNE Operator = "!="

	// OperatorExact represents no operator (exact string match).
	OperatorExact Operator = ""
)

type Option ¶

type Option func(*Validator)

Option is a functional option for configuring Validator instances.

func WithCleanup ¶

func WithCleanup(cleanup bool) Option

WithCleanup returns an Option that controls cleanup of validation resources. When false, Jobs, ConfigMaps, and RBAC resources are kept for debugging.

func WithImage ¶

func WithImage(image string) Option

WithImage returns an Option that sets the container image for validation Jobs.

func WithImagePullSecrets ¶

func WithImagePullSecrets(secrets []string) Option

WithImagePullSecrets returns an Option that sets image pull secrets for validation Jobs.

func WithNamespace ¶

func WithNamespace(namespace string) Option

WithNamespace returns an Option that sets the namespace for validation jobs.

func WithNoCluster ¶

func WithNoCluster(noCluster bool) Option

WithNoCluster returns an Option that controls cluster access. When set to true, validation runs in dry-run mode without connecting to cluster.

func WithRunID ¶

func WithRunID(runID string) Option

WithRunID returns an Option that sets the RunID for this validation run. Used when resuming a previous validation run.

func WithVersion ¶

func WithVersion(version string) Option

WithVersion returns an Option that sets the Validator version string.

type ParsedConstraint ¶

type ParsedConstraint struct {
	// Operator is the comparison operator (or empty for exact match).
	Operator Operator

	// Value is the expected value after the operator.
	Value string

	// IsVersionComparison indicates if this should be treated as a version comparison.
	IsVersionComparison bool
}

ParsedConstraint represents a parsed constraint expression.

func ParseConstraintExpression ¶

func ParseConstraintExpression(expr string) (*ParsedConstraint, error)

ParseConstraintExpression parses a constraint value expression. Examples:

">= 1.32.4" -> {Operator: ">=", Value: "1.32.4", IsVersionComparison: true}
"ubuntu" -> {Operator: "", Value: "ubuntu", IsVersionComparison: false}
"== 24.04" -> {Operator: "==", Value: "24.04", IsVersionComparison: false}

func (*ParsedConstraint) Evaluate ¶

func (pc *ParsedConstraint) Evaluate(actual string) (bool, error)

Evaluate evaluates the constraint against an actual value. Returns true if the constraint is satisfied, false otherwise.

func (*ParsedConstraint) String ¶

func (pc *ParsedConstraint) String() string

String returns a string representation of the parsed constraint.

type PhaseResult ¶

type PhaseResult struct {
	// Status is the overall status of this phase.
	Status ValidationStatus `json:"status" yaml:"status"`

	// Constraints contains per-constraint results for this phase.
	Constraints []ConstraintValidation `json:"constraints,omitempty" yaml:"constraints,omitempty"`

	// Checks contains results of named validation checks.
	Checks []CheckResult `json:"checks,omitempty" yaml:"checks,omitempty"`

	// Reason explains why the phase was skipped or failed.
	Reason string `json:"reason,omitempty" yaml:"reason,omitempty"`

	// Duration is how long this phase took to run.
	Duration time.Duration `json:"duration,omitempty" yaml:"duration,omitempty"`
}

PhaseResult represents the result of a single validation phase.

type ValidationPhaseName ¶

type ValidationPhaseName string

ValidationPhaseName represents the name of a validation phase.

const (
	// PhaseReadiness is the readiness validation phase.
	PhaseReadiness ValidationPhaseName = "readiness"

	// PhaseDeployment is the deployment validation phase.
	PhaseDeployment ValidationPhaseName = "deployment"

	// PhasePerformance is the performance validation phase.
	PhasePerformance ValidationPhaseName = "performance"

	// PhaseConformance is the conformance validation phase.
	PhaseConformance ValidationPhaseName = "conformance"

	// PhaseAll runs all phases sequentially.
	PhaseAll ValidationPhaseName = "all"
)

type ValidationResult ¶

type ValidationResult struct {
	header.Header `json:",inline" yaml:",inline"`

	// RunID is a unique identifier for this validation run.
	// Used for resume functionality and correlating resources.
	// Format: YYYYMMDD-HHMMSS-RANDOM (e.g., "20260206-140523-a3f9")
	RunID string `json:"runID,omitempty" yaml:"runID,omitempty"`

	// RecipeSource is the path/URI of the recipe that was validated.
	RecipeSource string `json:"recipeSource" yaml:"recipeSource"`

	// SnapshotSource is the path/URI of the snapshot used for validation.
	SnapshotSource string `json:"snapshotSource" yaml:"snapshotSource"`

	// Summary contains aggregate validation statistics.
	Summary ValidationSummary `json:"summary" yaml:"summary"`

	// Results contains per-constraint validation details (legacy, for backward compatibility).
	Results []ConstraintValidation `json:"results,omitempty" yaml:"results,omitempty"`

	// Phases contains per-phase validation results (multi-phase validation).
	Phases map[string]*PhaseResult `json:"phases,omitempty" yaml:"phases,omitempty"`
}

ValidationResult represents the complete validation outcome.

func NewValidationResult ¶

func NewValidationResult() *ValidationResult

NewValidationResult creates a new ValidationResult with initialized slices.

type ValidationStatus ¶

type ValidationStatus string

ValidationStatus represents the overall validation outcome.

const (
	// ValidationStatusPass indicates all constraints passed.
	ValidationStatusPass ValidationStatus = "pass"

	// ValidationStatusFail indicates one or more constraints failed.
	ValidationStatusFail ValidationStatus = "fail"

	// ValidationStatusPartial indicates some constraints couldn't be evaluated.
	ValidationStatusPartial ValidationStatus = "partial"

	// ValidationStatusSkipped indicates a phase was skipped (due to dependency failure).
	ValidationStatusSkipped ValidationStatus = "skipped"

	// ValidationStatusWarning indicates warnings but no hard failures.
	ValidationStatusWarning ValidationStatus = "warning"
)

type ValidationSummary ¶

type ValidationSummary struct {
	// Passed is the count of constraints that were satisfied.
	Passed int `json:"passed" yaml:"passed"`

	// Failed is the count of constraints that were not satisfied.
	Failed int `json:"failed" yaml:"failed"`

	// Skipped is the count of constraints that couldn't be evaluated.
	Skipped int `json:"skipped" yaml:"skipped"`

	// Total is the total number of constraints evaluated.
	Total int `json:"total" yaml:"total"`

	// Status is the overall validation status.
	Status ValidationStatus `json:"status" yaml:"status"`

	// Duration is how long the validation took.
	Duration time.Duration `json:"duration" yaml:"duration"`
}

ValidationSummary contains aggregate statistics about the validation.

type Validator ¶

type Validator struct {
	// Version is the validator version (typically the CLI version).
	Version string

	// Namespace is the Kubernetes namespace where validation jobs will run.
	// Defaults to "aicr-validation" if not specified.
	Namespace string

	// Image is the container image to use for validation Jobs.
	// Must include Go toolchain for running tests.
	// Defaults to "ghcr.io/nvidia/aicr-validator:latest".
	Image string

	// RunID is a unique identifier for this validation run.
	// Used to scope all resources (ConfigMaps, Jobs) and enable resumability.
	// Format: YYYYMMDD-HHMMSS-RANDOM (e.g., "20260206-140523-a3f9")
	RunID string

	// Cleanup controls whether to delete Jobs, ConfigMaps, and RBAC resources after validation.
	// Defaults to true. Set to false to keep resources for debugging.
	Cleanup bool

	// ImagePullSecrets are secret names for pulling images from private registries.
	ImagePullSecrets []string

	// NoCluster controls whether to skip actual cluster operations (dry-run mode).
	// When true, validation runs without connecting to Kubernetes cluster.
	NoCluster bool
}

Validator evaluates recipe constraints against snapshot measurements.

func New ¶

func New(opts ...Option) *Validator

New creates a new Validator with the provided options.

func (*Validator) Validate ¶

func (v *Validator) Validate(ctx context.Context, recipeResult *recipe.RecipeResult, snap *snapshotter.Snapshot) (*ValidationResult, error)

Validate evaluates all constraints from the recipe against the snapshot. Returns a ValidationResult containing per-constraint results and summary.

func (*Validator) ValidatePhase ¶

func (v *Validator) ValidatePhase(
	ctx context.Context,
	phase ValidationPhaseName,
	recipeResult *recipe.RecipeResult,
	snap *snapshotter.Snapshot,
) (*ValidationResult, error)

ValidatePhase runs validation for a specific phase. This is the main entry point for phase-based validation.

func (*Validator) ValidatePhases ¶

func (v *Validator) ValidatePhases(
	ctx context.Context,
	phases []ValidationPhaseName,
	recipeResult *recipe.RecipeResult,
	snap *snapshotter.Snapshot,
) (*ValidationResult, error)

ValidatePhases runs validation for multiple specified phases. If no phases are specified, defaults to readiness phase. If phases includes "all", runs all phases.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
agent
checks
deployment
readiness
helper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL