aicr

module
v0.11.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 21, 2026 License: Apache-2.0

README

NVIDIA AI Cluster Runtime

On Push CI On Tag Release License

AI Cluster Runtime (AICR) makes it easy to stand up GPU-accelerated Kubernetes clusters. It captures known-good combinations of drivers, operators, kernels, and system configurations and publishes them as version-locked recipes — reproducible artifacts for Helm, ArgoCD, and other deployment frameworks.

Why We Built This

Running GPU-accelerated Kubernetes clusters reliably is hard. Small differences in kernel versions, drivers, container runtimes, operators, and Kubernetes releases can cause failures that are difficult to diagnose and expensive to reproduce.

Historically, this knowledge has lived in internal validation pipelines and runbooks. AI Cluster Runtime makes it available to everyone.

Every AICR recipe is:

  • Optimized — Tuned for a specific combination of hardware, cloud, OS, and workload intent.
  • Validated — Passes automated constraint and compatibility checks before publishing.
  • Reproducible — Same inputs produce identical deployments every time.

Quick Start

Install and generate your first recipe in under two minutes:

# Install the CLI (Homebrew)
brew tap NVIDIA/aicr
brew install aicr

# Or use the install script
curl -sfL https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --


# Capture your cluster's current state
aicr snapshot --output snapshot.yaml

# Generate a validated recipe for your environment
aicr recipe --service aks --accelerator h100 --os ubuntu \
  --intent training --platform kubeflow -o recipe.yaml

# Query a specific hydrated config value
aicr query --service aks --accelerator h100 --os ubuntu --intent training \
  --selector components.gpu-operator.values.driver.version

# Validate the recipe against your cluster
aicr validate --recipe recipe.yaml --snapshot snapshot.yaml

# Render into deployment-ready Helm charts
aicr bundle --recipe recipe.yaml --output ./bundles

# Validate your cluster (deployment, performance, conformance)
aicr validate --recipe recipe.yaml --phase all --output report.json

The bundles/ directory contains per-component Helm charts with values files, checksums, and deployer configs. Deploy with helm install, commit to a GitOps repo, or use the built-in ArgoCD deployer.

See the Installation Guide for manual installation, building from source, and container images.

Features

Feature Description
aicr CLI Single binary. Generate recipes, create bundles, capture snapshots, validate configs.
API Server (aicrd) REST API with the same capabilities as the CLI. Run in-cluster for CI/CD integration or air-gapped environments.
Snapshot Agent Kubernetes Job that captures live cluster state (GPU hardware, drivers, OS, operators) into a ConfigMap for validation against recipes.
Supply Chain Security SLSA Level 3 provenance, signed SBOMs, image attestations (cosign), and checksum verification on every release.

Supported Components

Dimension This Release
Kubernetes Amazon EKS, Azure AKS (1.34+), GKE, self-managed (Kind)
GPUs NVIDIA H100, GB200
OS Ubuntu
Workloads Training (Kubeflow), Inference (Dynamo)
Components GPU Operator, Network Operator, cert-manager, Prometheus stack, etc.

See the full Component Catalog for every component that can appear in a recipe. Don't see what you need? Open an issue — that feedback directly shapes what gets validated next.

How It Works

AICR end-to-end workflow

A recipe is a version-locked configuration for a specific environment. You describe your target (cloud, GPU, OS, workload intent), and the recipe engine matches it against a library of validated overlays — layered configurations that compose bottom-up from base defaults through cloud, accelerator, OS, and workload-specific tuning.

The bundler materializes a recipe into deployment-ready artifacts: one folder per component, each with Helm values, checksums, and a README. The validator compares a recipe against a live cluster snapshot and flags anything out of spec.

This separation means the same validated configuration works whether you deploy with Helm, ArgoCD, Flux, or a custom pipeline.

What AI Cluster Runtime Is Not

  • Not a Kubernetes distribution
  • Not a cluster provisioner or lifecycle management system
  • Not a managed control plane or hosted service
  • Not a replacement for your cloud provider or OEM platform

You bring your cluster and your tools. AI Cluster Runtime tells you what should be installed and how it should be configured.

Documentation

Choose the path that matches how you'll use the project.

User — Platform and Infrastructure Operators
Contributor — Developers and Maintainers
Integrator — Automation and Platform Engineers

Resources

  • Roadmap — Feature priorities and development timeline
  • Security — Supply chain security, vulnerability reporting, and verification
  • Releases — Binaries, SBOMs, and attestations
  • Issues — Bugs, feature requests, and questions

Contributing

AI Cluster Runtime is Apache 2.0. Contributions are welcome: new recipes for environments we haven't covered (OpenShift, AKS, bare metal), additional bundler formats, validation checks, or bug reports. See CONTRIBUTING.md for development setup and the PR process.

Directories

Path Synopsis
cmd
aicr command
aicrd command
pkg
api
Package api provides the HTTP API layer for the AICR Recipe Generation service.
Package api provides the HTTP API layer for the AICR Recipe Generation service.
build
Package build implements the build command for generating OCI artifacts from build spec files.
Package build implements the build command for generating OCI artifacts from build spec files.
bundler
Package bundler provides orchestration for generating deployment bundles from recipes.
Package bundler provides orchestration for generating deployment bundles from recipes.
bundler/attestation
Package attestation provides bundle attestation using Sigstore keyless signing.
Package attestation provides bundle attestation using Sigstore keyless signing.
bundler/checksum
Package checksum provides SHA256 checksum generation for bundle verification.
Package checksum provides SHA256 checksum generation for bundle verification.
bundler/config
Package config provides configuration options for bundler implementations.
Package config provides configuration options for bundler implementations.
bundler/deployer/argocd
Package argocd provides ArgoCD Application generation for recipes.
Package argocd provides ArgoCD Application generation for recipes.
bundler/deployer/helm
Package helm generates per-component Helm bundles from recipe results.
Package helm generates per-component Helm bundles from recipe results.
bundler/registry
Package registry provides thread-safe registration and retrieval of bundler implementations.
Package registry provides thread-safe registration and retrieval of bundler implementations.
bundler/result
Package result provides types for tracking bundle generation results.
Package result provides types for tracking bundle generation results.
bundler/types
Package types defines the type system for bundler implementations.
Package types defines the type system for bundler implementations.
bundler/verifier
Package verifier implements offline bundle verification with a four-level trust model.
Package verifier implements offline bundle verification with a four-level trust model.
cli
Package cli implements the command-line interface for the AICR aicr tool.
Package cli implements the command-line interface for the AICR aicr tool.
collector
Package collector provides interfaces and implementations for collecting system configuration data.
Package collector provides interfaces and implementations for collecting system configuration data.
collector/file
Package file provides utilities for reading files from the filesystem.
Package file provides utilities for reading files from the filesystem.
collector/gpu
Package gpu collects GPU hardware and driver configuration data.
Package gpu collects GPU hardware and driver configuration data.
collector/k8s
Package k8s collects Kubernetes cluster configuration data.
Package k8s collects Kubernetes cluster configuration data.
collector/os
Package os collects operating system configuration data.
Package os collects operating system configuration data.
collector/systemd
Package systemd collects systemd service configuration data.
Package systemd collects systemd service configuration data.
component
Package component provides the generic bundler framework and shared utilities.
Package component provides the generic bundler framework and shared utilities.
constraints
Package constraints provides constraint parsing, extraction, and evaluation utilities for comparing recipe constraints against snapshot measurements.
Package constraints provides constraint parsing, extraction, and evaluation utilities for comparing recipe constraints against snapshot measurements.
defaults
Package defaults provides centralized configuration constants for the AICR system.
Package defaults provides centralized configuration constants for the AICR system.
errors
Package errors provides structured error types for better observability and programmatic error handling across the application.
Package errors provides structured error types for better observability and programmatic error handling across the application.
evidence
Package evidence renders CNCF AI Conformance evidence markdown from CTRF reports.
Package evidence renders CNCF AI Conformance evidence markdown from CTRF reports.
header
Package header provides common header types for AICR data structures.
Package header provides common header types for AICR data structures.
k8s
Package k8s provides Kubernetes integration for Cloud Native Stack.
Package k8s provides Kubernetes integration for Cloud Native Stack.
k8s/agent
Package agent provides Kubernetes Job deployment for automated snapshot capture.
Package agent provides Kubernetes Job deployment for automated snapshot capture.
k8s/client
Package client provides a singleton Kubernetes client for efficient cluster interactions.
Package client provides a singleton Kubernetes client for efficient cluster interactions.
k8s/pod
Package pod provides shared utilities for Kubernetes Job and Pod operations.
Package pod provides shared utilities for Kubernetes Job and Pod operations.
logging
Package logging provides structured logging utilities for AICR components.
Package logging provides structured logging utilities for AICR components.
manifest
Package manifest provides Helm-compatible template rendering for manifest files.
Package manifest provides Helm-compatible template rendering for manifest files.
measurement
Package measurement provides types and utilities for collecting, comparing, and filtering system measurements from various sources (Kubernetes, GPU, OS, SystemD).
Package measurement provides types and utilities for collecting, comparing, and filtering system measurements from various sources (Kubernetes, GPU, OS, SystemD).
oci
Package oci provides functionality for packaging and pushing artifacts to OCI-compliant registries.
Package oci provides functionality for packaging and pushing artifacts to OCI-compliant registries.
recipe
Package recipe provides recipe building and matching functionality.
Package recipe provides recipe building and matching functionality.
serializer
Package serializer provides encoding and decoding of measurement data in multiple formats.
Package serializer provides encoding and decoding of measurement data in multiple formats.
server
Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml
Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml
snapshotter
Package snapshotter captures comprehensive system configuration snapshots.
Package snapshotter captures comprehensive system configuration snapshots.
trust
Package trust manages Sigstore trusted root material for offline attestation verification.
Package trust manages Sigstore trusted root material for offline attestation verification.
validator
Package validator provides a container-per-validator execution engine for AICR cluster validation.
Package validator provides a container-per-validator execution engine for AICR cluster validation.
validator/catalog
Package catalog provides the declarative validator catalog.
Package catalog provides the declarative validator catalog.
validator/ctrf
Package ctrf provides Go types and utilities for the Common Test Report Format (CTRF).
Package ctrf provides Go types and utilities for the Common Test Report Format (CTRF).
validator/labels
Package labels provides shared label constants for validation resources.
Package labels provides shared label constants for validation resources.
version
Package version provides semantic version parsing and comparison with flexible precision support.
Package version provides semantic version parsing and comparison with flexible precision support.
tests
chainsaw/ai-conformance command
ai-conformance-check parses Chainsaw assertion YAML files and verifies that every declared resource exists in the target Kubernetes cluster.
ai-conformance-check parses Chainsaw assertion YAML files and verifies that every declared resource exists in the target Kubernetes cluster.
Package validators provides shared utilities for v2 validator containers.
Package validators provides shared utilities for v2 validator containers.
chainsaw
Package chainsaw executes Chainsaw-style assertions against a live Kubernetes cluster.
Package chainsaw executes Chainsaw-style assertions against a live Kubernetes cluster.
conformance command
conformance is a validator container for all conformance phase checks.
conformance is a validator container for all conformance phase checks.
deployment command
deployment is a validator container for all deployment phase checks.
deployment is a validator container for all deployment phase checks.
helper
Package helper provides shared utilities for v2 validator containers.
Package helper provides shared utilities for v2 validator containers.
performance command
performance is a validator container for all performance phase checks.
performance is a validator container for all performance phase checks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL