aicr

module
v0.9.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 8, 2026 License: Apache-2.0

README

AI Cluster Runtime (AICR)

On Push CI On Tag Release License

AICR provides tooling for deploying optimized and validated GPU-accelerated AI runtime in Kubernetes. It captures known-good combinations of drivers, operators, kernels, and system configurations to create a reproducible artifacts for common Kubernetes deployment frameworks like Helm and ArgoCD.

Why We Built This

Running GPU-accelerated Kubernetes clusters reliably is hard. Small differences in kernel versions, drivers, container runtimes, operators, and Kubernetes releases can cause failures that are difficult to diagnose and expensive to reproduce.

Historically, this knowledge has lived in internal validation pipelines, playbooks, and tribal knowledge. AICR exists to externalize that experience. Its goal is to make validated configurations visible, repeatable, and reusable across environments.

What AICR Is (and Is Not)

AICR is a source of validated configuration knowledge for NVIDIA-accelerated Kubernetes environments.

It is:

  • A curated set of tested and validated component combinations
  • A reference for how NVIDIA-accelerated Kubernetes clusters are expected to be configured
  • A foundation for generating reproducible deployment artifacts
  • Designed to integrate with existing provisioning, CI/CD, and GitOps workflows

It is not:

  • A Kubernetes distribution
  • A cluster provisioning or lifecycle management system
  • A managed control plane or hosted service
  • A replacement for cloud provider or OEM platforms

How It Works

AICR separates validated configuration knowledge from how that knowledge is consumed.

  • Human-readable documentation lives under docs/.
  • Version-locked configuration definitions (“recipes”) capture known-good system states.
  • Those definitions can be rendered into concrete artifacts such as Helm values, Kubernetes manifests, or install scripts.- Recipes can be validated against actual system configurations to verify compatibility.

This separation allows the same validated configuration to be applied consistently across different environments and automation systems.

For example, a configuration validated for GB200 on Ubuntu 22.04 with Kubernetes 1.34 can be rendered into Helm values and manifests suitable for use in an existing GitOps pipeline.

Get Started

Some tooling and APIs are under active development; documentation reflects current and near-term capabilities.

Installation

Install the latest version using the installation script:

Note: Temporally, while the repo is private, make sure to include your GitHub token first:

curl -sfL -H "Authorization: token $GITHUB_TOKEN" \
  https://raw.githubusercontent.com/NVIDIA/aicr/main/install | bash -s --

See Installation Guide for manual installation, building from source, and container images.

Quick Start

Get started quickly with AICR:

  1. Review the documentation under docs/ to understand supported platforms and required components.
  2. Identify your target environment:
    • GPU architecture
    • Operating system and kernel
    • Kubernetes distribution and version
    • Workload intent (for example, training or inference)
  3. Apply the validated configuration guidance using your existing tools (Helm, kubectl, CI/CD, or GitOps).
  4. Validate and iterate as platforms and workloads evolve.

Example: Generate a validated configuration for GB200 on EKS with Ubuntu, optimized for Kubeflow training:

# Generate a recipe for your environment
aicr recipe --service eks --accelerator gb200 --os ubuntu --intent training --platform kubeflow -o recipe.yaml

# Render the recipe into Helm values for your GitOps pipeline
aicr bundle --recipe recipe.yaml -o ./bundles

The generated bundles/ directory contains a Helm per-component bundle ready to deploy or commit to your GitOps repository. See CLI Reference for more options.

Get Started by Use Case

Choose the documentation path that matches how you'll use AICR.

User – Platform and Infrastructure Operators

You deploy and operate GPU-accelerated Kubernetes clusters using validated configurations.

Contributor – Developers and Maintainers

You contribute code, extend functionality, or work on AICR internals.

Integrator – Automation and Platform Engineers

You integrate AICR into CI/CD pipelines, GitOps workflows, or larger platforms.

Documentation & Resources

  • Documentation – Documentation, guides, and examples.
  • Roadmap – Feature priorities and development timeline
  • Overview - Detailed system overview and glossary
  • Security - Security-related resources
  • Releases - Binaries, SBOMs, and other artifacts
  • Issues - Bugs, feature requests, and questions

Contributing

Contributions are welcome. See contributing for development setup, contribution guidelines, and the pull request process.

Directories

Path Synopsis
cmd
aicr command
aicrd command
pkg
api
Package api provides the HTTP API layer for the AICR Recipe Generation service.
Package api provides the HTTP API layer for the AICR Recipe Generation service.
build
Package build implements the build command for generating OCI artifacts from build spec files.
Package build implements the build command for generating OCI artifacts from build spec files.
bundler
Package bundler provides orchestration for generating deployment bundles from recipes.
Package bundler provides orchestration for generating deployment bundles from recipes.
bundler/attestation
Package attestation provides bundle attestation using Sigstore keyless signing.
Package attestation provides bundle attestation using Sigstore keyless signing.
bundler/checksum
Package checksum provides SHA256 checksum generation for bundle verification.
Package checksum provides SHA256 checksum generation for bundle verification.
bundler/config
Package config provides configuration options for bundler implementations.
Package config provides configuration options for bundler implementations.
bundler/deployer/argocd
Package argocd provides ArgoCD Application generation for recipes.
Package argocd provides ArgoCD Application generation for recipes.
bundler/deployer/helm
Package helm generates per-component Helm bundles from recipe results.
Package helm generates per-component Helm bundles from recipe results.
bundler/registry
Package registry provides thread-safe registration and retrieval of bundler implementations.
Package registry provides thread-safe registration and retrieval of bundler implementations.
bundler/result
Package result provides types for tracking bundle generation results.
Package result provides types for tracking bundle generation results.
bundler/types
Package types defines the type system for bundler implementations.
Package types defines the type system for bundler implementations.
bundler/verifier
Package verifier implements offline bundle verification with a four-level trust model.
Package verifier implements offline bundle verification with a four-level trust model.
cli
Package cli implements the command-line interface for the AICR aicr tool.
Package cli implements the command-line interface for the AICR aicr tool.
collector
Package collector provides interfaces and implementations for collecting system configuration data.
Package collector provides interfaces and implementations for collecting system configuration data.
collector/file
Package file provides utilities for reading files from the filesystem.
Package file provides utilities for reading files from the filesystem.
collector/gpu
Package gpu collects GPU hardware and driver configuration data.
Package gpu collects GPU hardware and driver configuration data.
collector/k8s
Package k8s collects Kubernetes cluster configuration data.
Package k8s collects Kubernetes cluster configuration data.
collector/os
Package os collects operating system configuration data.
Package os collects operating system configuration data.
collector/systemd
Package systemd collects systemd service configuration data.
Package systemd collects systemd service configuration data.
component
Package component provides the generic bundler framework and shared utilities.
Package component provides the generic bundler framework and shared utilities.
constraints
Package constraints provides constraint parsing, extraction, and evaluation utilities for comparing recipe constraints against snapshot measurements.
Package constraints provides constraint parsing, extraction, and evaluation utilities for comparing recipe constraints against snapshot measurements.
defaults
Package defaults provides centralized configuration constants for the AICR system.
Package defaults provides centralized configuration constants for the AICR system.
errors
Package errors provides structured error types for better observability and programmatic error handling across the application.
Package errors provides structured error types for better observability and programmatic error handling across the application.
evidence
Package evidence renders CNCF AI Conformance evidence markdown from CTRF reports.
Package evidence renders CNCF AI Conformance evidence markdown from CTRF reports.
header
Package header provides common header types for AICR data structures.
Package header provides common header types for AICR data structures.
k8s
Package k8s provides Kubernetes integration for Cloud Native Stack.
Package k8s provides Kubernetes integration for Cloud Native Stack.
k8s/agent
Package agent provides Kubernetes Job deployment for automated snapshot capture.
Package agent provides Kubernetes Job deployment for automated snapshot capture.
k8s/client
Package client provides a singleton Kubernetes client for efficient cluster interactions.
Package client provides a singleton Kubernetes client for efficient cluster interactions.
k8s/pod
Package pod provides shared utilities for Kubernetes Job and Pod operations.
Package pod provides shared utilities for Kubernetes Job and Pod operations.
logging
Package logging provides structured logging utilities for AICR components.
Package logging provides structured logging utilities for AICR components.
manifest
Package manifest provides Helm-compatible template rendering for manifest files.
Package manifest provides Helm-compatible template rendering for manifest files.
measurement
Package measurement provides types and utilities for collecting, comparing, and filtering system measurements from various sources (Kubernetes, GPU, OS, SystemD).
Package measurement provides types and utilities for collecting, comparing, and filtering system measurements from various sources (Kubernetes, GPU, OS, SystemD).
oci
Package oci provides functionality for packaging and pushing artifacts to OCI-compliant registries.
Package oci provides functionality for packaging and pushing artifacts to OCI-compliant registries.
recipe
Package recipe provides recipe building and matching functionality.
Package recipe provides recipe building and matching functionality.
serializer
Package serializer provides encoding and decoding of measurement data in multiple formats.
Package serializer provides encoding and decoding of measurement data in multiple formats.
server
Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml
Package server implements the AICR System Configuration Recommendation API as defined in api/aicr/aicr-v1.yaml
snapshotter
Package snapshotter captures comprehensive system configuration snapshots.
Package snapshotter captures comprehensive system configuration snapshots.
trust
Package trust manages Sigstore trusted root material for offline attestation verification.
Package trust manages Sigstore trusted root material for offline attestation verification.
validator
Package validator provides a container-per-validator execution engine for AICR cluster validation.
Package validator provides a container-per-validator execution engine for AICR cluster validation.
validator/catalog
Package catalog provides the declarative validator catalog.
Package catalog provides the declarative validator catalog.
validator/ctrf
Package ctrf provides Go types and utilities for the Common Test Report Format (CTRF).
Package ctrf provides Go types and utilities for the Common Test Report Format (CTRF).
validator/labels
Package labels provides shared label constants for validation resources.
Package labels provides shared label constants for validation resources.
version
Package version provides semantic version parsing and comparison with flexible precision support.
Package version provides semantic version parsing and comparison with flexible precision support.
tests
chainsaw/ai-conformance command
ai-conformance-check parses Chainsaw assertion YAML files and verifies that every declared resource exists in the target Kubernetes cluster.
ai-conformance-check parses Chainsaw assertion YAML files and verifies that every declared resource exists in the target Kubernetes cluster.
Package validators provides shared utilities for v2 validator containers.
Package validators provides shared utilities for v2 validator containers.
chainsaw
Package chainsaw executes Chainsaw-style assertions against a live Kubernetes cluster.
Package chainsaw executes Chainsaw-style assertions against a live Kubernetes cluster.
conformance command
conformance is a validator container for all conformance phase checks.
conformance is a validator container for all conformance phase checks.
deployment command
deployment is a validator container for all deployment phase checks.
deployment is a validator container for all deployment phase checks.
helper
Package helper provides shared utilities for v2 validator containers.
Package helper provides shared utilities for v2 validator containers.
performance command
performance is a validator container for all performance phase checks.
performance is a validator container for all performance phase checks.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL