recipe

package
v0.14.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2026 License: Apache-2.0 Imports: 29 Imported by: 0

Documentation

Overview

Package recipe provides recipe building and matching functionality.

Package recipe provides recipe building and matching functionality.

Package recipe provides configuration recipe generation based on deployment criteria.

Overview

The recipe package generates tailored configuration recommendations for GPU-accelerated Kubernetes clusters. It uses a metadata-driven model where base configurations are enhanced with criteria-specific overlays to produce deployment-ready component references.

Core Types

Criteria: Specifies target deployment parameters

type Criteria struct {
    Service     CriteriaServiceType     // eks, gke, aks, oke, kind, lke, bcm, any
    Accelerator CriteriaAcceleratorType // h100, h200, gb200, b200, a100, l40, rtx-pro-6000, any
    Intent      CriteriaIntentType      // training, inference, any
    OS          CriteriaOSType          // ubuntu, rhel, cos, amazonlinux, talos, any
    Platform    CriteriaPlatformType    // dynamo, kubeflow, nim, runai, slurm, any
    Nodes       int                     // node count (0 = any)
}

RecipeResult: Generated configuration result

type RecipeResult struct {
    Header                              // API version, kind, metadata
    Criteria      *Criteria             // Input criteria
    MatchedRules  []string              // Applied overlay rules
    ComponentRefs []ComponentRef        // Component references (Helm or Kustomize)
    Constraints   []ConstraintRef       // Validation constraints
}

Recipe: Legacy format still used by bundlers

type Recipe struct {
    Header                              // API version, kind, metadata
    Request      *RequestInfo           // Input metadata (optional)
    MatchedRules []string               // Applied overlay rules
    Measurements []*measurement.Measurement // Configuration data
}

Builder: Generates recipes from criteria

type Builder struct {
    Version string  // Builder version for tracking
}

Criteria Types

Service types for Kubernetes environments:

  • CriteriaServiceEKS: Amazon EKS
  • CriteriaServiceGKE: Google GKE
  • CriteriaServiceAKS: Azure AKS
  • CriteriaServiceOKE: Oracle OKE
  • CriteriaServiceKind: kind (local clusters)
  • CriteriaServiceLKE: Linode LKE
  • CriteriaServiceBCM: NVIDIA Base Command Manager
  • CriteriaServiceAny: Any service (wildcard)

Accelerator types for GPU selection:

  • CriteriaAcceleratorH100: NVIDIA H100
  • CriteriaAcceleratorH200: NVIDIA H200
  • CriteriaAcceleratorGB200: NVIDIA GB200
  • CriteriaAcceleratorB200: NVIDIA B200
  • CriteriaAcceleratorA100: NVIDIA A100
  • CriteriaAcceleratorL40: NVIDIA L40
  • CriteriaAcceleratorRTXPro6000: NVIDIA RTX PRO 6000
  • CriteriaAcceleratorAny: Any accelerator (wildcard)

Intent types for workload optimization:

  • CriteriaIntentTraining: ML training workloads
  • CriteriaIntentInference: Inference workloads
  • CriteriaIntentAny: Generic workloads

OS types for host operating system:

  • CriteriaOSUbuntu: Ubuntu
  • CriteriaOSRHEL: Red Hat Enterprise Linux
  • CriteriaOSCOS: Container-Optimized OS (GKE)
  • CriteriaOSAmazonLinux: Amazon Linux
  • CriteriaOSTalos: Talos Linux
  • CriteriaOSAny: Any OS (wildcard)

Platform types for workload frameworks:

  • CriteriaPlatformDynamo: NVIDIA Dynamo
  • CriteriaPlatformKubeflow: Kubeflow
  • CriteriaPlatformNIM: NVIDIA NIM
  • CriteriaPlatformRunai: NVIDIA Run:ai
  • CriteriaPlatformSlurm: SchedMD Slinky Slurm
  • CriteriaPlatformAny: Any platform (wildcard)

Usage

Basic recipe generation with criteria:

criteria := recipe.NewCriteria()
criteria.Service = recipe.CriteriaServiceEKS
criteria.Accelerator = recipe.CriteriaAcceleratorH100
criteria.Intent = recipe.CriteriaIntentTraining

ctx := context.Background()
builder := recipe.NewBuilder()
result, err := builder.BuildFromCriteria(ctx, criteria)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Matched rules: %v\n", result.MatchedRules)
for _, ref := range result.ComponentRefs {
    fmt.Printf("Component: %s, Version: %s\n", ref.Name, ref.Version)
}

HTTP handlers for /v1/recipe and /v1/query live in pkg/server; they are thin adapters over pkg/client/v1 (aicr.Client) and do not call Builder directly.

Builder-Bound DataProvider (Multi-Tenant)

WithDataProvider attaches a DataProvider to the Builder so the metadata store, component registry, and per-component values files all resolve through the bound provider rather than the package-global one. This is the canonical pattern for any caller that constructs more than one Builder per process (multi-tenant servers, library users, test harnesses):

embedded := recipe.NewEmbeddedDataProvider(recipe.GetEmbeddedFS(), "")
tenantA := recipe.NewBuilder(recipe.WithDataProvider(embedded))
tenantB := recipe.NewBuilder(recipe.WithDataProvider(otherProvider))
resA, _ := tenantA.BuildFromCriteria(ctx, criteriaA)
resB, _ := tenantB.BuildFromCriteria(ctx, criteriaB)
// resA.DataProvider() != resB.DataProvider()

Caches are keyed by DataProvider identity, so concurrent builders against distinct providers do not share metadata-store or component-registry state. The new public surface for provider-bound builds:

  • WithDataProvider(dp DataProvider) Option — binds the Builder to dp.
  • LoadMetadataStoreFor(ctx, dp) (*MetadataStore, error) — loads (and caches) the metadata store for the supplied provider.
  • GetComponentRegistryFor(dp) (*ComponentRegistry, error) — returns the component registry for dp, cached per provider.
  • EvictCachedStore(dp) — drops the cached MetadataStore for dp so the next LoadMetadataStoreFor rebuilds from source.
  • EvictCachedRegistry(dp) — drops the cached registry for dp; nil receiver is a no-op.
  • GetManifestContentWithProvider(dp, path) ([]byte, error) — reads a manifest file from dp; a nil provider falls back to the package's default embedded provider.
  • (*RecipeResult).DataProvider() DataProvider — recovers the provider that produced the result. A default/unbound build returns the package default embedded provider; nil is only returned for a nil receiver or a result constructed outside the normal builder path.

Single-tenant entry points (the CLI, the API server) bind a provider explicitly via WithDataProvider; the former package-global accessors (SetDataProvider / GetDataProvider) have been removed. Recover a bound provider with (*RecipeResult).DataProvider(), or pass one explicitly.

Parse criteria from HTTP request (reg is a *CriteriaRegistry, typically from GetCriteriaRegistryFor(dp); a nil reg validates against the OSS fast-path values only):

criteria, err := recipe.ParseCriteriaFromRequest(r, reg)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
    return
}

Query Parameters (HTTP API - GET)

The HTTP handler accepts these query parameters for GET requests:

  • service: eks, gke, aks, oke, kind, lke, bcm, any (default: any)
  • accelerator: h100, h200, gb200, b200, a100, l40, rtx-pro-6000, any (default: any)
  • gpu: alias for accelerator (backwards compatibility)
  • intent: training, inference, any (default: any)
  • os: ubuntu, rhel, cos, amazonlinux, talos, any (default: any)
  • nodes: integer node count (default: 0 = any)

Criteria Files (CLI and HTTP API - POST)

Criteria can be defined in a Kubernetes-style YAML or JSON file using the RecipeCriteria resource type. This provides an alternative to individual CLI flags or query parameters.

RecipeCriteria: Kubernetes-style resource for criteria definition

type RecipeCriteria struct {
    Kind       string    // Must be "RecipeCriteria"
    APIVersion string    // Must be "aicr.nvidia.com/v1alpha1"
    Metadata   struct {
        Name string       // Optional descriptive name
    }
    Spec *Criteria        // The criteria specification
}

Example criteria file (criteria.yaml):

kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: gb200-eks-ubuntu-training
spec:
  service: eks
  os: ubuntu
  accelerator: gb200
  intent: training

Load criteria from a file:

criteria, err := recipe.LoadCriteriaFromFile("/path/to/criteria.yaml", reg)
if err != nil {
    log.Fatal(err)
}
result, err := builder.BuildFromCriteria(ctx, criteria)

Parse criteria from HTTP request body (POST):

criteria, err := recipe.ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"), reg)
if err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
    return
}

CLI usage with criteria file:

aicr recipe --criteria /path/to/criteria.yaml --output recipe.yaml

CLI flags can override criteria file values:

aicr recipe --criteria criteria.yaml --service gke --output recipe.yaml

HTTP API POST request:

curl -X POST http://localhost:8080/v1/recipe \
  -H "Content-Type: application/yaml" \
  -d @criteria.yaml

Criteria Matching

Criteria use asymmetric matching with priority-based resolution:

Recipe Wildcard (recipe field = "any"):

  • Recipe "any" acts as a wildcard, matching any query value
  • Example: Recipe with accelerator="any" matches query accelerator="h100"

Query Wildcard (query field = "any"):

  • Query "any" only matches recipes that also have "any"
  • Prevents generic queries from matching overly-specific recipes
  • Example: Query accelerator="any" does NOT match recipe accelerator="h100"

Exact Match:

  • Query service="eks", accelerator="h100" matches recipe with same values

Priority:

  • More specific overlays take precedence
  • Multiple matching overlays are applied in priority order
  • Later overlays can override earlier ones

Metadata Store Model

Recipe generation uses YAML metadata files:

1. Load overlays/base.yaml (common component versions and settings) 2. Find matching overlay files based on criteria 3. Merge overlay configurations into result 4. Return RecipeResult with component references

Base structure (recipes/overlays/base.yaml):

apiVersion: aicr.nvidia.com/v1alpha1
kind: Base
metadata:
  name: base
  version: v1.0.0
components:
  - name: gpu-operator
    version: v25.3.3
    repository: https://helm.ngc.nvidia.com/nvidia

Overlay structure (recipes/overlays/*.yaml):

apiVersion: aicr.nvidia.com/v1alpha1
kind: Overlay
metadata:
  name: h100-training
  priority: 100
match:
  accelerator: h100
  intent: training
components:
  - name: gpu-operator
    version: v25.3.3
    values:
      mig.strategy: mixed

RecipeInput Interface

The RecipeInput interface allows bundlers to work with both legacy Recipe and new RecipeResult formats:

type RecipeInput interface {
    GetMeasurements() []*measurement.Measurement
    GetComponentRef(name string) *ComponentRef
    GetValuesForComponent(name string) (map[string]any, error)
}

Error Handling

BuildFromCriteria returns errors when:

  • Criteria is nil
  • Metadata store cannot be loaded
  • No matching overlays found
  • Component configuration is invalid

ParseCriteriaFromRequest returns errors when:

  • Service type is invalid
  • Accelerator type is invalid
  • Intent type is invalid
  • Nodes count is negative or non-numeric

Data Source

Recipe metadata is embedded at build time from:

  • recipes/overlays/base.yaml (base component versions)
  • recipes/overlays/*.yaml (criteria-specific overlays)

The metadata store is loaded once and cached (singleton pattern with sync.Once).

Observability

The recipe builder exports Prometheus metrics:

  • recipe_built_duration_seconds: Time to build recipe
  • recipe_rule_match_total{status}: Rule matching statistics

Integration

The recipe package is used by:

  • pkg/cli - recipe command for CLI usage
  • pkg/server - HTTP recipe endpoint (via the pkg/client/v1 facade)
  • pkg/bundler - Bundle generation from recipes

It depends on:

  • pkg/measurement - Measurement data structures
  • pkg/version - Version parsing
  • pkg/header - Common header types
  • pkg/errors - Structured error handling

Component Types

The recipe system supports two component deployment types:

Helm Components:

  • Use Helm charts for deployment
  • Configured via helm section in registry.yaml
  • Support values files and inline overrides

Kustomize Components:

  • Use Kustomize for deployment
  • Configured via kustomize section in registry.yaml
  • Support Git/OCI sources with path and tag

The component registry (recipes/registry.yaml) determines component defaults. Components must have either helm OR kustomize configuration.

Subpackages

  • recipe/version - Semantic version parsing (moved to pkg/version)
  • recipe/header - Common header structures (moved to pkg/header)

Package recipe provides recipe building and matching functionality.

Index

Constants

View Source
const (
	EnvAllowedAccelerators = "AICR_ALLOWED_ACCELERATORS"
	EnvAllowedServices     = "AICR_ALLOWED_SERVICES"
	EnvAllowedIntents      = "AICR_ALLOWED_INTENTS"
	EnvAllowedOSTypes      = "AICR_ALLOWED_OS"
)

Environment variable names for allowlist configuration.

View Source
const CriteriaAnyValue = "any"

CriteriaAnyValue is the wildcard string used across every criteria dimension — recipes set a field to this literal (or leave it empty) to mean "this dimension is unconstrained." Each typed enum (CriteriaServiceAny, CriteriaAcceleratorAny, etc.) is the same string in its typed form; CriteriaAnyValue is the bare-string constant for matching logic that operates on stringified values (e.g., pkg/fingerprint.matchDim's three-way comparison).

View Source
const RecipeAPIVersion = "aicr.nvidia.com/v1alpha1"

RecipeAPIVersion is the API version for recipe metadata and result resources.

View Source
const RecipeCriteriaAPIVersion = "aicr.nvidia.com/v1alpha1"

RecipeCriteriaAPIVersion is the API version for RecipeCriteria resources.

View Source
const RecipeCriteriaKind = "RecipeCriteria"

RecipeCriteriaKind is the kind value for RecipeCriteria resources.

View Source
const RecipeMetadataKind = "RecipeMetadata"

RecipeMetadataKind is the kind value for RecipeMetadata resources.

View Source
const RecipeMixinKind = "RecipeMixin"

RecipeMixinKind is the kind value for mixin files.

View Source
const RecipeResultKind = "RecipeResult"

RecipeResultKind is the kind value for RecipeResult resources.

Variables

This section is empty.

Functions

func CachedCriteriaRegistryContainsForTesting added in v0.14.0

func CachedCriteriaRegistryContainsForTesting(dp DataProvider) bool

CachedCriteriaRegistryContainsForTesting reports whether the criteria registry cache has an entry for the supplied DataProvider.

Test-only by convention (the _ForTesting suffix); never call from production code.

func CachedRegistryContainsForTesting added in v0.14.0

func CachedRegistryContainsForTesting(dp DataProvider) bool

CachedRegistryContainsForTesting reports whether the registry cache has an entry for the supplied DataProvider. Pair with CachedStoreContainsForTesting in pkg/recipe/metadata_store.go to verify a single Client's caches are released. Scoped per-provider so it is robust under parallel test execution.

Test-only by convention (the _ForTesting suffix); never call from production code.

func CachedRegistryCountForTesting added in v0.14.0

func CachedRegistryCountForTesting() int

CachedRegistryCountForTesting returns the number of distinct DataProvider entries currently held in the registry cache. Exposed for tests in the aicr facade that assert Client.Close evicts the cached registry — without this, the only way to observe eviction from outside the recipe package would be to reach into unexported state via reflection.

Test-only by convention (the _ForTesting suffix); never call from production code.

NOTE: global count across every DataProvider. Tests that need a stable signal scoped to a specific DataProvider should prefer CachedRegistryContainsForTesting.

func CachedStoreContainsForTesting added in v0.14.0

func CachedStoreContainsForTesting(dp DataProvider) bool

CachedStoreContainsForTesting reports whether the metadata-store cache has an entry for the supplied DataProvider. Scoped per-provider so it is robust under parallel test execution: each test that uses a distinct DataProvider can observe ONLY its own entry's presence/absence.

Test-only by convention (the _ForTesting suffix); never call from production code.

func CachedStoreCountForTesting added in v0.14.0

func CachedStoreCountForTesting() int

CachedStoreCountForTesting returns the number of distinct DataProvider entries currently held in the metadata-store cache. Exposed for tests in the aicr facade that assert Client.Close evicts the cached store — paired with CachedRegistryCountForTesting in components.go so a single test can verify both halves of the per-Client cache are released.

Test-only by convention (the _ForTesting suffix); never call from production code.

NOTE: this returns the GLOBAL count across every DataProvider in the package. A parallel test in another package using its own DataProvider will perturb the count. Tests that need a stable signal scoped to a specific DataProvider should prefer CachedStoreContainsForTesting.

func ComponentRefsTopologicalLevels added in v0.13.0

func ComponentRefsTopologicalLevels(refs []ComponentRef) ([][]string, error)

ComponentRefsTopologicalLevels is the free-function form of RecipeMetadataSpec.TopologicalLevels — operates on a bare []ComponentRef slice. Callers that have refs but not a full RecipeMetadataSpec (e.g., the bundler post-resolution) use this.

func EvictCachedCriteriaRegistry added in v0.14.0

func EvictCachedCriteriaRegistry(dp DataProvider)

EvictCachedCriteriaRegistry drops the cached criteria registry for the supplied provider so the next GetCriteriaRegistryFor call rebuilds an empty registry. Passing a nil provider is a no-op (callers handle that case explicitly to avoid silently evicting the package-global registry).

func EvictCachedRegistry added in v0.14.0

func EvictCachedRegistry(dp DataProvider)

EvictCachedRegistry drops the cached registry for the supplied provider so the next GetComponentRegistryFor call rebuilds from source. Passing a nil provider is a no-op (callers handle that case explicitly to avoid silently evicting the package-global registry).

func EvictCachedStore added in v0.14.0

func EvictCachedStore(dp DataProvider)

EvictCachedStore drops the cached MetadataStore for the supplied provider. No-op when the provider has no cache entry. Safe on nil — callers do not need to guard. Used by tests that mutate a provider's backing data between loads, and by Task 7's eviction tests.

func GetCriteriaAcceleratorTypes

func GetCriteriaAcceleratorTypes() []string

GetCriteriaAcceleratorTypes returns the static OSS-embedded accelerator types sorted alphabetically. For the union of static + registry, use AllCriteriaAcceleratorTypes.

func GetCriteriaIntentTypes

func GetCriteriaIntentTypes() []string

GetCriteriaIntentTypes returns the static OSS-embedded intent types sorted alphabetically. For the union of static + registry, use AllCriteriaIntentTypes.

func GetCriteriaOSTypes

func GetCriteriaOSTypes() []string

GetCriteriaOSTypes returns the static OSS-embedded OS types sorted alphabetically. Delegates to oskind.All so the list stays in sync with the canonical constants without duplication. For the union of static + registry, use AllCriteriaOSTypes.

func GetCriteriaPlatformTypes

func GetCriteriaPlatformTypes() []string

GetCriteriaPlatformTypes returns the static OSS-embedded platform types sorted alphabetically. For the union of static + registry, use AllCriteriaPlatformTypes.

func GetCriteriaServiceTypes

func GetCriteriaServiceTypes() []string

GetCriteriaServiceTypes returns the static OSS-embedded service types sorted alphabetically. This is the canonical OSS list and is stable across `--data` configurations; for the union of static + registry (including values contributed by `--data`), use AllCriteriaServiceTypes.

func GetEmbeddedFS

func GetEmbeddedFS() embed.FS

GetEmbeddedFS returns the embedded data filesystem. This is used by the CLI to create layered data providers.

func GetManifestContent

func GetManifestContent(path string) ([]byte, error)

GetManifestContent retrieves a manifest file from the package-global DataProvider. Path should be relative to data directory (e.g., "components/network-operator/manifests/nfd-network-rule.yaml").

This entry point is preserved for back-compat with callers that have no RecipeResult-bound provider available. Internally derives a defaults.FileReadTimeout-bounded context so a hung backing store still returns instead of blocking the goroutine. Callers operating against a per-tenant Builder should prefer GetManifestContentWithProvider so the lookup honors the bound provider; callers that already hold a context.Context should use GetManifestContentWithContext.

func GetManifestContentWithContext added in v0.14.0

func GetManifestContentWithContext(ctx context.Context, dp DataProvider, path string) ([]byte, error)

GetManifestContentWithContext reads a manifest file from the supplied DataProvider, honoring the caller's context for cancellation/timeout. A nil provider falls back to GetDataProvider().

func GetManifestContentWithProvider added in v0.14.0

func GetManifestContentWithProvider(dp DataProvider, path string) ([]byte, error)

GetManifestContentWithProvider reads a manifest file from the supplied DataProvider. A nil provider falls back to the package-level embedded-data singleton so callers that thread a possibly-nil RecipeResult.DataProvider() through can rely on the embedded fallback without an explicit nil check.

Internally derives a defaults.FileReadTimeout-bounded context. Callers that already hold a context.Context should use GetManifestContentWithContext to honor their own deadline instead.

Path should be relative to the data root (e.g., "components/network-operator/manifests/nfd-network-rule.yaml").

func HydrateResult added in v0.11.0

func HydrateResult(result *RecipeResult) (map[string]any, error)

HydrateResult builds a fully hydrated map from a RecipeResult. Component values are merged via GetValuesForComponent so the output contains the final resolved configuration, not file references.

Internally derives a defaults.FileReadTimeout-bounded context so a hung backing store still returns instead of blocking the goroutine. Callers that hold a context.Context should use HydrateResultWithContext so the caller's deadline propagates to the underlying values reads.

func HydrateResultWithContext added in v0.14.0

func HydrateResultWithContext(ctx context.Context, result *RecipeResult) (map[string]any, error)

HydrateResultWithContext builds a fully hydrated map from a RecipeResult, honoring ctx for cancellation/timeout on the underlying values reads.

func LoadCatalog added in v0.14.0

func LoadCatalog(ctx context.Context) error

LoadCatalog eagerly loads the recipe catalog into the package cache, which has the side effect of seeding the criteria registry from every overlay's spec.criteria. Call this immediately after SetDataProvider so that subsequent ParseCriteria*Type lookups see values contributed by `--data` overlays. The CLI calls it at the top of `aicr recipe` Action and the API server should call it at startup; if the catalog is malformed, this surfaces the error before any criteria validation runs (and before the registry is half-populated).

func MatchesCriteriaField added in v0.9.0

func MatchesCriteriaField(recipeValue, queryValue string) bool

MatchesCriteriaField implements asymmetric matching for a single criteria field. Returns true if the recipe field matches the query field.

Matching rules:

  • Query is "any"/empty → only matches if recipe is also "any"/empty
  • Recipe is "any"/empty → matches any query value (recipe is generic/wildcard)
  • Otherwise → must match exactly

func ResetComponentRegistryForTesting added in v0.7.8

func ResetComponentRegistryForTesting()

ResetComponentRegistryForTesting drops every cached registry so the next GetComponentRegistryFor call rebuilds from source. This must only be called from tests.

func ResetCriteriaRegistryForTesting added in v0.14.0

func ResetCriteriaRegistryForTesting()

ResetCriteriaRegistryForTesting drops every cached criteria registry so the next GetCriteriaRegistryFor call rebuilds from scratch. This must only be called from tests.

func ResetMetadataStoreForTesting added in v0.12.1

func ResetMetadataStoreForTesting()

ResetMetadataStoreForTesting clears every cached metadata store so tests can reload against fresh providers without leaking state across cases. Must only be called from tests.

func Select added in v0.11.0

func Select(hydrated map[string]any, selector string) (any, error)

Select walks a dot-path selector against a hydrated map and returns the value at that path. Returns ErrCodeNotFound for invalid paths. An empty selector returns the entire map.

Types

type AllowLists

type AllowLists struct {
	// Accelerators is the list of allowed accelerator types (e.g., "h100", "l40").
	// If empty, all accelerator types are allowed.
	Accelerators []CriteriaAcceleratorType

	// Services is the list of allowed service types (e.g., "eks", "gke").
	// If empty, all service types are allowed.
	Services []CriteriaServiceType

	// Intents is the list of allowed intent types (e.g., "training", "inference").
	// If empty, all intent types are allowed.
	Intents []CriteriaIntentType

	// OSTypes is the list of allowed OS types (e.g., "ubuntu", "rhel").
	// If empty, all OS types are allowed.
	OSTypes []CriteriaOSType
}

AllowLists defines which criteria values are permitted for API requests. An empty or nil slice means all values are allowed for that criteria type. This is used by the API server to restrict which values can be requested, while the CLI always allows all values.

func ParseAllowListsFromEnv

func ParseAllowListsFromEnv() (*AllowLists, error)

ParseAllowListsFromEnv parses allowlist configuration from environment variables. Returns nil if no allowlist environment variables are set. Environment variables:

  • AICR_ALLOWED_ACCELERATORS: comma-separated list of accelerator types (e.g., "h100,l40")
  • AICR_ALLOWED_SERVICES: comma-separated list of service types (e.g., "eks,gke")
  • AICR_ALLOWED_INTENTS: comma-separated list of intent types (e.g., "training,inference")
  • AICR_ALLOWED_OS: comma-separated list of OS types (e.g., "ubuntu,rhel")

Invalid values in the environment variables are skipped with a warning logged.

func (*AllowLists) AcceleratorStrings

func (a *AllowLists) AcceleratorStrings() []string

AcceleratorStrings returns the allowed accelerator types as strings.

func (*AllowLists) IntentStrings

func (a *AllowLists) IntentStrings() []string

IntentStrings returns the allowed intent types as strings.

func (*AllowLists) IsEmpty

func (a *AllowLists) IsEmpty() bool

IsEmpty returns true if no allowlists are configured (all values allowed).

func (*AllowLists) OSTypeStrings

func (a *AllowLists) OSTypeStrings() []string

OSTypeStrings returns the allowed OS types as strings.

func (*AllowLists) ServiceStrings

func (a *AllowLists) ServiceStrings() []string

ServiceStrings returns the allowed service types as strings.

func (*AllowLists) ValidateCriteria

func (a *AllowLists) ValidateCriteria(c *Criteria) error

ValidateCriteria checks if the given criteria values are permitted by the allowlists. Returns nil if validation passes, or an error with details about what value is not allowed. The "any" value is always allowed regardless of the allowlist configuration.

type Builder

type Builder struct {
	Version    string
	AllowLists *AllowLists
	// contains filtered or unexported fields
}

Builder constructs RecipeResult payloads based on Criteria specifications. It loads recipe metadata, applies matching overlays, and generates tailored configuration recipes.

func NewBuilder

func NewBuilder(opts ...Option) *Builder

NewBuilder creates a new Builder instance with the provided functional options.

func (*Builder) BuildFromCriteria

func (b *Builder) BuildFromCriteria(ctx context.Context, c *Criteria) (*RecipeResult, error)

BuildFromCriteria creates a RecipeResult payload for the provided criteria. It loads the metadata store, applies matching overlays, and returns a RecipeResult with merged components and computed deployment order.

func (*Builder) BuildFromCriteriaWithEvaluator

func (b *Builder) BuildFromCriteriaWithEvaluator(ctx context.Context, c *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)

BuildFromCriteriaWithEvaluator creates a RecipeResult payload for the provided criteria, filtering overlays based on constraint evaluation against snapshot data.

When an evaluator function is provided:

  • Overlays that match by criteria but fail constraint evaluation are excluded
  • Constraint warnings are included in the result metadata for visibility
  • Only overlays whose constraints pass (or have no constraints) are merged

The evaluator function is typically created by wrapping validator.EvaluateConstraint with the snapshot data.

func (*Builder) DataProvider added in v0.14.0

func (b *Builder) DataProvider() DataProvider

DataProvider returns the Builder's bound provider, or nil if none is set and the package-global will be used.

type ComponentConfig

type ComponentConfig struct {
	// Name is the component identifier used in recipes (e.g., "gpu-operator").
	Name string `yaml:"name"`

	// DisplayName is the human-readable name used in templates and output.
	DisplayName string `yaml:"displayName"`

	// ValueOverrideKeys are alternative keys for --set flag matching.
	// Example: ["gpuoperator"] allows --set gpuoperator:key=value
	ValueOverrideKeys []string `yaml:"valueOverrideKeys,omitempty"`

	// Helm contains default Helm chart settings.
	Helm HelmConfig `yaml:"helm,omitempty"`

	// Kustomize contains default Kustomize settings.
	Kustomize KustomizeConfig `yaml:"kustomize,omitempty"`

	// NodeScheduling defines paths for injecting node selectors and tolerations.
	NodeScheduling NodeSchedulingConfig `yaml:"nodeScheduling,omitempty"`

	PodScheduling PodSchedulingConfig `yaml:"podScheduling,omitempty"`

	// StorageClassPaths are Helm value paths where the storage class name is injected.
	// When --storage-class is provided at bundle time, the value is written to each path.
	StorageClassPaths []string `yaml:"storageClassPaths,omitempty"`

	// Validations defines component-specific validation checks.
	Validations []ComponentValidationConfig `yaml:"validations,omitempty"`

	// HealthCheck defines custom health check configuration for this component.
	HealthCheck HealthCheckConfig `yaml:"healthCheck,omitempty"`

	// GKECriticalPriority signals that the component's default chart manifests
	// include pods with `priorityClassName: system-node-critical` or
	// `system-cluster-critical`. When true and the recipe's
	// `criteria.service` is "gke", the bundler synthesizes a permissive
	// ResourceQuota into the component's namespace (PreManifestFiles phase)
	// so GKE Standard's ResourceQuota admission plugin admits the pods.
	//
	// GKE Standard ships a kube-system ResourceQuota scoped to
	// `system-*-critical` PriorityClasses; per the Kubernetes spec, once
	// any quota in the cluster scopes by PriorityClass for those values,
	// pods that request a matching priority class can only be created in
	// namespaces that have a matching quota. Other services (EKS, AKS,
	// OKE, bare-metal) do not ship this default and are unaffected.
	//
	// See https://github.com/NVIDIA/aicr/issues/915.
	GKECriticalPriority bool `yaml:"gkeCriticalPriority,omitempty"`

	// HasSelfRefCRDs signals that the component's chart contains both
	// a CRD (shipped under `crds/` or via a CRDs subchart) AND a
	// template that creates a CR of that kind in the SAME release.
	// helm-diff's render pass fails on such charts on a fresh cluster:
	// it renders templates and validates against the live REST mapper,
	// but the mapper does not yet know the CRD because helm only
	// applies `crds/` resources during `helm install` (not during
	// render). This flag instructs the helmfile deployer to emit
	// `disableValidation: true` on the release, telling helm-diff to
	// skip the mapper check for that release only.
	//
	// Cross-chart CRD ordering — where one chart ships CRDs that
	// another chart's templates reference — is handled separately by
	// the helmfile bundler's DAG-stratified sub-helmfile layout (one
	// sub-helmfile per dependency level, processed sequentially). That
	// machinery reads ComponentRef.DependencyRefs and needs no
	// registry flag; correct dependencyRefs encode the ordering.
	// gpu-operator is the canonical example for self-reference: its
	// templates create a ClusterPolicy CR of the ClusterPolicy CRD it
	// ships in `crds/`. See https://github.com/NVIDIA/aicr/issues/914.
	HasSelfRefCRDs bool `yaml:"hasSelfRefCRDs,omitempty"`
}

ComponentConfig defines the bundler configuration for a component. This replaces the per-component Go packages with declarative YAML.

func (*ComponentConfig) GetAcceleratedNodeSelectorPaths

func (c *ComponentConfig) GetAcceleratedNodeSelectorPaths() []string

GetAcceleratedNodeSelectorPaths returns all accelerated node selector paths for a component.

func (*ComponentConfig) GetAcceleratedTaintStrPaths

func (c *ComponentConfig) GetAcceleratedTaintStrPaths() []string

GetAcceleratedTaintStrPaths returns all accelerated taint string paths for a component.

func (*ComponentConfig) GetAcceleratedTolerationPaths

func (c *ComponentConfig) GetAcceleratedTolerationPaths() []string

GetAcceleratedTolerationPaths returns all accelerated toleration paths for a component.

func (*ComponentConfig) GetNodeCountPaths added in v0.8.0

func (c *ComponentConfig) GetNodeCountPaths() []string

GetNodeCountPaths returns Helm value paths where the node count is injected.

func (*ComponentConfig) GetStorageClassPaths added in v0.13.0

func (c *ComponentConfig) GetStorageClassPaths() []string

GetStorageClassPaths returns Helm value paths where the storage class name is injected.

func (*ComponentConfig) GetSystemNodeSelectorPaths

func (c *ComponentConfig) GetSystemNodeSelectorPaths() []string

GetSystemNodeSelectorPaths returns all system node selector paths for a component.

func (*ComponentConfig) GetSystemTolerationPaths

func (c *ComponentConfig) GetSystemTolerationPaths() []string

GetSystemTolerationPaths returns all system toleration paths for a component.

func (*ComponentConfig) GetType

func (c *ComponentConfig) GetType() ComponentType

GetType returns the component deployment type based on which config is present. Returns ComponentTypeKustomize if Kustomize.DefaultSource is set, otherwise returns ComponentTypeHelm (the default).

func (*ComponentConfig) GetValidations

func (c *ComponentConfig) GetValidations() []ComponentValidationConfig

GetValidations returns all validation configurations for a component.

func (*ComponentConfig) GetWorkloadSelectorPaths

func (c *ComponentConfig) GetWorkloadSelectorPaths() []string

GetWorkloadSelectorPaths returns all workload selector paths for a component.

type ComponentRef

type ComponentRef struct {
	// Name is the unique identifier for this component.
	Name string `json:"name" yaml:"name"`

	// Namespace is the Kubernetes namespace for deploying this component.
	Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"`

	// Chart is the Helm chart name (e.g., "gpu-operator").
	Chart string `json:"chart,omitempty" yaml:"chart,omitempty"`

	// Type is the deployment type (Helm, Kustomize).
	Type ComponentType `json:"type" yaml:"type"`

	// Source is the repository URL or OCI reference.
	Source string `json:"source" yaml:"source"`

	// Version is the chart/component version (for Helm).
	Version string `json:"version,omitempty" yaml:"version,omitempty"`

	// Tag is the image/resource tag (for Kustomize).
	Tag string `json:"tag,omitempty" yaml:"tag,omitempty"`

	// ValuesFile is the path to the values file (relative to data directory).
	ValuesFile string `json:"valuesFile,omitempty" yaml:"valuesFile,omitempty"`

	// Overrides contains inline values that override those from ValuesFile.
	// Merge order: base values → ValuesFile → Overrides (highest precedence).
	Overrides map[string]any `json:"overrides,omitempty" yaml:"overrides,omitempty"`

	// Patches is a list of patch files to apply (for Kustomize).
	Patches []string `json:"patches,omitempty" yaml:"patches,omitempty"`

	// DependencyRefs is a list of component names this component depends on.
	DependencyRefs []string `json:"dependencyRefs,omitempty" yaml:"dependencyRefs,omitempty"`

	// ManifestFiles lists manifest files to include in the component bundle.
	// Paths are relative to the data directory.
	// Example: ["components/network-operator/manifests/nfd-network-rule.yaml"]
	ManifestFiles []string `json:"manifestFiles,omitempty" yaml:"manifestFiles,omitempty"`

	// PreManifestFiles lists manifest files that must be bundled and applied
	// BEFORE the component's primary chart. Paths are relative to the data
	// directory; ".." segments are rejected at load time (external data
	// directories enforce a path-traversal check during file registration,
	// and embed.FS refuses any read that resolves outside its root), so a
	// recipe cannot read arbitrary files outside the embedded/external data
	// root. Used for resources the chart depends on (e.g. a Namespace with
	// PSS labels that the chart's pods need to land in). Bundler emits
	// these as a "<name>-pre" local-helm folder at sync-wave N-1 (Argo) or
	// install step N-1 (Helm); the primary chart lands at wave N; existing
	// ManifestFiles still land at wave N+1 as before.
	PreManifestFiles []string `json:"preManifestFiles,omitempty" yaml:"preManifestFiles,omitempty"`

	// Path is the path within the repository to the kustomization (for Kustomize).
	Path string `json:"path,omitempty" yaml:"path,omitempty"`

	// Cleanup indicates whether to uninstall this component after validation.
	// Used for validation infrastructure components (e.g., nccl-doctor).
	Cleanup bool `json:"cleanup,omitempty" yaml:"cleanup,omitempty"`

	// ExpectedResources lists Kubernetes resources that should exist after deployment.
	// Used by deployment phase validation to verify component health.
	ExpectedResources []ExpectedResource `json:"expectedResources,omitempty" yaml:"expectedResources,omitempty"`

	// HealthCheckAsserts contains raw Chainsaw-style assert YAML loaded from the
	// registry's healthCheck.assertFile via the DataProvider. When non-empty, the
	// expected-resources check runs Chainsaw CLI to evaluate assertions instead of
	// the default auto-discovery + typed replica checks.
	HealthCheckAsserts string `json:"healthCheckAsserts,omitempty" yaml:"healthCheckAsserts,omitempty"`
}

ComponentRef represents a reference to a deployable component.

func (*ComponentRef) ApplyRegistryDefaults

func (ref *ComponentRef) ApplyRegistryDefaults(config *ComponentConfig)

ApplyRegistryDefaults fills in ComponentRef fields from ComponentConfig defaults. This applies registry defaults for fields that are not already set in the ComponentRef.

func (ComponentRef) IsEnabled added in v0.10.13

func (c ComponentRef) IsEnabled() bool

IsEnabled returns whether this component is enabled for deployment. A component is disabled when its Overrides map contains enabled: false. Components without an explicit enabled override are enabled by default.

type ComponentRegistry

type ComponentRegistry struct {
	APIVersion string            `yaml:"apiVersion"`
	Kind       string            `yaml:"kind"`
	Components []ComponentConfig `yaml:"components"`
	// contains filtered or unexported fields
}

ComponentRegistry holds the declarative configuration for all components. This is loaded from embedded recipe data (recipes/registry.yaml) at startup.

func GetComponentRegistry

func GetComponentRegistry() (*ComponentRegistry, error)

GetComponentRegistry returns the component registry for the package-global DataProvider. New callers — especially those that need per-tenant isolation — should use GetComponentRegistryFor directly with a caller-supplied provider.

func GetComponentRegistryFor added in v0.14.0

func GetComponentRegistryFor(dp DataProvider) (*ComponentRegistry, error)

GetComponentRegistryFor returns the component registry for the supplied DataProvider. Concurrent callers with the same provider observe the same singleton; distinct providers populate distinct cache entries and never share state. A nil provider falls back to the package-level embedded-data singleton so legacy entry points continue to work transparently.

A first-load *deterministic* error (file missing, schema invalid) is preserved by sync.Once and returned to every subsequent caller for the same provider until EvictCachedRegistry drops the entry — concurrent callers don't all re-run the same broken load. A first-load *transient* error (context.Canceled / DeadlineExceeded) is NOT cached: the entry is CompareAndDelete'd before returning so a follow-up call with a healthy ctx rebuilds from scratch, mirroring LoadMetadataStoreFor.

func (*ComponentRegistry) Count

func (r *ComponentRegistry) Count() int

Count returns the number of components in the registry.

func (*ComponentRegistry) Get

Get returns the component configuration by name. Returns nil if the component is not found.

func (*ComponentRegistry) GetByOverrideKey

func (r *ComponentRegistry) GetByOverrideKey(key string) *ComponentConfig

GetByOverrideKey returns the component configuration by value override key. This is used for matching --set flags like --set gpuoperator:key=value. Returns nil if no component matches the key.

func (*ComponentRegistry) Names

func (r *ComponentRegistry) Names() []string

Names returns all component names in the registry.

func (*ComponentRegistry) Validate

func (r *ComponentRegistry) Validate() []error

Validate checks the component registry for errors. Returns a slice of validation errors (empty if valid).

type ComponentType

type ComponentType string

ComponentType represents the type of component deployment.

const (
	ComponentTypeHelm      ComponentType = "Helm"
	ComponentTypeKustomize ComponentType = "Kustomize"
)

ComponentType constants for supported deployment types.

type ComponentValidationConfig

type ComponentValidationConfig struct {
	// Function is the name of the validation function to execute (e.g., "CheckWorkloadSelectorMissing").
	Function string `yaml:"function"`

	// Severity determines whether failures are warnings or errors ("warning" or "error").
	Severity string `yaml:"severity"`

	// Conditions are optional conditions that must be met for the validation to run.
	// Values are arrays of strings for OR matching (single element arrays are equivalent to single values).
	// Example: {"intent": ["training"]} or {"intent": ["training", "inference"]}
	Conditions map[string][]string `yaml:"conditions,omitempty"`

	// Message is an optional detail message to append to validation failures/warnings.
	Message string `yaml:"message,omitempty"`
}

ComponentValidationConfig defines a component-specific validation check.

type Constraint

type Constraint struct {
	// Name is the constraint identifier (e.g., "k8s", "worker-os").
	Name string `json:"name" yaml:"name"`

	// Value is the constraint expression (e.g., ">= 1.30", "ubuntu").
	Value string `json:"value" yaml:"value"`

	// Severity indicates the constraint severity ("error" or "warning").
	Severity string `json:"severity,omitempty" yaml:"severity,omitempty"`

	// Remediation provides actionable guidance for fixing failed constraints.
	Remediation string `json:"remediation,omitempty" yaml:"remediation,omitempty"`

	// Unit specifies the unit for numeric constraints (e.g., "GB/s").
	Unit string `json:"unit,omitempty" yaml:"unit,omitempty"`
}

Constraint represents a deployment constraint/assumption.

type ConstraintEvalResult

type ConstraintEvalResult struct {
	// Passed indicates if the constraint was satisfied.
	Passed bool

	// Actual is the actual value extracted from the snapshot.
	Actual string

	// Error contains the error if evaluation failed (e.g., value not found).
	Error error
}

ConstraintEvalResult represents the result of evaluating a single constraint. This mirrors the result from pkg/validator to avoid circular imports.

type ConstraintEvaluatorFunc

type ConstraintEvaluatorFunc func(constraint Constraint) ConstraintEvalResult

ConstraintEvaluatorFunc is a function type for evaluating constraints. It takes a constraint and returns the evaluation result. This function type allows the recipe package to use constraint evaluation from the validator package without creating a circular dependency.

type ConstraintWarning

type ConstraintWarning struct {
	// Overlay is the name of the overlay that was excluded.
	Overlay string `json:"overlay" yaml:"overlay"`

	// Constraint is the name of the constraint that failed.
	Constraint string `json:"constraint" yaml:"constraint"`

	// Expected is the expected constraint value.
	Expected string `json:"expected" yaml:"expected"`

	// Actual is the actual value from the snapshot (if found).
	Actual string `json:"actual,omitempty" yaml:"actual,omitempty"`

	// Reason explains why the constraint evaluation resulted in exclusion.
	Reason string `json:"reason" yaml:"reason"`
}

ConstraintWarning represents a warning about an overlay that matched criteria but was excluded due to failing constraint validation against the snapshot.

type Criteria

type Criteria struct {
	// Service is the Kubernetes service type (eks, gke, aks, oke, kind, lke, bcm).
	Service CriteriaServiceType `json:"service,omitempty" yaml:"service,omitempty"`

	// Accelerator is the GPU/accelerator type (h100, h200, gb200, b200, a100, l40, rtx-pro-6000).
	Accelerator CriteriaAcceleratorType `json:"accelerator,omitempty" yaml:"accelerator,omitempty"`

	// Intent is the workload intent (training, inference).
	Intent CriteriaIntentType `json:"intent,omitempty" yaml:"intent,omitempty"`

	// OS is the worker node operating system type.
	OS CriteriaOSType `json:"os,omitempty" yaml:"os,omitempty"`

	// Platform is the platform/framework type (dynamo, kubeflow, nim, runai, slurm).
	Platform CriteriaPlatformType `json:"platform,omitempty" yaml:"platform,omitempty"`

	// Nodes is the number of worker nodes (0 means any/unspecified).
	Nodes int `json:"nodes,omitempty" yaml:"nodes,omitempty"`
}

Criteria represents the input parameters for recipe matching. All fields are optional and default to "any" if not specified.

func BuildCriteriaWithRegistry added in v0.14.0

func BuildCriteriaWithRegistry(reg *CriteriaRegistry, opts ...RegistryCriteriaOption) (*Criteria, error)

BuildCriteriaWithRegistry creates a Criteria from registry-aware options, resolving each field against the supplied registry. This is the path per-provider callers (e.g., the CLI holding a registry from GetCriteriaRegistryFor) use to build and validate criteria against a specific provider's registered values rather than the package global.

A nil reg falls back to a fresh ephemeral registry (NewCriteriaRegistry) so the call is still well-defined for callers that have not yet bound a provider — only the hardcoded OSS fast-path values will validate.

func LoadCriteriaFromFile

func LoadCriteriaFromFile(path string, reg *CriteriaRegistry) (*Criteria, error)

LoadCriteriaFromFile loads criteria from a YAML or JSON file. The file format is auto-detected from the file extension. All fields are optional and default to "any" if not specified.

Example file (YAML):

kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: gb200-eks-ubuntu-training
spec:
  service: eks
  os: ubuntu
  accelerator: gb200
  intent: training

func LoadCriteriaFromFileWithContext

func LoadCriteriaFromFileWithContext(ctx context.Context, path string, reg *CriteriaRegistry) (*Criteria, error)

LoadCriteriaFromFileWithContext loads criteria from a YAML or JSON file with context support. The file format is auto-detected from the file extension. All fields are optional and default to "any" if not specified.

For HTTP/HTTPS URLs, the context is used for timeout and cancellation. For local file paths, the context is currently not used but is accepted for API consistency.

Example file (YAML):

kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: gb200-eks-ubuntu-training
spec:
  service: eks
  os: ubuntu
  accelerator: gb200
  intent: training

func NewCriteria

func NewCriteria() *Criteria

NewCriteria creates a new Criteria with all fields set to "any".

func ParseCriteriaFromBody

func ParseCriteriaFromBody(body io.Reader, contentType string, reg *CriteriaRegistry) (*Criteria, error)

ParseCriteriaFromBody parses criteria from an io.Reader (HTTP request body). Supports JSON and YAML based on the Content-Type header. All fields are optional and default to "any" if not specified.

Supported Content-Types:

  • application/json
  • application/x-yaml, application/yaml, text/yaml

If Content-Type is empty or unrecognized, JSON is assumed.

Example JSON body:

{
  "kind": "RecipeCriteria",
  "apiVersion": "aicr.nvidia.com/v1alpha1",
  "metadata": {"name": "my-criteria"},
  "spec": {"service": "eks", "accelerator": "h100"}
}

func ParseCriteriaFromRequest

func ParseCriteriaFromRequest(r *http.Request, reg *CriteriaRegistry) (*Criteria, error)

ParseCriteriaFromRequest parses recipe criteria from HTTP query parameters, validating each enum value against reg so non-OSS values contributed by a `--data` overlay are honored. A nil reg falls back to a fresh ephemeral registry (only the hardcoded OSS fast-path values will validate). All parameters are optional and default to "any" if not specified. Supported parameters: service, accelerator (alias: gpu), intent, os, platform, nodes.

func ParseCriteriaFromValues

func ParseCriteriaFromValues(values url.Values, reg *CriteriaRegistry) (*Criteria, error)

ParseCriteriaFromValues parses recipe criteria from URL values, validating each enum value against reg (a nil reg falls back to a fresh ephemeral registry — only hardcoded OSS values will validate). All parameters are optional and default to "any" if not specified. Supported parameters: service, accelerator (alias: gpu), intent, os, platform, nodes.

func (*Criteria) Matches

func (c *Criteria) Matches(other *Criteria) bool

Matches checks if this recipe criteria matches the given query criteria. Uses asymmetric matching:

  • Query "any" (or empty) = ONLY matches recipes that are also "any"/empty for that field
  • Recipe "any" (or empty) = wildcard (matches any query value for that field)
  • Query specific + Recipe specific = must match exactly

This ensures a generic query (e.g., accelerator=any) only matches generic recipes (e.g., accelerator=any), while a specific query (e.g., accelerator=gb200) can match both generic recipes and recipes with that specific value.

func (*Criteria) Specificity

func (c *Criteria) Specificity() int

Specificity returns a score indicating how specific this criteria is. Higher scores mean more specific criteria (fewer "any" fields). Used for ordering overlay application - more specific overlays are applied later.

func (*Criteria) String

func (c *Criteria) String() string

String returns a human-readable representation of the criteria.

func (*Criteria) Validate added in v0.11.0

func (c *Criteria) Validate() error

Validate checks that all non-empty criteria fields contain valid values against a fresh ephemeral registry (only the hardcoded OSS fast-path values will validate). Use ValidateWithRegistry to honor `--data` overlay values.

func (*Criteria) ValidateWithRegistry added in v0.14.0

func (c *Criteria) ValidateWithRegistry(reg *CriteriaRegistry) error

ValidateWithRegistry checks that all non-empty criteria fields contain valid values against reg. A nil reg falls back to a fresh ephemeral registry (only the hardcoded OSS fast-path values will validate).

type CriteriaAcceleratorType

type CriteriaAcceleratorType string

CriteriaAcceleratorType represents the GPU/accelerator type.

const (
	CriteriaAcceleratorAny        CriteriaAcceleratorType = "any"
	CriteriaAcceleratorH100       CriteriaAcceleratorType = "h100"
	CriteriaAcceleratorH200       CriteriaAcceleratorType = "h200"
	CriteriaAcceleratorGB200      CriteriaAcceleratorType = "gb200"
	CriteriaAcceleratorB200       CriteriaAcceleratorType = "b200"
	CriteriaAcceleratorA100       CriteriaAcceleratorType = "a100"
	CriteriaAcceleratorL40        CriteriaAcceleratorType = "l40"
	CriteriaAcceleratorRTXPro6000 CriteriaAcceleratorType = "rtx-pro-6000"
)

CriteriaAcceleratorType constants for supported accelerators.

type CriteriaField added in v0.14.0

type CriteriaField string

CriteriaField enumerates the criteria dimensions tracked by the registry. Using typed constants instead of bare strings prevents stringly-typed mismatches between registration sites and lookup sites.

const (
	// FieldService is the Kubernetes service criteria dimension
	// (e.g., eks, gke, aks, …).
	FieldService CriteriaField = "service"
	// FieldAccelerator is the GPU/accelerator criteria dimension
	// (e.g., h100, gb200, …).
	FieldAccelerator CriteriaField = "accelerator"
	// FieldIntent is the workload intent criteria dimension
	// (e.g., training, inference).
	FieldIntent CriteriaField = "intent"
	// FieldOS is the GPU node operating-system criteria dimension
	// (e.g., ubuntu, cos, …).
	FieldOS CriteriaField = "os"
	// FieldPlatform is the platform/framework criteria dimension
	// (e.g., kubeflow, nim, …).
	FieldPlatform CriteriaField = "platform"
)

type CriteriaIntentType

type CriteriaIntentType string

CriteriaIntentType represents the workload intent.

const (
	CriteriaIntentAny       CriteriaIntentType = "any"
	CriteriaIntentTraining  CriteriaIntentType = "training"
	CriteriaIntentInference CriteriaIntentType = "inference"
)

CriteriaIntentType constants for supported workload intents.

type CriteriaOSType

type CriteriaOSType string

CriteriaOSType represents an operating system type.

const (
	CriteriaOSAny         CriteriaOSType = oskind.Any
	CriteriaOSUbuntu      CriteriaOSType = oskind.Ubuntu
	CriteriaOSRHEL        CriteriaOSType = oskind.RHEL
	CriteriaOSCOS         CriteriaOSType = oskind.COS
	CriteriaOSAmazonLinux CriteriaOSType = oskind.AmazonLinux
	CriteriaOSTalos       CriteriaOSType = oskind.Talos
)

CriteriaOSType constants for supported operating systems. Values come from pkg/recipe/oskind (the single source of truth for OS string values shared across pkg/recipe, pkg/collector, pkg/k8s/agent, and the CLI).

type CriteriaOrigin added in v0.14.0

type CriteriaOrigin int

CriteriaOrigin identifies whether a registered value came from the embedded OSS catalog or from a runtime-mounted external catalog (--data). Strict mode uses the distinction to reject external-only values when validating the upstream catalog in CI.

const (
	// OriginEmbedded marks values contributed by overlays loaded from the
	// AICR binary's embedded data filesystem (i.e., the OSS catalog).
	OriginEmbedded CriteriaOrigin = iota
	// OriginExternal marks values contributed by overlays loaded from
	// `--data` (the per-invocation extension path).
	OriginExternal
)

type CriteriaPlatformType

type CriteriaPlatformType string

CriteriaPlatformType represents a platform/framework type.

const (
	CriteriaPlatformAny      CriteriaPlatformType = "any"
	CriteriaPlatformDynamo   CriteriaPlatformType = "dynamo"
	CriteriaPlatformKubeflow CriteriaPlatformType = "kubeflow"
	CriteriaPlatformNIM      CriteriaPlatformType = "nim"
	CriteriaPlatformRunai    CriteriaPlatformType = "runai"
	CriteriaPlatformSlurm    CriteriaPlatformType = "slurm"
)

CriteriaPlatformType constants for supported platforms.

type CriteriaRegistry added in v0.14.0

type CriteriaRegistry struct {
	// contains filtered or unexported fields
}

CriteriaRegistry holds the catalog-discovered set of valid criteria values per dimension, with origin tracking so strict mode can distinguish embedded (OSS) from external (--data) contributions.

The static switch arms inside ParseCriteria{Service,Accelerator,Intent, OS,Platform}Type remain the fast path for canonical / aliased values (e.g., "self-managed" → "any", "al2" → "amazonlinux"). Anything the switches do not recognize falls through to the registry on lookup.

func GetCriteriaRegistryFor added in v0.14.0

func GetCriteriaRegistryFor(dp DataProvider) *CriteriaRegistry

GetCriteriaRegistryFor returns the criteria registry for the supplied DataProvider, constructing it lazily on first access. Concurrent callers with the same provider observe the same singleton; distinct providers populate distinct cache entries and never share state. Each per-provider registry honors AICR_CRITERIA_STRICT at construction.

A nil provider returns a fresh ephemeral registry (not inserted into the cache) — only the hardcoded OSS fast-path values will validate. Callers that need overlay values to validate must pass a non-nil provider.

func NewCriteriaRegistry added in v0.14.0

func NewCriteriaRegistry() *CriteriaRegistry

NewCriteriaRegistry constructs an empty criteria registry, honoring AICR_CRITERIA_STRICT at construction. Use this when you need an ephemeral registry that is not tied to a DataProvider — e.g. config YAML validation that runs before the Client/Builder is wired and only needs the hardcoded OSS fast-path values to validate. For per-DataProvider registries (the common case — seeded by overlays loaded for a specific provider) use GetCriteriaRegistryFor(dp).

func (*CriteriaRegistry) AllAcceleratorTypes added in v0.14.0

func (r *CriteriaRegistry) AllAcceleratorTypes() []string

AllAcceleratorTypes returns the union of the static OSS list and values registered in this registry, sorted alphabetically.

func (*CriteriaRegistry) AllIntentTypes added in v0.14.0

func (r *CriteriaRegistry) AllIntentTypes() []string

AllIntentTypes returns the union of the static OSS list and values registered in this registry, sorted alphabetically.

func (*CriteriaRegistry) AllOSTypes added in v0.14.0

func (r *CriteriaRegistry) AllOSTypes() []string

AllOSTypes returns the union of the static OSS list and values registered in this registry, sorted alphabetically.

func (*CriteriaRegistry) AllPlatformTypes added in v0.14.0

func (r *CriteriaRegistry) AllPlatformTypes() []string

AllPlatformTypes returns the union of the static OSS list and values registered in this registry, sorted alphabetically.

func (*CriteriaRegistry) AllServiceTypes added in v0.14.0

func (r *CriteriaRegistry) AllServiceTypes() []string

AllServiceTypes returns the union of the static OSS list and values registered in this registry, sorted alphabetically.

func (*CriteriaRegistry) Has added in v0.14.0

func (r *CriteriaRegistry) Has(field CriteriaField, value string) bool

Has reports whether value is known for field, regardless of origin. Returns false in strict mode unless the value originates from an embedded overlay.

func (*CriteriaRegistry) HasEmbedded added in v0.14.0

func (r *CriteriaRegistry) HasEmbedded(field CriteriaField, value string) bool

HasEmbedded reports whether value is known for field from the embedded OSS catalog, ignoring external contributions. Used by tests and by introspection paths that explicitly need the OSS-only set.

func (*CriteriaRegistry) IsStrict added in v0.14.0

func (r *CriteriaRegistry) IsStrict() bool

IsStrict reports whether the registry is in strict mode.

func (*CriteriaRegistry) ParseAccelerator added in v0.14.0

func (r *CriteriaRegistry) ParseAccelerator(s string) (CriteriaAcceleratorType, error)

ParseAccelerator parses a string into a CriteriaAcceleratorType against this registry. See (*CriteriaRegistry).ParseService for the registry-fallback contract that also applies here.

func (*CriteriaRegistry) ParseIntent added in v0.14.0

func (r *CriteriaRegistry) ParseIntent(s string) (CriteriaIntentType, error)

ParseIntent parses a string into a CriteriaIntentType against this registry. See (*CriteriaRegistry).ParseService for the registry-fallback contract.

func (*CriteriaRegistry) ParseOS added in v0.14.0

func (r *CriteriaRegistry) ParseOS(s string) (CriteriaOSType, error)

ParseOS parses a string into a CriteriaOSType against this registry. See (*CriteriaRegistry).ParseService for the registry-fallback contract.

func (*CriteriaRegistry) ParsePlatform added in v0.14.0

func (r *CriteriaRegistry) ParsePlatform(s string) (CriteriaPlatformType, error)

ParsePlatform parses a string into a CriteriaPlatformType against this registry. See (*CriteriaRegistry).ParseService for the registry-fallback contract.

func (*CriteriaRegistry) ParseService added in v0.14.0

func (r *CriteriaRegistry) ParseService(s string) (CriteriaServiceType, error)

ParseService parses a string into a CriteriaServiceType against this registry.

The switch arms below are the canonical/aliased fast path for the embedded OSS catalog. Any value not recognized here falls through to the registry, which the data provider seeds from loaded overlays (embedded + `--data`). This lets internal/proprietary service values (e.g., undisclosed NCPs) be admitted at runtime via `--data` without a binary rebuild.

func (*CriteriaRegistry) Register added in v0.14.0

func (r *CriteriaRegistry) Register(field CriteriaField, value string, origin CriteriaOrigin)

Register records value under field with the given origin. Empty values and the wildcard "any" are ignored — they never need to be registered because the Parse functions handle them via the fast path. If the same (field, value) pair has been registered before, OriginEmbedded wins over OriginExternal so that an external overlay redeclaring a public value does not downgrade the value's origin and silently break strict mode.

func (*CriteriaRegistry) Reset added in v0.14.0

func (r *CriteriaRegistry) Reset()

Reset clears all registered values and the strict flag. Intended for use in tests that need a clean slate between cases.

func (*CriteriaRegistry) SetStrict added in v0.14.0

func (r *CriteriaRegistry) SetStrict(strict bool)

SetStrict toggles strict-mode validation. In strict mode, registry lookups admit only OriginEmbedded values; external contributions are hidden as if they had never been registered. The upstream OSS CI gate turns this on (via AICR_CRITERIA_STRICT) so the embedded catalog cannot accidentally depend on internal-only values.

func (*CriteriaRegistry) Values added in v0.14.0

func (r *CriteriaRegistry) Values(field CriteriaField) []string

Values returns the registered values for field as a sorted slice. In strict mode, only embedded values are returned. The result is a copy so callers may mutate it freely.

type CriteriaServiceType

type CriteriaServiceType string

CriteriaServiceType represents the Kubernetes service/platform type for criteria.

const (
	CriteriaServiceAny  CriteriaServiceType = "any"
	CriteriaServiceEKS  CriteriaServiceType = "eks"
	CriteriaServiceGKE  CriteriaServiceType = "gke"
	CriteriaServiceAKS  CriteriaServiceType = "aks"
	CriteriaServiceOKE  CriteriaServiceType = "oke"
	CriteriaServiceKind CriteriaServiceType = "kind"
	CriteriaServiceLKE  CriteriaServiceType = "lke"
	CriteriaServiceBCM  CriteriaServiceType = "bcm"
)

CriteriaServiceType constants for supported Kubernetes services.

type DataProvider

type DataProvider interface {
	// ReadFile reads a file by path (relative to data directory). Returns
	// the context's error if it is canceled before the read completes.
	ReadFile(ctx context.Context, path string) ([]byte, error)

	// WalkDir walks the directory tree rooted at root. Returns the
	// context's error if it is canceled mid-walk.
	WalkDir(ctx context.Context, root string, fn fs.WalkDirFunc) error

	// Source returns a description of where data came from (for debugging).
	Source(path string) string
}

DataProvider abstracts access to recipe data files. This allows layering external directories over embedded data.

Implementations must be comparable per the Go language spec: per-provider caches (the metadata store, component registry, and criteria registry) key by interface value via sync.Map, which panics at runtime if the dynamic type is non-comparable (e.g., a struct containing a slice, map, or func field). The safe and idiomatic shape is methods on a pointer receiver, as the in-tree EmbeddedDataProvider / LayeredDataProvider do.

ReadFile and WalkDir accept a context.Context. Cancellation is honored at I/O boundaries — before each file open, and between WalkDir entries — not mid-syscall on an in-flight read. The in-tree LayeredDataProvider reads external files through os.Open + io.ReadAll, which are not cancelable once started; a caller that cancels mid-syscall (e.g. on a stalled NFS / sshfs mount mid-readv) will see the cancellation honored on the *next* file the walk touches, not on the one currently blocked. Embedded reads are in-memory and cannot hang; they honor cancellation before each I/O for symmetry. Source is pure metadata and is not context-aware.

type EmbeddedDataProvider

type EmbeddedDataProvider struct {
	// contains filtered or unexported fields
}

EmbeddedDataProvider wraps an embed.FS to implement DataProvider.

func NewEmbeddedDataProvider

func NewEmbeddedDataProvider(efs embed.FS, prefix string) *EmbeddedDataProvider

NewEmbeddedDataProvider creates a provider from an embedded filesystem.

func (*EmbeddedDataProvider) ReadFile

func (p *EmbeddedDataProvider) ReadFile(ctx context.Context, path string) ([]byte, error)

ReadFile reads a file from the embedded filesystem.

func (*EmbeddedDataProvider) Source

func (p *EmbeddedDataProvider) Source(path string) string

Source returns "embedded" for all paths.

func (*EmbeddedDataProvider) WalkDir

func (p *EmbeddedDataProvider) WalkDir(ctx context.Context, root string, fn fs.WalkDirFunc) error

WalkDir walks the embedded filesystem.

type ExcludedOverlay added in v0.12.0

type ExcludedOverlay struct {
	// Name is the excluded overlay name.
	Name string `json:"name" yaml:"name"`

	// Reason identifies why the overlay was excluded.
	Reason ExcludedOverlayReason `json:"reason,omitempty" yaml:"reason,omitempty"`
}

ExcludedOverlay records a matching overlay that was excluded from the final recipe result, along with a machine-readable reason.

func (*ExcludedOverlay) UnmarshalJSON added in v0.12.0

func (e *ExcludedOverlay) UnmarshalJSON(data []byte) error

UnmarshalJSON accepts both the legacy string form and the current object form.

func (*ExcludedOverlay) UnmarshalYAML added in v0.12.0

func (e *ExcludedOverlay) UnmarshalYAML(node *yaml.Node) error

UnmarshalYAML accepts both the legacy scalar string form:

  • excludedOverlays: ["overlay-name"]

and the current object form:

  • excludedOverlays: [{name: overlay-name, reason: constraint-failed}]

type ExcludedOverlayReason added in v0.12.0

type ExcludedOverlayReason string

ExcludedOverlayReason indicates why a matching overlay was dropped.

const (
	// ExcludedOverlayReasonConstraintFailed is used when an overlay's own
	// constraints fail pre-merge evaluation.
	ExcludedOverlayReasonConstraintFailed ExcludedOverlayReason = "constraint-failed"
	// ExcludedOverlayReasonMixinConstraintFailed is used when a candidate chain
	// is excluded during post-compose mixin constraint evaluation.
	ExcludedOverlayReasonMixinConstraintFailed ExcludedOverlayReason = "mixin-constraint-failed"
)

type ExpectedResource

type ExpectedResource struct {
	// Kind is the resource kind (e.g., "Deployment", "DaemonSet").
	Kind string `json:"kind" yaml:"kind"`

	// Name is the resource name.
	Name string `json:"name" yaml:"name"`

	// Namespace is the resource namespace (optional for cluster-scoped resources).
	Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"`
}

ExpectedResource represents a Kubernetes resource that should exist after deployment.

type HealthCheckConfig added in v0.7.8

type HealthCheckConfig struct {
	// AssertFile is the path to a Chainsaw-style assert YAML file (relative to data directory).
	// When set, the expected-resources check uses Chainsaw CLI to evaluate assertions
	// instead of the default auto-discovery + typed replica checks.
	AssertFile string `yaml:"assertFile,omitempty"`
}

HealthCheckConfig defines custom health check settings for a component.

type HelmConfig

type HelmConfig struct {
	// DefaultRepository is the default Helm repository URL.
	DefaultRepository string `yaml:"defaultRepository,omitempty"`

	// DefaultChart is the chart name (e.g., "nvidia/gpu-operator").
	DefaultChart string `yaml:"defaultChart,omitempty"`

	// DefaultVersion is the default chart version if not specified in recipe.
	DefaultVersion string `yaml:"defaultVersion,omitempty"`

	// DefaultNamespace is the Kubernetes namespace for deploying this component.
	DefaultNamespace string `yaml:"defaultNamespace,omitempty"`
}

HelmConfig contains default Helm chart settings for a component.

type KustomizeConfig

type KustomizeConfig struct {
	// DefaultSource is the default Git repository or OCI reference.
	DefaultSource string `yaml:"defaultSource,omitempty"`

	// DefaultPath is the path within the repository to the kustomization.
	DefaultPath string `yaml:"defaultPath,omitempty"`

	// DefaultTag is the default Git tag, branch, or commit.
	DefaultTag string `yaml:"defaultTag,omitempty"`
}

KustomizeConfig contains default Kustomize settings for a component.

type LayeredDataProvider

type LayeredDataProvider struct {
	// contains filtered or unexported fields
}

LayeredDataProvider overlays an external directory on top of embedded data. For registryFileName: merges external components with embedded (external takes precedence). For all other files: external completely replaces embedded if present.

func NewLayeredDataProvider

func NewLayeredDataProvider(embedded *EmbeddedDataProvider, config LayeredProviderConfig) (*LayeredDataProvider, error)

NewLayeredDataProvider creates a provider that layers external data over embedded. Returns an error if: - External directory doesn't exist - External directory doesn't contain registryFileName - Path traversal is detected - File size exceeds limits

func (*LayeredDataProvider) ExternalDir added in v0.8.0

func (p *LayeredDataProvider) ExternalDir() string

ExternalDir returns the path to the external data directory.

func (*LayeredDataProvider) ExternalFiles added in v0.8.0

func (p *LayeredDataProvider) ExternalFiles() []string

ExternalFiles returns a sorted list of file paths that came from the external data directory. Paths are relative to the external directory root.

func (*LayeredDataProvider) ReadFile

func (p *LayeredDataProvider) ReadFile(ctx context.Context, path string) ([]byte, error)

ReadFile reads a file, checking external directory first. For registryFileName, returns merged content. For other files, external completely replaces embedded.

func (*LayeredDataProvider) Source

func (p *LayeredDataProvider) Source(path string) string

Source returns "external" or "embedded" depending on where the file comes from.

func (*LayeredDataProvider) WalkDir

func (p *LayeredDataProvider) WalkDir(ctx context.Context, root string, fn fs.WalkDirFunc) error

WalkDir walks both embedded and external directories. External files take precedence over embedded files.

type LayeredProviderConfig

type LayeredProviderConfig struct {
	// ExternalDir is the path to the external data directory.
	ExternalDir string

	// MaxFileSize is the maximum allowed file size in bytes (default: 10MB).
	MaxFileSize int64

	// AllowSymlinks allows symlinks in the external directory (default: false).
	AllowSymlinks bool
}

LayeredProviderConfig configures the layered data provider.

type MetadataStore

type MetadataStore struct {
	// Base is the base recipe metadata.
	Base *RecipeMetadata

	// Overlays is a list of overlay recipes indexed by name.
	Overlays map[string]*RecipeMetadata

	// Mixins is a map of composable mixin fragments indexed by name.
	Mixins map[string]*RecipeMixin

	// ValuesFiles contains embedded values file contents indexed by filename.
	ValuesFiles map[string][]byte
	// contains filtered or unexported fields
}

MetadataStore holds the base recipe and all overlays.

func LoadMetadataStoreFor added in v0.14.0

func LoadMetadataStoreFor(ctx context.Context, dp DataProvider) (*MetadataStore, error)

LoadMetadataStoreFor loads (and caches) the metadata store for the supplied DataProvider. Concurrent callers with the same provider observe the same singleton; distinct providers populate distinct cache entries and never share state. This is the multi-tenant entry point used by Builders bound via WithDataProvider.

A nil provider falls back to GetDataProvider() so the legacy loadMetadataStore(ctx) entry point — which consults the package-global provider — continues to work transparently.

Context cancellation that fires during the first build is surfaced AND auto-evicted: when entry.err is context.Canceled or context.DeadlineExceeded (per isTransientLoadError), the cache entry is removed via storeCache.CompareAndDelete so the next caller for the same provider loads from scratch with its own ctx. Without the auto-eviction the sync.Once semantics would otherwise lock every subsequent caller into the first caller's cancellation error. Non-transient errors (file-not-found, schema invalid, dependency cycle) ARE preserved by sync.Once — they're deterministic for the provider and concurrent callers shouldn't all re-run the same broken walk.

Callers no longer need to drop the entry via EvictCachedStore for a transient retry; EvictCachedStore remains for tests that mutate the provider's backing data between loads.

Note: only the first caller's ctx governs the build. Subsequent callers that arrive while the build is in flight block on the same sync.Once and do not observe their own ctx until the first caller's build returns. Callers that need strict per-request deadline enforcement (e.g., HTTP handlers bound by ServerHandlerTimeout running alongside a slower CLI loader) should invoke LoadMetadataStoreFor in a goroutine and select on their own ctx.Done() and the result channel.

func (*MetadataStore) BuildRecipeResult

func (s *MetadataStore) BuildRecipeResult(ctx context.Context, criteria *Criteria) (*RecipeResult, error)

BuildRecipeResult builds a RecipeResult by merging base with matching overlays. Each matching overlay is resolved through its inheritance chain before merging. This enables multi-level inheritance: base → intermediate → overlay.

func (*MetadataStore) BuildRecipeResultWithEvaluator

func (s *MetadataStore) BuildRecipeResultWithEvaluator(ctx context.Context, criteria *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)

BuildRecipeResultWithEvaluator builds a RecipeResult by merging base with matching overlays, filtering overlays based on constraint evaluation using the provided evaluator function.

This method extends BuildRecipeResult with constraint-aware filtering:

  • Each overlay that matches by criteria is tested against its constraints
  • Overlays with failing constraints are excluded from the merge
  • Warnings about excluded overlays are included in the result metadata

The evaluator function is called for each constraint in each matching overlay. If evaluator is nil, this method behaves identically to BuildRecipeResult.

func (*MetadataStore) FindMatchingOverlays

func (s *MetadataStore) FindMatchingOverlays(criteria *Criteria) []*RecipeMetadata

FindMatchingOverlays finds all overlays that match the given criteria and returns maximal leaf candidates sorted by specificity (least specific first).

Maximal leaf selection: after collecting all matching overlays, any overlay that is an ancestor (via spec.base chain) of another matching overlay is filtered out. Only the most-specific leaves survive as candidates. Their full inheritance chains are still resolved during merging, so ancestor content is not lost — it is just not applied as a separate independent candidate.

This is used by both BuildRecipeResult and BuildRecipeResultWithEvaluator to ensure consistent candidate selection regardless of call site.

func (*MetadataStore) GetRecipeByName

func (s *MetadataStore) GetRecipeByName(name string) (*RecipeMetadata, bool)

GetRecipeByName returns a recipe metadata by name. Returns the base recipe if name is "base", otherwise looks up in overlays.

func (*MetadataStore) GetValuesFile

func (s *MetadataStore) GetValuesFile(filename string) ([]byte, error)

GetValuesFile returns the content of a values file by filename.

type NodeSchedulingConfig

type NodeSchedulingConfig struct {
	// System defines paths for system component scheduling.
	System SchedulingPaths `yaml:"system,omitempty"`

	// Accelerated defines paths for GPU/accelerated node scheduling.
	Accelerated SchedulingPaths `yaml:"accelerated,omitempty"`

	// NodeCountPaths are Helm value paths where the bundle-time node count is injected (e.g. estimatedNodeCount for nodewright-operator).
	NodeCountPaths []string `yaml:"nodeCountPaths,omitempty"`
}

NodeSchedulingConfig defines paths for node scheduling injection.

type NodeSelection

type NodeSelection struct {
	// Selector specifies label-based node selection.
	Selector map[string]string `json:"selector,omitempty" yaml:"selector,omitempty"`

	// MaxNodes limits the number of nodes to validate.
	MaxNodes int `json:"maxNodes,omitempty" yaml:"maxNodes,omitempty"`

	// ExcludeNodes lists node names to exclude from validation.
	ExcludeNodes []string `json:"excludeNodes,omitempty" yaml:"excludeNodes,omitempty"`
}

NodeSelection defines node filtering for validation scope.

type Option

type Option func(*Builder)

Option is a functional option for configuring Builder instances.

func WithAllowLists

func WithAllowLists(al *AllowLists) Option

WithAllowLists returns an Option that sets criteria allowlists for the Builder. When allowlists are configured, the Builder will reject criteria values that are not in the allowed list. This is used by the API server to restrict which criteria values can be requested.

func WithDataProvider added in v0.14.0

func WithDataProvider(dp DataProvider) Option

WithDataProvider binds the Builder to a specific DataProvider, isolating its metadata store and component registry from the process-global ones at GetDataProvider().

Use this from any caller that constructs more than one Builder per process. When unset, the Builder falls back to the package-global DataProvider — preserving the CLI and API server behavior.

func WithVersion

func WithVersion(version string) Option

WithVersion returns an Option that sets the Builder version string. The version is included in recipe metadata for tracking purposes.

type PodSchedulingConfig

type PodSchedulingConfig struct {
	// Workload defines paths for workload pod scheduling.
	Workload WorkloadSchedulingPaths `yaml:"workload,omitempty"`
}

PodSchedulingConfig defines paths for pod scheduling injection.

type QueryRequest added in v0.11.0

type QueryRequest struct {
	Criteria *Criteria `json:"criteria" yaml:"criteria"`
	Selector string    `json:"selector" yaml:"selector"`
}

QueryRequest represents a query API request body for POST.

func ParseQueryRequestFromBody added in v0.14.0

func ParseQueryRequestFromBody(body io.Reader, contentType string) (*QueryRequest, error)

ParseQueryRequestFromBody parses a QueryRequest from the request body, honoring the standard MaxRecipePOSTBytes bound. The content type selects JSON vs YAML.

type Recipe

type Recipe struct {
	header.Header `json:",inline" yaml:",inline"`

	Request      *RequestInfo               `json:"request,omitempty" yaml:"request,omitempty"`
	MatchedRules []string                   `json:"matchedRules,omitempty" yaml:"matchedRules,omitempty"`
	Measurements []*measurement.Measurement `json:"measurements" yaml:"measurements"`
}

Recipe represents the recipe response structure.

func (*Recipe) GetComponentRef

func (r *Recipe) GetComponentRef(name string) *ComponentRef

GetComponentRef returns nil for Recipe (v1 format doesn't have components).

func (*Recipe) GetCriteria

func (r *Recipe) GetCriteria() *Criteria

GetCriteria returns nil for Recipe (v1 format doesn't have criteria).

func (*Recipe) GetValuesForComponent

func (r *Recipe) GetValuesForComponent(name string) (map[string]any, error)

GetValuesForComponent extracts values from measurements for Recipe. This maintains backward compatibility with the legacy measurements-based format.

func (*Recipe) GetVersion

func (r *Recipe) GetVersion() string

GetVersion returns the recipe version from metadata.

func (*Recipe) Validate

func (r *Recipe) Validate() error

Validate validates a recipe against all registered bundlers that implement Validator.

func (*Recipe) ValidateStructure

func (r *Recipe) ValidateStructure() error

ValidateStructure performs basic structural validation.

type RecipeCriteria

type RecipeCriteria struct {
	// Kind is always "RecipeCriteria".
	Kind string `json:"kind" yaml:"kind"`

	// APIVersion is the API version (e.g., "aicr.nvidia.com/v1alpha1").
	APIVersion string `json:"apiVersion" yaml:"apiVersion"`

	// Metadata contains the name and other metadata.
	Metadata struct {
		// Name is the unique identifier for this criteria set.
		Name string `json:"name" yaml:"name"`
	} `json:"metadata" yaml:"metadata"`

	// Spec contains the actual criteria specification.
	Spec *Criteria `json:"spec" yaml:"spec"`
}

RecipeCriteria represents a Kubernetes-style criteria resource. This is the format used in criteria files and API requests.

Example:

kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
  name: gb200-eks-ubuntu-training
spec:
  service: eks
  os: ubuntu
  accelerator: gb200
  intent: training

type RecipeInput

type RecipeInput interface {
	// GetComponentRef returns the component reference for a given component name.
	// Returns nil if the component is not found.
	GetComponentRef(name string) *ComponentRef

	// GetValuesForComponent returns the values map for a given component.
	// For Recipe, this extracts values from measurements.
	// For RecipeResult, this loads values from the component's valuesFile.
	GetValuesForComponent(name string) (map[string]any, error)

	// GetVersion returns the recipe version (CLI version that generated the recipe).
	// Returns empty string if version is not available.
	GetVersion() string

	// GetCriteria returns the criteria used to generate this recipe.
	// Returns nil if criteria is not available (e.g., for legacy Recipe format).
	GetCriteria() *Criteria
}

RecipeInput is an interface that both Recipe and RecipeResult implement. This allows bundlers to work with either format during the transition period.

type RecipeMetadata

type RecipeMetadata struct {
	RecipeMetadataHeader `json:",inline" yaml:",inline"`

	// Spec contains the recipe specification.
	Spec RecipeMetadataSpec `json:"spec" yaml:"spec"`
}

RecipeMetadata represents a recipe definition (base or overlay).

type RecipeMetadataHeader

type RecipeMetadataHeader struct {
	// Kind is always "RecipeMetadata".
	Kind string `json:"kind" yaml:"kind"`

	// APIVersion is the API version (e.g., "aicr.nvidia.com/v1alpha1").
	APIVersion string `json:"apiVersion" yaml:"apiVersion"`

	// Metadata contains the name and other metadata.
	Metadata struct {
		Name string `json:"name" yaml:"name"`
	} `json:"metadata" yaml:"metadata"`
}

RecipeMetadataHeader contains the Kubernetes-style header fields.

type RecipeMetadataSpec

type RecipeMetadataSpec struct {
	// Base is the name of the parent recipe to inherit from.
	// If empty, the recipe inherits from "base" (the root base.yaml).
	// This enables multi-level inheritance chains like:
	//   base → eks → eks-training → h100-eks-training
	Base string `json:"base,omitempty" yaml:"base,omitempty"`

	// Criteria defines when this recipe/overlay applies.
	// Only present in overlay files, not in base.
	Criteria *Criteria `json:"criteria,omitempty" yaml:"criteria,omitempty"`

	// Mixins is a list of mixin names to compose into this overlay.
	// Mixins are loaded from recipes/mixins/ and carry only constraints
	// and componentRefs. This field is loader metadata and is stripped
	// from the materialized recipe result.
	Mixins []string `json:"mixins,omitempty" yaml:"mixins,omitempty"`

	// Constraints are deployment assumptions/requirements.
	Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`

	// ComponentRefs is the list of components to deploy.
	ComponentRefs []ComponentRef `json:"componentRefs,omitempty" yaml:"componentRefs,omitempty"`

	// Validation defines multi-phase validation configuration.
	// Presence of a phase implies it is enabled.
	Validation *ValidationConfig `json:"validation,omitempty" yaml:"validation,omitempty"`
}

RecipeMetadataSpec contains the specification for a recipe.

func (*RecipeMetadataSpec) Merge

func (s *RecipeMetadataSpec) Merge(other *RecipeMetadataSpec)

Merge merges another RecipeMetadataSpec into this one. The other spec takes precedence for conflicts.

func (*RecipeMetadataSpec) TopologicalLevels added in v0.13.0

func (s *RecipeMetadataSpec) TopologicalLevels() ([][]string, error)

TopologicalLevels returns components grouped into dependency-depth tiers (Kahn-style, level-by-level). Level i contains exactly the components whose longest dependency path from a root is i. All components within a level are mutually independent (no edges among them), so a deployer can install/diff them in parallel.

Within each level, names are sorted alphabetically for determinism.

Error semantics match TopologicalSort: missing or cyclic dependencies surface as ErrCodeInvalidRequest with "circular dependencies exist." (Same trade-off — a dependency on an undeclared component appears as a cycle because its in-degree never drains to zero.)

func (*RecipeMetadataSpec) TopologicalSort

func (s *RecipeMetadataSpec) TopologicalSort() ([]string, error)

TopologicalSort returns components in dependency order (dependencies first). Components with no dependencies come first, then components that depend only on already-listed components, etc.

func (*RecipeMetadataSpec) ValidateDependencies

func (s *RecipeMetadataSpec) ValidateDependencies() error

ValidateDependencies validates that all dependencyRefs reference existing components. Returns an error if any dependency is missing or if there are circular dependencies.

type RecipeMixin added in v0.12.0

type RecipeMixin struct {
	Kind       string `json:"kind" yaml:"kind"`
	APIVersion string `json:"apiVersion" yaml:"apiVersion"`
	Metadata   struct {
		Name string `json:"name" yaml:"name"`
	} `json:"metadata" yaml:"metadata"`
	Spec struct {
		Constraints   []Constraint   `json:"constraints,omitempty" yaml:"constraints,omitempty"`
		ComponentRefs []ComponentRef `json:"componentRefs,omitempty" yaml:"componentRefs,omitempty"`
	} `json:"spec" yaml:"spec"`
}

RecipeMixin represents a composable fragment that carries only constraints and componentRefs. Mixins live in recipes/mixins/ and are referenced by overlay spec.mixins fields.

type RecipeResult

type RecipeResult struct {
	// Kind is always "RecipeResult".
	Kind string `json:"kind" yaml:"kind"`

	// APIVersion is the API version.
	APIVersion string `json:"apiVersion" yaml:"apiVersion"`

	// Metadata contains result metadata.
	Metadata struct {
		// Version is the recipe version (CLI version that generated this recipe).
		Version string `json:"version,omitempty" yaml:"version,omitempty"`

		// AppliedOverlays lists the overlay names in order of application.
		AppliedOverlays []string `json:"appliedOverlays,omitempty" yaml:"appliedOverlays,omitempty"`

		// ExcludedOverlays lists overlays that matched criteria but were excluded
		// from the final recipe, along with the machine-readable exclusion reason.
		// Only populated when a snapshot is provided during recipe generation.
		ExcludedOverlays []ExcludedOverlay `json:"excludedOverlays,omitempty" yaml:"excludedOverlays,omitempty"`

		// ConstraintWarnings contains details about why specific overlays were excluded.
		// Helps users understand why certain environment-specific configurations
		// were not applied and what would need to change to include them.
		ConstraintWarnings []ConstraintWarning `json:"constraintWarnings,omitempty" yaml:"constraintWarnings,omitempty"`
	} `json:"metadata" yaml:"metadata"`

	// Criteria is the input criteria used to generate this result.
	Criteria *Criteria `json:"criteria" yaml:"criteria"`

	// Constraints is the merged list of constraints.
	Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`

	// ComponentRefs is the merged list of components.
	ComponentRefs []ComponentRef `json:"componentRefs" yaml:"componentRefs"`

	// DeploymentOrder is the topologically sorted component names for deployment.
	// Components should be deployed in this order to satisfy dependencies.
	DeploymentOrder []string `json:"deploymentOrder" yaml:"deploymentOrder"`

	// Validation defines multi-phase validation configuration.
	// Inherited from recipe metadata during merging.
	Validation *ValidationConfig `json:"validation,omitempty" yaml:"validation,omitempty"`
	// contains filtered or unexported fields
}

RecipeResult represents the final merged recipe output.

func LoadFromFileWithProvider added in v0.14.0

func LoadFromFileWithProvider(ctx context.Context, path, kubeconfig, version string, dp DataProvider) (*RecipeResult, error)

LoadFromFileWithProvider loads a recipe from the given path bound to an explicit DataProvider. Overlay inputs (kind: RecipeMetadata) are hydrated through a builder bound to dp (so external --data overlays resolve against dp, not the package global), and the returned result carries dp via its provider field. A nil dp falls back to the package-global provider (matching LoadFromFile).

func (*RecipeResult) AssertOwnedBy added in v0.14.0

func (r *RecipeResult) AssertOwnedBy(b *Builder) error

AssertOwnedBy returns nil when this RecipeResult was produced by b, and ErrCodeInvalidRequest otherwise. The check uses pointer identity on the unexported owner field stamped by Builder.buildWithStore at build time.

Use this from consumer entry points that hold a *Builder reference and want to refuse a RecipeResult produced elsewhere before reading values (e.g., calling GetValuesForComponent). Two Builders with different DataProviders would otherwise mix component refs from one provider with file reads from the other — the same bug class the facade-level assertOwns in pkg/client/v1 protects against, but enforced at the layer where the data lives so external pkg/recipe.Builder importers are covered too.

A nil receiver returns nil (vacuously owned by anything) so the helper composes with chained nil-checks. A nil b argument returns ErrCodeInvalidRequest — callers must pass the Builder they want to assert against.

A nil owner on a non-nil result is rejected: the result has no provenance and cannot prove it belongs to b. Callers that load results externally (e.g., recipe YAML from disk) and want to consume them must rebuild via Builder or skip the owner-checked entry points; BindDataProvider + the non-checked accessors remain available for that path.

func (*RecipeResult) BindDataProvider added in v0.14.0

func (r *RecipeResult) BindDataProvider(dp DataProvider)

BindDataProvider sets the DataProvider on a RecipeResult so downstream value/manifest/data-file reads route through dp rather than the package global. It is the exported binder the aicr.Client facade uses to adopt a RecipeResult decoded from an external source (e.g. a /v1/bundle POST body) onto the Client's own provider — the in-process equivalent of the rec.provider = dp binding loader.go performs for an already-hydrated file. Nil-safe on the receiver. A nil dp leaves the result on the package-global fallback (DataProvider() then returns nil), matching the pre-bind behavior.

func (*RecipeResult) DataProvider added in v0.14.0

func (r *RecipeResult) DataProvider() DataProvider

DataProvider returns the DataProvider that produced this result, or nil when the result was built against the package-global provider. Nil-safe on the receiver so call sites can chain freely off a possibly-nil result.

func (*RecipeResult) DeepCopy added in v0.14.0

func (r *RecipeResult) DeepCopy() *RecipeResult

DeepCopy returns an independent copy of r with all exported fields deep-copied: the nested Metadata slices, Criteria, Constraints, ComponentRefs (including their map/slice fields), DeploymentOrder, and Validation config. The unexported provider is intentionally NOT copied — it is left nil so the caller can rebind it (e.g. the aicr.Client facade adopts a recipe by deep-copying first, then BindDataProvider on the copy, so binding never mutates caller-owned state). Nil-safe on the receiver.

Used by the facade's AdoptRecipe path: a caller may reuse one *RecipeResult across multiple Clients, and binding a Client's provider must not leak into the caller's pointer or contaminate a sibling Client's binding.

func (*RecipeResult) GetComponentRef

func (r *RecipeResult) GetComponentRef(name string) *ComponentRef

GetComponentRef returns the component reference for a given component name.

func (*RecipeResult) GetCriteria

func (r *RecipeResult) GetCriteria() *Criteria

GetCriteria returns the criteria used to generate this recipe result.

func (*RecipeResult) GetValuesForComponent

func (r *RecipeResult) GetValuesForComponent(name string) (map[string]any, error)

GetValuesForComponent loads values from the component's valuesFile and inline overrides.

Internally derives a defaults.FileReadTimeout-bounded context so a hung backing store still returns instead of blocking the goroutine. Callers that already hold a context.Context should use GetValuesForComponentWithContext to honor their own deadline.

func (*RecipeResult) GetValuesForComponentWithContext added in v0.14.0

func (r *RecipeResult) GetValuesForComponentWithContext(ctx context.Context, name string) (map[string]any, error)

GetValuesForComponentWithContext loads values from the component's valuesFile and inline overrides, honoring the caller's context for cancellation/timeout.

Merge order: base values → ValuesFile → Overrides (highest precedence). This supports three patterns:

  1. ValuesFile only: Traditional separate file approach
  2. Overrides only: Fully self-contained recipe with inline overrides
  3. ValuesFile + Overrides: Hybrid - reusable base with recipe-specific tweaks

File lookups route through the DataProvider bound to this result (set when the result was built by a Builder via WithDataProvider). When no provider is bound, lookups fall back to the package-level embedded-data singleton.

func (*RecipeResult) GetVersion

func (r *RecipeResult) GetVersion() string

GetVersion returns the recipe version from metadata.

func (*RecipeResult) Owner added in v0.14.0

func (r *RecipeResult) Owner() *Builder

Owner returns the *Builder that produced this RecipeResult, or nil when the result was constructed outside the Builder path (e.g., a recipe file loaded via LoadFromFile that was never re-built locally). Nil-safe on the receiver. Returned identity is for comparison only; callers should not mutate the Builder.

type RegistryCriteriaOption added in v0.14.0

type RegistryCriteriaOption func(reg *CriteriaRegistry, c *Criteria) error

RegistryCriteriaOption is a functional option for building a Criteria against an explicit *CriteriaRegistry. Unlike CriteriaOption (which closes over the package-global registry through the ParseCriteria*Type shims), a RegistryCriteriaOption resolves its enum value against the registry threaded in by BuildCriteriaWithRegistry, so a caller holding a per-provider registry (from GetCriteriaRegistryFor) builds and validates criteria against THAT provider's registered values.

func WithAcceleratorRegistry added in v0.14.0

func WithAcceleratorRegistry(s string) RegistryCriteriaOption

WithAcceleratorRegistry sets the accelerator type, resolving s against the registry threaded in by BuildCriteriaWithRegistry.

func WithIntentRegistry added in v0.14.0

func WithIntentRegistry(s string) RegistryCriteriaOption

WithIntentRegistry sets the intent type, resolving s against the registry threaded in by BuildCriteriaWithRegistry.

func WithNodesRegistry added in v0.14.0

func WithNodesRegistry(n int) RegistryCriteriaOption

WithNodesRegistry sets the number of nodes. The registry is unused (node count is not a registry dimension) but the signature matches the other RegistryCriteriaOption builders so all fields compose uniformly through BuildCriteriaWithRegistry.

func WithOSRegistry added in v0.14.0

func WithOSRegistry(s string) RegistryCriteriaOption

WithOSRegistry sets the OS type, resolving s against the registry threaded in by BuildCriteriaWithRegistry.

func WithPlatformRegistry added in v0.14.0

func WithPlatformRegistry(s string) RegistryCriteriaOption

WithPlatformRegistry sets the platform type, resolving s against the registry threaded in by BuildCriteriaWithRegistry.

func WithServiceRegistry added in v0.14.0

func WithServiceRegistry(s string) RegistryCriteriaOption

WithServiceRegistry sets the service type, resolving s against the registry threaded in by BuildCriteriaWithRegistry.

type RequestInfo

type RequestInfo struct {
	Os        string `json:"os,omitempty" yaml:"os,omitempty"`
	OsVersion string `json:"osVersion,omitempty" yaml:"osVersion,omitempty"`
	Service   string `json:"service,omitempty" yaml:"service,omitempty"`
	K8s       string `json:"k8s,omitempty" yaml:"k8s,omitempty"`
	GPU       string `json:"gpu,omitempty" yaml:"gpu,omitempty"`
	Intent    string `json:"intent,omitempty" yaml:"intent,omitempty"`
}

RequestInfo holds simplified request metadata for documentation purposes. This replaces the old Query type with just the fields needed for bundle documentation.

type SchedulingPaths

type SchedulingPaths struct {
	// NodeSelectorPaths are paths where node selectors are injected.
	NodeSelectorPaths []string `yaml:"nodeSelectorPaths,omitempty"`

	// TolerationPaths are paths where tolerations are injected.
	TolerationPaths []string `yaml:"tolerationPaths,omitempty"`

	// TaintPaths are paths where taints are injected as structured objects.
	// Intended to be used instea of TaintStrPaths for components that need to set specific parts of taints
	// and can't process the string format.
	TaintPaths []string `yaml:"taintPaths,omitempty"`

	// TaintStrPaths are paths where taints are injected as strings (format: key=value:effect or key:effect).
	TaintStrPaths []string `yaml:"taintStrPaths,omitempty"`
}

SchedulingPaths holds the Helm value paths for node scheduling.

type ValidationConfig

type ValidationConfig struct {
	// Readiness defines readiness validation phase settings.
	Readiness *ValidationPhase `json:"readiness,omitempty" yaml:"readiness,omitempty"`

	// Deployment defines deployment validation phase settings.
	Deployment *ValidationPhase `json:"deployment,omitempty" yaml:"deployment,omitempty"`

	// Performance defines performance validation phase settings.
	Performance *ValidationPhase `json:"performance,omitempty" yaml:"performance,omitempty"`

	// Conformance defines conformance validation phase settings.
	Conformance *ValidationPhase `json:"conformance,omitempty" yaml:"conformance,omitempty"`
}

ValidationConfig defines validation phases and settings.

type ValidationPhase

type ValidationPhase struct {
	// Timeout is the maximum duration for this phase (e.g., "10m").
	Timeout string `json:"timeout,omitempty" yaml:"timeout,omitempty"`

	// Constraints are phase-level constraints to evaluate.
	Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`

	// Checks are named validation checks to run in this phase.
	Checks []string `json:"checks,omitempty" yaml:"checks,omitempty"`

	// NodeSelection defines which nodes to include in validation.
	NodeSelection *NodeSelection `json:"nodeSelection,omitempty" yaml:"nodeSelection,omitempty"`

	// Infrastructure references a componentRef that provides validation infrastructure.
	// Example: "nccl-doctor" for performance testing.
	Infrastructure string `json:"infrastructure,omitempty" yaml:"infrastructure,omitempty"`
}

ValidationPhase represents a single validation phase configuration.

type WorkloadSchedulingPaths

type WorkloadSchedulingPaths struct {
	// WorkloadSelectorPaths are paths where workload selectors are injected.
	WorkloadSelectorPaths []string `yaml:"workloadSelectorPaths,omitempty"`
}

WorkloadSchedulingPaths holds the Helm value paths for workload scheduling.

Directories

Path Synopsis
Package oskind is the single source of truth for the string values of the OS recipe criterion.
Package oskind is the single source of truth for the string values of the OS recipe criterion.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL