Documentation
¶
Overview ¶
Package recipe provides recipe building and matching functionality.
Package recipe provides recipe building and matching functionality.
Package recipe provides configuration recipe generation based on deployment criteria.
Overview ¶
The recipe package generates tailored configuration recommendations for GPU-accelerated Kubernetes clusters. It uses a metadata-driven model where base configurations are enhanced with criteria-specific overlays to produce deployment-ready component references.
Core Types ¶
Criteria: Specifies target deployment parameters
type Criteria struct {
Service CriteriaServiceType // eks, gke, aks, any
Accelerator CriteriaAcceleratorType // h100, gb200, b200, a100, l40, any
Intent CriteriaIntentType // training, inference, any
OS CriteriaOSType // ubuntu, cos, rhel, any
Nodes int // node count (0 = any)
}
RecipeResult: Generated configuration result
type RecipeResult struct {
Header // API version, kind, metadata
Criteria *Criteria // Input criteria
MatchedRules []string // Applied overlay rules
ComponentRefs []ComponentRef // Component references (Helm or Kustomize)
Constraints []ConstraintRef // Validation constraints
}
Recipe: Legacy format still used by bundlers
type Recipe struct {
Header // API version, kind, metadata
Request *RequestInfo // Input metadata (optional)
MatchedRules []string // Applied overlay rules
Measurements []*measurement.Measurement // Configuration data
}
Builder: Generates recipes from criteria
type Builder struct {
Version string // Builder version for tracking
}
Criteria Types ¶
Service types for Kubernetes environments:
- CriteriaServiceEKS: Amazon EKS
- CriteriaServiceGKE: Google GKE
- CriteriaServiceAKS: Azure AKS
- CriteriaServiceAny: Any service (wildcard)
Accelerator types for GPU selection:
- CriteriaAcceleratorH100: NVIDIA H100
- CriteriaAcceleratorGB200: NVIDIA GB200
- CriteriaAcceleratorB200: NVIDIA B200
- CriteriaAcceleratorA100: NVIDIA A100
- CriteriaAcceleratorL40: NVIDIA L40
- CriteriaAcceleratorAny: Any accelerator (wildcard)
Intent types for workload optimization:
- CriteriaIntentTraining: ML training workloads
- CriteriaIntentInference: Inference workloads
- CriteriaIntentAny: Generic workloads
Usage ¶
Basic recipe generation with criteria:
criteria := recipe.NewCriteria()
criteria.Service = recipe.CriteriaServiceEKS
criteria.Accelerator = recipe.CriteriaAcceleratorH100
criteria.Intent = recipe.CriteriaIntentTraining
ctx := context.Background()
builder := recipe.NewBuilder()
result, err := builder.BuildFromCriteria(ctx, criteria)
if err != nil {
log.Fatal(err)
}
fmt.Printf("Matched rules: %v\n", result.MatchedRules)
for _, ref := range result.ComponentRefs {
fmt.Printf("Component: %s, Version: %s\n", ref.Name, ref.Version)
}
HTTP handler for API server:
builder := recipe.NewBuilder()
http.HandleFunc("/v1/recipe", builder.HandleRecipes)
Parse criteria from HTTP request:
criteria, err := recipe.ParseCriteriaFromRequest(r)
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
Query Parameters (HTTP API - GET) ¶
The HTTP handler accepts these query parameters for GET requests:
- service: eks, gke, aks, any (default: any)
- accelerator: h100, gb200, b200, a100, l40, any (default: any)
- gpu: alias for accelerator (backwards compatibility)
- intent: training, inference, any (default: any)
- os: ubuntu, cos, rhel, any (default: any)
- nodes: integer node count (default: 0 = any)
Criteria Files (CLI and HTTP API - POST) ¶
Criteria can be defined in a Kubernetes-style YAML or JSON file using the RecipeCriteria resource type. This provides an alternative to individual CLI flags or query parameters.
RecipeCriteria: Kubernetes-style resource for criteria definition
type RecipeCriteria struct {
Kind string // Must be "RecipeCriteria"
APIVersion string // Must be "aicr.nvidia.com/v1alpha1"
Metadata struct {
Name string // Optional descriptive name
}
Spec *Criteria // The criteria specification
}
Example criteria file (criteria.yaml):
kind: RecipeCriteria apiVersion: aicr.nvidia.com/v1alpha1 metadata: name: gb200-eks-ubuntu-training spec: service: eks os: ubuntu accelerator: gb200 intent: training
Load criteria from a file:
criteria, err := recipe.LoadCriteriaFromFile("/path/to/criteria.yaml")
if err != nil {
log.Fatal(err)
}
result, err := builder.BuildFromCriteria(ctx, criteria)
Parse criteria from HTTP request body (POST):
criteria, err := recipe.ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"))
if err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
CLI usage with criteria file:
aicr recipe --criteria /path/to/criteria.yaml --output recipe.yaml
CLI flags can override criteria file values:
aicr recipe --criteria criteria.yaml --service gke --output recipe.yaml
HTTP API POST request:
curl -X POST http://localhost:8080/v1/recipe \ -H "Content-Type: application/yaml" \ -d @criteria.yaml
Criteria Matching ¶
Criteria use asymmetric matching with priority-based resolution:
Recipe Wildcard (recipe field = "any"):
- Recipe "any" acts as a wildcard, matching any query value
- Example: Recipe with accelerator="any" matches query accelerator="h100"
Query Wildcard (query field = "any"):
- Query "any" only matches recipes that also have "any"
- Prevents generic queries from matching overly-specific recipes
- Example: Query accelerator="any" does NOT match recipe accelerator="h100"
Exact Match:
- Query service="eks", accelerator="h100" matches recipe with same values
Priority:
- More specific overlays take precedence
- Multiple matching overlays are applied in priority order
- Later overlays can override earlier ones
Metadata Store Model ¶
Recipe generation uses YAML metadata files:
1. Load overlays/base.yaml (common component versions and settings) 2. Find matching overlay files based on criteria 3. Merge overlay configurations into result 4. Return RecipeResult with component references
Base structure (recipes/overlays/base.yaml):
apiVersion: aicr.nvidia.com/v1alpha1
kind: Base
metadata:
name: base
version: v1.0.0
components:
- name: gpu-operator
version: v25.3.3
repository: https://helm.ngc.nvidia.com/nvidia
Overlay structure (recipes/overlays/*.yaml):
apiVersion: aicr.nvidia.com/v1alpha1
kind: Overlay
metadata:
name: h100-training
priority: 100
match:
accelerator: h100
intent: training
components:
- name: gpu-operator
version: v25.3.3
values:
mig.strategy: mixed
RecipeInput Interface ¶
The RecipeInput interface allows bundlers to work with both legacy Recipe and new RecipeResult formats:
type RecipeInput interface {
GetMeasurements() []*measurement.Measurement
GetComponentRef(name string) *ComponentRef
GetValuesForComponent(name string) (map[string]any, error)
}
Error Handling ¶
BuildFromCriteria returns errors when:
- Criteria is nil
- Metadata store cannot be loaded
- No matching overlays found
- Component configuration is invalid
ParseCriteriaFromRequest returns errors when:
- Service type is invalid
- Accelerator type is invalid
- Intent type is invalid
- Nodes count is negative or non-numeric
Data Source ¶
Recipe metadata is embedded at build time from:
- recipes/overlays/base.yaml (base component versions)
- recipes/overlays/*.yaml (criteria-specific overlays)
The metadata store is loaded once and cached (singleton pattern with sync.Once).
Observability ¶
The recipe builder exports Prometheus metrics:
- recipe_built_duration_seconds: Time to build recipe
- recipe_rule_match_total{status}: Rule matching statistics
Integration ¶
The recipe package is used by:
- pkg/cli - recipe command for CLI usage
- pkg/api - API recipe endpoint
- pkg/bundler - Bundle generation from recipes
It depends on:
- pkg/measurement - Measurement data structures
- pkg/version - Version parsing
- pkg/header - Common header types
- pkg/errors - Structured error handling
Component Types ¶
The recipe system supports two component deployment types:
Helm Components:
- Use Helm charts for deployment
- Configured via helm section in registry.yaml
- Support values files and inline overrides
Kustomize Components:
- Use Kustomize for deployment
- Configured via kustomize section in registry.yaml
- Support Git/OCI sources with path and tag
The component registry (recipes/registry.yaml) determines component defaults. Components must have either helm OR kustomize configuration.
Subpackages ¶
- recipe/version - Semantic version parsing (moved to pkg/version)
- recipe/header - Common header structures (moved to pkg/header)
Package recipe provides recipe building and matching functionality.
Index ¶
- Constants
- func GetCriteriaAcceleratorTypes() []string
- func GetCriteriaIntentTypes() []string
- func GetCriteriaOSTypes() []string
- func GetCriteriaPlatformTypes() []string
- func GetCriteriaServiceTypes() []string
- func GetEmbeddedFS() embed.FS
- func GetManifestContent(path string) ([]byte, error)
- func HydrateResult(result *RecipeResult) (map[string]any, error)
- func MatchesCriteriaField(recipeValue, queryValue string) bool
- func ResetComponentRegistryForTesting()
- func Select(hydrated map[string]any, selector string) (any, error)
- func SetDataProvider(provider DataProvider)
- type AllowLists
- type Builder
- func (b *Builder) BuildFromCriteria(ctx context.Context, c *Criteria) (*RecipeResult, error)
- func (b *Builder) BuildFromCriteriaWithEvaluator(ctx context.Context, c *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)
- func (b *Builder) HandleQuery(w http.ResponseWriter, r *http.Request)
- func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request)
- type ComponentConfig
- func (c *ComponentConfig) GetAcceleratedNodeSelectorPaths() []string
- func (c *ComponentConfig) GetAcceleratedTaintStrPaths() []string
- func (c *ComponentConfig) GetAcceleratedTolerationPaths() []string
- func (c *ComponentConfig) GetNodeCountPaths() []string
- func (c *ComponentConfig) GetSystemNodeSelectorPaths() []string
- func (c *ComponentConfig) GetSystemTolerationPaths() []string
- func (c *ComponentConfig) GetType() ComponentType
- func (c *ComponentConfig) GetValidations() []ComponentValidationConfig
- func (c *ComponentConfig) GetWorkloadSelectorPaths() []string
- type ComponentRef
- type ComponentRegistry
- type ComponentType
- type ComponentValidationConfig
- type Constraint
- type ConstraintEvalResult
- type ConstraintEvaluatorFunc
- type ConstraintWarning
- type Criteria
- func BuildCriteria(opts ...CriteriaOption) (*Criteria, error)
- func ExtractCriteriaFromSnapshot(snap *snapshotter.Snapshot) *Criteria
- func LoadCriteriaFromFile(path string) (*Criteria, error)
- func LoadCriteriaFromFileWithContext(ctx context.Context, path string) (*Criteria, error)
- func NewCriteria() *Criteria
- func ParseCriteriaFromBody(body io.Reader, contentType string) (*Criteria, error)
- func ParseCriteriaFromRequest(r *http.Request) (*Criteria, error)
- func ParseCriteriaFromValues(values url.Values) (*Criteria, error)
- type CriteriaAcceleratorType
- type CriteriaIntentType
- type CriteriaOSType
- type CriteriaOption
- type CriteriaPlatformType
- type CriteriaServiceType
- type DataProvider
- type EmbeddedDataProvider
- type ExpectedResource
- type HealthCheckConfig
- type HelmConfig
- type KustomizeConfig
- type LayeredDataProvider
- func (p *LayeredDataProvider) ExternalDir() string
- func (p *LayeredDataProvider) ExternalFiles() []string
- func (p *LayeredDataProvider) ReadFile(path string) ([]byte, error)
- func (p *LayeredDataProvider) Source(path string) string
- func (p *LayeredDataProvider) WalkDir(root string, fn fs.WalkDirFunc) error
- type LayeredProviderConfig
- type MetadataStore
- func (s *MetadataStore) BuildRecipeResult(ctx context.Context, criteria *Criteria) (*RecipeResult, error)
- func (s *MetadataStore) BuildRecipeResultWithEvaluator(ctx context.Context, criteria *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)
- func (s *MetadataStore) FindMatchingOverlays(criteria *Criteria) []*RecipeMetadata
- func (s *MetadataStore) GetRecipeByName(name string) (*RecipeMetadata, bool)
- func (s *MetadataStore) GetValuesFile(filename string) ([]byte, error)
- type NodeSchedulingConfig
- type NodeSelection
- type Option
- type PodSchedulingConfig
- type QueryRequest
- type Recipe
- type RecipeCriteria
- type RecipeInput
- type RecipeMetadata
- type RecipeMetadataHeader
- type RecipeMetadataSpec
- type RecipeResult
- type RequestInfo
- type SchedulingPaths
- type ValidationConfig
- type ValidationPhase
- type WorkloadSchedulingPaths
Constants ¶
const ( EnvAllowedAccelerators = "AICR_ALLOWED_ACCELERATORS" EnvAllowedServices = "AICR_ALLOWED_SERVICES" EnvAllowedIntents = "AICR_ALLOWED_INTENTS" EnvAllowedOSTypes = "AICR_ALLOWED_OS" )
Environment variable names for allowlist configuration.
const (
// DefaultMaxFileSize is the default maximum file size (10MB).
DefaultMaxFileSize = 10 * 1024 * 1024
)
const RecipeAPIVersion = "aicr.nvidia.com/v1alpha1"
RecipeAPIVersion is the API version for recipe metadata and result resources.
const RecipeCriteriaAPIVersion = "aicr.nvidia.com/v1alpha1"
RecipeCriteriaAPIVersion is the API version for RecipeCriteria resources.
const RecipeCriteriaKind = "RecipeCriteria"
RecipeCriteriaKind is the kind value for RecipeCriteria resources.
const RecipeMetadataKind = "RecipeMetadata"
RecipeMetadataKind is the kind value for RecipeMetadata resources.
const RecipeResultKind = "RecipeResult"
RecipeResultKind is the kind value for RecipeResult resources.
Variables ¶
This section is empty.
Functions ¶
func GetCriteriaAcceleratorTypes ¶
func GetCriteriaAcceleratorTypes() []string
GetCriteriaAcceleratorTypes returns all supported accelerator types sorted alphabetically.
func GetCriteriaIntentTypes ¶
func GetCriteriaIntentTypes() []string
GetCriteriaIntentTypes returns all supported intent types sorted alphabetically.
func GetCriteriaOSTypes ¶
func GetCriteriaOSTypes() []string
GetCriteriaOSTypes returns all supported OS types sorted alphabetically.
func GetCriteriaPlatformTypes ¶
func GetCriteriaPlatformTypes() []string
GetCriteriaPlatformTypes returns all supported platform types sorted alphabetically.
func GetCriteriaServiceTypes ¶
func GetCriteriaServiceTypes() []string
GetCriteriaServiceTypes returns all supported service types sorted alphabetically.
func GetEmbeddedFS ¶
GetEmbeddedFS returns the embedded data filesystem. This is used by the CLI to create layered data providers.
func GetManifestContent ¶
GetManifestContent retrieves a manifest file from the data provider. Path should be relative to data directory (e.g., "components/gpu-operator/manifests/dcgm-exporter.yaml").
func HydrateResult ¶ added in v0.11.0
func HydrateResult(result *RecipeResult) (map[string]any, error)
HydrateResult builds a fully hydrated map from a RecipeResult. Component values are merged via GetValuesForComponent so the output contains the final resolved configuration, not file references.
func MatchesCriteriaField ¶ added in v0.9.0
MatchesCriteriaField implements asymmetric matching for a single criteria field. Returns true if the recipe field matches the query field.
Matching rules:
- Query is "any"/empty → only matches if recipe is also "any"/empty
- Recipe is "any"/empty → matches any query value (recipe is generic/wildcard)
- Otherwise → must match exactly
func ResetComponentRegistryForTesting ¶ added in v0.7.8
func ResetComponentRegistryForTesting()
ResetComponentRegistryForTesting resets the singleton registry so it will be reloaded from the current DataProvider on the next call to GetComponentRegistry. This must only be called from tests.
func Select ¶ added in v0.11.0
Select walks a dot-path selector against a hydrated map and returns the value at that path. Returns ErrCodeNotFound for invalid paths. An empty selector returns the entire map.
func SetDataProvider ¶
func SetDataProvider(provider DataProvider)
SetDataProvider sets the global data provider. This should be called before any recipe operations if using external data. Note: This invalidates cached data, so callers should ensure this is called early in the application lifecycle.
Types ¶
type AllowLists ¶
type AllowLists struct {
// Accelerators is the list of allowed accelerator types (e.g., "h100", "l40").
// If empty, all accelerator types are allowed.
Accelerators []CriteriaAcceleratorType
// Services is the list of allowed service types (e.g., "eks", "gke").
// If empty, all service types are allowed.
Services []CriteriaServiceType
// Intents is the list of allowed intent types (e.g., "training", "inference").
// If empty, all intent types are allowed.
Intents []CriteriaIntentType
// OSTypes is the list of allowed OS types (e.g., "ubuntu", "rhel").
// If empty, all OS types are allowed.
OSTypes []CriteriaOSType
}
AllowLists defines which criteria values are permitted for API requests. An empty or nil slice means all values are allowed for that criteria type. This is used by the API server to restrict which values can be requested, while the CLI always allows all values.
func ParseAllowListsFromEnv ¶
func ParseAllowListsFromEnv() (*AllowLists, error)
ParseAllowListsFromEnv parses allowlist configuration from environment variables. Returns nil if no allowlist environment variables are set. Environment variables:
- AICR_ALLOWED_ACCELERATORS: comma-separated list of accelerator types (e.g., "h100,l40")
- AICR_ALLOWED_SERVICES: comma-separated list of service types (e.g., "eks,gke")
- AICR_ALLOWED_INTENTS: comma-separated list of intent types (e.g., "training,inference")
- AICR_ALLOWED_OS: comma-separated list of OS types (e.g., "ubuntu,rhel")
Invalid values in the environment variables are skipped with a warning logged.
func (*AllowLists) AcceleratorStrings ¶
func (a *AllowLists) AcceleratorStrings() []string
AcceleratorStrings returns the allowed accelerator types as strings.
func (*AllowLists) IntentStrings ¶
func (a *AllowLists) IntentStrings() []string
IntentStrings returns the allowed intent types as strings.
func (*AllowLists) IsEmpty ¶
func (a *AllowLists) IsEmpty() bool
IsEmpty returns true if no allowlists are configured (all values allowed).
func (*AllowLists) OSTypeStrings ¶
func (a *AllowLists) OSTypeStrings() []string
OSTypeStrings returns the allowed OS types as strings.
func (*AllowLists) ServiceStrings ¶
func (a *AllowLists) ServiceStrings() []string
ServiceStrings returns the allowed service types as strings.
func (*AllowLists) ValidateCriteria ¶
func (a *AllowLists) ValidateCriteria(c *Criteria) error
ValidateCriteria checks if the given criteria values are permitted by the allowlists. Returns nil if validation passes, or an error with details about what value is not allowed. The "any" value is always allowed regardless of the allowlist configuration.
type Builder ¶
type Builder struct {
Version string
AllowLists *AllowLists
}
Builder constructs RecipeResult payloads based on Criteria specifications. It loads recipe metadata, applies matching overlays, and generates tailored configuration recipes.
func NewBuilder ¶
NewBuilder creates a new Builder instance with the provided functional options.
func (*Builder) BuildFromCriteria ¶
BuildFromCriteria creates a RecipeResult payload for the provided criteria. It loads the metadata store, applies matching overlays, and returns a RecipeResult with merged components and computed deployment order.
func (*Builder) BuildFromCriteriaWithEvaluator ¶
func (b *Builder) BuildFromCriteriaWithEvaluator(ctx context.Context, c *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)
BuildFromCriteriaWithEvaluator creates a RecipeResult payload for the provided criteria, filtering overlays based on constraint evaluation against snapshot data.
When an evaluator function is provided:
- Overlays that match by criteria but fail constraint evaluation are excluded
- Constraint warnings are included in the result metadata for visibility
- Only overlays whose constraints pass (or have no constraints) are merged
The evaluator function is typically created by wrapping validator.EvaluateConstraint with the snapshot data.
func (*Builder) HandleQuery ¶ added in v0.11.0
func (b *Builder) HandleQuery(w http.ResponseWriter, r *http.Request)
HandleQuery processes query requests. It resolves a recipe from criteria, hydrates all component values, and returns the value at the given selector path. Supports GET with query parameters (+selector) and POST with JSON/YAML body.
func (*Builder) HandleRecipes ¶
func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request)
HandleRecipes processes recipe requests using the criteria-based system. It supports GET requests with query parameters and POST requests with JSON/YAML body to specify recipe criteria. The response returns a RecipeResult with component references and constraints. Errors are handled and returned in a structured format.
type ComponentConfig ¶
type ComponentConfig struct {
// Name is the component identifier used in recipes (e.g., "gpu-operator").
Name string `yaml:"name"`
// DisplayName is the human-readable name used in templates and output.
DisplayName string `yaml:"displayName"`
// ValueOverrideKeys are alternative keys for --set flag matching.
// Example: ["gpuoperator"] allows --set gpuoperator:key=value
ValueOverrideKeys []string `yaml:"valueOverrideKeys,omitempty"`
// Helm contains default Helm chart settings.
Helm HelmConfig `yaml:"helm,omitempty"`
// Kustomize contains default Kustomize settings.
Kustomize KustomizeConfig `yaml:"kustomize,omitempty"`
// NodeScheduling defines paths for injecting node selectors and tolerations.
NodeScheduling NodeSchedulingConfig `yaml:"nodeScheduling,omitempty"`
PodScheduling PodSchedulingConfig `yaml:"podScheduling,omitempty"`
// Validations defines component-specific validation checks.
Validations []ComponentValidationConfig `yaml:"validations,omitempty"`
// HealthCheck defines custom health check configuration for this component.
HealthCheck HealthCheckConfig `yaml:"healthCheck,omitempty"`
}
ComponentConfig defines the bundler configuration for a component. This replaces the per-component Go packages with declarative YAML.
func (*ComponentConfig) GetAcceleratedNodeSelectorPaths ¶
func (c *ComponentConfig) GetAcceleratedNodeSelectorPaths() []string
GetAcceleratedNodeSelectorPaths returns all accelerated node selector paths for a component.
func (*ComponentConfig) GetAcceleratedTaintStrPaths ¶
func (c *ComponentConfig) GetAcceleratedTaintStrPaths() []string
GetAcceleratedTaintStrPaths returns all accelerated taint string paths for a component.
func (*ComponentConfig) GetAcceleratedTolerationPaths ¶
func (c *ComponentConfig) GetAcceleratedTolerationPaths() []string
GetAcceleratedTolerationPaths returns all accelerated toleration paths for a component.
func (*ComponentConfig) GetNodeCountPaths ¶ added in v0.8.0
func (c *ComponentConfig) GetNodeCountPaths() []string
GetNodeCountPaths returns Helm value paths where the node count is injected.
func (*ComponentConfig) GetSystemNodeSelectorPaths ¶
func (c *ComponentConfig) GetSystemNodeSelectorPaths() []string
GetSystemNodeSelectorPaths returns all system node selector paths for a component.
func (*ComponentConfig) GetSystemTolerationPaths ¶
func (c *ComponentConfig) GetSystemTolerationPaths() []string
GetSystemTolerationPaths returns all system toleration paths for a component.
func (*ComponentConfig) GetType ¶
func (c *ComponentConfig) GetType() ComponentType
GetType returns the component deployment type based on which config is present. Returns ComponentTypeKustomize if Kustomize.DefaultSource is set, otherwise returns ComponentTypeHelm (the default).
func (*ComponentConfig) GetValidations ¶
func (c *ComponentConfig) GetValidations() []ComponentValidationConfig
GetValidations returns all validation configurations for a component.
func (*ComponentConfig) GetWorkloadSelectorPaths ¶
func (c *ComponentConfig) GetWorkloadSelectorPaths() []string
GetWorkloadSelectorPaths returns all workload selector paths for a component.
type ComponentRef ¶
type ComponentRef struct {
// Name is the unique identifier for this component.
Name string `json:"name" yaml:"name"`
// Namespace is the Kubernetes namespace for deploying this component.
Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"`
// Chart is the Helm chart name (e.g., "gpu-operator").
Chart string `json:"chart,omitempty" yaml:"chart,omitempty"`
// Type is the deployment type (Helm, Kustomize).
Type ComponentType `json:"type" yaml:"type"`
// Source is the repository URL or OCI reference.
Source string `json:"source" yaml:"source"`
// Version is the chart/component version (for Helm).
Version string `json:"version,omitempty" yaml:"version,omitempty"`
// Tag is the image/resource tag (for Kustomize).
Tag string `json:"tag,omitempty" yaml:"tag,omitempty"`
// ValuesFile is the path to the values file (relative to data directory).
ValuesFile string `json:"valuesFile,omitempty" yaml:"valuesFile,omitempty"`
// Overrides contains inline values that override those from ValuesFile.
// Merge order: base values → ValuesFile → Overrides (highest precedence).
Overrides map[string]any `json:"overrides,omitempty" yaml:"overrides,omitempty"`
// Patches is a list of patch files to apply (for Kustomize).
Patches []string `json:"patches,omitempty" yaml:"patches,omitempty"`
// DependencyRefs is a list of component names this component depends on.
DependencyRefs []string `json:"dependencyRefs,omitempty" yaml:"dependencyRefs,omitempty"`
// ManifestFiles lists manifest files to include in the component bundle.
// Paths are relative to the data directory.
// Example: ["components/gpu-operator/manifests/dcgm-exporter.yaml"]
ManifestFiles []string `json:"manifestFiles,omitempty" yaml:"manifestFiles,omitempty"`
// Path is the path within the repository to the kustomization (for Kustomize).
Path string `json:"path,omitempty" yaml:"path,omitempty"`
// Cleanup indicates whether to uninstall this component after validation.
// Used for validation infrastructure components (e.g., nccl-doctor).
Cleanup bool `json:"cleanup,omitempty" yaml:"cleanup,omitempty"`
// ExpectedResources lists Kubernetes resources that should exist after deployment.
// Used by deployment phase validation to verify component health.
ExpectedResources []ExpectedResource `json:"expectedResources,omitempty" yaml:"expectedResources,omitempty"`
// HealthCheckAsserts contains raw Chainsaw-style assert YAML loaded from the
// registry's healthCheck.assertFile via the DataProvider. When non-empty, the
// expected-resources check runs Chainsaw CLI to evaluate assertions instead of
// the default auto-discovery + typed replica checks.
HealthCheckAsserts string `json:"healthCheckAsserts,omitempty" yaml:"healthCheckAsserts,omitempty"`
}
ComponentRef represents a reference to a deployable component.
func (*ComponentRef) ApplyRegistryDefaults ¶
func (ref *ComponentRef) ApplyRegistryDefaults(config *ComponentConfig)
ApplyRegistryDefaults fills in ComponentRef fields from ComponentConfig defaults. This applies registry defaults for fields that are not already set in the ComponentRef.
func (ComponentRef) IsEnabled ¶ added in v0.10.13
func (c ComponentRef) IsEnabled() bool
IsEnabled returns whether this component is enabled for deployment. A component is disabled when its Overrides map contains enabled: false. Components without an explicit enabled override are enabled by default.
type ComponentRegistry ¶
type ComponentRegistry struct {
APIVersion string `yaml:"apiVersion"`
Kind string `yaml:"kind"`
Components []ComponentConfig `yaml:"components"`
// contains filtered or unexported fields
}
ComponentRegistry holds the declarative configuration for all components. This is loaded from embedded recipe data (recipes/registry.yaml) at startup.
func GetComponentRegistry ¶
func GetComponentRegistry() (*ComponentRegistry, error)
GetComponentRegistry returns the global component registry. The registry is loaded once from embedded data and cached. Returns an error if the registry file cannot be loaded or parsed.
func (*ComponentRegistry) Count ¶
func (r *ComponentRegistry) Count() int
Count returns the number of components in the registry.
func (*ComponentRegistry) Get ¶
func (r *ComponentRegistry) Get(name string) *ComponentConfig
Get returns the component configuration by name. Returns nil if the component is not found.
func (*ComponentRegistry) GetByOverrideKey ¶
func (r *ComponentRegistry) GetByOverrideKey(key string) *ComponentConfig
GetByOverrideKey returns the component configuration by value override key. This is used for matching --set flags like --set gpuoperator:key=value. Returns nil if no component matches the key.
func (*ComponentRegistry) Names ¶
func (r *ComponentRegistry) Names() []string
Names returns all component names in the registry.
func (*ComponentRegistry) Validate ¶
func (r *ComponentRegistry) Validate() []error
Validate checks the component registry for errors. Returns a slice of validation errors (empty if valid).
type ComponentType ¶
type ComponentType string
ComponentType represents the type of component deployment.
const ( ComponentTypeHelm ComponentType = "Helm" ComponentTypeKustomize ComponentType = "Kustomize" )
ComponentType constants for supported deployment types.
type ComponentValidationConfig ¶
type ComponentValidationConfig struct {
// Function is the name of the validation function to execute (e.g., "CheckWorkloadSelectorMissing").
Function string `yaml:"function"`
// Severity determines whether failures are warnings or errors ("warning" or "error").
Severity string `yaml:"severity"`
// Conditions are optional conditions that must be met for the validation to run.
// Values are arrays of strings for OR matching (single element arrays are equivalent to single values).
// Example: {"intent": ["training"]} or {"intent": ["training", "inference"]}
Conditions map[string][]string `yaml:"conditions,omitempty"`
// Message is an optional detail message to append to validation failures/warnings.
Message string `yaml:"message,omitempty"`
}
ComponentValidationConfig defines a component-specific validation check.
type Constraint ¶
type Constraint struct {
// Name is the constraint identifier (e.g., "k8s", "worker-os").
Name string `json:"name" yaml:"name"`
// Value is the constraint expression (e.g., ">= 1.30", "ubuntu").
Value string `json:"value" yaml:"value"`
// Severity indicates the constraint severity ("error" or "warning").
Severity string `json:"severity,omitempty" yaml:"severity,omitempty"`
// Remediation provides actionable guidance for fixing failed constraints.
Remediation string `json:"remediation,omitempty" yaml:"remediation,omitempty"`
// Unit specifies the unit for numeric constraints (e.g., "GB/s").
Unit string `json:"unit,omitempty" yaml:"unit,omitempty"`
}
Constraint represents a deployment constraint/assumption.
type ConstraintEvalResult ¶
type ConstraintEvalResult struct {
// Passed indicates if the constraint was satisfied.
Passed bool
// Actual is the actual value extracted from the snapshot.
Actual string
// Error contains the error if evaluation failed (e.g., value not found).
Error error
}
ConstraintEvalResult represents the result of evaluating a single constraint. This mirrors the result from pkg/validator to avoid circular imports.
type ConstraintEvaluatorFunc ¶
type ConstraintEvaluatorFunc func(constraint Constraint) ConstraintEvalResult
ConstraintEvaluatorFunc is a function type for evaluating constraints. It takes a constraint and returns the evaluation result. This function type allows the recipe package to use constraint evaluation from the validator package without creating a circular dependency.
type ConstraintWarning ¶
type ConstraintWarning struct {
// Overlay is the name of the overlay that was excluded.
Overlay string `json:"overlay" yaml:"overlay"`
// Constraint is the name of the constraint that failed.
Constraint string `json:"constraint" yaml:"constraint"`
// Expected is the expected constraint value.
Expected string `json:"expected" yaml:"expected"`
// Actual is the actual value from the snapshot (if found).
Actual string `json:"actual,omitempty" yaml:"actual,omitempty"`
// Reason explains why the constraint evaluation resulted in exclusion.
Reason string `json:"reason" yaml:"reason"`
}
ConstraintWarning represents a warning about an overlay that matched criteria but was excluded due to failing constraint validation against the snapshot.
type Criteria ¶
type Criteria struct {
// Service is the Kubernetes service type (eks, gke, aks, oke, self-managed).
Service CriteriaServiceType `json:"service,omitempty" yaml:"service,omitempty"`
// Accelerator is the GPU/accelerator type (h100, gb200, b200, a100, l40).
Accelerator CriteriaAcceleratorType `json:"accelerator,omitempty" yaml:"accelerator,omitempty"`
// Intent is the workload intent (training, inference).
Intent CriteriaIntentType `json:"intent,omitempty" yaml:"intent,omitempty"`
// OS is the worker node operating system type.
OS CriteriaOSType `json:"os,omitempty" yaml:"os,omitempty"`
// Platform is the platform/framework type (kubeflow).
Platform CriteriaPlatformType `json:"platform,omitempty" yaml:"platform,omitempty"`
// Nodes is the number of worker nodes (0 means any/unspecified).
Nodes int `json:"nodes,omitempty" yaml:"nodes,omitempty"`
}
Criteria represents the input parameters for recipe matching. All fields are optional and default to "any" if not specified.
func BuildCriteria ¶
func BuildCriteria(opts ...CriteriaOption) (*Criteria, error)
BuildCriteria creates a Criteria from functional options.
func ExtractCriteriaFromSnapshot ¶
func ExtractCriteriaFromSnapshot(snap *snapshotter.Snapshot) *Criteria
ExtractCriteriaFromSnapshot extracts criteria from a snapshot. This maps snapshot measurements to criteria fields.
func LoadCriteriaFromFile ¶
LoadCriteriaFromFile loads criteria from a YAML or JSON file. The file format is auto-detected from the file extension. All fields are optional and default to "any" if not specified.
Example file (YAML):
kind: RecipeCriteria apiVersion: aicr.nvidia.com/v1alpha1 metadata: name: gb200-eks-ubuntu-training spec: service: eks os: ubuntu accelerator: gb200 intent: training
func LoadCriteriaFromFileWithContext ¶
LoadCriteriaFromFileWithContext loads criteria from a YAML or JSON file with context support. The file format is auto-detected from the file extension. All fields are optional and default to "any" if not specified.
For HTTP/HTTPS URLs, the context is used for timeout and cancellation. For local file paths, the context is currently not used but is accepted for API consistency.
Example file (YAML):
kind: RecipeCriteria apiVersion: aicr.nvidia.com/v1alpha1 metadata: name: gb200-eks-ubuntu-training spec: service: eks os: ubuntu accelerator: gb200 intent: training
func NewCriteria ¶
func NewCriteria() *Criteria
NewCriteria creates a new Criteria with all fields set to "any".
func ParseCriteriaFromBody ¶
ParseCriteriaFromBody parses criteria from an io.Reader (HTTP request body). Supports JSON and YAML based on the Content-Type header. All fields are optional and default to "any" if not specified.
Supported Content-Types:
- application/json
- application/x-yaml, application/yaml, text/yaml
If Content-Type is empty or unrecognized, JSON is assumed.
Example JSON body:
{
"kind": "RecipeCriteria",
"apiVersion": "aicr.nvidia.com/v1alpha1",
"metadata": {"name": "my-criteria"},
"spec": {"service": "eks", "accelerator": "h100"}
}
func ParseCriteriaFromRequest ¶
ParseCriteriaFromRequest parses recipe criteria from HTTP query parameters. All parameters are optional and default to "any" if not specified. Supported parameters: service, accelerator (alias: gpu), intent, os, nodes.
func ParseCriteriaFromValues ¶
ParseCriteriaFromValues parses recipe criteria from URL values. All parameters are optional and default to "any" if not specified. Supported parameters: service, accelerator (alias: gpu), intent, os, nodes.
func (*Criteria) Matches ¶
Matches checks if this recipe criteria matches the given query criteria. Uses asymmetric matching:
- Query "any" (or empty) = ONLY matches recipes that are also "any"/empty for that field
- Recipe "any" (or empty) = wildcard (matches any query value for that field)
- Query specific + Recipe specific = must match exactly
This ensures a generic query (e.g., accelerator=any) only matches generic recipes (e.g., accelerator=any), while a specific query (e.g., accelerator=gb200) can match both generic recipes and recipes with that specific value.
func (*Criteria) Specificity ¶
Specificity returns a score indicating how specific this criteria is. Higher scores mean more specific criteria (fewer "any" fields). Used for ordering overlay application - more specific overlays are applied later.
type CriteriaAcceleratorType ¶
type CriteriaAcceleratorType string
CriteriaAcceleratorType represents the GPU/accelerator type.
const ( CriteriaAcceleratorAny CriteriaAcceleratorType = "any" CriteriaAcceleratorH100 CriteriaAcceleratorType = "h100" CriteriaAcceleratorGB200 CriteriaAcceleratorType = "gb200" CriteriaAcceleratorB200 CriteriaAcceleratorType = "b200" CriteriaAcceleratorA100 CriteriaAcceleratorType = "a100" CriteriaAcceleratorL40 CriteriaAcceleratorType = "l40" )
CriteriaAcceleratorType constants for supported accelerators.
func ParseCriteriaAcceleratorType ¶
func ParseCriteriaAcceleratorType(s string) (CriteriaAcceleratorType, error)
ParseCriteriaAcceleratorType parses a string into a CriteriaAcceleratorType.
type CriteriaIntentType ¶
type CriteriaIntentType string
CriteriaIntentType represents the workload intent.
const ( CriteriaIntentAny CriteriaIntentType = "any" CriteriaIntentTraining CriteriaIntentType = "training" CriteriaIntentInference CriteriaIntentType = "inference" )
CriteriaIntentType constants for supported workload intents.
func ParseCriteriaIntentType ¶
func ParseCriteriaIntentType(s string) (CriteriaIntentType, error)
ParseCriteriaIntentType parses a string into a CriteriaIntentType.
type CriteriaOSType ¶
type CriteriaOSType string
CriteriaOSType represents an operating system type.
const ( CriteriaOSAny CriteriaOSType = "any" CriteriaOSUbuntu CriteriaOSType = "ubuntu" CriteriaOSRHEL CriteriaOSType = "rhel" CriteriaOSCOS CriteriaOSType = "cos" CriteriaOSAmazonLinux CriteriaOSType = "amazonlinux" )
CriteriaOSType constants for supported operating systems.
func ParseCriteriaOSType ¶
func ParseCriteriaOSType(s string) (CriteriaOSType, error)
ParseCriteriaOSType parses a string into a CriteriaOSType.
type CriteriaOption ¶
CriteriaOption is a functional option for building Criteria.
func WithCriteriaAccelerator ¶
func WithCriteriaAccelerator(s string) CriteriaOption
WithCriteriaAccelerator sets the accelerator type.
func WithCriteriaIntent ¶
func WithCriteriaIntent(s string) CriteriaOption
WithCriteriaIntent sets the intent type.
func WithCriteriaNodes ¶
func WithCriteriaNodes(n int) CriteriaOption
WithCriteriaNodes sets the number of nodes.
func WithCriteriaPlatform ¶
func WithCriteriaPlatform(s string) CriteriaOption
WithCriteriaPlatform sets the platform type.
func WithCriteriaService ¶
func WithCriteriaService(s string) CriteriaOption
WithCriteriaService sets the service type.
type CriteriaPlatformType ¶
type CriteriaPlatformType string
CriteriaPlatformType represents a platform/framework type.
const ( CriteriaPlatformAny CriteriaPlatformType = "any" CriteriaPlatformDynamo CriteriaPlatformType = "dynamo" CriteriaPlatformKubeflow CriteriaPlatformType = "kubeflow" )
CriteriaPlatformType constants for supported platforms.
func ParseCriteriaPlatformType ¶
func ParseCriteriaPlatformType(s string) (CriteriaPlatformType, error)
ParseCriteriaPlatformType parses a string into a CriteriaPlatformType.
type CriteriaServiceType ¶
type CriteriaServiceType string
CriteriaServiceType represents the Kubernetes service/platform type for criteria.
const ( CriteriaServiceAny CriteriaServiceType = "any" CriteriaServiceEKS CriteriaServiceType = "eks" CriteriaServiceGKE CriteriaServiceType = "gke" CriteriaServiceAKS CriteriaServiceType = "aks" CriteriaServiceOKE CriteriaServiceType = "oke" CriteriaServiceKind CriteriaServiceType = "kind" )
CriteriaServiceType constants for supported Kubernetes services.
func ParseCriteriaServiceType ¶
func ParseCriteriaServiceType(s string) (CriteriaServiceType, error)
ParseCriteriaServiceType parses a string into a CriteriaServiceType.
type DataProvider ¶
type DataProvider interface {
// ReadFile reads a file by path (relative to data directory).
ReadFile(path string) ([]byte, error)
// WalkDir walks the directory tree rooted at root.
WalkDir(root string, fn fs.WalkDirFunc) error
// Source returns a description of where data came from (for debugging).
Source(path string) string
}
DataProvider abstracts access to recipe data files. This allows layering external directories over embedded data.
func GetDataProvider ¶
func GetDataProvider() DataProvider
GetDataProvider returns the global data provider. Returns the embedded provider if none was set.
type EmbeddedDataProvider ¶
type EmbeddedDataProvider struct {
// contains filtered or unexported fields
}
EmbeddedDataProvider wraps an embed.FS to implement DataProvider.
func NewEmbeddedDataProvider ¶
func NewEmbeddedDataProvider(efs embed.FS, prefix string) *EmbeddedDataProvider
NewEmbeddedDataProvider creates a provider from an embedded filesystem.
func (*EmbeddedDataProvider) ReadFile ¶
func (p *EmbeddedDataProvider) ReadFile(path string) ([]byte, error)
ReadFile reads a file from the embedded filesystem.
func (*EmbeddedDataProvider) Source ¶
func (p *EmbeddedDataProvider) Source(path string) string
Source returns "embedded" for all paths.
func (*EmbeddedDataProvider) WalkDir ¶
func (p *EmbeddedDataProvider) WalkDir(root string, fn fs.WalkDirFunc) error
WalkDir walks the embedded filesystem.
type ExpectedResource ¶
type ExpectedResource struct {
// Kind is the resource kind (e.g., "Deployment", "DaemonSet").
Kind string `json:"kind" yaml:"kind"`
// Name is the resource name.
Name string `json:"name" yaml:"name"`
// Namespace is the resource namespace (optional for cluster-scoped resources).
Namespace string `json:"namespace,omitempty" yaml:"namespace,omitempty"`
}
ExpectedResource represents a Kubernetes resource that should exist after deployment.
type HealthCheckConfig ¶ added in v0.7.8
type HealthCheckConfig struct {
// AssertFile is the path to a Chainsaw-style assert YAML file (relative to data directory).
// When set, the expected-resources check uses Chainsaw CLI to evaluate assertions
// instead of the default auto-discovery + typed replica checks.
AssertFile string `yaml:"assertFile,omitempty"`
}
HealthCheckConfig defines custom health check settings for a component.
type HelmConfig ¶
type HelmConfig struct {
// DefaultRepository is the default Helm repository URL.
DefaultRepository string `yaml:"defaultRepository,omitempty"`
// DefaultChart is the chart name (e.g., "nvidia/gpu-operator").
DefaultChart string `yaml:"defaultChart,omitempty"`
// DefaultVersion is the default chart version if not specified in recipe.
DefaultVersion string `yaml:"defaultVersion,omitempty"`
// DefaultNamespace is the Kubernetes namespace for deploying this component.
DefaultNamespace string `yaml:"defaultNamespace,omitempty"`
}
HelmConfig contains default Helm chart settings for a component.
type KustomizeConfig ¶
type KustomizeConfig struct {
// DefaultSource is the default Git repository or OCI reference.
DefaultSource string `yaml:"defaultSource,omitempty"`
// DefaultPath is the path within the repository to the kustomization.
DefaultPath string `yaml:"defaultPath,omitempty"`
// DefaultTag is the default Git tag, branch, or commit.
DefaultTag string `yaml:"defaultTag,omitempty"`
}
KustomizeConfig contains default Kustomize settings for a component.
type LayeredDataProvider ¶
type LayeredDataProvider struct {
// contains filtered or unexported fields
}
LayeredDataProvider overlays an external directory on top of embedded data. For registryFileName: merges external components with embedded (external takes precedence). For all other files: external completely replaces embedded if present.
func NewLayeredDataProvider ¶
func NewLayeredDataProvider(embedded *EmbeddedDataProvider, config LayeredProviderConfig) (*LayeredDataProvider, error)
NewLayeredDataProvider creates a provider that layers external data over embedded. Returns an error if: - External directory doesn't exist - External directory doesn't contain registryFileName - Path traversal is detected - File size exceeds limits
func (*LayeredDataProvider) ExternalDir ¶ added in v0.8.0
func (p *LayeredDataProvider) ExternalDir() string
ExternalDir returns the path to the external data directory.
func (*LayeredDataProvider) ExternalFiles ¶ added in v0.8.0
func (p *LayeredDataProvider) ExternalFiles() []string
ExternalFiles returns a sorted list of file paths that came from the external data directory. Paths are relative to the external directory root.
func (*LayeredDataProvider) ReadFile ¶
func (p *LayeredDataProvider) ReadFile(path string) ([]byte, error)
ReadFile reads a file, checking external directory first. For registryFileName, returns merged content. For other files, external completely replaces embedded.
func (*LayeredDataProvider) Source ¶
func (p *LayeredDataProvider) Source(path string) string
Source returns "external" or "embedded" depending on where the file comes from.
func (*LayeredDataProvider) WalkDir ¶
func (p *LayeredDataProvider) WalkDir(root string, fn fs.WalkDirFunc) error
WalkDir walks both embedded and external directories. External files take precedence over embedded files.
type LayeredProviderConfig ¶
type LayeredProviderConfig struct {
// ExternalDir is the path to the external data directory.
ExternalDir string
// MaxFileSize is the maximum allowed file size in bytes (default: 10MB).
MaxFileSize int64
// AllowSymlinks allows symlinks in the external directory (default: false).
AllowSymlinks bool
}
LayeredProviderConfig configures the layered data provider.
type MetadataStore ¶
type MetadataStore struct {
// Base is the base recipe metadata.
Base *RecipeMetadata
// Overlays is a list of overlay recipes indexed by name.
Overlays map[string]*RecipeMetadata
// ValuesFiles contains embedded values file contents indexed by filename.
ValuesFiles map[string][]byte
}
MetadataStore holds the base recipe and all overlays.
func (*MetadataStore) BuildRecipeResult ¶
func (s *MetadataStore) BuildRecipeResult(ctx context.Context, criteria *Criteria) (*RecipeResult, error)
BuildRecipeResult builds a RecipeResult by merging base with matching overlays. Each matching overlay is resolved through its inheritance chain before merging. This enables multi-level inheritance: base → intermediate → overlay.
func (*MetadataStore) BuildRecipeResultWithEvaluator ¶
func (s *MetadataStore) BuildRecipeResultWithEvaluator(ctx context.Context, criteria *Criteria, evaluator ConstraintEvaluatorFunc) (*RecipeResult, error)
BuildRecipeResultWithEvaluator builds a RecipeResult by merging base with matching overlays, filtering overlays based on constraint evaluation using the provided evaluator function.
This method extends BuildRecipeResult with constraint-aware filtering:
- Each overlay that matches by criteria is tested against its constraints
- Overlays with failing constraints are excluded from the merge
- Warnings about excluded overlays are included in the result metadata
The evaluator function is called for each constraint in each matching overlay. If evaluator is nil, this method behaves identically to BuildRecipeResult.
func (*MetadataStore) FindMatchingOverlays ¶
func (s *MetadataStore) FindMatchingOverlays(criteria *Criteria) []*RecipeMetadata
FindMatchingOverlays finds all overlays that match the given criteria. Returns overlays sorted by specificity (least specific first).
func (*MetadataStore) GetRecipeByName ¶
func (s *MetadataStore) GetRecipeByName(name string) (*RecipeMetadata, bool)
GetRecipeByName returns a recipe metadata by name. Returns the base recipe if name is "base", otherwise looks up in overlays.
func (*MetadataStore) GetValuesFile ¶
func (s *MetadataStore) GetValuesFile(filename string) ([]byte, error)
GetValuesFile returns the content of a values file by filename.
type NodeSchedulingConfig ¶
type NodeSchedulingConfig struct {
// System defines paths for system component scheduling.
System SchedulingPaths `yaml:"system,omitempty"`
// Accelerated defines paths for GPU/accelerated node scheduling.
Accelerated SchedulingPaths `yaml:"accelerated,omitempty"`
// NodeCountPaths are Helm value paths where the bundle-time node count is injected (e.g. estimatedNodeCount for skyhook-operator).
NodeCountPaths []string `yaml:"nodeCountPaths,omitempty"`
}
NodeSchedulingConfig defines paths for node scheduling injection.
type NodeSelection ¶
type NodeSelection struct {
// Selector specifies label-based node selection.
Selector map[string]string `json:"selector,omitempty" yaml:"selector,omitempty"`
// MaxNodes limits the number of nodes to validate.
MaxNodes int `json:"maxNodes,omitempty" yaml:"maxNodes,omitempty"`
// ExcludeNodes lists node names to exclude from validation.
ExcludeNodes []string `json:"excludeNodes,omitempty" yaml:"excludeNodes,omitempty"`
}
NodeSelection defines node filtering for validation scope.
type Option ¶
type Option func(*Builder)
Option is a functional option for configuring Builder instances.
func WithAllowLists ¶
func WithAllowLists(al *AllowLists) Option
WithAllowLists returns an Option that sets criteria allowlists for the Builder. When allowlists are configured, the Builder will reject criteria values that are not in the allowed list. This is used by the API server to restrict which criteria values can be requested.
func WithVersion ¶
WithVersion returns an Option that sets the Builder version string. The version is included in recipe metadata for tracking purposes.
type PodSchedulingConfig ¶
type PodSchedulingConfig struct {
// Workload defines paths for workload pod scheduling.
Workload WorkloadSchedulingPaths `yaml:"workload,omitempty"`
}
PodSchedulingConfig defines paths for pod scheduling injection.
type QueryRequest ¶ added in v0.11.0
type QueryRequest struct {
Criteria *Criteria `json:"criteria" yaml:"criteria"`
Selector string `json:"selector" yaml:"selector"`
}
QueryRequest represents a query API request body for POST.
type Recipe ¶
type Recipe struct {
header.Header `json:",inline" yaml:",inline"`
Request *RequestInfo `json:"request,omitempty" yaml:"request,omitempty"`
MatchedRules []string `json:"matchedRules,omitempty" yaml:"matchedRules,omitempty"`
Measurements []*measurement.Measurement `json:"measurements" yaml:"measurements"`
}
Recipe represents the recipe response structure.
func (*Recipe) GetComponentRef ¶
func (r *Recipe) GetComponentRef(name string) *ComponentRef
GetComponentRef returns nil for Recipe (v1 format doesn't have components).
func (*Recipe) GetCriteria ¶
GetCriteria returns nil for Recipe (v1 format doesn't have criteria).
func (*Recipe) GetValuesForComponent ¶
GetValuesForComponent extracts values from measurements for Recipe. This maintains backward compatibility with the legacy measurements-based format.
func (*Recipe) GetVersion ¶
GetVersion returns the recipe version from metadata.
func (*Recipe) Validate ¶
Validate validates a recipe against all registered bundlers that implement Validator.
func (*Recipe) ValidateStructure ¶
ValidateStructure performs basic structural validation.
type RecipeCriteria ¶
type RecipeCriteria struct {
// Kind is always "RecipeCriteria".
Kind string `json:"kind" yaml:"kind"`
// APIVersion is the API version (e.g., "aicr.nvidia.com/v1alpha1").
APIVersion string `json:"apiVersion" yaml:"apiVersion"`
// Metadata contains the name and other metadata.
Metadata struct {
// Name is the unique identifier for this criteria set.
Name string `json:"name" yaml:"name"`
} `json:"metadata" yaml:"metadata"`
// Spec contains the actual criteria specification.
Spec *Criteria `json:"spec" yaml:"spec"`
}
RecipeCriteria represents a Kubernetes-style criteria resource. This is the format used in criteria files and API requests.
Example:
kind: RecipeCriteria apiVersion: aicr.nvidia.com/v1alpha1 metadata: name: gb200-eks-ubuntu-training spec: service: eks os: ubuntu accelerator: gb200 intent: training
type RecipeInput ¶
type RecipeInput interface {
// GetComponentRef returns the component reference for a given component name.
// Returns nil if the component is not found.
GetComponentRef(name string) *ComponentRef
// GetValuesForComponent returns the values map for a given component.
// For Recipe, this extracts values from measurements.
// For RecipeResult, this loads values from the component's valuesFile.
GetValuesForComponent(name string) (map[string]any, error)
// GetVersion returns the recipe version (CLI version that generated the recipe).
// Returns empty string if version is not available.
GetVersion() string
// GetCriteria returns the criteria used to generate this recipe.
// Returns nil if criteria is not available (e.g., for legacy Recipe format).
GetCriteria() *Criteria
}
RecipeInput is an interface that both Recipe and RecipeResult implement. This allows bundlers to work with either format during the transition period.
type RecipeMetadata ¶
type RecipeMetadata struct {
RecipeMetadataHeader `json:",inline" yaml:",inline"`
// Spec contains the recipe specification.
Spec RecipeMetadataSpec `json:"spec" yaml:"spec"`
}
RecipeMetadata represents a recipe definition (base or overlay).
type RecipeMetadataHeader ¶
type RecipeMetadataHeader struct {
// Kind is always "RecipeMetadata".
Kind string `json:"kind" yaml:"kind"`
// APIVersion is the API version (e.g., "aicr.nvidia.com/v1alpha1").
APIVersion string `json:"apiVersion" yaml:"apiVersion"`
// Metadata contains the name and other metadata.
Metadata struct {
Name string `json:"name" yaml:"name"`
} `json:"metadata" yaml:"metadata"`
}
RecipeMetadataHeader contains the Kubernetes-style header fields.
type RecipeMetadataSpec ¶
type RecipeMetadataSpec struct {
// Base is the name of the parent recipe to inherit from.
// If empty, the recipe inherits from "base" (the root base.yaml).
// This enables multi-level inheritance chains like:
// base → eks → eks-training → h100-eks-training
Base string `json:"base,omitempty" yaml:"base,omitempty"`
// Criteria defines when this recipe/overlay applies.
// Only present in overlay files, not in base.
Criteria *Criteria `json:"criteria,omitempty" yaml:"criteria,omitempty"`
// Constraints are deployment assumptions/requirements.
Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`
// ComponentRefs is the list of components to deploy.
ComponentRefs []ComponentRef `json:"componentRefs,omitempty" yaml:"componentRefs,omitempty"`
// Validation defines multi-phase validation configuration.
// Presence of a phase implies it is enabled.
Validation *ValidationConfig `json:"validation,omitempty" yaml:"validation,omitempty"`
}
RecipeMetadataSpec contains the specification for a recipe.
func (*RecipeMetadataSpec) Merge ¶
func (s *RecipeMetadataSpec) Merge(other *RecipeMetadataSpec)
Merge merges another RecipeMetadataSpec into this one. The other spec takes precedence for conflicts.
func (*RecipeMetadataSpec) TopologicalSort ¶
func (s *RecipeMetadataSpec) TopologicalSort() ([]string, error)
TopologicalSort returns components in dependency order (dependencies first). Components with no dependencies come first, then components that depend only on already-listed components, etc.
func (*RecipeMetadataSpec) ValidateDependencies ¶
func (s *RecipeMetadataSpec) ValidateDependencies() error
ValidateDependencies validates that all dependencyRefs reference existing components. Returns an error if any dependency is missing or if there are circular dependencies.
type RecipeResult ¶
type RecipeResult struct {
// Kind is always "RecipeResult".
Kind string `json:"kind" yaml:"kind"`
// APIVersion is the API version.
APIVersion string `json:"apiVersion" yaml:"apiVersion"`
// Metadata contains result metadata.
Metadata struct {
// Version is the recipe version (CLI version that generated this recipe).
Version string `json:"version,omitempty" yaml:"version,omitempty"`
// AppliedOverlays lists the overlay names in order of application.
AppliedOverlays []string `json:"appliedOverlays,omitempty" yaml:"appliedOverlays,omitempty"`
// ExcludedOverlays lists overlays that matched criteria but were excluded
// due to failing constraint validation against the snapshot.
// Only populated when a snapshot is provided during recipe generation.
ExcludedOverlays []string `json:"excludedOverlays,omitempty" yaml:"excludedOverlays,omitempty"`
// ConstraintWarnings contains details about why specific overlays were excluded.
// Helps users understand why certain environment-specific configurations
// were not applied and what would need to change to include them.
ConstraintWarnings []ConstraintWarning `json:"constraintWarnings,omitempty" yaml:"constraintWarnings,omitempty"`
} `json:"metadata" yaml:"metadata"`
// Criteria is the input criteria used to generate this result.
Criteria *Criteria `json:"criteria" yaml:"criteria"`
// Constraints is the merged list of constraints.
Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`
// ComponentRefs is the merged list of components.
ComponentRefs []ComponentRef `json:"componentRefs" yaml:"componentRefs"`
// DeploymentOrder is the topologically sorted component names for deployment.
// Components should be deployed in this order to satisfy dependencies.
DeploymentOrder []string `json:"deploymentOrder" yaml:"deploymentOrder"`
// Validation defines multi-phase validation configuration.
// Inherited from recipe metadata during merging.
Validation *ValidationConfig `json:"validation,omitempty" yaml:"validation,omitempty"`
}
RecipeResult represents the final merged recipe output.
func (*RecipeResult) GetComponentRef ¶
func (r *RecipeResult) GetComponentRef(name string) *ComponentRef
GetComponentRef returns the component reference for a given component name.
func (*RecipeResult) GetCriteria ¶
func (r *RecipeResult) GetCriteria() *Criteria
GetCriteria returns the criteria used to generate this recipe result.
func (*RecipeResult) GetValuesForComponent ¶
func (r *RecipeResult) GetValuesForComponent(name string) (map[string]any, error)
GetValuesForComponent loads values from the component's valuesFile and inline overrides. Merge order: base values → ValuesFile → Overrides (highest precedence). This supports three patterns:
- ValuesFile only: Traditional separate file approach
- Overrides only: Fully self-contained recipe with inline overrides
- ValuesFile + Overrides: Hybrid - reusable base with recipe-specific tweaks
func (*RecipeResult) GetVersion ¶
func (r *RecipeResult) GetVersion() string
GetVersion returns the recipe version from metadata.
type RequestInfo ¶
type RequestInfo struct {
Os string `json:"os,omitempty" yaml:"os,omitempty"`
OsVersion string `json:"osVersion,omitempty" yaml:"osVersion,omitempty"`
Service string `json:"service,omitempty" yaml:"service,omitempty"`
K8s string `json:"k8s,omitempty" yaml:"k8s,omitempty"`
GPU string `json:"gpu,omitempty" yaml:"gpu,omitempty"`
Intent string `json:"intent,omitempty" yaml:"intent,omitempty"`
}
RequestInfo holds simplified request metadata for documentation purposes. This replaces the old Query type with just the fields needed for bundle documentation.
type SchedulingPaths ¶
type SchedulingPaths struct {
// NodeSelectorPaths are paths where node selectors are injected.
NodeSelectorPaths []string `yaml:"nodeSelectorPaths,omitempty"`
// TolerationPaths are paths where tolerations are injected.
TolerationPaths []string `yaml:"tolerationPaths,omitempty"`
// TaintPaths are paths where taints are injected as structured objects.
// Intended to be used instea of TaintStrPaths for components that need to set specific parts of taints
// and can't process the string format.
TaintPaths []string `yaml:"taintPaths,omitempty"`
// TaintStrPaths are paths where taints are injected as strings (format: key=value:effect or key:effect).
TaintStrPaths []string `yaml:"taintStrPaths,omitempty"`
}
SchedulingPaths holds the Helm value paths for node scheduling.
type ValidationConfig ¶
type ValidationConfig struct {
// Readiness defines readiness validation phase settings.
Readiness *ValidationPhase `json:"readiness,omitempty" yaml:"readiness,omitempty"`
// Deployment defines deployment validation phase settings.
Deployment *ValidationPhase `json:"deployment,omitempty" yaml:"deployment,omitempty"`
// Performance defines performance validation phase settings.
Performance *ValidationPhase `json:"performance,omitempty" yaml:"performance,omitempty"`
// Conformance defines conformance validation phase settings.
Conformance *ValidationPhase `json:"conformance,omitempty" yaml:"conformance,omitempty"`
}
ValidationConfig defines validation phases and settings.
type ValidationPhase ¶
type ValidationPhase struct {
// Timeout is the maximum duration for this phase (e.g., "10m").
Timeout string `json:"timeout,omitempty" yaml:"timeout,omitempty"`
// Constraints are phase-level constraints to evaluate.
Constraints []Constraint `json:"constraints,omitempty" yaml:"constraints,omitempty"`
// Checks are named validation checks to run in this phase.
Checks []string `json:"checks,omitempty" yaml:"checks,omitempty"`
// NodeSelection defines which nodes to include in validation.
NodeSelection *NodeSelection `json:"nodeSelection,omitempty" yaml:"nodeSelection,omitempty"`
// Infrastructure references a componentRef that provides validation infrastructure.
// Example: "nccl-doctor" for performance testing.
Infrastructure string `json:"infrastructure,omitempty" yaml:"infrastructure,omitempty"`
}
ValidationPhase represents a single validation phase configuration.
type WorkloadSchedulingPaths ¶
type WorkloadSchedulingPaths struct {
// WorkloadSelectorPaths are paths where workload selectors are injected.
WorkloadSelectorPaths []string `yaml:"workloadSelectorPaths,omitempty"`
}
WorkloadSchedulingPaths holds the Helm value paths for workload scheduling.