Documentation
¶
Index ¶
- Constants
- Variables
- func ExtractVersionFromVMSize(vmsize *skewer.VMSizeType) string
- func GetAKSGPUImageSHA(size string) string
- func GetGPUDriverType(size string) string
- func GetGPUDriverVersion(size string) string
- func GetMaxPods(nodeClass *v1beta1.AKSNodeClass, networkPlugin, networkPluginMode string) int32
- func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
- func GetZone(vm *armcompute.VirtualMachine) (string, error)
- func HasChanged(existing, new any, options *hashstructure.HashOptions) bool
- func ImageReferenceToString(imageRef *armcompute.ImageReference) string
- func IsAKSManagedVNET(nodeResourceGroup string, subnetID string) (bool, error)
- func IsMarinerEnabledGPUSKU(vmSize string) bool
- func IsNvidiaEnabledSKU(vmSize string) bool
- func IsVMDeleting(vm armcompute.VirtualMachine) bool
- func MakeVMZone(zone string) []*string
- func MakeZone(location string, zoneID string) string
- func NewTerminatingResourceError(gr schema.GroupResource, name string) *errors.StatusError
- func PrettySlice[T any](s []T, maxItems int) string
- func ResourceIDToProviderID(ctx context.Context, id string) string
- func StringMap(list v1.ResourceList) map[string]string
- func UseGridDrivers(size string) bool
- func WithDefaultFloat64(key string, def float64) float64
- type NvidiaSKUConfig
- type VnetSubnetResource
Constants ¶
const ( Nvidia470CudaDriverVersion = "470.82.01" // https://github.com/Azure/AgentBaker/blob/c0e684e5cecebcf61554cc7d2e2d2191972d35ed/parts/common/components.json#L797-L811 NvidiaCudaDriverVersion = "550.144.03" AKSGPUCudaVersionSuffix = "20250328201547" NvidiaGridDriverVersion = "550.144.06" AKSGPUGridVersionSuffix = "20250512225043" )
TODO: Get these from agentbaker
Variables ¶
var ConvergedGPUDriverSizes = map[string]bool{ "standard_nv6ads_a10_v5": true, "standard_nv12ads_a10_v5": true, "standard_nv18ads_a10_v5": true, "standard_nv36ads_a10_v5": true, "standard_nv72ads_a10_v5": true, "standard_nv36adms_a10_v5": true, "standard_nc8ads_a10_v4": true, "standard_nc16ads_a10_v4": true, "standard_nc32ads_a10_v4": true, }
ConvergedGPUDriverSizes : these sizes use a "converged" driver to support both cuda/grid workloads.
how do you figure this out? ask HPC or find out by trial and error. installing vanilla cuda drivers will fail to install with opaque errors. see https://github.com/Azure/azhpc-extensions/blob/daaefd78df6f27012caf30f3b54c3bd6dc437652/NvidiaGPU/resources.json
Functions ¶
func ExtractVersionFromVMSize ¶ added in v1.6.1
func ExtractVersionFromVMSize(vmsize *skewer.VMSizeType) string
extractVersionFromVMSize extracts and normalizes the version from VMSizeType, dropping "v" prefix and backfilling "1"
func GetAKSGPUImageSHA ¶
func GetGPUDriverType ¶ added in v0.5.5
GetGPUDriverType returns the type of GPU driver for given VM SKU ("grid" or "cuda")
func GetGPUDriverVersion ¶
NV series GPUs target graphics workloads vs NC which targets compute. they typically use GRID, not CUDA drivers, and will fail to install CUDA drivers. NVv1 seems to run with CUDA, NVv5 requires GRID. NVv3 is untested on AKS, NVv4 is AMD so n/a, and NVv2 no longer seems to exist (?).
func GetMaxPods ¶ added in v0.7.5
func GetMaxPods(nodeClass *v1beta1.AKSNodeClass, networkPlugin, networkPluginMode string) int32
GetMaxPods resolves what we should set max pods to for a given nodeclass. If not specified, defaults based on network-plugin. 30 for "azure", 110 for "kubenet", or 250 for "none" and network plugin mode overlay.
func GetSubnetResourceID ¶ added in v0.4.0
func GetSubnetResourceID(subscriptionID, resourceGroupName, virtualNetworkName, subnetName string) string
GetSubnetResourceID constructs the subnet resource id
func GetZone ¶ added in v0.6.1
func GetZone(vm *armcompute.VirtualMachine) (string, error)
GetZone returns the zone for the given virtual machine, or an empty string if there is no zone specified
func HasChanged ¶ added in v1.5.0
func HasChanged(existing, new any, options *hashstructure.HashOptions) bool
HasChanged returns if the given value has changed, given the existing and new instance
This option is accessible in place of using a ChangeMonitor, when there's access to both the existing and new data.
func ImageReferenceToString ¶ added in v0.7.0
func ImageReferenceToString(imageRef *armcompute.ImageReference) string
func IsAKSManagedVNET ¶ added in v1.5.0
IsAKSManagedVNET determines if the vnet managed or not. Note: You can "trick" this function if you really try by (for example) createding a VNET that looks like an AKS managed VNET, with the same resource group as the MC RG, in a different subscription, or by creating your own VNET in the MC RG whose name matches the AKS pattern but the VNET is actually yours rather than ours.
func IsMarinerEnabledGPUSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
func IsNvidiaEnabledSKU ¶
IsNvidiaEnabledSKU determines if an VM SKU has nvidia driver support
func IsVMDeleting ¶ added in v0.7.0
func IsVMDeleting(vm armcompute.VirtualMachine) bool
func MakeVMZone ¶ added in v0.6.1
VM Zones field expects just the zone number, without region
func NewTerminatingResourceError ¶ added in v1.6.2
func NewTerminatingResourceError(gr schema.GroupResource, name string) *errors.StatusError
NewTerminatingResourceError returns a NotFound error indicating that the resource is terminating. This is useful for resources where termination should be treated as not found.
func PrettySlice ¶ added in v0.7.0
PrettySlice truncates a slice after a certain number of max items to ensure that the Slice isn't too long
func StringMap ¶ added in v0.7.0
func StringMap(list v1.ResourceList) map[string]string
StringMap returns the string map representation of the resource list
func UseGridDrivers ¶ added in v0.5.5
func WithDefaultFloat64 ¶ added in v0.7.0
WithDefaultFloat64 returns the float64 value of the supplied environment variable or, if not present, the supplied default value. If the float64 conversion fails, returns the default
Types ¶
type NvidiaSKUConfig ¶ added in v0.5.5
type VnetSubnetResource ¶ added in v1.6.2
type VnetSubnetResource struct {
SubscriptionID string
ResourceGroupName string
VNetName string
SubnetName string
}
this parsing function replaces three different functions in different packages that all had bugs. Please don't use a regex to parse these
func GetVnetSubnetIDComponents ¶ added in v0.4.0
func GetVnetSubnetIDComponents(vnetSubnetID string) (VnetSubnetResource, error)
GetVnetSubnetIDComponents parses an Azure subnet resource ID into its component parts. Input: A fully qualified Azure subnet resource ID in the format:
/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Network/virtualNetworks/{virtualNetworkName}/subnets/{subnetName}
The input is case-insensitive and must contain exactly 11 slash-separated segments. Output: A vnetSubnetResource struct containing:
- SubscriptionID: The Azure subscription ID
- ResourceGroupName: The resource group name
- VNetName: The virtual network name
- SubnetName: The subnet name
Returns an error if the input format is invalid or doesn't match the expected structure.
func (VnetSubnetResource) IsSameVNET ¶ added in v1.6.2
func (v VnetSubnetResource) IsSameVNET(cmp VnetSubnetResource) bool