gpuallocator

package
v1.34.7 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 25, 2025 License: Apache-2.0 Imports: 21 Imported by: 0

Documentation

Overview

Package gpuallocator handles GPU allocation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func IsQuotaError added in v1.34.6

func IsQuotaError(err error) bool

IsQuotaError checks if the error is a quota-related error

func RefreshGPUNodeCapacity added in v1.33.4

func RefreshGPUNodeCapacity(ctx context.Context, k8sClient client.Client, node *tfv1.GPUNode, pool *tfv1.GPUPool) ([]string, error)

Types

type AllocRequest added in v1.34.0

type AllocRequest struct {
	// Name of the GPU pool to allocate from
	PoolName string
	// Namespace information for the workload
	WorkloadNameNamespace tfv1.NameNamespace
	// Resource requirements for the allocation
	Request tfv1.Resource
	// Number of GPUs to allocate
	Count uint
	// Specific GPU model to allocate, empty string means any model
	GPUModel string
	// Node affinity requirements
	NodeAffinity *v1.NodeAffinity
}

AllocRequest encapsulates all parameters needed for GPU allocation

type CompactFirst

type CompactFirst struct{}

CompactFirst selects GPU with minimum available resources (most utilized) to efficiently pack workloads and maximize GPU utilization

func (CompactFirst) SelectGPUs

func (c CompactFirst) SelectGPUs(gpus []tfv1.GPU, count uint) ([]*tfv1.GPU, error)

SelectGPUs selects multiple GPUs from the same node with the least available resources (most packed)

type GpuAllocator

type GpuAllocator struct {
	client.Client
	// contains filtered or unexported fields
}

func NewGpuAllocator

func NewGpuAllocator(ctx context.Context, client client.Client, syncInterval time.Duration) *GpuAllocator

func (*GpuAllocator) Alloc

func (s *GpuAllocator) Alloc(ctx context.Context, req AllocRequest) ([]*tfv1.GPU, error)

Alloc allocates a request to a gpu or multiple gpus from the same node.

func (*GpuAllocator) Dealloc

func (s *GpuAllocator) Dealloc(ctx context.Context, workloadNameNamespace tfv1.NameNamespace, request tfv1.Resource, gpus []types.NamespacedName)

Dealloc a request from gpu to release available resources on it.

func (*GpuAllocator) SetupWithManager

func (s *GpuAllocator) SetupWithManager(ctx context.Context, mgr manager.Manager) (<-chan struct{}, error)

SetupWithManager sets up the GpuAllocator with the Manager.

func (*GpuAllocator) Stop

func (s *GpuAllocator) Stop()

Stop stops all background goroutines

type LowLoadFirst

type LowLoadFirst struct{}

LowLoadFirst selects GPU with maximum available resources (least utilized) to distribute workloads more evenly across GPUs

func (LowLoadFirst) SelectGPUs

func (l LowLoadFirst) SelectGPUs(gpus []tfv1.GPU, count uint) ([]*tfv1.GPU, error)

SelectGPUs selects multiple GPUs from the same node with the most available resources (least loaded)

type QuotaExceededError added in v1.34.6

type QuotaExceededError struct {
	Namespace string
	Resource  string
	Requested resource.Quantity
	Available resource.Quantity
	Limit     resource.Quantity
}

QuotaExceededError represents a quota exceeded error with detailed information

func (*QuotaExceededError) Error added in v1.34.6

func (e *QuotaExceededError) Error() string

type QuotaStore added in v1.34.6

type QuotaStore struct {
	client.Client
	// contains filtered or unexported fields
}

QuotaStore manages GPU resource quotas in memory for atomic operations

func NewQuotaStore added in v1.34.6

func NewQuotaStore(client client.Client) *QuotaStore

NewQuotaStore creates a new quota store

func (*QuotaStore) AllocateQuota added in v1.34.6

func (qs *QuotaStore) AllocateQuota(namespace string, req AllocRequest)

AllocateQuota atomically allocates quota resources This function is called under GPU allocator's storeMutex

func (*QuotaStore) DeallocateQuota added in v1.34.6

func (qs *QuotaStore) DeallocateQuota(namespace string, request tfv1.Resource, replicas int32)

DeallocateQuota atomically deallocates quota resources This function is called under GPU allocator's storeMutex

func (*QuotaStore) GetQuotaStatus added in v1.34.6

func (qs *QuotaStore) GetQuotaStatus(namespace string) (*tfv1.GPUResourceUsage, *tfv1.GPUResourceUsage, bool)

GetQuotaStatus returns current quota status for a namespace

type QuotaStoreEntry added in v1.34.6

type QuotaStoreEntry struct {
	// contains filtered or unexported fields
}

QuotaStoreEntry represents quota information in memory

type Strategy

type Strategy interface {
	SelectGPUs(gpus []tfv1.GPU, count uint) ([]*tfv1.GPU, error)
}

func NewStrategy

func NewStrategy(placementMode tfv1.PlacementMode) Strategy

NewStrategy creates a strategy based on the placement mode

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL