v1alpha1

package

v0.1.4 Latest Latest Go to latest Published: Jun 10, 2025 License: Apache-2.0 Imports: 5 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/inftyai/llmaz

Links

Open Source Insights

Documentation ¶

Overview ¶

Package v1alpha1 contains API Schema definitions for the v1alpha1 API group +kubebuilder:object:generate=true +groupName=llmaz.io

Index ¶

Constants
Variables
func Resource(resource string) schema.GroupResource
type Flavor
- func (in *Flavor) DeepCopy() *Flavor
- func (in *Flavor) DeepCopyInto(out *Flavor)
type FlavorName
type InferenceConfig
- func (in *InferenceConfig) DeepCopy() *InferenceConfig
- func (in *InferenceConfig) DeepCopyInto(out *InferenceConfig)
type ModelClaim
- func (in *ModelClaim) DeepCopy() *ModelClaim
- func (in *ModelClaim) DeepCopyInto(out *ModelClaim)
type ModelClaims
- func (in *ModelClaims) DeepCopy() *ModelClaims
- func (in *ModelClaims) DeepCopyInto(out *ModelClaims)
type ModelHub
- func (in *ModelHub) DeepCopy() *ModelHub
- func (in *ModelHub) DeepCopyInto(out *ModelHub)
type ModelName
type ModelRef
- func (in *ModelRef) DeepCopy() *ModelRef
- func (in *ModelRef) DeepCopyInto(out *ModelRef)
type ModelRole
type ModelSource
- func (in *ModelSource) DeepCopy() *ModelSource
- func (in *ModelSource) DeepCopyInto(out *ModelSource)
type ModelSpec
- func (in *ModelSpec) DeepCopy() *ModelSpec
- func (in *ModelSpec) DeepCopyInto(out *ModelSpec)
type ModelStatus
- func (in *ModelStatus) DeepCopy() *ModelStatus
- func (in *ModelStatus) DeepCopyInto(out *ModelStatus)
type OpenModel
- func (in *OpenModel) DeepCopy() *OpenModel
- func (in *OpenModel) DeepCopyInto(out *OpenModel)
- func (in *OpenModel) DeepCopyObject() runtime.Object
type OpenModelList
- func (in *OpenModelList) DeepCopy() *OpenModelList
- func (in *OpenModelList) DeepCopyInto(out *OpenModelList)
- func (in *OpenModelList) DeepCopyObject() runtime.Object
type URIProtocol

Constants ¶

View Source

const (
	ModelFamilyNameLabelKey = "llmaz.io/model-family-name"
	ModelNameLabelKey       = "llmaz.io/model-name"
	// Annotation with value = "true" represents we'll preload the model,
	// by default via Manta(https://github.com/InftyAI/Manta), make sure
	// Manta is installed in prior.
	// Note: right now, we only support preloading models from Huggingface,
	// in the future, more hubs and objstores will also be supported.
	//
	// We set this as an annotation rather than a field is just because preheating
	// models is not a common sense and Manta is not a mature solution right now.
	// Once either of them qualified, we'll expose this as a field in Model.
	ModelPreheatAnnoKey = "llmaz.io/model-preheat"

	HUGGING_FACE = "Huggingface"
	MODEL_SCOPE  = "ModelScope"
)

View Source

const (
	// ModelPending means model is waiting for model downloading.
	ModelPending = "Pending"
	// ModelReady means model is already downloaded.
	ModelReady = "Ready"
)

Variables ¶

View Source

var (
	// GroupVersion is group version used to register these objects
	GroupVersion = schema.GroupVersion{Group: "llmaz.io", Version: "v1alpha1"}

	// SchemeGroupVersion is alias to GroupVersion for client-go libraries.
	// It is required by pkg/client/informers/externalversions/...
	SchemeGroupVersion = GroupVersion

	// SchemeBuilder is used to add go types to the GroupVersionKind scheme
	SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion}

	// AddToScheme adds the types in this group-version to the given scheme.
	AddToScheme = SchemeBuilder.AddToScheme
)

Functions ¶

func Resource ¶

func Resource(resource string) schema.GroupResource

Resource is required by pkg/client/listers/...

Types ¶

type Flavor ¶

type Flavor struct {
	// Name represents the flavor name, which will be used in model claim.
	Name FlavorName `json:"name"`
	// Limits defines the required accelerators to serve the model for each replica,
	// like <nvidia.com/gpu: 8>. For multi-hosts cases, the limits here indicates
	// the resource requirements for each replica, usually equals to the TP size.
	// Not recommended to set the cpu and memory usage here:
	// - if using playground, you can define the cpu/mem usage at backendConfig.
	// - if using inference service, you can define the cpu/mem at the container resources.
	// However, if you define the same accelerator resources at playground/service as well,
	// the resources will be overwritten by the flavor limit here.
	// +optional
	Limits v1.ResourceList `json:"limits,omitempty"`
	// NodeSelector represents the node candidates for Pod placements, if a node doesn't
	// meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin.
	// If nodeSelector is empty, it means every node is a candidate.
	// +optional
	NodeSelector map[string]string `json:"nodeSelector,omitempty"`
	// Params stores other useful parameters and will be consumed by cluster-autoscaler / Karpenter
	// for autoscaling or be defined as model parallelism parameters like TP or PP size.
	// E.g. with autoscaling, when scaling up nodes with 8x Nvidia A00, the parameter can be injected
	// with <INSTANCE-TYPE: p4d.24xlarge> for AWS.
	// Preset parameters: TP, PP, INSTANCE-TYPE.
	// +optional
	Params map[string]string `json:"params,omitempty"`
}

Flavor defines the accelerator requirements for a model and the necessary parameters in autoscaling. Right now, it will be used in two places: - Pod scheduling with node selectors specified. - Cluster autoscaling with essential parameters provided.

func (*Flavor) DeepCopy ¶

func (in *Flavor) DeepCopy() *Flavor

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new Flavor.

func (*Flavor) DeepCopyInto ¶

func (in *Flavor) DeepCopyInto(out *Flavor)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type FlavorName ¶

type FlavorName string

type InferenceConfig ¶ added in v0.1.0

type InferenceConfig struct {
	// Flavors represents the accelerator requirements to serve the model.
	// Flavors are fungible following the priority represented by the slice order.
	// +kubebuilder:validation:MaxItems=8
	// +optional
	Flavors []Flavor `json:"flavors,omitempty"`
}

InferenceConfig represents the inference configurations for the model.

func (*InferenceConfig) DeepCopy ¶ added in v0.1.0

func (in *InferenceConfig) DeepCopy() *InferenceConfig

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new InferenceConfig.

func (*InferenceConfig) DeepCopyInto ¶ added in v0.1.0

func (in *InferenceConfig) DeepCopyInto(out *InferenceConfig)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelClaim ¶

type ModelClaim struct {
	// ModelName represents the name of the Model.
	ModelName ModelName `json:"modelName,omitempty"`
	// InferenceFlavors represents a list of flavors with fungibility support
	// to serve the model.
	// If set, The flavor names should be a subset of the model configured flavors.
	// If not set, Model configured flavors will be used by default.
	// +optional
	InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"`
}

ModelClaim represents claiming for one model.

func (*ModelClaim) DeepCopy ¶

func (in *ModelClaim) DeepCopy() *ModelClaim

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaim.

func (*ModelClaim) DeepCopyInto ¶

func (in *ModelClaim) DeepCopyInto(out *ModelClaim)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelClaims ¶ added in v0.0.6

type ModelClaims struct {
	// Models represents a list of models with roles specified, there maybe
	// multiple models here to support state-of-the-art technologies like
	// speculative decoding, then one model is main(target) model, another one
	// is draft model.
	// +kubebuilder:validation:MinItems=1
	Models []ModelRef `json:"models,omitempty"`
	// InferenceFlavors represents a list of flavor names with fungibility supported
	// to serve the model.
	// - If not set, will employ the model configured flavors by default.
	// - If set, will lookup the flavor names following the model orders.
	// +optional
	InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"`
}

ModelClaims represents multiple claims for different models.

func (*ModelClaims) DeepCopy ¶ added in v0.0.6

func (in *ModelClaims) DeepCopy() *ModelClaims

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaims.

func (*ModelClaims) DeepCopyInto ¶ added in v0.0.6

func (in *ModelClaims) DeepCopyInto(out *ModelClaims)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelHub ¶

type ModelHub struct {
	// Name refers to the model registry, such as huggingface.
	// +kubebuilder:default=Huggingface
	// +kubebuilder:validation:Enum={Huggingface,ModelScope}
	// +optional
	Name *string `json:"name,omitempty"`
	// ModelID refers to the model identifier on model hub,
	// such as meta-llama/Meta-Llama-3-8B.
	ModelID string `json:"modelID,omitempty"`
	// Filename refers to a specified model file rather than the whole repo.
	// This is helpful to download a specified GGUF model rather than downloading
	// the whole repo which includes all kinds of quantized models.
	// TODO: this is only supported with Huggingface, add support for ModelScope
	// in the near future.
	// Note: once filename is set, allowPatterns and ignorePatterns should be left unset.
	Filename *string `json:"filename,omitempty"`
	// Revision refers to a Git revision id which can be a branch name, a tag, or a commit hash.
	// +kubebuilder:default=main
	// +optional
	Revision *string `json:"revision,omitempty"`
	// AllowPatterns refers to files matched with at least one pattern will be downloaded.
	// +optional
	AllowPatterns []string `json:"allowPatterns,omitempty"`
	// IgnorePatterns refers to files matched with any of the patterns will not be downloaded.
	// +optional
	IgnorePatterns []string `json:"ignorePatterns,omitempty"`
}

ModelHub represents the model registry for model downloads.

func (*ModelHub) DeepCopy ¶

func (in *ModelHub) DeepCopy() *ModelHub

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelHub.

func (*ModelHub) DeepCopyInto ¶

func (in *ModelHub) DeepCopyInto(out *ModelHub)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelName ¶

type ModelName string

type ModelRef ¶ added in v0.1.0

type ModelRef struct {
	// Name represents the model name.
	Name ModelName `json:"name"`
	// Role represents the model role once more than one model is required.
	// Such as a draft role, which means running with SpeculativeDecoding,
	// and default arguments for backend will be searched in backendRuntime
	// with the name of speculative-decoding.
	// +kubebuilder:validation:Enum={main,draft}
	// +kubebuilder:default=main
	// +optional
	Role *ModelRole `json:"role,omitempty"`
}

ModelRef refers to a created Model with it's role.

func (*ModelRef) DeepCopy ¶ added in v0.1.0

func (in *ModelRef) DeepCopy() *ModelRef

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelRef.

func (*ModelRef) DeepCopyInto ¶ added in v0.1.0

func (in *ModelRef) DeepCopyInto(out *ModelRef)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelRole ¶ added in v0.0.6

type ModelRole string

const (
	// MainRole represents the main model, if only one model is required,
	// it must be the main model. Only one main model is allowed.
	MainRole ModelRole = "main"
	// DraftRole represents the draft model in speculative decoding,
	// the main model is the target model then.
	DraftRole ModelRole = "draft"
	// LoraRole represents the lora model.
	LoraRole ModelRole = "lora"
)

type ModelSource ¶

type ModelSource struct {
	// ModelHub represents the model registry for model downloads.
	// +optional
	ModelHub *ModelHub `json:"modelHub,omitempty"`
	// URI represents a various kinds of model sources following the uri protocol, protocol://<address>, e.g.
	// - oss://<bucket>.<endpoint>/<path-to-your-model>
	// - ollama://llama3.3
	// - host://<path-to-your-model>
	//
	// +optional
	URI *URIProtocol `json:"uri,omitempty"`
}

ModelSource represents the source of the model. Only one model source will be used.

func (*ModelSource) DeepCopy ¶

func (in *ModelSource) DeepCopy() *ModelSource

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSource.

func (*ModelSource) DeepCopyInto ¶

func (in *ModelSource) DeepCopyInto(out *ModelSource)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelSpec ¶

type ModelSpec struct {
	// FamilyName represents the model type, like llama2, which will be auto injected
	// to the labels with the key of `llmaz.io/model-family-name`.
	FamilyName ModelName `json:"familyName"`
	// Source represents the source of the model, there're several ways to load
	// the model such as loading from huggingface, OCI registry, s3, host path and so on.
	Source ModelSource `json:"source"`
	// InferenceConfig represents the inference configurations for the model.
	InferenceConfig *InferenceConfig `json:"inferenceConfig,omitempty"`
}

ModelSpec defines the desired state of Model

func (*ModelSpec) DeepCopy ¶

func (in *ModelSpec) DeepCopy() *ModelSpec

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSpec.

func (*ModelSpec) DeepCopyInto ¶

func (in *ModelSpec) DeepCopyInto(out *ModelSpec)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type ModelStatus ¶

type ModelStatus struct {
	// Conditions represents the Inference condition.
	Conditions []metav1.Condition `json:"conditions,omitempty"`
}

ModelStatus defines the observed state of Model

func (*ModelStatus) DeepCopy ¶

func (in *ModelStatus) DeepCopy() *ModelStatus

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelStatus.

func (*ModelStatus) DeepCopyInto ¶

func (in *ModelStatus) DeepCopyInto(out *ModelStatus)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

type OpenModel ¶

type OpenModel struct {
	metav1.TypeMeta   `json:",inline"`
	metav1.ObjectMeta `json:"metadata,omitempty"`

	Spec   ModelSpec   `json:"spec,omitempty"`
	Status ModelStatus `json:"status,omitempty"`
}

OpenModel is the Schema for the open models API

func (*OpenModel) DeepCopy ¶

func (in *OpenModel) DeepCopy() *OpenModel

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModel.

func (*OpenModel) DeepCopyInto ¶

func (in *OpenModel) DeepCopyInto(out *OpenModel)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

func (*OpenModel) DeepCopyObject ¶

func (in *OpenModel) DeepCopyObject() runtime.Object

DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.

type OpenModelList ¶

type OpenModelList struct {
	metav1.TypeMeta `json:",inline"`
	metav1.ListMeta `json:"metadata,omitempty"`
	Items           []OpenModel `json:"items"`
}

OpenModelList contains a list of OpenModel

func (*OpenModelList) DeepCopy ¶

func (in *OpenModelList) DeepCopy() *OpenModelList

DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModelList.

func (*OpenModelList) DeepCopyInto ¶

func (in *OpenModelList) DeepCopyInto(out *OpenModelList)

DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.

func (*OpenModelList) DeepCopyObject ¶

func (in *OpenModelList) DeepCopyObject() runtime.Object

DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.

type URIProtocol ¶

type URIProtocol string

URIProtocol represents the protocol of the URI.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL