Documentation
¶
Overview ¶
Package v1alpha1 contains API Schema definitions for the v1alpha1 API group +kubebuilder:object:generate=true +groupName=llmaz.io
Index ¶
- Constants
- Variables
- func Resource(resource string) schema.GroupResource
- type Flavor
- type FlavorName
- type InferenceConfig
- type ModelClaim
- type ModelClaims
- type ModelHub
- type ModelName
- type ModelRef
- type ModelRole
- type ModelSource
- type ModelSpec
- type ModelStatus
- type OpenModel
- type OpenModelList
- type URIProtocol
Constants ¶
const ( ModelFamilyNameLabelKey = "llmaz.io/model-family-name" ModelNameLabelKey = "llmaz.io/model-name" // Annotation with value = "true" represents we'll preload the model, // by default via Manta(https://github.com/InftyAI/Manta), make sure // Manta is installed in prior. // Note: right now, we only support preloading models from Huggingface, // in the future, more hubs and objstores will also be supported. // // We set this as an annotation rather than a field is just because preheating // models is not a common sense and Manta is not a mature solution right now. // Once either of them qualified, we'll expose this as a field in Model. ModelPreheatAnnoKey = "llmaz.io/model-preheat" HUGGING_FACE = "Huggingface" MODEL_SCOPE = "ModelScope" )
const ( // ModelPending means model is waiting for model downloading. ModelPending = "Pending" // ModelReady means model is already downloaded. ModelReady = "Ready" )
Variables ¶
var ( // GroupVersion is group version used to register these objects GroupVersion = schema.GroupVersion{Group: "llmaz.io", Version: "v1alpha1"} // SchemeGroupVersion is alias to GroupVersion for client-go libraries. // It is required by pkg/client/informers/externalversions/... SchemeGroupVersion = GroupVersion // SchemeBuilder is used to add go types to the GroupVersionKind scheme SchemeBuilder = &scheme.Builder{GroupVersion: GroupVersion} // AddToScheme adds the types in this group-version to the given scheme. AddToScheme = SchemeBuilder.AddToScheme )
Functions ¶
func Resource ¶
func Resource(resource string) schema.GroupResource
Resource is required by pkg/client/listers/...
Types ¶
type Flavor ¶
type Flavor struct {
// Name represents the flavor name, which will be used in model claim.
Name FlavorName `json:"name"`
// Limits defines the required accelerators to serve the model for each replica,
// like <nvidia.com/gpu: 8>. For multi-hosts cases, the limits here indicates
// the resource requirements for each replica, usually equals to the TP size.
// Not recommended to set the cpu and memory usage here:
// - if using playground, you can define the cpu/mem usage at backendConfig.
// - if using inference service, you can define the cpu/mem at the container resources.
// However, if you define the same accelerator resources at playground/service as well,
// the resources will be overwritten by the flavor limit here.
// +optional
Limits v1.ResourceList `json:"limits,omitempty"`
// NodeSelector represents the node candidates for Pod placements, if a node doesn't
// meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin.
// If nodeSelector is empty, it means every node is a candidate.
// +optional
NodeSelector map[string]string `json:"nodeSelector,omitempty"`
// Params stores other useful parameters and will be consumed by cluster-autoscaler / Karpenter
// for autoscaling or be defined as model parallelism parameters like TP or PP size.
// E.g. with autoscaling, when scaling up nodes with 8x Nvidia A00, the parameter can be injected
// with <INSTANCE-TYPE: p4d.24xlarge> for AWS.
// Preset parameters: TP, PP, INSTANCE-TYPE.
// +optional
Params map[string]string `json:"params,omitempty"`
}
Flavor defines the accelerator requirements for a model and the necessary parameters in autoscaling. Right now, it will be used in two places: - Pod scheduling with node selectors specified. - Cluster autoscaling with essential parameters provided.
func (*Flavor) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new Flavor.
func (*Flavor) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type FlavorName ¶
type FlavorName string
type InferenceConfig ¶ added in v0.1.0
type InferenceConfig struct {
// Flavors represents the accelerator requirements to serve the model.
// Flavors are fungible following the priority represented by the slice order.
// +kubebuilder:validation:MaxItems=8
// +optional
Flavors []Flavor `json:"flavors,omitempty"`
}
InferenceConfig represents the inference configurations for the model.
func (*InferenceConfig) DeepCopy ¶ added in v0.1.0
func (in *InferenceConfig) DeepCopy() *InferenceConfig
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new InferenceConfig.
func (*InferenceConfig) DeepCopyInto ¶ added in v0.1.0
func (in *InferenceConfig) DeepCopyInto(out *InferenceConfig)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelClaim ¶
type ModelClaim struct {
// ModelName represents the name of the Model.
ModelName ModelName `json:"modelName,omitempty"`
// InferenceFlavors represents a list of flavors with fungibility support
// to serve the model.
// If set, The flavor names should be a subset of the model configured flavors.
// If not set, Model configured flavors will be used by default.
// +optional
InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"`
}
ModelClaim represents claiming for one model.
func (*ModelClaim) DeepCopy ¶
func (in *ModelClaim) DeepCopy() *ModelClaim
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaim.
func (*ModelClaim) DeepCopyInto ¶
func (in *ModelClaim) DeepCopyInto(out *ModelClaim)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelClaims ¶ added in v0.0.6
type ModelClaims struct {
// Models represents a list of models with roles specified, there maybe
// multiple models here to support state-of-the-art technologies like
// speculative decoding, then one model is main(target) model, another one
// is draft model.
// +kubebuilder:validation:MinItems=1
Models []ModelRef `json:"models,omitempty"`
// InferenceFlavors represents a list of flavor names with fungibility supported
// to serve the model.
// - If not set, will employ the model configured flavors by default.
// - If set, will lookup the flavor names following the model orders.
// +optional
InferenceFlavors []FlavorName `json:"inferenceFlavors,omitempty"`
}
ModelClaims represents multiple claims for different models.
func (*ModelClaims) DeepCopy ¶ added in v0.0.6
func (in *ModelClaims) DeepCopy() *ModelClaims
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelClaims.
func (*ModelClaims) DeepCopyInto ¶ added in v0.0.6
func (in *ModelClaims) DeepCopyInto(out *ModelClaims)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelHub ¶
type ModelHub struct {
// Name refers to the model registry, such as huggingface.
// +kubebuilder:default=Huggingface
// +kubebuilder:validation:Enum={Huggingface,ModelScope}
// +optional
Name *string `json:"name,omitempty"`
// ModelID refers to the model identifier on model hub,
// such as meta-llama/Meta-Llama-3-8B.
ModelID string `json:"modelID,omitempty"`
// Filename refers to a specified model file rather than the whole repo.
// This is helpful to download a specified GGUF model rather than downloading
// the whole repo which includes all kinds of quantized models.
// TODO: this is only supported with Huggingface, add support for ModelScope
// in the near future.
// Note: once filename is set, allowPatterns and ignorePatterns should be left unset.
Filename *string `json:"filename,omitempty"`
// Revision refers to a Git revision id which can be a branch name, a tag, or a commit hash.
// +kubebuilder:default=main
// +optional
Revision *string `json:"revision,omitempty"`
// AllowPatterns refers to files matched with at least one pattern will be downloaded.
// +optional
AllowPatterns []string `json:"allowPatterns,omitempty"`
// IgnorePatterns refers to files matched with any of the patterns will not be downloaded.
// +optional
IgnorePatterns []string `json:"ignorePatterns,omitempty"`
}
ModelHub represents the model registry for model downloads.
func (*ModelHub) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelHub.
func (*ModelHub) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelRef ¶ added in v0.1.0
type ModelRef struct {
// Name represents the model name.
Name ModelName `json:"name"`
// Role represents the model role once more than one model is required.
// Such as a draft role, which means running with SpeculativeDecoding,
// and default arguments for backend will be searched in backendRuntime
// with the name of speculative-decoding.
// +kubebuilder:validation:Enum={main,draft}
// +kubebuilder:default=main
// +optional
Role *ModelRole `json:"role,omitempty"`
}
ModelRef refers to a created Model with it's role.
func (*ModelRef) DeepCopy ¶ added in v0.1.0
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelRef.
func (*ModelRef) DeepCopyInto ¶ added in v0.1.0
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelRole ¶ added in v0.0.6
type ModelRole string
const ( // MainRole represents the main model, if only one model is required, // it must be the main model. Only one main model is allowed. MainRole ModelRole = "main" // DraftRole represents the draft model in speculative decoding, // the main model is the target model then. DraftRole ModelRole = "draft" // LoraRole represents the lora model. LoraRole ModelRole = "lora" )
type ModelSource ¶
type ModelSource struct {
// ModelHub represents the model registry for model downloads.
// +optional
ModelHub *ModelHub `json:"modelHub,omitempty"`
// URI represents a various kinds of model sources following the uri protocol, protocol://<address>, e.g.
// - oss://<bucket>.<endpoint>/<path-to-your-model>
// - ollama://llama3.3
// - host://<path-to-your-model>
//
// +optional
URI *URIProtocol `json:"uri,omitempty"`
}
ModelSource represents the source of the model. Only one model source will be used.
func (*ModelSource) DeepCopy ¶
func (in *ModelSource) DeepCopy() *ModelSource
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSource.
func (*ModelSource) DeepCopyInto ¶
func (in *ModelSource) DeepCopyInto(out *ModelSource)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelSpec ¶
type ModelSpec struct {
// FamilyName represents the model type, like llama2, which will be auto injected
// to the labels with the key of `llmaz.io/model-family-name`.
FamilyName ModelName `json:"familyName"`
// Source represents the source of the model, there're several ways to load
// the model such as loading from huggingface, OCI registry, s3, host path and so on.
Source ModelSource `json:"source"`
// InferenceConfig represents the inference configurations for the model.
InferenceConfig *InferenceConfig `json:"inferenceConfig,omitempty"`
}
ModelSpec defines the desired state of Model
func (*ModelSpec) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelSpec.
func (*ModelSpec) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type ModelStatus ¶
type ModelStatus struct {
// Conditions represents the Inference condition.
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
ModelStatus defines the observed state of Model
func (*ModelStatus) DeepCopy ¶
func (in *ModelStatus) DeepCopy() *ModelStatus
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ModelStatus.
func (*ModelStatus) DeepCopyInto ¶
func (in *ModelStatus) DeepCopyInto(out *ModelStatus)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
type OpenModel ¶
type OpenModel struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ModelSpec `json:"spec,omitempty"`
Status ModelStatus `json:"status,omitempty"`
}
OpenModel is the Schema for the open models API
func (*OpenModel) DeepCopy ¶
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModel.
func (*OpenModel) DeepCopyInto ¶
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*OpenModel) DeepCopyObject ¶
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.
type OpenModelList ¶
type OpenModelList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []OpenModel `json:"items"`
}
OpenModelList contains a list of OpenModel
func (*OpenModelList) DeepCopy ¶
func (in *OpenModelList) DeepCopy() *OpenModelList
DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new OpenModelList.
func (*OpenModelList) DeepCopyInto ¶
func (in *OpenModelList) DeepCopyInto(out *OpenModelList)
DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (*OpenModelList) DeepCopyObject ¶
func (in *OpenModelList) DeepCopyObject() runtime.Object
DeepCopyObject is an autogenerated deepcopy function, copying the receiver, creating a new runtime.Object.