llmaz

module

v0.0.5 Latest Latest Go to latest Published: Aug 26, 2024 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/inftyai/llmaz

Links

Open Source Insights

README ¶

llmaz

llmaz (pronounced /lima:z/), aims to provide a Production-Ready inference platform for large language models on Kubernetes. It closely integrates with the state-of-the-art inference backends to bring the leading-edge researches to cloud.

🌱 llmaz is alpha now, so API may change before graduating to Beta.

Concept

Feature Overview

User Friendly: People can quick deploy a LLM service with minimal configurations.
High Performance: llmaz supports a wide range of advanced inference backends for high performance, like vLLM, SGLang, llama.cpp. Find the full list of supported backends here.
Scaling Efficiency (WIP): llmaz works smoothly with autoscaling components like Cluster-Autoscaler or Karpenter to support elastic scenarios.
Accelerator Fungibility (WIP): llmaz supports serving the same LLM with various accelerators to optimize cost and performance.
SOTA Inference (WIP): llmaz supports the latest cutting-edge researches like Speculative Decoding or Splitwise to run on Kubernetes.
Various Model Providers: llmaz automatically loads models from various providers, such as HuggingFace, ModelScope, ObjectStores(aliyun OSS, more on the way).
Multi-hosts Support: llmaz supports both single-host and multi-hosts scenarios with LWS from day 1.

Quick Start

Installation

Read the Installation for guidance.

Deploy

Here's a simplest sample for deploying facebook/opt-125m, all you need to do is to apply a Model and a Playground.

Please refer to examples to learn more.

Note: if your model needs Huggingface token for weight downloads, please run kubectl create secret generic modelhub-secret --from-literal=HF_TOKEN=<your token> ahead.

Model

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: opt-125m
spec:
  familyName: opt
  source:
    modelHub:
      modelID: facebook/opt-125m
  inferenceFlavors:
  - name: t4 # GPU type
    requests:
      nvidia.com/gpu: 1

Inference Playground

apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: opt-125m
spec:
  replicas: 1
  modelClaim:
    modelName: opt-125m

Test

Expose the service

kubectl port-forward pod/opt-125m-0 8080:8080

Get registered models

curl http://localhost:8080/v1/models

Request a query

curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "opt-125m",
    "prompt": "San Francisco is a",
    "max_tokens": 10,
    "temperature": 0
}'

Roadmap

Gateway support for traffic routing
Serverless support for cloud-agnostic users
CLI tool support
Model training, fine tuning in the long-term

Contributions

🚀 All kinds of contributions are welcomed ! Please follow Contributing. Thanks to all these contributors.

Directories ¶

Path	Synopsis
api
core/v1alpha1 Package v1alpha1 contains API Schema definitions for the v1alpha1 API group +kubebuilder:object:generate=true +groupName=llmaz.io	Package v1alpha1 contains API Schema definitions for the v1alpha1 API group +kubebuilder:object:generate=true +groupName=llmaz.io
inference/v1alpha1 Package v1alpha1 contains API Schema definitions for the inference v1alpha1 API group +kubebuilder:object:generate=true +groupName=inference.llmaz.io	Package v1alpha1 contains API Schema definitions for the inference v1alpha1 API group +kubebuilder:object:generate=true +groupName=inference.llmaz.io
client-go
applyconfiguration
applyconfiguration/core/v1alpha1
applyconfiguration/inference/v1alpha1
applyconfiguration/internal
clientset/versioned
clientset/versioned/fake This package has the automatically generated fake clientset.	This package has the automatically generated fake clientset.
clientset/versioned/scheme This package contains the scheme of the automatically generated clientset.	This package contains the scheme of the automatically generated clientset.
clientset/versioned/typed/core/v1alpha1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
clientset/versioned/typed/core/v1alpha1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
clientset/versioned/typed/inference/v1alpha1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
clientset/versioned/typed/inference/v1alpha1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
informers/externalversions
informers/externalversions/core
informers/externalversions/core/v1alpha1
informers/externalversions/inference
informers/externalversions/inference/v1alpha1
informers/externalversions/internalinterfaces
listers/core/v1alpha1
listers/inference/v1alpha1
cmd
hack
internal
pkg
cert
controller
controller/inference
controller_helper/backend
controller_helper/model_source
util
webhook
test
util
util/validation
util/wrapper

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL