inference-manager

module

v1.8.0 Latest Latest Go to latest Published: Nov 14, 2024 License: Apache-2.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/llmariner/inference-manager

Links

Open Source Insights

README ¶

inference-manager

The inference-manager manages inference runtimes (e.g., vLLM and Ollama) in containers, load models, and process requests.

Set up Inference Server/Engine for development

Requirements:

Run the following command:

make setup-llmariner setup-cluster helm-apply-inference

[!TIP]

Run just only make helm-reapply-inference-server or make helm-reapply-inference-engine, it will rebuild inference-manager container images, deploy them using the local helm chart, and restart containers.

You can configure parameters in .values.yaml.

Try out inference APIs

with curl:

curl --request POST http://localhost:8080/v1/chat/completions -d '{
  "model": "google-gemma-2b-it-q4_0",
  "messages": [{"role": "user", "content": "hello"}]
}'

with llma:

export LLMARINER_API_KEY=dummy
llma chat completions create \
    --model google-gemma-2b-it-q4_0 \
    --role system \
    --completion 'hi'

Directories ¶

Path	Synopsis
api
v1
common
pkg/sse
pkg/test
engine
cmd command
internal/autoscaler
internal/config
internal/metrics
internal/modeldownloader
internal/modeldownloader/common
internal/modeldownloader/huggingface
internal/models
internal/ollama
internal/processor
internal/runtime
internal/s3
server
cmd command
internal/admin
internal/config
internal/infprocessor
internal/monitoring
internal/rag
internal/router
internal/server
internal/taskexchanger
triton-proxy
cmd command
internal/server

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL