vllm

package
v0.10.16 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 8, 2026 License: MIT Imports: 10 Imported by: 0

Documentation

Overview

Package vllm wraps the OpenAI-compat HTTP surface exposed by `vllm serve` (https://docs.vllm.ai/). vLLM is a high-throughput inference server with one behavior worth distinguishing in the catalog stack: by default it applies the target model's HuggingFace `generation_config.json` when the client omits sampler fields. Most other local servers we wrap (omlx, lmstudio, lucebox) cannot do this — MLX / GGUF repackaging typically drops generation_config.json from the bundle, and the servers ship their own presets instead.

The implication for ADR-007's catalog-stale nudge: when a vLLM-served request omits sampler fields, the user is not "decoding greedy" — the server is honoring the model creator's recommended bundle. The CLI reflects that with a softer message.

Capabilities mirror lmstudio (Tools / Stream / StructuredOutput true) and add ImplicitGenerationConfig=true. Reasoning is model-dependent and not declared at the provider level; per-model thinking-mode controls live in the catalog ModelEntry, matching the lmstudio precedent.

Default port 8000 follows the vLLM docs. Auth is optional: vLLM accepts unauthenticated requests by default and gates with --api-key (or VLLM_API_KEY) when the operator sets one. The Config.APIKey field flows through unchanged.

Index

Constants

View Source
const DefaultBaseURL = "http://localhost:8000/v1"

Variables

View Source
var ProtocolCapabilities = openai.ProtocolCapabilities{
	Tools:                    true,
	Stream:                   true,
	StructuredOutput:         true,
	ImplicitGenerationConfig: true,
}

ProtocolCapabilities mirrors lmstudio's openai-compat surface and adds ImplicitGenerationConfig=true so the catalog-stale nudge can soften.

Functions

func New

func New(cfg Config) *openai.Provider

Types

type Config

type Config struct {
	BaseURL      string
	APIKey       string
	Model        string
	ModelPattern string
	KnownModels  map[string]string
	Headers      map[string]string
	Reasoning    reasoning.Reasoning
}

type UtilizationProbe added in v0.10.9

type UtilizationProbe struct {
	// contains filtered or unexported fields
}

UtilizationProbe queries vLLM server-root observability endpoints and normalizes them into the shared endpoint utilization shape.

func NewUtilizationProbe added in v0.10.9

func NewUtilizationProbe(baseURL string, client *http.Client) *UtilizationProbe

NewUtilizationProbe creates a probe for an OpenAI-compatible vLLM base URL.

func (*UtilizationProbe) Probe added in v0.10.9

Probe fetches /metrics from the server root and returns a normalized sample. Failures return stale or unknown utilization instead of surfacing endpoint unavailability.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL