vllm

package

v0.10.16 Latest Latest Go to latest Published: May 8, 2026 License: MIT Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DocumentDrivenDX/fizeau

Links

Open Source Insights

Documentation ¶

Overview ¶

Package vllm wraps the OpenAI-compat HTTP surface exposed by `vllm serve` (https://docs.vllm.ai/). vLLM is a high-throughput inference server with one behavior worth distinguishing in the catalog stack: by default it applies the target model's HuggingFace `generation_config.json` when the client omits sampler fields. Most other local servers we wrap (omlx, lmstudio, lucebox) cannot do this — MLX / GGUF repackaging typically drops generation_config.json from the bundle, and the servers ship their own presets instead.

The implication for ADR-007's catalog-stale nudge: when a vLLM-served request omits sampler fields, the user is not "decoding greedy" — the server is honoring the model creator's recommended bundle. The CLI reflects that with a softer message.

Capabilities mirror lmstudio (Tools / Stream / StructuredOutput true) and add ImplicitGenerationConfig=true. Reasoning is model-dependent and not declared at the provider level; per-model thinking-mode controls live in the catalog ModelEntry, matching the lmstudio precedent.

Default port 8000 follows the vLLM docs. Auth is optional: vLLM accepts unauthenticated requests by default and gates with --api-key (or VLLM_API_KEY) when the operator sets one. The Config.APIKey field flows through unchanged.

Index ¶

Constants
Variables
func New(cfg Config) *openai.Provider
type Config
type UtilizationProbe
- func NewUtilizationProbe(baseURL string, client *http.Client) *UtilizationProbe
- func (p *UtilizationProbe) Probe(ctx context.Context) utilization.EndpointUtilization

Constants ¶

View Source

const DefaultBaseURL = "http://localhost:8000/v1"

Variables ¶

View Source

var ProtocolCapabilities = openai.ProtocolCapabilities{
	Tools:                    true,
	Stream:                   true,
	StructuredOutput:         true,
	ImplicitGenerationConfig: true,
}

ProtocolCapabilities mirrors lmstudio's openai-compat surface and adds ImplicitGenerationConfig=true so the catalog-stale nudge can soften.

Functions ¶

func New ¶

func New(cfg Config) *openai.Provider

Types ¶

type Config ¶

type Config struct {
	BaseURL      string
	APIKey       string
	Model        string
	ModelPattern string
	KnownModels  map[string]string
	Headers      map[string]string
	Reasoning    reasoning.Reasoning
}

type UtilizationProbe ¶ added in v0.10.9

type UtilizationProbe struct {
	// contains filtered or unexported fields
}

UtilizationProbe queries vLLM server-root observability endpoints and normalizes them into the shared endpoint utilization shape.

func NewUtilizationProbe ¶ added in v0.10.9

func NewUtilizationProbe(baseURL string, client *http.Client) *UtilizationProbe

NewUtilizationProbe creates a probe for an OpenAI-compatible vLLM base URL.

func (*UtilizationProbe) Probe ¶ added in v0.10.9

func (p *UtilizationProbe) Probe(ctx context.Context) utilization.EndpointUtilization

Probe fetches /metrics from the server root and returns a normalized sample. Failures return stale or unknown utilization instead of surfacing endpoint unavailability.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL