examples/

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ethpandaops/vllm-agent-sdk-go

Links

Open Source Insights

README ¶

Examples

This directory contains runnable examples for github.com/ethpandaops/vllm-agent-sdk-go.

API Overview

The SDK supports two primary styles:

Top-level one-shot APIs:

Query(ctx, content, ...opts)
QueryStream(ctx, messages, ...opts)

Stateful client API:

NewClient() + Start()
Query() + ReceiveResponse() / ReceiveMessages()
Interrupt() / SetModel() / SetPermissionMode()

Environment

The core SDK resolves VLLM_BASE_URL, VLLM_API_KEY, and VLLM_MODEL automatically (see Configuration). Examples add additional overrides:

Variable	Description	Default
`VLLM_BASE_URL`	vLLM server base URL	`http://127.0.0.1:8000/v1`
`VLLM_API_KEY`	Bearer auth token (if your server enforces auth)	(none)
`VLLM_MODEL`	Model name	`QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ` (example default)
`VLLM_IMAGE_MODEL`	Image-capable model	`QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ` (example default)
`VLLM_VISION_MODEL`	Vision model for multimodal input	Falls back to `VLLM_IMAGE_MODEL`, then `VLLM_MODEL`
`VLLM_IMAGE_OUTPUT_DIR`	Directory for saving generated images	(none)

Use VLLM_MODEL to pin a different served model when you need deterministic capability coverage.

Core SDK Examples

These focus on the sibling-style SDK contract that downstream code consumes directly.

Example	Description
`quick_start`	Basic one-shot query with low-cost model defaults.
`query_stream`	Streaming input via `QueryStream` and `MessagesFromSlice`.
`client_multi_turn`	Stateful `Client` usage over multiple turns.
`model_discovery`	List models and inspect free/tool/structured-output/image capabilities.
`structured_output`	Structured JSON output with `WithOutputFormat`.
`sdk_tools`	In-process SDK tools via `WithSDKTools(...)`.
`on_user_input`	SDK-owned user input prompts via `WithOnUserInput(...)`.
`permissions`	Tool permission denial handling via `WithCanUseTool(...)`.
`hooks`	Hook callbacks around tool execution.
`sessions_local`	Local session persistence, listing, stats, and message inspection.
`interrupt`	Client cancellation via `Interrupt()`.
`error_handling`	Typed `UnsupportedControlError` and `ErrSessionNotFound` handling.
`system_prompt`	System prompt configuration (default vs custom string vs preset).
`extended_thinking`	Extended thinking with `WithThinking` and `WithEffort`.
`include_partial_messages`	Real-time streaming of partial message deltas.
`max_budget_usd`	API cost control with `WithMaxBudgetUSD`.
`cancellation`	Context cancellation and graceful client shutdown.
`parallel_queries`	Concurrent `Query()` calls with `errgroup`.
`pipeline`	Multi-step LLM orchestration (Generate → Evaluate → Refine).
`mcp_calculator`	In-process MCP server with calculator tools via `CreateSdkMcpServer`.
`mcp_status`	Query MCP server connection status via `GetMCPStatus`.
`memory_tool`	Filesystem-backed persistent memory via MCP tools.
`custom_logger`	Bridge any logging library (logrus) to `WithLogger` via `slog.Handler`.

VLLM-Native Advanced Examples

These focus on VLLM-specific routing and request-shape controls.

Example	Description
`vllm_chat_controls`	Sampling/tool controls (`top_p`, penalties, seed, stop, logprobs).
`vllm_routing`	Provider/plugins/route/session/trace controls.
`vllm_responses`	`/responses` mode with instructions/text config/service tier/truncation.
`vllm_responses_chaining`	Responses chaining with `previous_response_id` and prompt cache key.
`vllm_multimodal_input`	Multimodal chat-completions input with block-based text + image parts.
`vllm_multimodal_image`	Multimodal/image generation with generated image blocks saved to disk.
`vllm_extra`	Escape-hatch payload overrides via `WithVLLMExtra`.

Running

# Run any example
go run ./examples/quick_start
go run ./examples/vllm_responses
go run ./examples/vllm_multimodal_input

# Examples with sub-examples accept a name argument
go run ./examples/extended_thinking all
go run ./examples/cancellation graceful_shutdown

Testing

# Run all examples and verify output with VLLM
scripts/test_examples.sh

# Run specific examples
scripts/test_examples.sh -f quick_start,pipeline

# Keep going on failure
scripts/test_examples.sh -k

# Override the model
VLLM_MODEL=QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ scripts/test_examples.sh

# Image-generation example with explicit image model and output directory
VLLM_IMAGE_MODEL=QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ \
VLLM_IMAGE_OUTPUT_DIR=/tmp/vllm-images \
go run ./examples/vllm_multimodal_image

# Multimodal input example with a vision-capable model
VLLM_VISION_MODEL=QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ \
go run ./examples/vllm_multimodal_input

Directories ¶

Path	Synopsis
cancellation Package main demonstrates cancellation and graceful shutdown patterns.	Package main demonstrates cancellation and graceful shutdown patterns.
client_multi_turn
error_handling
extended_thinking Package main demonstrates extended thinking capabilities with VLLM.	Package main demonstrates extended thinking capabilities with VLLM.
hooks
include_partial_messages Package main demonstrates partial message streaming where incremental assistant updates are received as the model generates responses.	Package main demonstrates partial message streaming where incremental assistant updates are received as the model generates responses.
internal
exampleutil
interrupt
max_budget_usd Package main demonstrates API cost control with budget limits.	Package main demonstrates API cost control with budget limits.
mcp_calculator Package main demonstrates how to create calculator tools using MCP servers.	Package main demonstrates how to create calculator tools using MCP servers.
mcp_status Package main demonstrates querying MCP server connection status.	Package main demonstrates querying MCP server connection status.
memory_tool Package main demonstrates a filesystem-backed memory tool for agent state persistence.	Package main demonstrates a filesystem-backed memory tool for agent state persistence.
model_discovery
on_user_input
parallel_queries Package main demonstrates running multiple Query() calls concurrently.	Package main demonstrates running multiple Query() calls concurrently.
permissions
pipeline Package main demonstrates multi-step LLM orchestration with Go control flow.	Package main demonstrates multi-step LLM orchestration with Go control flow.
query_stream
quick_start
sdk_tools
sessions_local
structured_output
system_prompt Package main demonstrates configuring system prompts.	Package main demonstrates configuring system prompts.
vllm_chat_controls
vllm_extra
vllm_multimodal_image
vllm_multimodal_input
vllm_responses
vllm_responses_chaining
vllm_routing

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL