testopenai

package

v0.4.0-rc1 Latest Latest Go to latest Published: Nov 5, 2025 License: Apache-2.0 Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/envoyproxy/ai-gateway

Links

Open Source Insights

README ¶

Test OpenAI Server

This package provides a test OpenAI API server for testing AI Gateway functionality without requiring actual API access or credentials.

Pre-recorded OpenAI request/responses are stored as YAML files in the cassettes directory, using the go-vcr v4 format.

Overview

The test server works by:

Automatically loading all pre-recorded API interactions from embedded "cassette" YAML files
Matching incoming requests against recorded interactions based on the X-Cassette-Name header
Replaying the recorded responses with delays faster than real platforms to keep tests fast.

This approach provides:

Deterministic testing: Same inputs always produce same outputs
No API credentials needed: Tests can run without OpenAI API keys
Fast execution: No network calls to external services
Cost savings: No API usage charges during testing

Usage

Basic Usage

import (
	"testing"
	"github.com/envoyproxy/ai-gateway/internal/testopenai"
)

func TestMyFeature(t *testing.T) {
	// Create server on random port - cassettes are automatically loaded
	server, err := testopenai.NewServer()
	require.NoError(t, err)
	defer server.Close()

	// Create a request for a specific cassette
	req, err := testopenai.NewRequest(server.URL(), testopenai.CassetteChatBasic)
	require.NoError(t, err)

	// Make the request
	resp, err := http.DefaultClient.Do(req)
	// ... test your code
}

Recording New Cassettes

The test server can record new interactions when:

No matching cassette is found
OPENAI_API_KEY or AZURE_OPENAI_API_KEY is set in the environment
A cassette name is provided via X-Cassette-Name header

To record a new cassette, follow these steps:

Add a constant for your test scenario to requests.go:

const (
	// ... existing constants
	// CassetteChatFeatureX includes feature X, added to OpenAI version 1.2.3.
	CassetteChatFeatureX
	_cassetteNameEnd // Keep this at the end
)

Note: The constants use iota enumeration, so your new constant must be added before _cassetteNameEnd to be included in the AllCassettes() iteration.

Also add its string mapping:

var stringValues = map[CassetteName]string{
	// ... existing mappings
	CassetteChatFeatureX: "chat-feature-x",
}

Add the request body for your test to requests.go:

var requestBodies = map[CassetteName]*openai.ChatCompletionRequest{
	// ... existing entries
	CassetteChatFeatureX: {
		Model: openai.ModelGPT41Nano,
		Messages: []openai.ChatCompletionMessageParamUnion{
			{
				Type: openai.ChatMessageRoleUser,
				Value: openai.ChatCompletionUserMessageParam{
					Role: openai.ChatMessageRoleUser,
					Content: openai.StringOrUserRoleContentUnion{
						Value: "Your test prompt",
					},
				},
			},
		},
		// Add your feature-specific fields here
	},
}

Run TestNewRequest with your API credentials set:

For OpenAI:

cd tests/internal/testopenai
OPENAI_API_KEY=sk-.. go test -run TestNewRequest -v

For Azure OpenAI:

cd tests/internal/testopenai
AZURE_OPENAI_API_KEY=your-key \
  AZURE_OPENAI_ENDPOINT=https://your-resource.cognitiveservices.azure.com \
  AZURE_OPENAI_DEPLOYMENT=your-deployment-name \
  OPENAI_API_VERSION=2024-02-15-preview \
  go test -run TestNewRequest -v

Use it in tests like chat_completions_test.go

Flowchart of Request Handling

graph TD
    A[Request arrives] --> B{X-Cassette-Name\nheader present?}
    B -->|Yes| C[Search for specific cassette]
    B -->|No| D[Search all cassettes]

    C --> E{Cassette found?}
    D --> F{Match found?}

    E -->|Yes| G{Interaction matches?}
    E -->|No| H{API key set?}
    F -->|Yes| P[Return recorded response]
    F -->|No| I[Return 400 error:\nInclude X-Cassette-Name header]

    G -->|Yes| P
    G -->|No| O[Return 409 error:\nInteraction out of date]

    H -->|Yes| J[Record new interaction]
    H -->|No| K[Return 500 error:\nNo cassette found]

    J --> L[Make real API call]
    L --> M[Save to cassette file\nwith .yaml extension]
    M --> N[Return response to client]

    style I fill:#f96
    style K fill:#f96
    style O fill:#fa6

Future work

OpenAI is not the only inference API supported, but it is special as it is the most common frontend and backend for AI Gateway. This is why we expose the requests, as we will often proxy these even if the backend is not OpenAI compatible.

The recording process would remain consistent for other cloud services, such as Anthropic or Bedrock, though there could be variations in how requests are scrubbed for secrets or handled for request signing. In a future refactoring, we could extract the core recording infrastructure into a separate package, reducing this one to just cassette constants and OpenAI-specific request recording and handling details. Most of the code could be reused for other backends.

For additional insights, refer to OpenTelemetry instrumentation, which often employs VCR for LLM frameworks as well.

Here are key parts of the OpenTelemetry Botocore Bedrock instrumentation that deals with request signing and recording:

Here are key parts of OpenInference Anthropic instrumentation, which handles their endpoint.

test_instrumentor.py

Documentation ¶

Overview ¶

Package testopenai provides Azure OpenAI support for VCR cassette recording and playback.

Azure OpenAI cassettes follow a specific workflow to enable recording and playback while protecting sensitive information:

RECORDING FLOW:

NewRequest builds Azure-format path from request body model: Input: endpoint="/chat/completions", body={"model":"gpt-4"} Output: path="/openai/deployments/gpt-4/chat/completions" Note: No api-version in the path built by NewRequest.

2. Server forwards to upstream Azure API:

Reads AZURE_OPENAI_ENDPOINT (e.g., https://your-resource.eastus2.cognitiveservices.azure.com)
Reads AZURE_OPENAI_DEPLOYMENT (deployment name configured in Azure portal)
Reads OPENAI_API_VERSION (e.g., 2024-12-01-preview)
Builds URL: {endpoint}/openai/deployments/{deployment}/{endpoint}?api-version={version}
Example: https://your-resource.eastus2.cognitiveservices.azure.com/openai/deployments/prod-gpt4/chat/completions?api-version=2024-12-01-preview

3. VCR afterCaptureHook scrubs sensitive data from recorded cassette:

Replaces actual hostname with "resource-name.cognitiveservices.azure.com"
Replaces deployment name with model name from request body
Strips api-version query parameter (not needed for playback matching)
Result: https://resource-name.cognitiveservices.azure.com/openai/deployments/gpt-4/chat/completions

PLAYBACK FLOW: 1. NewRequest builds same Azure-format path (no api-version) 2. Server normalizes cassette URL by replacing hostname only 3. Request path matches cassette path exactly

ENVIRONMENT VARIABLES: Recording Azure cassettes requires: - AZURE_OPENAI_API_KEY: Authentication for Azure OpenAI - AZURE_OPENAI_ENDPOINT: Base URL (hostname is scrubbed in cassette) - AZURE_OPENAI_DEPLOYMENT: Deployment name (replaced with model in cassette) - OPENAI_API_VERSION: API version (stripped from cassette, only used upstream)

Recording OpenAI cassettes requires: - OPENAI_API_KEY: Authentication for OpenAI

Package testopenai provides a test OpenAI API server for testing. It uses VCR (Video Cassette Recorder) pattern to replay pre-recorded API responses, allowing deterministic testing without requiring actual API access or credentials.

Index ¶

Constants
func NewRequest(ctx context.Context, baseURL string, cassette Cassette) (*http.Request, error)
func ResponseBody(cassette Cassette) string
type Cassette
- func (c Cassette) String() string
type Server
- func NewServer(out io.Writer, port int) (*Server, error)

Constants ¶

View Source

const CassetteNameHeader = "X-Cassette-Name"

CassetteNameHeader is the header used to specify which cassette to use for matching.

Variables ¶

This section is empty.

Functions ¶

func NewRequest ¶

func NewRequest(ctx context.Context, baseURL string, cassette Cassette) (*http.Request, error)

NewRequest creates a new OpenAI request for the given cassette.

The returned request is an http.MethodPost with the body and CassetteNameHeader according to the pre-recorded cassette.

func ResponseBody ¶

func ResponseBody(cassette Cassette) string

ResponseBody is used in tests to avoid duplicating body content when the proxy serialization matches exactly the upstream (testopenai) server.

Types ¶

type Cassette ¶

type Cassette int

Cassette is an HTTP interaction recording.

Note: At the moment, our tests are optimized for single request/response pairs and do not include scenarios requiring multiple round-trips, such as `cached_tokens`.

const (

	// CassetteChatBasic is the canonical OpenAI chat completion request.
	CassetteChatBasic Cassette = iota
	// CassetteChatJSONMode is a chat completion request with JSON response format.
	CassetteChatJSONMode
	// CassetteChatMultimodal is a multimodal chat request with text and image inputs.
	CassetteChatMultimodal
	// CassetteChatMultiturn is a multi-turn conversation with message history.
	CassetteChatMultiturn
	// CassetteChatNoMessages is a request missing the required messages field.
	CassetteChatNoMessages
	// CassetteChatParallelTools is a chat completion with parallel function calling enabled.
	CassetteChatParallelTools
	// CassetteChatStreaming is the canonical OpenAI chat completion request,
	// with streaming enabled.
	CassetteChatStreaming
	// CassetteChatTools is a chat completion request with function tools.
	CassetteChatTools
	// CassetteChatUnknownModel is a request with a non-existent model.
	CassetteChatUnknownModel
	// CassetteChatBadRequest is a request with multiple validation errors.
	CassetteChatBadRequest
	// CassetteChatReasoning tests capture of reasoning_tokens in completion_tokens_details for O1 models.
	CassetteChatReasoning
	// CassetteChatImageToText tests image input processing showing image token
	// count in usage details.
	CassetteChatImageToText
	// CassetteChatTextToImageTool tests image generation through tool calls since
	// chat completions cannot natively output images.
	CassetteChatTextToImageTool
	// CassetteChatAudioToText tests audio input transcription and audio_tokens
	// in prompt_tokens_details.
	CassetteChatAudioToText
	// CassetteChatTextToAudio tests audio output generation where the model
	// produces audio content, showing audio_tokens in completion_tokens_details.
	CassetteChatTextToAudio
	// CassetteChatDetailedUsage tests capture of all token usage detail fields in a single response.
	CassetteChatDetailedUsage
	// CassetteChatStreamingDetailedUsage tests capture of detailed token usage in streaming responses with include_usage.
	CassetteChatStreamingDetailedUsage
	// CassetteChatWebSearch tests OpenAI Web Search tool with a small URL response, including citations.
	CassetteChatWebSearch
	// CassetteChatStreamingWebSearch is CassetteChatWebSearch except with streaming enabled.
	CassetteChatStreamingWebSearch
	// CassetteChatOpenAIAgentsPython is a real request from OpenAI Agents Python library for financial research.
	// See https://github.com/openai/openai-agents-python/tree/main/examples/financial_research_agent
	CassetteChatOpenAIAgentsPython

	// CassetteCompletionBasic tests standard single-prompt code completion
	// requests typical of LoRA-tuned CodeLlama or Starcoder models deployed via
	// vLLM/llama.cpp. Uses fibonacci function as representative coding task for
	// model evaluation.
	// See: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
	CassetteCompletionBasic
	// CassetteCompletionToken is CassetteCompletionBasic, but with cl100k_base
	// tokens as input instead of text strings. This simulates LoRA fine-tuning
	// workflows requiring precise tokenization control.
	CassetteCompletionToken
	// CassetteCompletionStreaming is CassetteCompletionBasic, with streaming
	// enabled to test real-time token delivery common in IDE.
	CassetteCompletionStreaming
	// CassetteCompletionStreamingUsage is CassetteCompletionStreaming, but
	// with include_usage enabled to test detailed token usage reporting.
	CassetteCompletionStreamingUsage
	// CassetteCompletionTextBatch tests multiple code completion variants
	// generated simultaneously, common in IDE autocomplete where users select
	// from LoRA model suggestions. Full vs truncated prompts simulate real
	// editing scenarios.
	// See: https://community.openai.com/t/n-argument-vs-batch-input/59121
	CassetteCompletionTextBatch
	// CassetteCompletionTokenBatch is CassetteCompletionTextBatch, but with
	// cl100k_base tokens as input instead of text strings. This simulates LoRA
	// fine-tuning workflows requiring precise tokenization control.
	CassetteCompletionTokenBatch
	// CassetteCompletionSuffix tests the suffix parameter for fill-in-the-middle
	// completion tasks. The model generates code to insert between a prompt (partial
	// function definition) and a suffix (function call). Also tests logprobs for
	// confidence scoring and n for multiple completion variants. Only gpt-3.5-turbo-instruct
	// supports the suffix parameter.
	// See: https://platform.openai.com/docs/guides/completions/inserting-text
	CassetteCompletionSuffix
	// CassetteCompletionBadRequest is a request with multiple validation
	// errors.
	CassetteCompletionBadRequest
	// CassetteCompletionUnknownModel is a request with a non-existent model.
	CassetteCompletionUnknownModel

	// CassetteEmbeddingsBasic is the canonical OpenAI embeddings request with a single string input.
	CassetteEmbeddingsBasic
	// CassetteEmbeddingsBase64 tests base64 encoding format for embedding vectors.
	CassetteEmbeddingsBase64
	// CassetteEmbeddingsTokens tests embeddings with token array input instead of text.
	CassetteEmbeddingsTokens
	// CassetteEmbeddingsLargeText tests embeddings with a longer text input.
	CassetteEmbeddingsLargeText
	// CassetteEmbeddingsUnknownModel tests error handling for non-existent model.
	CassetteEmbeddingsUnknownModel
	// CassetteEmbeddingsDimensions tests embeddings with specified output dimensions.
	CassetteEmbeddingsDimensions
	// CassetteEmbeddingsMixedBatch tests batch with varying text lengths.
	CassetteEmbeddingsMixedBatch
	// CassetteEmbeddingsMaxTokens tests input that approaches token limit.
	CassetteEmbeddingsMaxTokens
	// CassetteEmbeddingsWhitespace tests handling of various whitespace patterns.
	CassetteEmbeddingsWhitespace
	// CassetteEmbeddingsBadRequest tests request with multiple validation errors.
	CassetteEmbeddingsBadRequest

	// CassetteImageGenerationBasic is a basic image generation request with model and prompt.
	CassetteImageGenerationBasic

	// CassetteAzureChatBasic is the same as CassetteChatBasic, except using
	// Azure OpenAI Service authentication and endpoint format.
	CassetteAzureChatBasic
)

func ChatCassettes ¶

func ChatCassettes() []Cassette

ChatCassettes returns a slice of all cassettes for chat completions. Unlike image generation—which *requires* an image_generation tool call— audio synthesis is natively supported.

func CompletionCassettes ¶ added in v0.4.0

func CompletionCassettes() []Cassette

CompletionCassettes returns a slice of all cassettes for the /completions endpoint.

func EmbeddingsCassettes ¶

func EmbeddingsCassettes() []Cassette

EmbeddingsCassettes returns a slice of all cassettes for embeddings.

func ImageCassettes ¶ added in v0.4.0

func ImageCassettes() []Cassette

ImageCassettes returns a slice of all cassettes for image generation.

func (Cassette) String ¶

func (c Cassette) String() string

String returns the string representation of the cassette name.

type Server ¶

type Server struct {
	// contains filtered or unexported fields
}

Server represents a test OpenAI API server that replays cassette recordings.

func NewServer ¶

func NewServer(out io.Writer, port int) (*Server, error)

NewServer creates a new test OpenAI server (use port 0 for random).

func (*Server) Close ¶

func (s *Server) Close()

Close shuts down the server.

func (*Server) Port ¶

func (s *Server) Port() int

Port returns the port the server is listening on.

func (*Server) URL ¶

func (s *Server) URL() string

URL returns the base URL of the server.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL