internalapi

package
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 7, 2025 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package internalapi provides constants and functions used across the boundary among controller, extension server and extproc.

Index

Constants

View Source
const (
	// EnvoyAIGatewayHeaderPrefix is the prefix for special headers used by AI Gateway, either for internal or external use.
	EnvoyAIGatewayHeaderPrefix = "x-ai-eg-"
	// InternalEndpointMetadataNamespace is the namespace used for the dynamic metadata for internal use.
	InternalEndpointMetadataNamespace = "aigateway.envoy.io"
	// InternalMetadataBackendNameKey is the key used to store the backend name
	InternalMetadataBackendNameKey = "per_route_rule_backend_name"
	// MCPBackendHeader is the special header key used to specify the target backend name.
	MCPBackendHeader = EnvoyAIGatewayHeaderPrefix + "mcp-backend"
	// MCPRouteHeader is the special header key used to identify the mcp route.
	MCPRouteHeader = EnvoyAIGatewayHeaderPrefix + "mcp-route"
	// MCPBackendListenerPort is the port for the MCP backend listener.
	MCPBackendListenerPort = 10088
	// MCPProxyPort is the port where the MCP proxy listens.
	MCPProxyPort = 9856
	// MCPGeneratedResourceCommonPrefix is the common prefix for all MCP-related generated resources.
	MCPGeneratedResourceCommonPrefix = "ai-eg-mcp-"
	// MCPMainHTTPRoutePrefix is the prefix for the main HTTPRoute resources generated for MCP.
	MCPMainHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "main-"
	// MCPPerBackendRefHTTPRoutePrefix is the prefix for the per-backend-ref HTTPRoute resources generated for MCP.
	MCPPerBackendRefHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "br-"
	// MCPPerBackendHTTPRouteFilterPrefix is the prefix for the HTTP route filter names for per-backend resources.
	MCPPerBackendHTTPRouteFilterPrefix = MCPGeneratedResourceCommonPrefix + "brf-"

	// MCPMetadataHeaderPrefix is the prefix for special headers used to pass metadata in the filter metadata.
	// These headers are added internally to the requests to the upstream servers so they can be populated in the filter
	// metadata. These headers are considered just internal, and they'll be removed once they are stored in the filter
	// metadata to avoid sending unnecessary information to the upstream servers.
	MCPMetadataHeaderPrefix = "x-ai-eg-mcp-metadata-"
	// MCPMetadataHeaderRequestID is the special header key used to pass the MCP request ID in the filter metadata.
	MCPMetadataHeaderRequestID = MCPMetadataHeaderPrefix + "request-id"
	// MCPMetadataHeaderMethod is the special header key used to pass the MCP method in the filter metadata.
	MCPMetadataHeaderMethod = MCPMetadataHeaderPrefix + "method"
)
View Source
const (
	// XDSClusterMetadataKey is the key used to access cluster metadata in xDS attributes
	XDSClusterMetadataKey = "xds.cluster_metadata"
	// XDSUpstreamHostMetadataKey is the key used to access upstream host metadata in xDS attributes
	XDSUpstreamHostMetadataKey = "xds.upstream_host_metadata"
)
View Source
const AIGatewayFilterMetadataNamespace = aigv1a1.AIGatewayFilterMetadataNamespace

AIGatewayFilterMetadataNamespace is the namespace used for the filter metadata related to AI Gateway.

For example, token usage, input/output tokens, and request costs are stored in this namespace. Aliased from aigv1a1.AIGatewayFilterMetadataNamespace to avoid making ExtProc directly depend on the control plane API which is not a concern of ExtProc.

View Source
const (
	// AIGatewayGeneratedHTTPRouteAnnotation is the annotation key used to mark
	// HTTPRoute resources that are generated by the AI Gateway controller.
	AIGatewayGeneratedHTTPRouteAnnotation = "ai-gateway-generated"
)
View Source
const (
	// EndpointPickerHeaderKey is the header key used to specify the target backend endpoint.
	// This is the default header name in the reference implementation:
	// https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/2b5b337b45c3289e5f9367b2c19deef021722fcd/pkg/epp/server/runserver.go#L63
	EndpointPickerHeaderKey = "x-gateway-destination-endpoint"
)
View Source
const ModelNameHeaderKeyDefault = aigv1a1.AIModelHeaderKey

ModelNameHeaderKeyDefault is the default header key for the model name.

Variables

View Source
var MCPInternalHeadersToMetadata = map[string]string{
	MCPBackendHeader:           "mcp_backend",
	MCPMetadataHeaderMethod:    "mcp_method",
	MCPMetadataHeaderRequestID: "mcp_request_id",
}

MCPInternalHeadersToMetadata maps special MCP headers to metadata keys.

Functions

func ParseRequestHeaderAttributeMapping added in v0.4.0

func ParseRequestHeaderAttributeMapping(s string) (map[string]string, error)

ParseRequestHeaderAttributeMapping parses comma-separated key-value pairs for header-to-attribute mapping. The input format is "header1:attribute1,header2:attribute2" where header names are HTTP request headers and attribute names are Otel span or metric attributes. Example: "x-session-id:session.id,x-user-id:user.id".

Note: This serves a different purpose than OTEL's OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST, which captures headers as span attributes for tracing.

Note: We do not need to convert to Prometheus format (e.g., x-session-id → session.id) here, as that's done implicitly in the Prometheus exporter.

func PerRouteRuleRefBackendName

func PerRouteRuleRefBackendName(namespace, name, routeName string, routeRuleIndex, refIndex int) string

PerRouteRuleRefBackendName generates a unique backend name for a per-route rule, i.e., the unique identifier for a backend that is associated with a specific route rule in a specific AIGatewayRoute.

Types

type Header [2]string

Header represents a single HTTP header as a key-value pair.

func (Header) Key added in v0.4.0

func (h Header) Key() string

Key returns the header key.

func (Header) Value added in v0.4.0

func (h Header) Value() string

Value returns the header value.

type ModelNameHeaderKey added in v0.4.0

type ModelNameHeaderKey = string

ModelNameHeaderKey is the configurable header key whose value is set by the gateway based on the model extracted from the request body.

This header is automatically populated by the gateway and cannot be set by end users as it will be overwritten. The flow is:

  1. Router filter extracts OriginalModel from request body and sets this header
  2. HTTPRoute uses this header value for model-based routing
  3. If backend has ModelNameOverride, the header is updated with the override value
  4. Metrics and observability systems use the final header value

Defaults to ModelNameHeaderKeyDefault.

type ModelNameOverride added in v0.4.0

type ModelNameOverride = string

ModelNameOverride represents a backend-specific model name that overrides the OriginalModel in the client request to the router.

Configuration:

  • Set via aigv1a1.AIGatewayRouteRuleBackendRef
  • Replaces the OriginalModel with a backend-specific model name

Example:

  • server requests: "llama3-2-1b"
  • Override to: "us.meta.llama3-2-1b-instruct-v1:0" (for AWS Bedrock)

Effects:

  • Updates the header specified by ModelNameHeaderKey
  • Used by routing, rate limiting, and observability systems

type OriginalModel added in v0.4.0

type OriginalModel = string

OriginalModel is the model name extracted from the incoming request body before any virtualization applies.

Flow:

  1. Router filter extracts model from request body
  2. If ModelNameOverride is configured, RequestModel differs from OriginalModel
  3. Provider responds with ResponseModel (may differ from RequestModel)

Example:

  1. OriginalModel: OpenAI Client sends: {"model": "gpt-5"}
  2. RequestModel: ModelNameOverride replaces with "gpt-5-nano"
  3. ResponseModel: OpenAI Platform sends: {"model": "gpt-5-nano-2025-08-07"}

### OpenTelemetry

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5" model results in a plain text string attribute: "gen_ai.original.model" -> "gpt-5"

type RequestModel added in v0.4.0

type RequestModel = string

RequestModel is the name of the model sent in the request to perform a completion or to create embeddings.

This is either the model received by the router's OpenAI Chat Completions or Embeddings endpoints, or a ModelNameOverride.

This is not necessarily the same as ResponseModel, and in some cases like Azure OpenAI Service, this field isn't read at all.

### OpenTelemetry

The RequestModel is a key attribute for correlating metrics with spans.

In OpenInference (span semantics), this is the "model" field of invocation parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in an JSON string attribute: "llm.invocation_parameters" -> {"model": "gpt-5-nano"}

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text string attribute: "gen_ai.request.model" -> "gpt-5-nano"

type ResponseModel added in v0.4.0

type ResponseModel = string

ResponseModel is the name of the model that generated a response to a completion or embeddings request.

### Relationship to RequestModel

This may differ from the RequestModel unless the provider is deterministic:

  • Static Model Execution (AWS Bedrock)
  • Deterministic Snapshot Mapping (GCP providers)

In virtualized providers, this may be different:

  • URI-Based Resolution (Azure OpenAI)
  • Automatic Routing & Resolution: (OpenAI Platform)

See https://aigateway.envoyproxy.io/docs/capabilities/traffic/model-name-virtualization

### OpenTelemetry

The ResponseModel is even more important that RequestModel for evaluation use cases as it is the only field that authoritatively explains the model used for a completion. It is a key attribute for correlating metrics with spans.

In OpenInference (span semantics), this is the "model_name" attribute. parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "llm.model_name" -> "gpt-5-nano-2025-08-07"

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "gen_ai.response.model" -> "gpt-5-nano-2025-08-07"

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL