Documentation
¶
Overview ¶
Package internalapi provides constants and functions used across the boundary among controller, extension server and extproc.
Index ¶
- Constants
- Variables
- func ParseRequestHeaderAttributeMapping(s string) (map[string]string, error)
- func PerRouteRuleRefBackendName(namespace, name, routeName string, routeRuleIndex, refIndex int) string
- type Header
- type ModelNameHeaderKey
- type ModelNameOverride
- type OriginalModel
- type RequestModel
- type ResponseModel
Constants ¶
const ( // EnvoyAIGatewayHeaderPrefix is the prefix for special headers used by AI Gateway, either for internal or external use. EnvoyAIGatewayHeaderPrefix = "x-ai-eg-" // InternalEndpointMetadataNamespace is the namespace used for the dynamic metadata for internal use. InternalEndpointMetadataNamespace = "aigateway.envoy.io" // InternalMetadataBackendNameKey is the key used to store the backend name InternalMetadataBackendNameKey = "per_route_rule_backend_name" // MCPBackendHeader is the special header key used to specify the target backend name. MCPBackendHeader = EnvoyAIGatewayHeaderPrefix + "mcp-backend" // MCPRouteHeader is the special header key used to identify the mcp route. MCPRouteHeader = EnvoyAIGatewayHeaderPrefix + "mcp-route" // MCPBackendListenerPort is the port for the MCP backend listener. MCPBackendListenerPort = 10088 // MCPProxyPort is the port where the MCP proxy listens. MCPProxyPort = 9856 // MCPGeneratedResourceCommonPrefix is the common prefix for all MCP-related generated resources. MCPGeneratedResourceCommonPrefix = "ai-eg-mcp-" // MCPMainHTTPRoutePrefix is the prefix for the main HTTPRoute resources generated for MCP. MCPMainHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "main-" // MCPPerBackendRefHTTPRoutePrefix is the prefix for the per-backend-ref HTTPRoute resources generated for MCP. MCPPerBackendRefHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "br-" // MCPPerBackendHTTPRouteFilterPrefix is the prefix for the HTTP route filter names for per-backend resources. MCPPerBackendHTTPRouteFilterPrefix = MCPGeneratedResourceCommonPrefix + "brf-" // MCPMetadataHeaderPrefix is the prefix for special headers used to pass metadata in the filter metadata. // These headers are added internally to the requests to the upstream servers so they can be populated in the filter // metadata. These headers are considered just internal, and they'll be removed once they are stored in the filter // metadata to avoid sending unnecessary information to the upstream servers. MCPMetadataHeaderPrefix = "x-ai-eg-mcp-metadata-" // MCPMetadataHeaderRequestID is the special header key used to pass the MCP request ID in the filter metadata. MCPMetadataHeaderRequestID = MCPMetadataHeaderPrefix + "request-id" // MCPMetadataHeaderMethod is the special header key used to pass the MCP method in the filter metadata. MCPMetadataHeaderMethod = MCPMetadataHeaderPrefix + "method" )
const ( // XDSClusterMetadataKey is the key used to access cluster metadata in xDS attributes XDSClusterMetadataKey = "xds.cluster_metadata" // XDSUpstreamHostMetadataKey is the key used to access upstream host metadata in xDS attributes XDSUpstreamHostMetadataKey = "xds.upstream_host_metadata" )
const AIGatewayFilterMetadataNamespace = aigv1a1.AIGatewayFilterMetadataNamespace
AIGatewayFilterMetadataNamespace is the namespace used for the filter metadata related to AI Gateway.
For example, token usage, input/output tokens, and request costs are stored in this namespace. Aliased from aigv1a1.AIGatewayFilterMetadataNamespace to avoid making ExtProc directly depend on the control plane API which is not a concern of ExtProc.
const ( // AIGatewayGeneratedHTTPRouteAnnotation is the annotation key used to mark // HTTPRoute resources that are generated by the AI Gateway controller. AIGatewayGeneratedHTTPRouteAnnotation = "ai-gateway-generated" )
const ( // EndpointPickerHeaderKey is the header key used to specify the target backend endpoint. // This is the default header name in the reference implementation: // https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/2b5b337b45c3289e5f9367b2c19deef021722fcd/pkg/epp/server/runserver.go#L63 EndpointPickerHeaderKey = "x-gateway-destination-endpoint" )
const ModelNameHeaderKeyDefault = aigv1a1.AIModelHeaderKey
ModelNameHeaderKeyDefault is the default header key for the model name.
Variables ¶
var MCPInternalHeadersToMetadata = map[string]string{ MCPBackendHeader: "mcp_backend", MCPMetadataHeaderMethod: "mcp_method", MCPMetadataHeaderRequestID: "mcp_request_id", }
MCPInternalHeadersToMetadata maps special MCP headers to metadata keys.
Functions ¶
func ParseRequestHeaderAttributeMapping ¶ added in v0.4.0
ParseRequestHeaderAttributeMapping parses comma-separated key-value pairs for header-to-attribute mapping. The input format is "header1:attribute1,header2:attribute2" where header names are HTTP request headers and attribute names are Otel span or metric attributes. Example: "x-session-id:session.id,x-user-id:user.id".
Note: This serves a different purpose than OTEL's OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST, which captures headers as span attributes for tracing.
Note: We do not need to convert to Prometheus format (e.g., x-session-id → session.id) here, as that's done implicitly in the Prometheus exporter.
func PerRouteRuleRefBackendName ¶
func PerRouteRuleRefBackendName(namespace, name, routeName string, routeRuleIndex, refIndex int) string
PerRouteRuleRefBackendName generates a unique backend name for a per-route rule, i.e., the unique identifier for a backend that is associated with a specific route rule in a specific AIGatewayRoute.
Types ¶
type Header ¶ added in v0.4.0
type Header [2]string
Header represents a single HTTP header as a key-value pair.
type ModelNameHeaderKey ¶ added in v0.4.0
type ModelNameHeaderKey = string
ModelNameHeaderKey is the configurable header key whose value is set by the gateway based on the model extracted from the request body.
This header is automatically populated by the gateway and cannot be set by end users as it will be overwritten. The flow is:
- Router filter extracts OriginalModel from request body and sets this header
- HTTPRoute uses this header value for model-based routing
- If backend has ModelNameOverride, the header is updated with the override value
- Metrics and observability systems use the final header value
Defaults to ModelNameHeaderKeyDefault.
type ModelNameOverride ¶ added in v0.4.0
type ModelNameOverride = string
ModelNameOverride represents a backend-specific model name that overrides the OriginalModel in the client request to the router.
Configuration:
- Set via aigv1a1.AIGatewayRouteRuleBackendRef
- Replaces the OriginalModel with a backend-specific model name
Example:
- server requests: "llama3-2-1b"
- Override to: "us.meta.llama3-2-1b-instruct-v1:0" (for AWS Bedrock)
Effects:
- Updates the header specified by ModelNameHeaderKey
- Used by routing, rate limiting, and observability systems
type OriginalModel ¶ added in v0.4.0
type OriginalModel = string
OriginalModel is the model name extracted from the incoming request body before any virtualization applies.
Flow:
- Router filter extracts model from request body
- If ModelNameOverride is configured, RequestModel differs from OriginalModel
- Provider responds with ResponseModel (may differ from RequestModel)
Example:
- OriginalModel: OpenAI Client sends: {"model": "gpt-5"}
- RequestModel: ModelNameOverride replaces with "gpt-5-nano"
- ResponseModel: OpenAI Platform sends: {"model": "gpt-5-nano-2025-08-07"}
### OpenTelemetry
In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5" model results in a plain text string attribute: "gen_ai.original.model" -> "gpt-5"
type RequestModel ¶ added in v0.4.0
type RequestModel = string
RequestModel is the name of the model sent in the request to perform a completion or to create embeddings.
This is either the model received by the router's OpenAI Chat Completions or Embeddings endpoints, or a ModelNameOverride.
This is not necessarily the same as ResponseModel, and in some cases like Azure OpenAI Service, this field isn't read at all.
### OpenTelemetry
The RequestModel is a key attribute for correlating metrics with spans.
In OpenInference (span semantics), this is the "model" field of invocation parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in an JSON string attribute: "llm.invocation_parameters" -> {"model": "gpt-5-nano"}
In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text string attribute: "gen_ai.request.model" -> "gpt-5-nano"
type ResponseModel ¶ added in v0.4.0
type ResponseModel = string
ResponseModel is the name of the model that generated a response to a completion or embeddings request.
### Relationship to RequestModel
This may differ from the RequestModel unless the provider is deterministic:
- Static Model Execution (AWS Bedrock)
- Deterministic Snapshot Mapping (GCP providers)
In virtualized providers, this may be different:
- URI-Based Resolution (Azure OpenAI)
- Automatic Routing & Resolution: (OpenAI Platform)
See https://aigateway.envoyproxy.io/docs/capabilities/traffic/model-name-virtualization
### OpenTelemetry
The ResponseModel is even more important that RequestModel for evaluation use cases as it is the only field that authoritatively explains the model used for a completion. It is a key attribute for correlating metrics with spans.
In OpenInference (span semantics), this is the "model_name" attribute. parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "llm.model_name" -> "gpt-5-nano-2025-08-07"
In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "gen_ai.response.model" -> "gpt-5-nano-2025-08-07"