internalapi

package

v0.4.0 Latest Latest Go to latest Published: Nov 7, 2025 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/envoyproxy/ai-gateway

Links

Open Source Insights

Documentation ¶

Overview ¶

Package internalapi provides constants and functions used across the boundary among controller, extension server and extproc.

Index ¶

Constants
Variables
func ParseRequestHeaderAttributeMapping(s string) (map[string]string, error)
func PerRouteRuleRefBackendName(namespace, name, routeName string, routeRuleIndex, refIndex int) string
type Header
- func (h Header) Key() string
- func (h Header) Value() string
type ModelNameHeaderKey
type ModelNameOverride
type OriginalModel
type RequestModel
type ResponseModel

Constants ¶

View Source

const (
	// EnvoyAIGatewayHeaderPrefix is the prefix for special headers used by AI Gateway, either for internal or external use.
	EnvoyAIGatewayHeaderPrefix = "x-ai-eg-"
	// InternalEndpointMetadataNamespace is the namespace used for the dynamic metadata for internal use.
	InternalEndpointMetadataNamespace = "aigateway.envoy.io"
	// InternalMetadataBackendNameKey is the key used to store the backend name
	InternalMetadataBackendNameKey = "per_route_rule_backend_name"
	// MCPBackendHeader is the special header key used to specify the target backend name.
	MCPBackendHeader = EnvoyAIGatewayHeaderPrefix + "mcp-backend"
	// MCPRouteHeader is the special header key used to identify the mcp route.
	MCPRouteHeader = EnvoyAIGatewayHeaderPrefix + "mcp-route"
	// MCPBackendListenerPort is the port for the MCP backend listener.
	MCPBackendListenerPort = 10088
	// MCPProxyPort is the port where the MCP proxy listens.
	MCPProxyPort = 9856
	// MCPGeneratedResourceCommonPrefix is the common prefix for all MCP-related generated resources.
	MCPGeneratedResourceCommonPrefix = "ai-eg-mcp-"
	// MCPMainHTTPRoutePrefix is the prefix for the main HTTPRoute resources generated for MCP.
	MCPMainHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "main-"
	// MCPPerBackendRefHTTPRoutePrefix is the prefix for the per-backend-ref HTTPRoute resources generated for MCP.
	MCPPerBackendRefHTTPRoutePrefix = MCPGeneratedResourceCommonPrefix + "br-"
	// MCPPerBackendHTTPRouteFilterPrefix is the prefix for the HTTP route filter names for per-backend resources.
	MCPPerBackendHTTPRouteFilterPrefix = MCPGeneratedResourceCommonPrefix + "brf-"

	// MCPMetadataHeaderPrefix is the prefix for special headers used to pass metadata in the filter metadata.
	// These headers are added internally to the requests to the upstream servers so they can be populated in the filter
	// metadata. These headers are considered just internal, and they'll be removed once they are stored in the filter
	// metadata to avoid sending unnecessary information to the upstream servers.
	MCPMetadataHeaderPrefix = "x-ai-eg-mcp-metadata-"
	// MCPMetadataHeaderRequestID is the special header key used to pass the MCP request ID in the filter metadata.
	MCPMetadataHeaderRequestID = MCPMetadataHeaderPrefix + "request-id"
	// MCPMetadataHeaderMethod is the special header key used to pass the MCP method in the filter metadata.
	MCPMetadataHeaderMethod = MCPMetadataHeaderPrefix + "method"
)

View Source

const (
	// XDSClusterMetadataKey is the key used to access cluster metadata in xDS attributes
	XDSClusterMetadataKey = "xds.cluster_metadata"
	// XDSUpstreamHostMetadataKey is the key used to access upstream host metadata in xDS attributes
	XDSUpstreamHostMetadataKey = "xds.upstream_host_metadata"
)

View Source

const AIGatewayFilterMetadataNamespace = aigv1a1.AIGatewayFilterMetadataNamespace

AIGatewayFilterMetadataNamespace is the namespace used for the filter metadata related to AI Gateway.

For example, token usage, input/output tokens, and request costs are stored in this namespace. Aliased from aigv1a1.AIGatewayFilterMetadataNamespace to avoid making ExtProc directly depend on the control plane API which is not a concern of ExtProc.

View Source

const (
	// AIGatewayGeneratedHTTPRouteAnnotation is the annotation key used to mark
	// HTTPRoute resources that are generated by the AI Gateway controller.
	AIGatewayGeneratedHTTPRouteAnnotation = "ai-gateway-generated"
)

View Source

const (
	// EndpointPickerHeaderKey is the header key used to specify the target backend endpoint.
	// This is the default header name in the reference implementation:
	// https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/2b5b337b45c3289e5f9367b2c19deef021722fcd/pkg/epp/server/runserver.go#L63
	EndpointPickerHeaderKey = "x-gateway-destination-endpoint"
)

View Source

const ModelNameHeaderKeyDefault = aigv1a1.AIModelHeaderKey

ModelNameHeaderKeyDefault is the default header key for the model name.

Variables ¶

View Source

var MCPInternalHeadersToMetadata = map[string]string{
	MCPBackendHeader:           "mcp_backend",
	MCPMetadataHeaderMethod:    "mcp_method",
	MCPMetadataHeaderRequestID: "mcp_request_id",
}

MCPInternalHeadersToMetadata maps special MCP headers to metadata keys.

Functions ¶

func ParseRequestHeaderAttributeMapping ¶ added in v0.4.0

func ParseRequestHeaderAttributeMapping(s string) (map[string]string, error)

ParseRequestHeaderAttributeMapping parses comma-separated key-value pairs for header-to-attribute mapping. The input format is "header1:attribute1,header2:attribute2" where header names are HTTP request headers and attribute names are Otel span or metric attributes. Example: "x-session-id:session.id,x-user-id:user.id".

Note: This serves a different purpose than OTEL's OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST, which captures headers as span attributes for tracing.

Note: We do not need to convert to Prometheus format (e.g., x-session-id → session.id) here, as that's done implicitly in the Prometheus exporter.

func PerRouteRuleRefBackendName ¶

func PerRouteRuleRefBackendName(namespace, name, routeName string, routeRuleIndex, refIndex int) string

PerRouteRuleRefBackendName generates a unique backend name for a per-route rule, i.e., the unique identifier for a backend that is associated with a specific route rule in a specific AIGatewayRoute.

Types ¶

type Header ¶ added in v0.4.0

type Header [2]string

Header represents a single HTTP header as a key-value pair.

func (h Header) Key() string

Key returns the header key.

func (h Header) Value() string

Value returns the header value.

type ModelNameHeaderKey ¶ added in v0.4.0

type ModelNameHeaderKey = string

ModelNameHeaderKey is the configurable header key whose value is set by the gateway based on the model extracted from the request body.

This header is automatically populated by the gateway and cannot be set by end users as it will be overwritten. The flow is:

Router filter extracts OriginalModel from request body and sets this header
HTTPRoute uses this header value for model-based routing
If backend has ModelNameOverride, the header is updated with the override value
Metrics and observability systems use the final header value

Defaults to ModelNameHeaderKeyDefault.

type ModelNameOverride ¶ added in v0.4.0

type ModelNameOverride = string

ModelNameOverride represents a backend-specific model name that overrides the OriginalModel in the client request to the router.

Configuration:

Set via aigv1a1.AIGatewayRouteRuleBackendRef
Replaces the OriginalModel with a backend-specific model name

Example:

server requests: "llama3-2-1b"
Override to: "us.meta.llama3-2-1b-instruct-v1:0" (for AWS Bedrock)

Effects:

Updates the header specified by ModelNameHeaderKey
Used by routing, rate limiting, and observability systems

type OriginalModel ¶ added in v0.4.0

type OriginalModel = string

OriginalModel is the model name extracted from the incoming request body before any virtualization applies.

Flow:

Router filter extracts model from request body
If ModelNameOverride is configured, RequestModel differs from OriginalModel
Provider responds with ResponseModel (may differ from RequestModel)

Example:

OriginalModel: OpenAI Client sends: {"model": "gpt-5"}
RequestModel: ModelNameOverride replaces with "gpt-5-nano"
ResponseModel: OpenAI Platform sends: {"model": "gpt-5-nano-2025-08-07"}

### OpenTelemetry

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5" model results in a plain text string attribute: "gen_ai.original.model" -> "gpt-5"

type RequestModel ¶ added in v0.4.0

type RequestModel = string

RequestModel is the name of the model sent in the request to perform a completion or to create embeddings.

This is either the model received by the router's OpenAI Chat Completions or Embeddings endpoints, or a ModelNameOverride.

This is not necessarily the same as ResponseModel, and in some cases like Azure OpenAI Service, this field isn't read at all.

### OpenTelemetry

The RequestModel is a key attribute for correlating metrics with spans.

In OpenInference (span semantics), this is the "model" field of invocation parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in an JSON string attribute: "llm.invocation_parameters" -> {"model": "gpt-5-nano"}

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text string attribute: "gen_ai.request.model" -> "gpt-5-nano"

type ResponseModel ¶ added in v0.4.0

type ResponseModel = string

ResponseModel is the name of the model that generated a response to a completion or embeddings request.

### Relationship to RequestModel

This may differ from the RequestModel unless the provider is deterministic:

Static Model Execution (AWS Bedrock)
Deterministic Snapshot Mapping (GCP providers)

In virtualized providers, this may be different:

URI-Based Resolution (Azure OpenAI)
Automatic Routing & Resolution: (OpenAI Platform)

See https://aigateway.envoyproxy.io/docs/capabilities/traffic/model-name-virtualization

### OpenTelemetry

The ResponseModel is even more important that RequestModel for evaluation use cases as it is the only field that authoritatively explains the model used for a completion. It is a key attribute for correlating metrics with spans.

In OpenInference (span semantics), this is the "model_name" attribute. parameters, explaining how the LLM was invoked. For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "llm.model_name" -> "gpt-5-nano-2025-08-07"

In OpenTelemetry Generative AI Metrics, this is an attribute on metrics such as "gen_ai.server.token.usage". For example, an OpenAI Chat Completion request to the "gpt-5-nano" model results in a plain text attribute of the latest model: "gen_ai.response.model" -> "gpt-5-nano-2025-08-07"

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

internalapi

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func ParseRequestHeaderAttributeMapping ¶ added in v0.4.0

func PerRouteRuleRefBackendName ¶

Types ¶

type Header ¶ added in v0.4.0

func (Header) Key ¶ added in v0.4.0

func (Header) Value ¶ added in v0.4.0

type ModelNameHeaderKey ¶ added in v0.4.0

type ModelNameOverride ¶ added in v0.4.0

type OriginalModel ¶ added in v0.4.0

type RequestModel ¶ added in v0.4.0

type ResponseModel ¶ added in v0.4.0

Source Files ¶