Documentation
¶
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type LLMTokenUsage ¶
type LLMTokenUsage struct {
// InputTokens is the number of tokens consumed from the input.
InputTokens uint32
// OutputTokens is the number of tokens consumed from the output.
OutputTokens uint32
// TotalTokens is the total number of tokens consumed.
TotalTokens uint32
}
LLMTokenUsage represents the token usage reported usually by the backend API in the response body.
type RequestBody ¶
type RequestBody any
RequestBody is the union of all request body types. TODO: maybe we should just define Translator interface per endpoint.
type Translator ¶
type Translator interface {
// RequestBody translates the request body.
// - `body` is the request body already parsed by [router.RequestBodyParser]. The concrete type is specific to the schema and the path.
// - This returns `headerMutation` and `bodyMutation` that can be nil to indicate no mutation.
RequestBody(body *openai.ChatCompletionRequest) (
headerMutation *extprocv3.HeaderMutation,
bodyMutation *extprocv3.BodyMutation,
err error,
)
// ResponseHeaders translates the response headers.
// - `headers` is the response headers.
// - This returns `headerMutation` that can be nil to indicate no mutation.
ResponseHeaders(headers map[string]string) (
headerMutation *extprocv3.HeaderMutation,
err error,
)
// ResponseBody translates the response body. When stream=true, this is called for each chunk of the response body.
// - `body` is the response body either chunk or the entire body, depending on the context.
// - This returns `headerMutation` and `bodyMutation` that can be nil to indicate no mutation.
// - This returns `tokenUsage` that is extracted from the body and will be used to do token rate limiting.
ResponseBody(respHeaders map[string]string, body io.Reader, endOfStream bool) (
headerMutation *extprocv3.HeaderMutation,
bodyMutation *extprocv3.BodyMutation,
tokenUsage LLMTokenUsage,
err error,
)
}
Translator translates the request and response messages between the client and the backend API schemas for a specific path. The implementation can embed [defaultTranslator] to avoid implementing all methods.
The instance of Translator is created by a [Factory].
This is created per request and is not thread-safe.
func NewChatCompletionOpenAIToAWSBedrockTranslator ¶
func NewChatCompletionOpenAIToAWSBedrockTranslator() Translator
NewChatCompletionOpenAIToAWSBedrockTranslator implements [Factory] for OpenAI to AWS Bedrock translation.
func NewChatCompletionOpenAIToOpenAITranslator ¶
func NewChatCompletionOpenAIToOpenAITranslator() Translator
NewChatCompletionOpenAIToOpenAITranslator implements [Factory] for OpenAI to OpenAI translation.