types

package
v0.6.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 3, 2026 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Copyright 2024 The Aibrix Team.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2024 The Aibrix Team.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Index

Constants

View Source
const DefaultQueueCapacity = 1024

Variables

View Source
var (
	ErrQueueEmpty = errors.New("queue is empty")
)

Functions

This section is empty.

Types

type FallbackRouter added in v0.4.0

type FallbackRouter interface {
	Router

	// SetFallback sets the fallback router
	SetFallback(RoutingAlgorithm, RouterProviderFunc)
}

FallbackRouter enables router chaining by set a fallback router.

type OutputPredictor added in v0.4.0

type OutputPredictor interface {
	// AddTrace collects history input and output tokens data.
	AddTrace(inputTokens, outputTokens int, cnt int32)

	// Predict outputs the number of output tokens based on the number of input tokens.
	Predict(promptLen int) (outputLen int)
}

type OutputPredictorProvider added in v0.4.0

type OutputPredictorProvider interface {
	// GetOutputPredictor returns the output predictor
	GetOutputPredictor(modelName string) (OutputPredictor, error)
}

OutputPredictorProvider provides a stateful way to get an output predictor, allowing a struct to provide the output predictor by model.

type PodList

type PodList interface {
	// Len returns the number of pods in the list.
	Len() int

	// All returns a slice of all pods in the list.
	All() []*v1.Pod

	// Indexes returns a slice of indexes for querying pods by index.
	Indexes() []string

	// ListByIndex returns a slice of pods that match the given index.
	ListByIndex(index string) []*v1.Pod

	// ListPortsForPod returns a map of portList that bind with pod, key podname
	ListPortsForPod() map[string][]int
}

PodList is an interface for a list of pods and support for indexing and querying pods by index.

type QueueRouter added in v0.4.0

type QueueRouter interface {
	Router

	Len() int
}

QueueRouter defines the interface for routers that contains built-in queue and offers queue status query.

type RequestFeatures added in v0.4.0

type RequestFeatures []float64

type ResolvedConfigProfile added in v0.6.0

type ResolvedConfigProfile struct {
	RoutingStrategy          string
	PromptLenBucketMinLength int
	PromptLenBucketMaxLength int
	Combined                 bool
}

ResolvedConfigProfile holds the resolved model config profile for a request. Populated from model.aibrix.ai/config annotation based on config-profile header or defaultProfile. Nil when no config is present;

type Router

type Router interface {
	// Route selects a target pod from the provided list of pods.
	// The input pods is guaranteed to be non-empty and contain only routable pods.
	Route(ctx *RoutingContext, readyPodList PodList) (string, error)
}

Router defines the interface for routing logic to select target pods.

type RouterConstructor

type RouterConstructor func() (Router, error)

RouterConstructor defines a constructor for a router.

type RouterProvider

type RouterProvider interface {
	// GetRouter returns the router
	GetRouter(ctx *RoutingContext) (Router, error)
}

RouterProvider provides a stateful way to get a router, allowing a struct to provide the router by strategy and model.

type RouterProviderFunc

type RouterProviderFunc func(*RoutingContext) (Router, error)

RouterProviderFunc provides a stateless way to get a router

type RouterProviderRegistrationFunc

type RouterProviderRegistrationFunc func() RouterProviderFunc

RouterProviderRegistrationFunc provides a way to register RouterProviderFunc

type RouterQueue added in v0.4.0

type RouterQueue[V comparable] interface {
	Enqueue(V, time.Time) error
	Peek(time.Time, PodList) (V, error)
	Dequeue(time.Time) (V, error)
	Len() int
}

type RoutingAlgorithm

type RoutingAlgorithm string

RoutingAlgorithm defines the routing algorithms

func (RoutingAlgorithm) NewContext added in v0.4.0

func (alg RoutingAlgorithm) NewContext(ctx context.Context, model, message, requestID, user string) *RoutingContext

NewContext gets a RoutingContext with current RoutingAlgorithm.

type RoutingContext

type RoutingContext struct {
	context.Context
	Algorithm      RoutingAlgorithm
	Model          string
	Stream         bool
	Message        string
	RequestID      string
	User           *string
	RequestTime    time.Time // Time when the routing context is created.
	RequestEndTime time.Time // Time when the routing is done and sent to inference engine.
	PendingLoad    float64   // Normalized pending load of request, available after AddRequestCount call. See cache.PendingLoadProvider
	TraceTerm      int64     // Trace term identifier, available after AddRequestCount call.
	RoutedTime     time.Time // Time consumed during routing.

	ReqHeaders       map[string]string
	ReqBody          []byte
	ReqPath          string
	ReqConfigProfile string

	PrefillStartTime time.Time // Time when prefill request is started.
	PrefillEndTime   time.Time // Time consumed during prefill.

	// RespHeaders holds response headers that the router intends to set.
	// These are typically used to propagate control information back to the client,
	// such as session affinity id.
	// The router implementation (e.g., sessionAffinityRouter) may populate this field
	// during the Route() call.
	RespHeaders map[string]string

	// ConfigProfile holds the resolved model config profile for this request.
	// Set in HandleRequestBody from model.aibrix.ai/config (annotation)
	// based on config-profile header. Nil when no config is present.
	ConfigProfile *ResolvedConfigProfile
	// contains filtered or unexported fields
}

RoutingContext encapsulates the context information required for routing. It can be extended with more fields as needed in the future.

func NewRoutingContext

func NewRoutingContext(ctx context.Context, algorithms RoutingAlgorithm, model, message, requestID, user string) *RoutingContext

NewRoutingContext gets a RoutingContext from a context pool.

func (*RoutingContext) CanAddStats added in v0.4.0

func (r *RoutingContext) CanAddStats() bool

CanAddStats returns true if the first time trying update in-memory realtime statistics.

func (*RoutingContext) CanAddTrace added in v0.4.0

func (r *RoutingContext) CanAddTrace() bool

CanAddTrace returns true if the first time trying add trace to cache.

func (*RoutingContext) CanDoneStats added in v0.4.0

func (r *RoutingContext) CanDoneStats() bool

func (*RoutingContext) Delete

func (r *RoutingContext) Delete()

Delete resolves all waiting TargetPod() calls and releases the RoutingContext to the pool.

func (*RoutingContext) Elapsed added in v0.4.0

func (r *RoutingContext) Elapsed(currentTime time.Time) time.Duration

Elapsed returns the elapsed time since the request was created.

func (*RoutingContext) Features added in v0.4.0

func (r *RoutingContext) Features() (RequestFeatures, error)

Features returns the features corresponding to the request. The feature of a request is defined by the output length and prompt length.

func (*RoutingContext) GetError added in v0.4.0

func (r *RoutingContext) GetError() error

GetError returns the error of the routing context.

func (*RoutingContext) GetRoutingDelay added in v0.4.0

func (r *RoutingContext) GetRoutingDelay() time.Duration

GetRoutingDelay returns the time duration used for routing the request.

func (*RoutingContext) HasError added in v0.4.0

func (r *RoutingContext) HasError() bool

HasError returns true if the request has an error.

func (*RoutingContext) HasRouted

func (r *RoutingContext) HasRouted() bool

HasRouted returns true if the request has been routed or an error has been set.

func (*RoutingContext) PromptLength added in v0.4.0

func (r *RoutingContext) PromptLength() (int, error)

PromptLength returns the length of the prompt of the request.

func (*RoutingContext) PromptTokens added in v0.4.0

func (r *RoutingContext) PromptTokens() ([]int, error)

PromptTokens returns the tokenized prompt of the request.

func (*RoutingContext) SetError added in v0.4.0

func (r *RoutingContext) SetError(err error)

SetError sets the error of the routing context asynchronously. Do not call this function from synchronize routers. Asynchronize routers call this to set an error.

func (*RoutingContext) SetOutputPreditor added in v0.4.0

func (r *RoutingContext) SetOutputPreditor(predictor OutputPredictor) (old OutputPredictor)

SetOutputPreditor enables RoutingContext to use existing OutputPredictor to predict output length.

func (*RoutingContext) SetTargetPod

func (r *RoutingContext) SetTargetPod(pod *v1.Pod)

SetTargetPod sets the target pod of the routing context. All routers call this to set the target pod.

func (*RoutingContext) SetTargetPort added in v0.6.0

func (r *RoutingContext) SetTargetPort(port int)

func (*RoutingContext) TargetAddress

func (r *RoutingContext) TargetAddress() string

TargetAddress returns the routing target address of the request.

func (*RoutingContext) TargetPod

func (r *RoutingContext) TargetPod() *v1.Pod

TargetPod returns the routing target pod of the request. TargetPod blocks until the target pod is set or an error is set.

func (*RoutingContext) TargetPort added in v0.6.0

func (r *RoutingContext) TargetPort() int

func (*RoutingContext) TokenLength added in v0.4.0

func (r *RoutingContext) TokenLength() (int, error)

TokenLength returns the predicted output token length.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL