types

package
v0.4.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2025 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Copyright 2024 The Aibrix Team.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2024 The Aibrix Team.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Index

Constants

View Source
const DefaultQueueCapacity = 1024

Variables

View Source
var (
	ErrQueueEmpty = errors.New("queue is empty")
)

Functions

This section is empty.

Types

type FallbackRouter added in v0.4.0

type FallbackRouter interface {
	Router

	// SetFallback sets the fallback router
	SetFallback(RoutingAlgorithm, RouterProviderFunc)
}

FallbackRouter enables router chaining by set a fallback router.

type OutputPredictor added in v0.4.0

type OutputPredictor interface {
	// AddTrace collects history input and output tokens data.
	AddTrace(inputTokens, outputTokens int, cnt int32)

	// Predict outputs the number of output tokens based on the number of input tokens.
	Predict(promptLen int) (outputLen int)
}

type OutputPredictorProvider added in v0.4.0

type OutputPredictorProvider interface {
	// GetOutputPredictor returns the output predictor
	GetOutputPredictor(modelName string) (OutputPredictor, error)
}

OutputPredictorProvider provides a stateful way to get an output predictor, allowing a struct to provide the output predictor by model.

type PodList

type PodList interface {
	// Len returns the number of pods in the list.
	Len() int

	// All returns a slice of all pods in the list.
	All() []*v1.Pod

	// Indexes returns a slice of indexes for querying pods by index.
	Indexes() []string

	// ListByIndex returns a slice of pods that match the given index.
	ListByIndex(index string) []*v1.Pod
}

PodList is an interface for a list of pods and support for indexing and querying pods by index.

type QueueRouter added in v0.4.0

type QueueRouter interface {
	Router

	Len() int
}

QueueRouter defines the interface for routers that contains built-in queue and offers queue status query.

type RequestFeatures added in v0.4.0

type RequestFeatures []float64

type Router

type Router interface {
	// Route selects a target pod from the provided list of pods.
	// The input pods is guaranteed to be non-empty and contain only routable pods.
	Route(ctx *RoutingContext, readyPodList PodList) (string, error)
}

Router defines the interface for routing logic to select target pods.

type RouterConstructor

type RouterConstructor func() (Router, error)

RouterConstructor defines a constructor for a router.

type RouterProvider

type RouterProvider interface {
	// GetRouter returns the router
	GetRouter(ctx *RoutingContext) (Router, error)
}

RouterProvider provides a stateful way to get a router, allowing a struct to provide the router by strategy and model.

type RouterProviderFunc

type RouterProviderFunc func(*RoutingContext) (Router, error)

RouterProviderFunc provides a stateless way to get a router

type RouterProviderRegistrationFunc

type RouterProviderRegistrationFunc func() RouterProviderFunc

RouterProviderRegistrationFunc provides a way to register RouterProviderFunc

type RouterQueue added in v0.4.0

type RouterQueue[V comparable] interface {
	Enqueue(V, time.Time) error
	Peek(time.Time, PodList) (V, error)
	Dequeue(time.Time) (V, error)
	Len() int
}

type RoutingAlgorithm

type RoutingAlgorithm string

RoutingAlgorithms defines the routing algorithms

func (RoutingAlgorithm) NewContext added in v0.4.0

func (alg RoutingAlgorithm) NewContext(ctx context.Context, model, message, requestID, user string) *RoutingContext

NewContext gets a RoutingContext with current RoutingAlgorithm.

type RoutingContext

type RoutingContext struct {
	context.Context
	Algorithm   RoutingAlgorithm
	Model       string
	Message     string
	RequestID   string
	User        *string
	RequestTime time.Time // Time when the routing context is created.
	PendingLoad float64   // Normalized pending load of request, available after AddRequestCount call. See cache.PendingLoadProvider
	TraceTerm   int64     // Trace term identifier, available after AddRequestCount call.
	RoutedTime  time.Time // Time consumed during routing.

	ReqHeaders map[string]string
	ReqBody    []byte
	ReqPath    string
	// contains filtered or unexported fields
}

RoutingContext encapsulates the context information required for routing. It can be extended with more fields as needed in the future.

func NewRoutingContext

func NewRoutingContext(ctx context.Context, algorithms RoutingAlgorithm, model, message, requestID, user string) *RoutingContext

NewRoutingContext gets a RoutingContext from a context pool.

func (*RoutingContext) CanAddStats added in v0.4.0

func (r *RoutingContext) CanAddStats() bool

CanAddStats returns true if the first time trying update in-memory realtime statistics.

func (*RoutingContext) CanAddTrace added in v0.4.0

func (r *RoutingContext) CanAddTrace() bool

CanAddTrace returns true if the first time trying add trace to cache.

func (*RoutingContext) CanDoneStats added in v0.4.0

func (r *RoutingContext) CanDoneStats() bool

func (*RoutingContext) Delete

func (r *RoutingContext) Delete()

Delete resolves all waiting TargetPod() calls and releases the RoutingContext to the pool.

func (*RoutingContext) Elapsed added in v0.4.0

func (r *RoutingContext) Elapsed(currentTime time.Time) time.Duration

Elapsed returns the elapsed time since the request was created.

func (*RoutingContext) Features added in v0.4.0

func (r *RoutingContext) Features() (RequestFeatures, error)

Features returns the features corresponding to the request. The feature of a request is defined by the output length and prompt length.

func (*RoutingContext) GetError added in v0.4.0

func (r *RoutingContext) GetError() error

GetError returns the error of the routing context.

func (*RoutingContext) GetRoutingDelay added in v0.4.0

func (r *RoutingContext) GetRoutingDelay() time.Duration

GetRoutingDelay returns the time duration used for routing the request.

func (*RoutingContext) HasError added in v0.4.0

func (r *RoutingContext) HasError() bool

HasError returns true if the request has an error.

func (*RoutingContext) HasRouted

func (r *RoutingContext) HasRouted() bool

HasRouted returns true if the request has been routed or an error has been set.

func (*RoutingContext) PromptLength added in v0.4.0

func (r *RoutingContext) PromptLength() (int, error)

PromptLength returns the length of the prompt of the request.

func (*RoutingContext) PromptTokens added in v0.4.0

func (r *RoutingContext) PromptTokens() ([]int, error)

PromptTokens returns the tokenized prompt of the request.

func (*RoutingContext) SetError added in v0.4.0

func (r *RoutingContext) SetError(err error)

SetError sets the error of the routing context asynchronously. Do not call this function from synchronize routers. Asynchronize routers call this to set an error.

func (*RoutingContext) SetOutputPreditor added in v0.4.0

func (r *RoutingContext) SetOutputPreditor(predictor OutputPredictor) (old OutputPredictor)

SetOutputPreditor enables RoutingContext to use existing OutputPredictor to predict output length.

func (*RoutingContext) SetTargetPod

func (r *RoutingContext) SetTargetPod(pod *v1.Pod)

SetTargetPod sets the target pod of the routing context. All routers call this to set the target pod.

func (*RoutingContext) TargetAddress

func (r *RoutingContext) TargetAddress() string

TargetAddress returns the routing target address of the request.

func (*RoutingContext) TargetPod

func (r *RoutingContext) TargetPod() *v1.Pod

TargetPod returns the routing target pod of the request. TargetPod blocks until the target pod is set or an error is set.

func (*RoutingContext) TokenLength added in v0.4.0

func (r *RoutingContext) TokenLength() (int, error)

TokenLength returns the predicted output token length.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL