Documentation
¶
Overview ¶
Copyright 2024 The Aibrix Team.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2024 The Aibrix Team.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Index ¶
- Constants
- Variables
- type FallbackRouter
- type OutputPredictor
- type OutputPredictorProvider
- type PodList
- type QueueRouter
- type RequestFeatures
- type Router
- type RouterConstructor
- type RouterProvider
- type RouterProviderFunc
- type RouterProviderRegistrationFunc
- type RouterQueue
- type RoutingAlgorithm
- type RoutingContext
- func (r *RoutingContext) CanAddStats() bool
- func (r *RoutingContext) CanAddTrace() bool
- func (r *RoutingContext) CanDoneStats() bool
- func (r *RoutingContext) Delete()
- func (r *RoutingContext) Elapsed(currentTime time.Time) time.Duration
- func (r *RoutingContext) Features() (RequestFeatures, error)
- func (r *RoutingContext) GetError() error
- func (r *RoutingContext) GetRoutingDelay() time.Duration
- func (r *RoutingContext) HasError() bool
- func (r *RoutingContext) HasRouted() bool
- func (r *RoutingContext) PromptLength() (int, error)
- func (r *RoutingContext) PromptTokens() ([]int, error)
- func (r *RoutingContext) SetError(err error)
- func (r *RoutingContext) SetOutputPreditor(predictor OutputPredictor) (old OutputPredictor)
- func (r *RoutingContext) SetTargetPod(pod *v1.Pod)
- func (r *RoutingContext) TargetAddress() string
- func (r *RoutingContext) TargetPod() *v1.Pod
- func (r *RoutingContext) TokenLength() (int, error)
Constants ¶
const DefaultQueueCapacity = 1024
Variables ¶
var (
ErrQueueEmpty = errors.New("queue is empty")
)
Functions ¶
This section is empty.
Types ¶
type FallbackRouter ¶ added in v0.4.0
type FallbackRouter interface {
Router
// SetFallback sets the fallback router
SetFallback(RoutingAlgorithm, RouterProviderFunc)
}
FallbackRouter enables router chaining by set a fallback router.
type OutputPredictor ¶ added in v0.4.0
type OutputPredictorProvider ¶ added in v0.4.0
type OutputPredictorProvider interface {
// GetOutputPredictor returns the output predictor
GetOutputPredictor(modelName string) (OutputPredictor, error)
}
OutputPredictorProvider provides a stateful way to get an output predictor, allowing a struct to provide the output predictor by model.
type PodList ¶
type PodList interface {
// Len returns the number of pods in the list.
Len() int
// All returns a slice of all pods in the list.
All() []*v1.Pod
// Indexes returns a slice of indexes for querying pods by index.
Indexes() []string
// ListByIndex returns a slice of pods that match the given index.
ListByIndex(index string) []*v1.Pod
}
PodList is an interface for a list of pods and support for indexing and querying pods by index.
type QueueRouter ¶ added in v0.4.0
QueueRouter defines the interface for routers that contains built-in queue and offers queue status query.
type RequestFeatures ¶ added in v0.4.0
type RequestFeatures []float64
type Router ¶
type Router interface {
// Route selects a target pod from the provided list of pods.
// The input pods is guaranteed to be non-empty and contain only routable pods.
Route(ctx *RoutingContext, readyPodList PodList) (string, error)
}
Router defines the interface for routing logic to select target pods.
type RouterConstructor ¶
RouterConstructor defines a constructor for a router.
type RouterProvider ¶
type RouterProvider interface {
// GetRouter returns the router
GetRouter(ctx *RoutingContext) (Router, error)
}
RouterProvider provides a stateful way to get a router, allowing a struct to provide the router by strategy and model.
type RouterProviderFunc ¶
type RouterProviderFunc func(*RoutingContext) (Router, error)
RouterProviderFunc provides a stateless way to get a router
type RouterProviderRegistrationFunc ¶
type RouterProviderRegistrationFunc func() RouterProviderFunc
RouterProviderRegistrationFunc provides a way to register RouterProviderFunc
type RouterQueue ¶ added in v0.4.0
type RoutingAlgorithm ¶
type RoutingAlgorithm string
RoutingAlgorithms defines the routing algorithms
func (RoutingAlgorithm) NewContext ¶ added in v0.4.0
func (alg RoutingAlgorithm) NewContext(ctx context.Context, model, message, requestID, user string) *RoutingContext
NewContext gets a RoutingContext with current RoutingAlgorithm.
type RoutingContext ¶
type RoutingContext struct {
context.Context
Algorithm RoutingAlgorithm
Model string
Message string
RequestID string
User *string
RequestTime time.Time // Time when the routing context is created.
PendingLoad float64 // Normalized pending load of request, available after AddRequestCount call. See cache.PendingLoadProvider
TraceTerm int64 // Trace term identifier, available after AddRequestCount call.
RoutedTime time.Time // Time consumed during routing.
ReqHeaders map[string]string
ReqBody []byte
ReqPath string
// contains filtered or unexported fields
}
RoutingContext encapsulates the context information required for routing. It can be extended with more fields as needed in the future.
func NewRoutingContext ¶
func NewRoutingContext(ctx context.Context, algorithms RoutingAlgorithm, model, message, requestID, user string) *RoutingContext
NewRoutingContext gets a RoutingContext from a context pool.
func (*RoutingContext) CanAddStats ¶ added in v0.4.0
func (r *RoutingContext) CanAddStats() bool
CanAddStats returns true if the first time trying update in-memory realtime statistics.
func (*RoutingContext) CanAddTrace ¶ added in v0.4.0
func (r *RoutingContext) CanAddTrace() bool
CanAddTrace returns true if the first time trying add trace to cache.
func (*RoutingContext) CanDoneStats ¶ added in v0.4.0
func (r *RoutingContext) CanDoneStats() bool
func (*RoutingContext) Delete ¶
func (r *RoutingContext) Delete()
Delete resolves all waiting TargetPod() calls and releases the RoutingContext to the pool.
func (*RoutingContext) Elapsed ¶ added in v0.4.0
func (r *RoutingContext) Elapsed(currentTime time.Time) time.Duration
Elapsed returns the elapsed time since the request was created.
func (*RoutingContext) Features ¶ added in v0.4.0
func (r *RoutingContext) Features() (RequestFeatures, error)
Features returns the features corresponding to the request. The feature of a request is defined by the output length and prompt length.
func (*RoutingContext) GetError ¶ added in v0.4.0
func (r *RoutingContext) GetError() error
GetError returns the error of the routing context.
func (*RoutingContext) GetRoutingDelay ¶ added in v0.4.0
func (r *RoutingContext) GetRoutingDelay() time.Duration
GetRoutingDelay returns the time duration used for routing the request.
func (*RoutingContext) HasError ¶ added in v0.4.0
func (r *RoutingContext) HasError() bool
HasError returns true if the request has an error.
func (*RoutingContext) HasRouted ¶
func (r *RoutingContext) HasRouted() bool
HasRouted returns true if the request has been routed or an error has been set.
func (*RoutingContext) PromptLength ¶ added in v0.4.0
func (r *RoutingContext) PromptLength() (int, error)
PromptLength returns the length of the prompt of the request.
func (*RoutingContext) PromptTokens ¶ added in v0.4.0
func (r *RoutingContext) PromptTokens() ([]int, error)
PromptTokens returns the tokenized prompt of the request.
func (*RoutingContext) SetError ¶ added in v0.4.0
func (r *RoutingContext) SetError(err error)
SetError sets the error of the routing context asynchronously. Do not call this function from synchronize routers. Asynchronize routers call this to set an error.
func (*RoutingContext) SetOutputPreditor ¶ added in v0.4.0
func (r *RoutingContext) SetOutputPreditor(predictor OutputPredictor) (old OutputPredictor)
SetOutputPreditor enables RoutingContext to use existing OutputPredictor to predict output length.
func (*RoutingContext) SetTargetPod ¶
func (r *RoutingContext) SetTargetPod(pod *v1.Pod)
SetTargetPod sets the target pod of the routing context. All routers call this to set the target pod.
func (*RoutingContext) TargetAddress ¶
func (r *RoutingContext) TargetAddress() string
TargetAddress returns the routing target address of the request.
func (*RoutingContext) TargetPod ¶
func (r *RoutingContext) TargetPod() *v1.Pod
TargetPod returns the routing target pod of the request. TargetPod blocks until the target pod is set or an error is set.
func (*RoutingContext) TokenLength ¶ added in v0.4.0
func (r *RoutingContext) TokenLength() (int, error)
TokenLength returns the predicted output token length.