serving

package
v1.2.15 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 18, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package serving implements a minimal programmatic serving layer.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AgenticLayer

type AgenticLayer struct {
	MemoryManager   *memory.DefaultManager
	WorkflowManager *core.WorkflowManager
	SkillRegistry   *core.SkillRegistry
	PolicyStore     *access.Store
	// contains filtered or unexported fields
}

AgenticLayer is the serving layer that manages LLM backends and executes inference.

func NewAgenticLayer

func NewAgenticLayer() *AgenticLayer

NewAgenticLayer creates a new serving layer.

func (*AgenticLayer) AddLLMBackend

func (l *AgenticLayer) AddLLMBackend(name string, backend *core.LLMBackend, modelID string)

AddLLMBackend registers an LLM backend by name.

func (*AgenticLayer) CheckBackendAccess added in v1.2.13

func (l *AgenticLayer) CheckBackendAccess(tags map[string]string, backendName string) access.Decision

CheckBackendAccess checks whether the given tags permit access to the named backend.

func (*AgenticLayer) CheckDataAccess added in v1.2.13

func (l *AgenticLayer) CheckDataAccess(tags map[string]string, backendName string, dataLabels []string) access.Decision

CheckDataAccess checks whether data with the given labels may be sent to the named backend.

func (*AgenticLayer) CheckManifestBounds added in v1.2.15

func (l *AgenticLayer) CheckManifestBounds(manifest *core.SkillManifest, backendName string, toolNames []string, dataLabels []string) access.Decision

CheckManifestBounds verifies that the actual request resources are within the skill's declared manifest.

func (*AgenticLayer) CheckSkillAccess added in v1.2.15

func (l *AgenticLayer) CheckSkillAccess(tags map[string]string, skillName string) access.Decision

CheckSkillAccess checks whether the given tags permit invocation of the named skill.

func (*AgenticLayer) CheckSkillEnvelope added in v1.2.15

func (l *AgenticLayer) CheckSkillEnvelope(tags map[string]string, skillID string, manifest *core.SkillManifest) access.Decision

CheckSkillEnvelope verifies that the skill's manifest is within the intersection of three bounds: the manifest itself, skill-scoped policies, and the invoker's policies.

func (*AgenticLayer) CheckToolAccess added in v1.2.13

func (l *AgenticLayer) CheckToolAccess(tags map[string]string, tools []*mcp.Tool) access.Decision

CheckToolAccess checks whether the given tags permit all requested tools. Returns the first denial encountered, or an allowed decision if all tools pass.

func (*AgenticLayer) CheckToolAccessByName added in v1.2.15

func (l *AgenticLayer) CheckToolAccessByName(tags map[string]string, toolName string) access.Decision

CheckToolAccessByName checks whether the given tags permit a single tool by name.

func (*AgenticLayer) Execute

func (l *AgenticLayer) Execute(ctx context.Context, serverName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, error)

Execute runs a single non-streaming inference call against the named LLM backend. For streaming, use ExecuteStream instead. opts.Stream must be false.

func (*AgenticLayer) ExecuteStream

func (l *AgenticLayer) ExecuteStream(ctx context.Context, serverName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, <-chan model.StreamEvent, error)

ExecuteStream runs inference with streaming. It returns the response (filled as the stream is consumed), a channel of stream events, and an error. The caller must consume the channel until closed; the response content, tool_calls, and metrics are populated by the provider's goroutine as the stream completes. opts.Stream must be true.

func (*AgenticLayer) GetCostModel added in v1.2.11

func (l *AgenticLayer) GetCostModel(backendName string) *core.CostModel

GetCostModel returns the CostModel for a registered backend, or nil if not found or unset.

func (*AgenticLayer) GetLLMBackendHealth

func (l *AgenticLayer) GetLLMBackendHealth(ctx context.Context, serverName string) (HealthStatus, error)

GetLLMBackendHealth returns the health status of a named LLM backend.

func (*AgenticLayer) GetModelProvider

func (l *AgenticLayer) GetModelProvider(ctx context.Context, backendName string) (model.Provider, error)

GetModelProvider returns the model provider for a named LLM backend.

func (*AgenticLayer) ListLLMBackends

func (l *AgenticLayer) ListLLMBackends() []string

ListLLMBackends returns all registered LLM backend names.

func (*AgenticLayer) NotifyWorkflowComplete

func (l *AgenticLayer) NotifyWorkflowComplete(ctx context.Context, workflowID string, backends []string)

NotifyWorkflowComplete emits TransitionWorkflowComplete signals for each backend the workflow used, then deregisters the workflow from the tracker.

func (*AgenticLayer) SelectBackendByAccuracy added in v1.2.11

func (l *AgenticLayer) SelectBackendByAccuracy(accuracy float64, policy string, defaultBackend string) (string, error)

SelectBackendByAccuracy picks the cheapest backend whose Quality >= accuracy. Policy controls fallback behavior (see AccuracyPolicyPrefer, AccuracyPolicyStrict).

func (*AgenticLayer) StartPressureMonitor

func (l *AgenticLayer) StartPressureMonitor(ctx context.Context)

StartPressureMonitor launches the background memory pressure polling loop. It dynamically queries the current set of backends on each tick and stops when ctx is cancelled.

func (*AgenticLayer) UpdateBackend added in v1.2.11

func (l *AgenticLayer) UpdateBackend(name string, update BackendUpdate) error

UpdateBackend applies a partial update to a registered backend.

func (*AgenticLayer) ValidateAccess added in v1.2.15

func (l *AgenticLayer) ValidateAccess(
	tags map[string]string,
	backend string,
	toolNames []string,
	dataLabels []string,
	skillID string,
) (access.Decision, map[string]string)

ValidateAccess runs all access control checks for a request and returns the first denial, or an allowed decision if everything passes. Both handleExecute and handleAccessCheck call this method.

If skillID is non-empty, the skill must be registered. The method performs skill visibility, envelope, and manifest bounds checks before the standard backend/tool/data checks. On success with a skill, the returned tags map includes the injected "skill" tag for downstream policy matching.

type BackendUpdate added in v1.2.11

type BackendUpdate struct {
	CostModel      *core.CostModel `json:"cost_model,omitempty"`
	Quality        *float64        `json:"quality,omitempty"`
	MaxConcurrency *int            `json:"max_concurrency,omitempty"`
}

BackendUpdate holds the optional fields that can be live-updated on a registered backend. Nil fields are left unchanged.

type ChatOptions

type ChatOptions struct {
	WorkflowID  string
	CachePolicy string
}

ChatOptions carries optional metadata for a scheduled chat request.

type HealthStatus

type HealthStatus string

HealthStatus represents the health status of an LLM backend

const (
	HealthStatusHealthy     HealthStatus = "healthy"
	HealthStatusDegraded    HealthStatus = "degraded"
	HealthStatusUnavailable HealthStatus = "unavailable"
)

type LLMBackendManager

type LLMBackendManager struct {
	// contains filtered or unexported fields
}

LLMBackendManager manages a pool of LLM backend configurations and their providers

func NewLLMBackendManager

func NewLLMBackendManager(mm *memory.DefaultManager) *LLMBackendManager

NewLLMBackendManager creates a new LLM backend manager.

func (*LLMBackendManager) AddLLMBackend

func (m *LLMBackendManager) AddLLMBackend(name string, backend *core.LLMBackend, modelID string)

AddLLMBackend registers an LLM backend by name.

func (*LLMBackendManager) GetCostModel added in v1.2.11

func (m *LLMBackendManager) GetCostModel(backendName string) *core.CostModel

GetCostModel returns the CostModel for a registered backend, or nil if not found or unset.

func (*LLMBackendManager) GetHealthStatus

func (m *LLMBackendManager) GetHealthStatus(ctx context.Context, backendName string) (HealthStatus, error)

GetHealthStatus returns the health status of an LLM backend

func (*LLMBackendManager) GetModelID

func (m *LLMBackendManager) GetModelID(backendName string) string

GetModelID returns the modelID string for a registered backend, or "" if not found.

func (*LLMBackendManager) GetModelProvider

func (m *LLMBackendManager) GetModelProvider(ctx context.Context, backendName string) (model.Provider, error)

GetModelProvider returns a cached provider for an LLM backend, creating it if necessary

func (*LLMBackendManager) ListLLMBackends

func (m *LLMBackendManager) ListLLMBackends() []string

ListLLMBackends returns a list of all LLM backend names

func (*LLMBackendManager) ScheduleChat

func (m *LLMBackendManager) ScheduleChat(ctx context.Context, backendName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, <-chan model.StreamEvent, error)

ScheduleChat queues a request for execution under the backend's scheduling policy. stageName identifies the stage queue inside the backend. Empty uses "default".

func (*LLMBackendManager) SelectBackendByAccuracy added in v1.2.11

func (m *LLMBackendManager) SelectBackendByAccuracy(accuracy float64, policy string, defaultBackend string) (string, error)

SelectBackendByAccuracy returns the cheapest registered backend whose Quality >= accuracy and that has a CostModel set. Ties are broken by ascending output cost, then input cost, then backend name.

The policy parameter controls fallback behavior when no backend meets the threshold:

  • "strict": return an error.
  • "prefer" (or empty, the default): fall back to the cheapest costed backend, or defaultBackend if no backends have cost models.

func (*LLMBackendManager) UpdateBackend added in v1.2.11

func (m *LLMBackendManager) UpdateBackend(name string, update BackendUpdate) error

UpdateBackend applies a partial update to an existing backend's mutable fields. Returns an error if the backend is not registered.

Directories

Path Synopsis
Package access implements access control policy evaluation for the Orla serving layer.
Package access implements access control policy evaluation for the Orla serving layer.
Package api provides the HTTP API for the serving layer daemon.
Package api provides the HTTP API for the serving layer daemon.
Package cost provides helpers for token-based cost estimation.
Package cost provides helpers for token-based cost estimation.
Package memory implements the Memory Manager for Orla's agentic serving layer.
Package memory implements the Memory Manager for Orla's agentic serving layer.
Package metrics provides Prometheus metrics for the Orla serving layer.
Package metrics provides Prometheus metrics for the Orla serving layer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL