serving

package

v1.2.15 Latest Latest Go to latest Published: Apr 18, 2026 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/harvard-cns/orla

Links

Open Source Insights

Documentation ¶

Overview ¶

Package serving implements a minimal programmatic serving layer.

Index ¶

type AgenticLayer
- func NewAgenticLayer() *AgenticLayer
type BackendUpdate
type ChatOptions
type HealthStatus
type LLMBackendManager
- func NewLLMBackendManager(mm *memory.DefaultManager) *LLMBackendManager

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AgenticLayer ¶

type AgenticLayer struct {
	MemoryManager   *memory.DefaultManager
	WorkflowManager *core.WorkflowManager
	SkillRegistry   *core.SkillRegistry
	PolicyStore     *access.Store
	// contains filtered or unexported fields
}

AgenticLayer is the serving layer that manages LLM backends and executes inference.

func NewAgenticLayer ¶

func NewAgenticLayer() *AgenticLayer

NewAgenticLayer creates a new serving layer.

func (*AgenticLayer) AddLLMBackend ¶

func (l *AgenticLayer) AddLLMBackend(name string, backend *core.LLMBackend, modelID string)

AddLLMBackend registers an LLM backend by name.

func (*AgenticLayer) CheckBackendAccess ¶ added in v1.2.13

func (l *AgenticLayer) CheckBackendAccess(tags map[string]string, backendName string) access.Decision

CheckBackendAccess checks whether the given tags permit access to the named backend.

func (*AgenticLayer) CheckDataAccess ¶ added in v1.2.13

func (l *AgenticLayer) CheckDataAccess(tags map[string]string, backendName string, dataLabels []string) access.Decision

CheckDataAccess checks whether data with the given labels may be sent to the named backend.

func (*AgenticLayer) CheckManifestBounds ¶ added in v1.2.15

func (l *AgenticLayer) CheckManifestBounds(manifest *core.SkillManifest, backendName string, toolNames []string, dataLabels []string) access.Decision

CheckManifestBounds verifies that the actual request resources are within the skill's declared manifest.

func (*AgenticLayer) CheckSkillAccess ¶ added in v1.2.15

func (l *AgenticLayer) CheckSkillAccess(tags map[string]string, skillName string) access.Decision

CheckSkillAccess checks whether the given tags permit invocation of the named skill.

func (*AgenticLayer) CheckSkillEnvelope ¶ added in v1.2.15

func (l *AgenticLayer) CheckSkillEnvelope(tags map[string]string, skillID string, manifest *core.SkillManifest) access.Decision

CheckSkillEnvelope verifies that the skill's manifest is within the intersection of three bounds: the manifest itself, skill-scoped policies, and the invoker's policies.

func (*AgenticLayer) CheckToolAccess ¶ added in v1.2.13

func (l *AgenticLayer) CheckToolAccess(tags map[string]string, tools []*mcp.Tool) access.Decision

CheckToolAccess checks whether the given tags permit all requested tools. Returns the first denial encountered, or an allowed decision if all tools pass.

func (*AgenticLayer) CheckToolAccessByName ¶ added in v1.2.15

func (l *AgenticLayer) CheckToolAccessByName(tags map[string]string, toolName string) access.Decision

CheckToolAccessByName checks whether the given tags permit a single tool by name.

func (*AgenticLayer) Execute ¶

func (l *AgenticLayer) Execute(ctx context.Context, serverName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, error)

Execute runs a single non-streaming inference call against the named LLM backend. For streaming, use ExecuteStream instead. opts.Stream must be false.

func (*AgenticLayer) ExecuteStream ¶

func (l *AgenticLayer) ExecuteStream(ctx context.Context, serverName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, <-chan model.StreamEvent, error)

ExecuteStream runs inference with streaming. It returns the response (filled as the stream is consumed), a channel of stream events, and an error. The caller must consume the channel until closed; the response content, tool_calls, and metrics are populated by the provider's goroutine as the stream completes. opts.Stream must be true.

func (*AgenticLayer) GetCostModel ¶ added in v1.2.11

func (l *AgenticLayer) GetCostModel(backendName string) *core.CostModel

GetCostModel returns the CostModel for a registered backend, or nil if not found or unset.

func (*AgenticLayer) GetLLMBackendHealth ¶

func (l *AgenticLayer) GetLLMBackendHealth(ctx context.Context, serverName string) (HealthStatus, error)

GetLLMBackendHealth returns the health status of a named LLM backend.

func (*AgenticLayer) GetModelProvider ¶

func (l *AgenticLayer) GetModelProvider(ctx context.Context, backendName string) (model.Provider, error)

GetModelProvider returns the model provider for a named LLM backend.

func (*AgenticLayer) ListLLMBackends ¶

func (l *AgenticLayer) ListLLMBackends() []string

ListLLMBackends returns all registered LLM backend names.

func (*AgenticLayer) NotifyWorkflowComplete ¶

func (l *AgenticLayer) NotifyWorkflowComplete(ctx context.Context, workflowID string, backends []string)

NotifyWorkflowComplete emits TransitionWorkflowComplete signals for each backend the workflow used, then deregisters the workflow from the tracker.

func (*AgenticLayer) SelectBackendByAccuracy ¶ added in v1.2.11

func (l *AgenticLayer) SelectBackendByAccuracy(accuracy float64, policy string, defaultBackend string) (string, error)

SelectBackendByAccuracy picks the cheapest backend whose Quality >= accuracy. Policy controls fallback behavior (see AccuracyPolicyPrefer, AccuracyPolicyStrict).

func (*AgenticLayer) StartPressureMonitor ¶

func (l *AgenticLayer) StartPressureMonitor(ctx context.Context)

StartPressureMonitor launches the background memory pressure polling loop. It dynamically queries the current set of backends on each tick and stops when ctx is cancelled.

func (*AgenticLayer) UpdateBackend ¶ added in v1.2.11

func (l *AgenticLayer) UpdateBackend(name string, update BackendUpdate) error

UpdateBackend applies a partial update to a registered backend.

func (*AgenticLayer) ValidateAccess ¶ added in v1.2.15

func (l *AgenticLayer) ValidateAccess(
	tags map[string]string,
	backend string,
	toolNames []string,
	dataLabels []string,
	skillID string,
) (access.Decision, map[string]string)

ValidateAccess runs all access control checks for a request and returns the first denial, or an allowed decision if everything passes. Both handleExecute and handleAccessCheck call this method.

If skillID is non-empty, the skill must be registered. The method performs skill visibility, envelope, and manifest bounds checks before the standard backend/tool/data checks. On success with a skill, the returned tags map includes the injected "skill" tag for downstream policy matching.

type BackendUpdate ¶ added in v1.2.11

type BackendUpdate struct {
	CostModel      *core.CostModel `json:"cost_model,omitempty"`
	Quality        *float64        `json:"quality,omitempty"`
	MaxConcurrency *int            `json:"max_concurrency,omitempty"`
}

BackendUpdate holds the optional fields that can be live-updated on a registered backend. Nil fields are left unchanged.

type ChatOptions ¶

type ChatOptions struct {
	WorkflowID  string
	CachePolicy string
}

ChatOptions carries optional metadata for a scheduled chat request.

type HealthStatus ¶

type HealthStatus string

HealthStatus represents the health status of an LLM backend

const (
	HealthStatusHealthy     HealthStatus = "healthy"
	HealthStatusDegraded    HealthStatus = "degraded"
	HealthStatusUnavailable HealthStatus = "unavailable"
)

type LLMBackendManager ¶

type LLMBackendManager struct {
	// contains filtered or unexported fields
}

LLMBackendManager manages a pool of LLM backend configurations and their providers

func NewLLMBackendManager ¶

func NewLLMBackendManager(mm *memory.DefaultManager) *LLMBackendManager

NewLLMBackendManager creates a new LLM backend manager.

func (*LLMBackendManager) AddLLMBackend ¶

func (m *LLMBackendManager) AddLLMBackend(name string, backend *core.LLMBackend, modelID string)

AddLLMBackend registers an LLM backend by name.

func (*LLMBackendManager) GetCostModel ¶ added in v1.2.11

func (m *LLMBackendManager) GetCostModel(backendName string) *core.CostModel

GetCostModel returns the CostModel for a registered backend, or nil if not found or unset.

func (*LLMBackendManager) GetHealthStatus ¶

func (m *LLMBackendManager) GetHealthStatus(ctx context.Context, backendName string) (HealthStatus, error)

GetHealthStatus returns the health status of an LLM backend

func (*LLMBackendManager) GetModelID ¶

func (m *LLMBackendManager) GetModelID(backendName string) string

GetModelID returns the modelID string for a registered backend, or "" if not found.

func (*LLMBackendManager) GetModelProvider ¶

func (m *LLMBackendManager) GetModelProvider(ctx context.Context, backendName string) (model.Provider, error)

GetModelProvider returns a cached provider for an LLM backend, creating it if necessary

func (*LLMBackendManager) ListLLMBackends ¶

func (m *LLMBackendManager) ListLLMBackends() []string

ListLLMBackends returns a list of all LLM backend names

func (*LLMBackendManager) ScheduleChat ¶

func (m *LLMBackendManager) ScheduleChat(ctx context.Context, backendName, stageName string, messages []model.Message, tools []*mcp.Tool, opts model.InferenceOptions, chatOpts ...ChatOptions) (*model.Response, <-chan model.StreamEvent, error)

ScheduleChat queues a request for execution under the backend's scheduling policy. stageName identifies the stage queue inside the backend. Empty uses "default".

func (*LLMBackendManager) SelectBackendByAccuracy ¶ added in v1.2.11

func (m *LLMBackendManager) SelectBackendByAccuracy(accuracy float64, policy string, defaultBackend string) (string, error)

SelectBackendByAccuracy returns the cheapest registered backend whose Quality >= accuracy and that has a CostModel set. Ties are broken by ascending output cost, then input cost, then backend name.

The policy parameter controls fallback behavior when no backend meets the threshold:

"strict": return an error.
"prefer" (or empty, the default): fall back to the cheapest costed backend, or defaultBackend if no backends have cost models.

func (*LLMBackendManager) UpdateBackend ¶ added in v1.2.11

func (m *LLMBackendManager) UpdateBackend(name string, update BackendUpdate) error

UpdateBackend applies a partial update to an existing backend's mutable fields. Returns an error if the backend is not registered.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
access Package access implements access control policy evaluation for the Orla serving layer.	Package access implements access control policy evaluation for the Orla serving layer.
api Package api provides the HTTP API for the serving layer daemon.	Package api provides the HTTP API for the serving layer daemon.
cost Package cost provides helpers for token-based cost estimation.	Package cost provides helpers for token-based cost estimation.
memory Package memory implements the Memory Manager for Orla's agentic serving layer.	Package memory implements the Memory Manager for Orla's agentic serving layer.
metrics Package metrics provides Prometheus metrics for the Orla serving layer.	Package metrics provides Prometheus metrics for the Orla serving layer.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL