routing

package
v0.10.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2026 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package routing implements the unified routing engine for fizeau. It ranks (harness, provider, model) candidates uniformly per CONTRACT-003.

The engine consolidates DDx-side harness-tier routing and agent-side per-provider failover into a single ranking pipeline.

Index

Constants

View Source
const (
	ProviderPreferenceLocalFirst        = "local-first"
	ProviderPreferenceSubscriptionFirst = "subscription-first"
	ProviderPreferenceLocalOnly         = "local-only"
	ProviderPreferenceSubscriptionOnly  = "subscription-only"
)
View Source
const (
	QuotaTrendUnknown    = "unknown"
	QuotaTrendHealthy    = "healthy"
	QuotaTrendBurning    = "burning"
	QuotaTrendExhausting = "exhausting"
)
View Source
const (
	// CostSourceCatalog means cost came from the model catalog.
	CostSourceCatalog = "catalog"
	// CostSourceSubscription means cost came from subscription quota pricing.
	CostSourceSubscription = "subscription"
	// CostSourceUnknown means no reliable cost estimate is available.
	CostSourceUnknown = "unknown"
	// CostSourceUserConfig means cost came from explicit user configuration.
	CostSourceUserConfig = "user-config"
)

Variables

View Source
var ProfileEscalationLadder = []string{"cheap", "standard", "smart"}

ProfileEscalationLadder is the fixed cheap → standard → smart progression service.ResolveRoute walks when every candidate at the requested tier is filtered out (unhealthy or capability-rejected).

Functions

func EscalateProfileAware

func EscalateProfileAware(current string, ladder []string, req Request, in Inputs) string

EscalateProfileAware is a helper for tier escalation. Given a request that failed at one profile, return the next profile to try, restricted to those that have a viable candidate under the current Inputs (i.e., the profile's resolved concrete model exists on the request's pinned provider, if any).

Fixes ddx-3c5ba7cc: tier escalation respects --provider affinity.

Returns "" when no further profile is viable.

func FuzzyMatch

func FuzzyMatch(input string, pool []string) string

FuzzyMatch returns the best concrete model from pool for the given input, using canonical-form (case-insensitive, vendor-prefix-stripped) matching. Returns "" when no candidate matches.

Algorithm:

  1. Exact match (case-insensitive on canonical form) wins outright.
  2. Prefix/suffix/contains match on canonical form. Among fuzzy matches, prefer prefix, then suffix, then contains, and within each tier prefer the shortest remaining text (most specific match).
  3. No match returns "".

func ProviderModelKey

func ProviderModelKey(provider, endpoint, model string) string

ProviderModelKey is the metrics key used by routing callers for provider performance signals. Endpoint is optional; when empty the key remains compatible with older provider:model metrics.

func SameModelIntent

func SameModelIntent(a, b string) bool

SameModelIntent returns true if two model strings refer to the same model after canonicalization. Used by capability gating and provider matching.

Types

type Candidate

type Candidate struct {
	Harness            string
	Provider           string
	Endpoint           string
	Model              string
	Score              float64
	CostUSDPer1kTokens float64
	CostSource         string
	Power              int
	Eligible           bool
	Reason             string

	// FilterReason is the typed disqualification category, set at the
	// rejection site that decided why this candidate is ineligible.
	// Empty for eligible candidates. Service-layer code maps this to the
	// public FilterReason* string constants without parsing free-form
	// Reason text.
	FilterReason FilterReason

	// LatencyMS, SuccessRate, and CostClass expose the score-component
	// inputs so callers can render per-axis explanations alongside the
	// final Score. Zero / negative values mean unknown (see Inputs docs).
	LatencyMS   float64
	SuccessRate float64
	CostClass   string
	SpeedTPS    float64

	QuotaOK          bool
	QuotaPercentUsed int
	QuotaTrend       string
}

Candidate is one ranked routing option.

type Capabilities

type Capabilities struct {
	ContextWindow      int      // resolved tokens; 0 = unknown
	SupportsTools      bool     // supports tool/function calling
	SupportsStreaming  bool     // supports streaming responses
	SupportedReasoning []string // supported public reasoning values
	MaxReasoningTokens int      // 0 means numeric reasoning is unsupported/unknown
	SupportedPerms     []string // {"safe","supervised","unrestricted"} subset
	ExactPinSupport    bool     // accepts exact concrete model pins
	SupportedModels    []string // nil = no static allow-list
}

Capabilities describes what a (harness, provider, model) tuple can do. Populated from harness config + catalog metadata + provider discovery.

func (Capabilities) HasModel

func (c Capabilities) HasModel(model string) bool

HasModel returns true when the exact model pin is within the static harness allow-list. A nil allow-list means the harness delegates validation to provider-side model checks.

func (Capabilities) HasPermissions

func (c Capabilities) HasPermissions(perm string) bool

HasPermissions returns true if the candidate supports the requested level. An empty permission level always returns true.

func (Capabilities) HasReasoning

func (c Capabilities) HasReasoning(value string) bool

HasReasoning returns true if the candidate supports the requested reasoning value. Empty, auto, off, and numeric 0 impose no requirement.

type Decision

type Decision struct {
	Harness    string
	Provider   string
	Endpoint   string
	Model      string
	Reason     string
	Candidates []Candidate
}

Decision is the routing engine's output: the picked candidate plus the full ranked list (including rejected ones with rejection reasons).

func Resolve

func Resolve(req Request, in Inputs) (*Decision, error)

Resolve runs the engine end-to-end and returns a Decision.

The engine:

  1. Enumerates (harness, provider, model) candidates from inputs.
  2. Applies gating (capability, override, model-pin, surface).
  3. Scores per profile policy with cooldown demotion + perf bias.
  4. Sorts viable → score → cost → locality → name.
  5. Returns the top viable candidate with the full ranked list.

Returns an error only when no viable candidate exists.

type ErrHarnessModelIncompatible

type ErrHarnessModelIncompatible struct {
	// Harness is the canonical harness name supplied by the caller.
	Harness string
	// Model is the exact concrete model pin supplied by the caller.
	Model string
	// SupportedModels is the harness allow-list that rejected Model.
	SupportedModels []string
}

ErrHarnessModelIncompatible reports an explicit Harness+Model pin that the harness allow-list cannot serve.

func (ErrHarnessModelIncompatible) Error

func (ErrHarnessModelIncompatible) Is

func (e ErrHarnessModelIncompatible) Is(target error) bool

func (ErrHarnessModelIncompatible) Unwrap

type ErrNoLiveProvider

type ErrNoLiveProvider struct {
	// PromptTokens is the request's EstimatedPromptTokens at the time
	// escalation began. Zero means no prompt-token gating was active.
	PromptTokens int
	// RequiresTools mirrors the request's RequiresTools flag.
	RequiresTools bool
	// StartingTier is the profile name that escalation began from
	// (the profile in the original request).
	StartingTier string
}

ErrNoLiveProvider reports that profile-tier escalation walked the entire ladder (cheap → standard → smart) without finding a live provider that can serve the request. Callers translate this into a precise user-facing message naming the prompt size and tool requirement so operators know what capability is missing across all tiers.

func (*ErrNoLiveProvider) Error

func (e *ErrNoLiveProvider) Error() string

type ErrNoProfileCandidate

type ErrNoProfileCandidate struct {
	Profile           string
	MissingCapability string
	Rejected          int
}

ErrNoProfileCandidate reports that a profile's hard placement requirement could not be satisfied by any routed candidate.

func (ErrNoProfileCandidate) Error

func (e ErrNoProfileCandidate) Error() string

func (ErrNoProfileCandidate) Is

func (e ErrNoProfileCandidate) Is(target error) bool

func (ErrNoProfileCandidate) Unwrap

func (e ErrNoProfileCandidate) Unwrap() error

type ErrProfilePinConflict

type ErrProfilePinConflict struct {
	// Profile is the explicit profile requested by the caller.
	Profile string
	// ConflictingPin names the explicit pin that violates the profile, such as
	// "Harness=claude" or "Model=local-model".
	ConflictingPin string
	// ProfileConstraint is a short description of the profile placement rule,
	// such as "local-only" or "subscription-only".
	ProfileConstraint string
}

ErrProfilePinConflict reports an explicit Profile whose placement constraint contradicts another explicit caller pin.

func (ErrProfilePinConflict) Error

func (e ErrProfilePinConflict) Error() string

func (ErrProfilePinConflict) Is

func (e ErrProfilePinConflict) Is(target error) bool

func (ErrProfilePinConflict) Unwrap

func (e ErrProfilePinConflict) Unwrap() error

type FilterReason

type FilterReason string

FilterReason categorizes why a routing candidate was disqualified. The zero value (empty string) means the candidate is eligible.

const (
	// FilterReasonEligible is the zero value for an eligible candidate.
	FilterReasonEligible FilterReason = ""
	// FilterReasonContextTooSmall: candidate's context window is below the
	// request's MinContextWindow().
	FilterReasonContextTooSmall FilterReason = "context_too_small"
	// FilterReasonNoToolSupport: request needs tool calling but candidate
	// does not support it.
	FilterReasonNoToolSupport FilterReason = "no_tool_support"
	// FilterReasonReasoningUnsupported: candidate cannot satisfy the
	// requested reasoning policy.
	FilterReasonReasoningUnsupported FilterReason = "reasoning_unsupported"
	// FilterReasonUnhealthy: harness/provider is unavailable, in cooldown,
	// out of quota, or excluded by a hard provider-preference gate.
	FilterReasonUnhealthy FilterReason = "unhealthy"
	// FilterReasonScoredBelowTop: catch-all for ineligibility that does
	// not fit a more specific category (also used for capability
	// mismatches such as permissions/model-pin/exact-pin and for model
	// resolution failures).
	FilterReasonScoredBelowTop FilterReason = "scored_below_top"
	// FilterReasonPinMismatch: candidate was rejected because it does not
	// satisfy an explicit caller pin such as Provider.
	FilterReasonPinMismatch FilterReason = "pin_mismatch"
	// FilterReasonPowerMissing: automatic routing requires catalog power
	// metadata, but the candidate model is unknown or has zero power.
	FilterReasonPowerMissing FilterReason = "power_missing"
	// FilterReasonBelowMinPower: candidate model power is below req.MinPower.
	FilterReasonBelowMinPower FilterReason = "below_min_power"
	// FilterReasonAboveMaxPower: candidate model power is above req.MaxPower.
	FilterReasonAboveMaxPower FilterReason = "above_max_power"
	// FilterReasonExactPinOnly: catalog marks the model as only eligible for
	// explicit model pins.
	FilterReasonExactPinOnly FilterReason = "exact_pin_only"
	// FilterReasonNotAutoRoutable: catalog metadata exists but marks the
	// model inactive, deprecated, stale, or otherwise excluded from
	// automatic routing.
	FilterReasonNotAutoRoutable FilterReason = "not_auto_routable"
)

func CheckGating

func CheckGating(cap Capabilities, req Request) (string, FilterReason)

CheckGating applies all capability gates against a request and returns the first failure reason (free-form string for diagnostics) plus the typed FilterReason category for the failure. Returns ("", FilterReasonEligible) when all gates pass.

The typed return is the authoritative classification — callers must not re-classify by parsing the string. The string is for human-readable diagnostics only.

Fixes ddx-4817edfd subtree: pre-dispatch capability check covering context window, tool support, effort, permissions.

func CheckPowerEligibility

func CheckPowerEligibility(lookup func(string) (ModelEligibility, bool), model string, req Request) (string, FilterReason)

CheckPowerEligibility applies catalog-power gates for unpinned automatic routing. Exact model pins bypass this gate so callers can intentionally run a discovered model that is missing catalog power or marked exact-pin-only.

type HarnessEntry

type HarnessEntry struct {
	Name                string
	Surface             string
	CostClass           string
	IsLocal             bool
	IsSubscription      bool
	IsHTTPProvider      bool
	AutoRoutingEligible bool
	TestOnly            bool
	ExactPinSupport     bool
	DefaultModel        string
	SupportedModels     []string
	SupportedReasoning  []string
	MaxReasoningTokens  int
	SupportedPerms      []string
	SupportsTools       bool

	// Available reflects the harness's discovered availability.
	Available bool

	// QuotaOK / QuotaPercentUsed reflect live quota state (when applicable).
	// SubscriptionOK gates subscription harnesses at the eligibility level:
	// when false, the candidate is ineligible regardless of score.
	QuotaOK          bool
	QuotaPercentUsed int
	QuotaStale       bool
	QuotaTrend       string // unknown|healthy|burning|exhausting
	SubscriptionOK   bool   // false = hard gate; true = score-based demotion

	// InCooldown marks the entire harness as being in a failure cooldown.
	// When true the harness is demoted in score (via candidateInternal.InCooldown)
	// but not hard-rejected, so it can still win when all other harnesses are
	// also unavailable.
	InCooldown bool

	// Providers is the list of providers this harness can dispatch to.
	// For subprocess harnesses (claude/codex) this is typically a single
	// vendor entry. For the native "agent" harness it is the configured
	// list of HTTP providers.
	Providers []ProviderEntry
}

HarnessEntry is the harness-side input the caller (service) supplies. It is the routing engine's view of a registered harness; the engine does not import the harnesses package directly to keep the dependency narrow.

type Inputs

type Inputs struct {
	Harnesses           []HarnessEntry
	HistoricalSuccess   map[string]float64   // by harness name; -1 = insufficient data
	ObservedSpeedTPS    map[string]float64   // by "provider:model"
	ProviderSuccessRate map[string]float64   // by ProviderModelKey(provider, endpoint, model)
	ObservedLatencyMS   map[string]float64   // by ProviderModelKey(provider, endpoint, model)
	ProviderCooldowns   map[string]time.Time // by provider name
	CooldownDuration    time.Duration        // 0 = no cooldown enforcement
	Now                 time.Time            // injected for deterministic testing; default time.Now()
	CatalogResolver     func(ref, surface string) (concreteModel string, ok bool)
	ModelEligibility    func(model string) (ModelEligibility, bool)

	// ReasoningResolver returns the catalog's surface_policy reasoning_default
	// for a (profile, surface) pair. When set, buildHarnessCandidates uses it
	// to resolve Reasoning=auto to a concrete level before invoking the
	// capability gate, so candidates that cannot satisfy the resolved level
	// (e.g. an off-only variant under a profile whose surface default is
	// "high") are correctly disqualified instead of slipping through.
	ReasoningResolver func(profile, surface string) (resolved string, ok bool)
}

Inputs bundles the engine's external data sources.

type ModelEligibility

type ModelEligibility struct {
	Power        int
	ExactPinOnly bool
	AutoRoutable bool
}

ModelEligibility is the routing engine's catalog-power view for one model. Unknown or zero-power models are still usable through exact Model pins, but are excluded from unpinned automatic routing.

type NoViableCandidateError

type NoViableCandidateError struct {
	Rejected int
	Model    string
	Provider string
	Harness  string
	MinPower int
	MaxPower int
}

NoViableCandidateError reports that routing evaluated candidates but every one failed a gate.

func (*NoViableCandidateError) Error

func (e *NoViableCandidateError) Error() string

type ProviderEntry

type ProviderEntry struct {
	Name               string
	BaseURL            string
	EndpointName       string
	EndpointBaseURL    string
	DefaultModel       string
	DiscoveredIDs      []string // models discovered via /v1/models or equivalent
	DiscoveryAttempted bool
	ContextWindows     map[string]int
	SupportsTools      bool

	// CostUSDPer1kTokens is the estimated blended USD cost per 1,000 tokens.
	// A zero value with CostSourceUnknown means the provider cost is unknown.
	CostUSDPer1kTokens float64
	// CostSource describes where CostUSDPer1kTokens came from: catalog,
	// subscription, user-config, or unknown.
	CostSource string

	// InCooldown reflects whether this provider is in a failure-cooldown window.
	InCooldown bool
}

ProviderEntry describes one provider available under a harness.

type Request

type Request struct {
	Profile            string // "cheap" | "standard" | "smart"
	ModelRef           string // catalog alias (e.g. "qwen/qwen3.6")
	Model              string // exact concrete model pin
	Provider           string // exact provider pin; constrains routing to one provider identity
	Harness            string // hard preference; constrains routing to one harness
	Reasoning          string // public reasoning scalar
	Permissions        string // "safe" | "supervised" | "unrestricted"
	ProviderPreference string // "local-first" | "subscription-first" | "local-only" | "subscription-only"

	// EstimatedPromptTokens, when > 0, drives context-window gating.
	EstimatedPromptTokens int

	// RequiresTools, when true, requires the candidate to support tool calling.
	RequiresTools bool
	MinPower      int
	MaxPower      int
}

Request is the routing input. All fields are optional except at least one of {Profile, ModelRef, Model, Harness, Provider} should be set (otherwise the engine has nothing to disambiguate on).

Provider is present from day one (fixes ddx-8610020e — no soft-preference dropping).

func (Request) MinContextWindow

func (r Request) MinContextWindow() int

MinContextWindow returns the minimum context window the request requires, derived from EstimatedPromptTokens with a safety margin.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL