Documentation
¶
Overview ¶
Package routing implements the unified routing engine for fizeau. It ranks (harness, provider, model) candidates uniformly per CONTRACT-003.
The engine consolidates DDx-side harness-tier routing and agent-side per-provider failover into a single ranking pipeline.
Index ¶
- Constants
- Variables
- func EscalateProfileAware(current string, ladder []string, req Request, in Inputs) string
- func FuzzyMatch(input string, pool []string) string
- func ProviderModelKey(provider, endpoint, model string) string
- func SameModelIntent(a, b string) bool
- type Candidate
- type Capabilities
- type Decision
- type EndpointLoad
- type ErrAllProvidersQuotaExhausted
- type ErrHarnessModelIncompatible
- type ErrNoLiveProvider
- type ErrNoProfileCandidate
- type ErrProfilePinConflict
- type FilterReason
- type HarnessEntry
- type Inputs
- type ModelEligibility
- type NoViableCandidateError
- type ProviderEntry
- type Request
Constants ¶
const ( ProviderPreferenceLocalFirst = "local-first" ProviderPreferenceSubscriptionFirst = "subscription-first" ProviderPreferenceLocalOnly = "local-only" ProviderPreferenceSubscriptionOnly = "subscription-only" ContextSourceProviderAPI = "provider_api" ContextSourceProviderConfig = "provider_config" ContextSourceCatalog = "catalog" ContextSourceDefault = "default" ContextSourceUnknown = "unknown" )
const ( QuotaTrendUnknown = "unknown" QuotaTrendHealthy = "healthy" QuotaTrendBurning = "burning" QuotaTrendExhausting = "exhausting" )
const ( // CostSourceCatalog means cost came from the model catalog. CostSourceCatalog = "catalog" // CostSourceSubscription means cost came from subscription quota pricing. CostSourceSubscription = "subscription" // CostSourceUnknown means no reliable cost estimate is available. CostSourceUnknown = "unknown" // CostSourceUserConfig means cost came from explicit user configuration. CostSourceUserConfig = "user-config" )
const StickyAffinityBonus = 250.0
Variables ¶
var ProfileEscalationLadder = []string{"cheap", "standard", "smart"}
ProfileEscalationLadder is the fixed cheap → standard → smart progression service.ResolveRoute walks when every candidate at the requested tier is filtered out (unhealthy or capability-rejected).
Functions ¶
func EscalateProfileAware ¶
EscalateProfileAware is a helper for tier escalation. Given a request that failed at one profile, return the next profile to try, restricted to those that have a viable candidate under the current Inputs (i.e., the profile's resolved concrete model exists on the request's pinned provider, if any).
Fixes ddx-3c5ba7cc: tier escalation respects --provider affinity.
Returns "" when no further profile is viable.
func FuzzyMatch ¶
FuzzyMatch returns the best concrete model from pool for the given input, using canonical-form (case-insensitive, vendor-prefix-stripped) matching. Returns "" when no candidate matches.
Algorithm:
- Exact match (case-insensitive on canonical form) wins outright.
- Prefix/suffix/contains match on canonical form. Among fuzzy matches, prefer prefix, then suffix, then contains, and within each tier prefer the shortest remaining text (most specific match).
- No match returns "".
func ProviderModelKey ¶
ProviderModelKey is the metrics key used by routing callers for provider performance signals. Endpoint is optional; when empty the key remains compatible with older provider:model metrics.
func SameModelIntent ¶
SameModelIntent returns true if two model strings refer to the same model after canonicalization. Used by capability gating and provider matching.
Types ¶
type Candidate ¶
type Candidate struct {
Harness string
Provider string
Endpoint string
ServerInstance string
Model string
Score float64
CostUSDPer1kTokens float64
CostSource string
Power int
ContextLength int
ContextSource string
ScoreComponents map[string]float64
Eligible bool
Reason string
// FilterReason is the typed disqualification category, set at the
// rejection site that decided why this candidate is ineligible.
// Empty for eligible candidates. Service-layer code maps this to the
// public FilterReason* string constants without parsing free-form
// Reason text.
FilterReason FilterReason
// LatencyMS, SuccessRate, and CostClass expose the score-component
// inputs so callers can render per-axis explanations alongside the
// final Score. Zero / negative values mean unknown (see Inputs docs).
LatencyMS float64
SuccessRate float64
CostClass string
SpeedTPS float64
Utilization float64
ContextHeadroom int
QuotaOK bool
QuotaPercentUsed int
QuotaTrend string
StickyAffinity float64
}
Candidate is one ranked routing option.
type Capabilities ¶
type Capabilities struct {
ContextWindow int // resolved tokens; 0 = unknown
SupportsTools bool // supports tool/function calling
SupportsStreaming bool // supports streaming responses
SupportedReasoning []string // supported public reasoning values
MaxReasoningTokens int // 0 means numeric reasoning is unsupported/unknown
SupportedPerms []string // {"safe","supervised","unrestricted"} subset
ExactPinSupport bool // accepts exact concrete model pins
SupportedModels []string // nil = no static allow-list
}
Capabilities describes what a (harness, provider, model) tuple can do. Populated from harness config + catalog metadata + provider discovery.
func (Capabilities) HasModel ¶
func (c Capabilities) HasModel(model string) bool
HasModel returns true when the exact model pin is within the static harness allow-list. A nil allow-list means the harness delegates validation to provider-side model checks.
func (Capabilities) HasPermissions ¶
func (c Capabilities) HasPermissions(perm string) bool
HasPermissions returns true if the candidate supports the requested level. An empty permission level always returns true.
func (Capabilities) HasReasoning ¶
func (c Capabilities) HasReasoning(value string) bool
HasReasoning returns true if the candidate supports the requested reasoning value. Empty, auto, off, and numeric 0 impose no requirement.
type Decision ¶
type Decision struct {
Harness string
Provider string
Endpoint string
ServerInstance string
Model string
Reason string
Candidates []Candidate
}
Decision is the routing engine's output: the picked candidate plus the full ranked list (including rejected ones with rejection reasons).
func Resolve ¶
Resolve runs the engine end-to-end and returns a Decision.
The engine:
- Enumerates (harness, provider, model) candidates from inputs.
- Applies gating (capability, override, model-pin, surface).
- Scores per profile policy with cooldown demotion + perf bias.
- Sorts viable → score → cost → locality → name.
- Returns the top viable candidate with the full ranked list.
Returns an error only when no viable candidate exists.
type EndpointLoad ¶ added in v0.10.9
type EndpointLoad struct {
LeaseCount int
NormalizedLoad float64
UtilizationFresh bool
UtilizationSaturated bool
}
EndpointLoad is the routing engine's normalized view of endpoint load for a single provider/model tuple.
type ErrAllProvidersQuotaExhausted ¶ added in v0.10.4
ErrAllProvidersQuotaExhausted reports that every routing candidate that would have been eligible for the request was filtered out solely because its provider is currently in quota_exhausted state. RetryAfter is the earliest expected provider-recovery time across the exhausted set.
func (ErrAllProvidersQuotaExhausted) Error ¶ added in v0.10.4
func (e ErrAllProvidersQuotaExhausted) Error() string
func (ErrAllProvidersQuotaExhausted) Is ¶ added in v0.10.4
func (e ErrAllProvidersQuotaExhausted) Is(target error) bool
func (ErrAllProvidersQuotaExhausted) Unwrap ¶ added in v0.10.4
func (e ErrAllProvidersQuotaExhausted) Unwrap() error
type ErrHarnessModelIncompatible ¶
type ErrHarnessModelIncompatible struct {
// Harness is the canonical harness name supplied by the caller.
Harness string
// Model is the exact concrete model pin supplied by the caller.
Model string
// SupportedModels is the harness allow-list that rejected Model.
SupportedModels []string
}
ErrHarnessModelIncompatible reports an explicit Harness+Model pin that the harness allow-list cannot serve.
func (ErrHarnessModelIncompatible) Error ¶
func (e ErrHarnessModelIncompatible) Error() string
func (ErrHarnessModelIncompatible) Is ¶
func (e ErrHarnessModelIncompatible) Is(target error) bool
func (ErrHarnessModelIncompatible) Unwrap ¶
func (e ErrHarnessModelIncompatible) Unwrap() error
type ErrNoLiveProvider ¶
type ErrNoLiveProvider struct {
// PromptTokens is the request's EstimatedPromptTokens at the time
// escalation began. Zero means no prompt-token gating was active.
PromptTokens int
// RequiresTools mirrors the request's RequiresTools flag.
RequiresTools bool
// StartingTier is the profile name that escalation began from
// (the profile in the original request).
StartingTier string
}
ErrNoLiveProvider reports that profile-tier escalation walked the entire ladder (cheap → standard → smart) without finding a live provider that can serve the request. Callers translate this into a precise user-facing message naming the prompt size and tool requirement so operators know what capability is missing across all tiers.
func (*ErrNoLiveProvider) Error ¶
func (e *ErrNoLiveProvider) Error() string
type ErrNoProfileCandidate ¶
ErrNoProfileCandidate reports that a profile's hard placement requirement could not be satisfied by any routed candidate.
func (ErrNoProfileCandidate) Error ¶
func (e ErrNoProfileCandidate) Error() string
func (ErrNoProfileCandidate) Is ¶
func (e ErrNoProfileCandidate) Is(target error) bool
func (ErrNoProfileCandidate) Unwrap ¶
func (e ErrNoProfileCandidate) Unwrap() error
type ErrProfilePinConflict ¶
type ErrProfilePinConflict struct {
// Profile is the explicit profile requested by the caller.
Profile string
// ConflictingPin names the explicit pin that violates the profile, such as
// "Harness=claude" or "Model=local-model".
ConflictingPin string
// ProfileConstraint is a short description of the profile placement rule,
// such as "local-only" or "subscription-only".
ProfileConstraint string
}
ErrProfilePinConflict reports an explicit Profile whose placement constraint contradicts another explicit caller pin.
func (ErrProfilePinConflict) Error ¶
func (e ErrProfilePinConflict) Error() string
func (ErrProfilePinConflict) Is ¶
func (e ErrProfilePinConflict) Is(target error) bool
func (ErrProfilePinConflict) Unwrap ¶
func (e ErrProfilePinConflict) Unwrap() error
type FilterReason ¶
type FilterReason string
FilterReason categorizes why a routing candidate was disqualified. The zero value (empty string) means the candidate is eligible.
const ( // FilterReasonEligible is the zero value for an eligible candidate. FilterReasonEligible FilterReason = "" // FilterReasonContextTooSmall: candidate's context window is below the // request's MinContextWindow(). FilterReasonContextTooSmall FilterReason = "context_too_small" // FilterReasonNoToolSupport: request needs tool calling but candidate // does not support it. FilterReasonNoToolSupport FilterReason = "no_tool_support" // FilterReasonReasoningUnsupported: candidate cannot satisfy the // requested reasoning policy. FilterReasonReasoningUnsupported FilterReason = "reasoning_unsupported" // FilterReasonUnhealthy: harness/provider is unavailable, in cooldown, // out of quota, or excluded by a hard provider-preference gate. FilterReasonUnhealthy FilterReason = "unhealthy" // FilterReasonScoredBelowTop: catch-all for ineligibility that does // not fit a more specific category (also used for capability // mismatches such as permissions/model-pin/exact-pin and for model // resolution failures). FilterReasonScoredBelowTop FilterReason = "scored_below_top" // FilterReasonPinMismatch: candidate was rejected because it does not // satisfy an explicit caller pin such as Provider. FilterReasonPinMismatch FilterReason = "pin_mismatch" // FilterReasonPowerMissing: automatic routing requires catalog power // metadata, but the candidate model is unknown or has zero power. FilterReasonPowerMissing FilterReason = "power_missing" // FilterReasonBelowMinPower: candidate model power is below req.MinPower. FilterReasonBelowMinPower FilterReason = "below_min_power" // FilterReasonAboveMaxPower: candidate model power is above req.MaxPower. FilterReasonAboveMaxPower FilterReason = "above_max_power" // FilterReasonExactPinOnly: catalog marks the model as only eligible for // explicit model pins. FilterReasonExactPinOnly FilterReason = "exact_pin_only" // FilterReasonNotAutoRoutable: catalog metadata exists but marks the // model inactive, deprecated, stale, or otherwise excluded from // automatic routing. FilterReasonNotAutoRoutable FilterReason = "not_auto_routable" // FilterReasonQuotaExhausted: provider is in the quota_exhausted state // with retry_after in the future. The candidate would have been eligible // otherwise. FilterReasonQuotaExhausted FilterReason = "quota_exhausted" )
func CheckGating ¶
func CheckGating(cap Capabilities, req Request) (string, FilterReason)
CheckGating applies all capability gates against a request and returns the first failure reason (free-form string for diagnostics) plus the typed FilterReason category for the failure. Returns ("", FilterReasonEligible) when all gates pass.
The typed return is the authoritative classification — callers must not re-classify by parsing the string. The string is for human-readable diagnostics only.
Fixes ddx-4817edfd subtree: pre-dispatch capability check covering context window, tool support, effort, permissions.
func CheckPowerEligibility ¶
func CheckPowerEligibility(lookup func(string) (ModelEligibility, bool), model string, req Request) (string, FilterReason)
CheckPowerEligibility applies catalog-power gates for unpinned automatic routing. Any explicit hard route pin (harness, provider, or model) bypasses this gate so caller-chosen routes are never broadened or overridden by power policy.
type HarnessEntry ¶
type HarnessEntry struct {
Name string
Surface string
CostClass string
IsLocal bool
IsSubscription bool
IsHTTPProvider bool
AutoRoutingEligible bool
TestOnly bool
ExactPinSupport bool
DefaultModel string
SupportedModels []string
SupportedReasoning []string
MaxReasoningTokens int
SupportedPerms []string
SupportsTools bool
// Available reflects the harness's discovered availability.
Available bool
// QuotaOK / QuotaPercentUsed reflect live quota state (when applicable).
// SubscriptionOK gates subscription harnesses at the eligibility level:
// when false, the candidate is ineligible regardless of score.
QuotaOK bool
QuotaPercentUsed int
QuotaStale bool
QuotaTrend string // unknown|healthy|burning|exhausting
QuotaReason string
SubscriptionOK bool // false = hard gate; true = score-based demotion
// InCooldown marks the entire harness as being in a failure cooldown.
// When true the harness is demoted in score (via candidateInternal.InCooldown)
// but not hard-rejected, so it can still win when all other harnesses are
// also unavailable.
InCooldown bool
// Providers is the list of providers this harness can dispatch to.
// For subprocess harnesses (claude/codex) this is typically a single
// vendor entry. For the native "fiz" harness it is the configured
// list of HTTP providers.
Providers []ProviderEntry
}
HarnessEntry is the harness-side input the caller (service) supplies. It is the routing engine's view of a registered harness; the engine does not import the harnesses package directly to keep the dependency narrow.
type Inputs ¶
type Inputs struct {
Harnesses []HarnessEntry
HistoricalSuccess map[string]float64 // by harness name; -1 = insufficient data
ObservedSpeedTPS map[string]float64 // by "provider:model"
ProviderSuccessRate map[string]float64 // by ProviderModelKey(provider, endpoint, model)
ObservedLatencyMS map[string]float64 // by ProviderModelKey(provider, endpoint, model)
EndpointLoads map[string]EndpointLoad
EndpointLoadResolver func(provider, endpoint, model string) (EndpointLoad, bool)
StickyServerInstanceResolver func(stickyKey string) (string, bool)
ProviderCooldowns map[string]time.Time // by provider name
CooldownDuration time.Duration // 0 = no cooldown enforcement
// ProviderQuotaExhaustedUntil maps provider name → retry_after time.
// A provider with retry_after > Now is treated as quota_exhausted and
// disqualified from candidate selection. The service maintains the
// per-provider quota state machine and projects it into this map for
// each routing call.
ProviderQuotaExhaustedUntil map[string]time.Time
Now time.Time // injected for deterministic testing; default time.Now()
CatalogResolver func(ref, surface string) (concreteModel string, ok bool)
CatalogCandidatesResolver func(ref, surface string) (concreteModels []string, ok bool)
ModelEligibility func(model string) (ModelEligibility, bool)
// ReasoningResolver returns the catalog's surface_policy reasoning_default
// for a (profile, surface) pair. When set, buildHarnessCandidates uses it
// to resolve Reasoning=auto to a concrete level before invoking the
// capability gate, so candidates that cannot satisfy the resolved level
// (e.g. an off-only variant under a profile whose surface default is
// "high") are correctly disqualified instead of slipping through.
ReasoningResolver func(profile, surface string) (resolved string, ok bool)
}
Inputs bundles the engine's external data sources.
type ModelEligibility ¶
ModelEligibility is the routing engine's catalog-power view for one model. Unknown or zero-power models are still usable through exact Model pins, but are excluded from unpinned automatic routing.
type NoViableCandidateError ¶
type NoViableCandidateError struct {
Rejected int
Model string
Provider string
Harness string
MinPower int
MaxPower int
}
NoViableCandidateError reports that routing evaluated candidates but every one failed a gate.
func (*NoViableCandidateError) Error ¶
func (e *NoViableCandidateError) Error() string
type ProviderEntry ¶
type ProviderEntry struct {
Name string
BaseURL string
ServerInstance string
EndpointName string
EndpointBaseURL string
DefaultModel string
CostClass string
DiscoveredIDs []string // models discovered via /v1/models or equivalent
DiscoveryAttempted bool
ContextWindows map[string]int
ContextWindowSources map[string]string
ContextWindow int
ContextWindowSource string
SupportsTools bool
// CostUSDPer1kTokens is the estimated blended USD cost per 1,000 tokens.
// A zero value with CostSourceUnknown means the provider cost is unknown.
CostUSDPer1kTokens float64
// CostSource describes where CostUSDPer1kTokens came from: catalog,
// subscription, user-config, or unknown.
CostSource string
// InCooldown reflects whether this provider is in a failure-cooldown window.
InCooldown bool
}
ProviderEntry describes one provider available under a harness.
type Request ¶
type Request struct {
Profile string // "cheap" | "standard" | "smart"
ModelRef string // catalog alias (e.g. "qwen/qwen3.6")
Model string // exact concrete model pin
Provider string // exact provider pin; constrains routing to one provider identity
Harness string // hard preference; constrains routing to one harness
Reasoning string // public reasoning scalar
Permissions string // "safe" | "supervised" | "unrestricted"
ProviderPreference string // "local-first" | "subscription-first" | "local-only" | "subscription-only"
CorrelationID string // validated sticky route key, when available
// EstimatedPromptTokens, when > 0, drives context-window gating.
EstimatedPromptTokens int
// RequiresTools, when true, requires the candidate to support tool calling.
RequiresTools bool
MinPower int
MaxPower int
}
Request is the routing input. All fields are optional except at least one of {Profile, ModelRef, Model, Harness, Provider} should be set (otherwise the engine has nothing to disambiguate on).
Provider is present from day one (fixes ddx-8610020e — no soft-preference dropping).
func (Request) MinContextWindow ¶
MinContextWindow returns the minimum context window the request requires, derived from EstimatedPromptTokens with a safety margin.