routing

package

v0.10.15 Latest Latest Go to latest Published: May 8, 2026 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/DocumentDrivenDX/fizeau

Links

Open Source Insights

Documentation ¶

Overview ¶

Package routing implements the unified routing engine for fizeau. It ranks (harness, provider, model) candidates uniformly per CONTRACT-003.

The engine consolidates DDx-side harness-tier routing and agent-side per-provider failover into a single ranking pipeline.

Index ¶

Constants
Variables
func EscalateProfileAware(current string, ladder []string, req Request, in Inputs) string
func FuzzyMatch(input string, pool []string) string
func ProviderModelKey(provider, endpoint, model string) string
func SameModelIntent(a, b string) bool
type Candidate
type Capabilities
- func (c Capabilities) HasModel(model string) bool
- func (c Capabilities) HasPermissions(perm string) bool
- func (c Capabilities) HasReasoning(value string) bool
type Decision
- func Resolve(req Request, in Inputs) (*Decision, error)
type EndpointLoad
type ErrAllProvidersQuotaExhausted
- func (e ErrAllProvidersQuotaExhausted) Error() string
- func (e ErrAllProvidersQuotaExhausted) Is(target error) bool
- func (e ErrAllProvidersQuotaExhausted) Unwrap() error
type ErrHarnessModelIncompatible
- func (e ErrHarnessModelIncompatible) Error() string
- func (e ErrHarnessModelIncompatible) Is(target error) bool
- func (e ErrHarnessModelIncompatible) Unwrap() error
type ErrNoLiveProvider
- func (e *ErrNoLiveProvider) Error() string
type ErrNoProfileCandidate
- func (e ErrNoProfileCandidate) Error() string
- func (e ErrNoProfileCandidate) Is(target error) bool
- func (e ErrNoProfileCandidate) Unwrap() error
type ErrProfilePinConflict
- func (e ErrProfilePinConflict) Error() string
- func (e ErrProfilePinConflict) Is(target error) bool
- func (e ErrProfilePinConflict) Unwrap() error
type FilterReason
- func CheckGating(cap Capabilities, req Request) (string, FilterReason)
- func CheckPowerEligibility(lookup func(string) (ModelEligibility, bool), model string, req Request) (string, FilterReason)
type HarnessEntry
type Inputs
type ModelEligibility
type NoViableCandidateError
- func (e *NoViableCandidateError) Error() string
type ProviderEntry
type Request
- func (r Request) MinContextWindow() int

Constants ¶

View Source

const (
	ProviderPreferenceLocalFirst        = "local-first"
	ProviderPreferenceSubscriptionFirst = "subscription-first"
	ProviderPreferenceLocalOnly         = "local-only"
	ProviderPreferenceSubscriptionOnly  = "subscription-only"

	ContextSourceProviderAPI    = "provider_api"
	ContextSourceProviderConfig = "provider_config"
	ContextSourceCatalog        = "catalog"
	ContextSourceDefault        = "default"
	ContextSourceUnknown        = "unknown"
)

View Source

const (
	QuotaTrendUnknown    = "unknown"
	QuotaTrendHealthy    = "healthy"
	QuotaTrendBurning    = "burning"
	QuotaTrendExhausting = "exhausting"
)

View Source

const (
	// CostSourceCatalog means cost came from the model catalog.
	CostSourceCatalog = "catalog"
	// CostSourceSubscription means cost came from subscription quota pricing.
	CostSourceSubscription = "subscription"
	// CostSourceUnknown means no reliable cost estimate is available.
	CostSourceUnknown = "unknown"
	// CostSourceUserConfig means cost came from explicit user configuration.
	CostSourceUserConfig = "user-config"
)

View Source

const StickyAffinityBonus = 250.0

Variables ¶

View Source

var ProfileEscalationLadder = []string{"cheap", "standard", "smart"}

ProfileEscalationLadder is the fixed cheap → standard → smart progression service.ResolveRoute walks when every candidate at the requested tier is filtered out (unhealthy or capability-rejected).

Functions ¶

func EscalateProfileAware ¶

func EscalateProfileAware(current string, ladder []string, req Request, in Inputs) string

EscalateProfileAware is a helper for tier escalation. Given a request that failed at one profile, return the next profile to try, restricted to those that have a viable candidate under the current Inputs (i.e., the profile's resolved concrete model exists on the request's pinned provider, if any).

Fixes ddx-3c5ba7cc: tier escalation respects --provider affinity.

Returns "" when no further profile is viable.

func FuzzyMatch ¶

func FuzzyMatch(input string, pool []string) string

FuzzyMatch returns the best concrete model from pool for the given input, using canonical-form (case-insensitive, vendor-prefix-stripped) matching. Returns "" when no candidate matches.

Algorithm:

Exact match (case-insensitive on canonical form) wins outright.
Prefix/suffix/contains match on canonical form. Among fuzzy matches, prefer prefix, then suffix, then contains, and within each tier prefer the shortest remaining text (most specific match).
No match returns "".

func ProviderModelKey ¶

func ProviderModelKey(provider, endpoint, model string) string

ProviderModelKey is the metrics key used by routing callers for provider performance signals. Endpoint is optional; when empty the key remains compatible with older provider:model metrics.

func SameModelIntent ¶

func SameModelIntent(a, b string) bool

SameModelIntent returns true if two model strings refer to the same model after canonicalization. Used by capability gating and provider matching.

Types ¶

type Candidate ¶

type Candidate struct {
	Harness            string
	Provider           string
	Endpoint           string
	ServerInstance     string
	Model              string
	Score              float64
	CostUSDPer1kTokens float64
	CostSource         string
	Power              int
	ContextLength      int
	ContextSource      string
	ScoreComponents    map[string]float64
	Eligible           bool
	Reason             string

	// FilterReason is the typed disqualification category, set at the
	// rejection site that decided why this candidate is ineligible.
	// Empty for eligible candidates. Service-layer code maps this to the
	// public FilterReason* string constants without parsing free-form
	// Reason text.
	FilterReason FilterReason

	// LatencyMS, SuccessRate, and CostClass expose the score-component
	// inputs so callers can render per-axis explanations alongside the
	// final Score. Zero / negative values mean unknown (see Inputs docs).
	LatencyMS       float64
	SuccessRate     float64
	CostClass       string
	SpeedTPS        float64
	Utilization     float64
	ContextHeadroom int

	QuotaOK          bool
	QuotaPercentUsed int
	QuotaTrend       string
	StickyAffinity   float64
}

Candidate is one ranked routing option.

type Capabilities ¶

type Capabilities struct {
	ContextWindow      int      // resolved tokens; 0 = unknown
	SupportsTools      bool     // supports tool/function calling
	SupportsStreaming  bool     // supports streaming responses
	SupportedReasoning []string // supported public reasoning values
	MaxReasoningTokens int      // 0 means numeric reasoning is unsupported/unknown
	SupportedPerms     []string // {"safe","supervised","unrestricted"} subset
	ExactPinSupport    bool     // accepts exact concrete model pins
	SupportedModels    []string // nil = no static allow-list
}

Capabilities describes what a (harness, provider, model) tuple can do. Populated from harness config + catalog metadata + provider discovery.

func (Capabilities) HasModel ¶

func (c Capabilities) HasModel(model string) bool

HasModel returns true when the exact model pin is within the static harness allow-list. A nil allow-list means the harness delegates validation to provider-side model checks.

func (Capabilities) HasPermissions ¶

func (c Capabilities) HasPermissions(perm string) bool

HasPermissions returns true if the candidate supports the requested level. An empty permission level always returns true.

func (Capabilities) HasReasoning ¶

func (c Capabilities) HasReasoning(value string) bool

HasReasoning returns true if the candidate supports the requested reasoning value. Empty, auto, off, and numeric 0 impose no requirement.

type Decision ¶

type Decision struct {
	Harness        string
	Provider       string
	Endpoint       string
	ServerInstance string
	Model          string
	Reason         string
	Candidates     []Candidate
}

Decision is the routing engine's output: the picked candidate plus the full ranked list (including rejected ones with rejection reasons).

func Resolve ¶

func Resolve(req Request, in Inputs) (*Decision, error)

Resolve runs the engine end-to-end and returns a Decision.

The engine:

Enumerates (harness, provider, model) candidates from inputs.
Applies gating (capability, override, model-pin, surface).
Scores per profile policy with cooldown demotion + perf bias.
Sorts viable → score → cost → locality → name.
Returns the top viable candidate with the full ranked list.

Returns an error only when no viable candidate exists.

type EndpointLoad ¶ added in v0.10.9

type EndpointLoad struct {
	LeaseCount           int
	NormalizedLoad       float64
	UtilizationFresh     bool
	UtilizationSaturated bool
}

EndpointLoad is the routing engine's normalized view of endpoint load for a single provider/model tuple.

type ErrAllProvidersQuotaExhausted ¶ added in v0.10.4

type ErrAllProvidersQuotaExhausted struct {
	RetryAfter         time.Time
	ExhaustedProviders []string
}

ErrAllProvidersQuotaExhausted reports that every routing candidate that would have been eligible for the request was filtered out solely because its provider is currently in quota_exhausted state. RetryAfter is the earliest expected provider-recovery time across the exhausted set.

func (ErrAllProvidersQuotaExhausted) Error ¶ added in v0.10.4

func (e ErrAllProvidersQuotaExhausted) Error() string

func (ErrAllProvidersQuotaExhausted) Is ¶ added in v0.10.4

func (e ErrAllProvidersQuotaExhausted) Is(target error) bool

func (ErrAllProvidersQuotaExhausted) Unwrap ¶ added in v0.10.4

func (e ErrAllProvidersQuotaExhausted) Unwrap() error

type ErrHarnessModelIncompatible ¶

type ErrHarnessModelIncompatible struct {
	// Harness is the canonical harness name supplied by the caller.
	Harness string
	// Model is the exact concrete model pin supplied by the caller.
	Model string
	// SupportedModels is the harness allow-list that rejected Model.
	SupportedModels []string
}

ErrHarnessModelIncompatible reports an explicit Harness+Model pin that the harness allow-list cannot serve.

func (ErrHarnessModelIncompatible) Error ¶

func (e ErrHarnessModelIncompatible) Error() string

func (ErrHarnessModelIncompatible) Is ¶

func (e ErrHarnessModelIncompatible) Is(target error) bool

func (ErrHarnessModelIncompatible) Unwrap ¶

func (e ErrHarnessModelIncompatible) Unwrap() error

type ErrNoLiveProvider ¶

type ErrNoLiveProvider struct {
	// PromptTokens is the request's EstimatedPromptTokens at the time
	// escalation began. Zero means no prompt-token gating was active.
	PromptTokens int
	// RequiresTools mirrors the request's RequiresTools flag.
	RequiresTools bool
	// StartingTier is the profile name that escalation began from
	// (the profile in the original request).
	StartingTier string
}

ErrNoLiveProvider reports that profile-tier escalation walked the entire ladder (cheap → standard → smart) without finding a live provider that can serve the request. Callers translate this into a precise user-facing message naming the prompt size and tool requirement so operators know what capability is missing across all tiers.

func (*ErrNoLiveProvider) Error ¶

func (e *ErrNoLiveProvider) Error() string

type ErrNoProfileCandidate ¶

type ErrNoProfileCandidate struct {
	Profile           string
	MissingCapability string
	Rejected          int
}

ErrNoProfileCandidate reports that a profile's hard placement requirement could not be satisfied by any routed candidate.

func (ErrNoProfileCandidate) Error ¶

func (e ErrNoProfileCandidate) Error() string

func (ErrNoProfileCandidate) Is ¶

func (e ErrNoProfileCandidate) Is(target error) bool

func (ErrNoProfileCandidate) Unwrap ¶

func (e ErrNoProfileCandidate) Unwrap() error

type ErrProfilePinConflict ¶

type ErrProfilePinConflict struct {
	// Profile is the explicit profile requested by the caller.
	Profile string
	// ConflictingPin names the explicit pin that violates the profile, such as
	// "Harness=claude" or "Model=local-model".
	ConflictingPin string
	// ProfileConstraint is a short description of the profile placement rule,
	// such as "local-only" or "subscription-only".
	ProfileConstraint string
}

ErrProfilePinConflict reports an explicit Profile whose placement constraint contradicts another explicit caller pin.

func (ErrProfilePinConflict) Error ¶

func (e ErrProfilePinConflict) Error() string

func (ErrProfilePinConflict) Is ¶

func (e ErrProfilePinConflict) Is(target error) bool

func (ErrProfilePinConflict) Unwrap ¶

func (e ErrProfilePinConflict) Unwrap() error

type FilterReason ¶

type FilterReason string

FilterReason categorizes why a routing candidate was disqualified. The zero value (empty string) means the candidate is eligible.

const (
	// FilterReasonEligible is the zero value for an eligible candidate.
	FilterReasonEligible FilterReason = ""
	// FilterReasonContextTooSmall: candidate's context window is below the
	// request's MinContextWindow().
	FilterReasonContextTooSmall FilterReason = "context_too_small"
	// FilterReasonNoToolSupport: request needs tool calling but candidate
	// does not support it.
	FilterReasonNoToolSupport FilterReason = "no_tool_support"
	// FilterReasonReasoningUnsupported: candidate cannot satisfy the
	// requested reasoning policy.
	FilterReasonReasoningUnsupported FilterReason = "reasoning_unsupported"
	// FilterReasonUnhealthy: harness/provider is unavailable, in cooldown,
	// out of quota, or excluded by a hard provider-preference gate.
	FilterReasonUnhealthy FilterReason = "unhealthy"
	// FilterReasonScoredBelowTop: catch-all for ineligibility that does
	// not fit a more specific category (also used for capability
	// mismatches such as permissions/model-pin/exact-pin and for model
	// resolution failures).
	FilterReasonScoredBelowTop FilterReason = "scored_below_top"
	// FilterReasonPinMismatch: candidate was rejected because it does not
	// satisfy an explicit caller pin such as Provider.
	FilterReasonPinMismatch FilterReason = "pin_mismatch"
	// FilterReasonPowerMissing: automatic routing requires catalog power
	// metadata, but the candidate model is unknown or has zero power.
	FilterReasonPowerMissing FilterReason = "power_missing"
	// FilterReasonBelowMinPower: candidate model power is below req.MinPower.
	FilterReasonBelowMinPower FilterReason = "below_min_power"
	// FilterReasonAboveMaxPower: candidate model power is above req.MaxPower.
	FilterReasonAboveMaxPower FilterReason = "above_max_power"
	// FilterReasonExactPinOnly: catalog marks the model as only eligible for
	// explicit model pins.
	FilterReasonExactPinOnly FilterReason = "exact_pin_only"
	// FilterReasonNotAutoRoutable: catalog metadata exists but marks the
	// model inactive, deprecated, stale, or otherwise excluded from
	// automatic routing.
	FilterReasonNotAutoRoutable FilterReason = "not_auto_routable"
	// FilterReasonQuotaExhausted: provider is in the quota_exhausted state
	// with retry_after in the future. The candidate would have been eligible
	// otherwise.
	FilterReasonQuotaExhausted FilterReason = "quota_exhausted"
)

func CheckGating ¶

func CheckGating(cap Capabilities, req Request) (string, FilterReason)

CheckGating applies all capability gates against a request and returns the first failure reason (free-form string for diagnostics) plus the typed FilterReason category for the failure. Returns ("", FilterReasonEligible) when all gates pass.

The typed return is the authoritative classification — callers must not re-classify by parsing the string. The string is for human-readable diagnostics only.

Fixes ddx-4817edfd subtree: pre-dispatch capability check covering context window, tool support, effort, permissions.

func CheckPowerEligibility ¶

func CheckPowerEligibility(lookup func(string) (ModelEligibility, bool), model string, req Request) (string, FilterReason)

CheckPowerEligibility applies catalog-power gates for unpinned automatic routing. Any explicit hard route pin (harness, provider, or model) bypasses this gate so caller-chosen routes are never broadened or overridden by power policy.

type HarnessEntry ¶

type HarnessEntry struct {
	Name                string
	Surface             string
	CostClass           string
	IsLocal             bool
	IsSubscription      bool
	IsHTTPProvider      bool
	AutoRoutingEligible bool
	TestOnly            bool
	ExactPinSupport     bool
	DefaultModel        string
	SupportedModels     []string
	SupportedReasoning  []string
	MaxReasoningTokens  int
	SupportedPerms      []string
	SupportsTools       bool

	// Available reflects the harness's discovered availability.
	Available bool

	// QuotaOK / QuotaPercentUsed reflect live quota state (when applicable).
	// SubscriptionOK gates subscription harnesses at the eligibility level:
	// when false, the candidate is ineligible regardless of score.
	QuotaOK          bool
	QuotaPercentUsed int
	QuotaStale       bool
	QuotaTrend       string // unknown|healthy|burning|exhausting
	QuotaReason      string
	SubscriptionOK   bool // false = hard gate; true = score-based demotion

	// InCooldown marks the entire harness as being in a failure cooldown.
	// When true the harness is demoted in score (via candidateInternal.InCooldown)
	// but not hard-rejected, so it can still win when all other harnesses are
	// also unavailable.
	InCooldown bool

	// Providers is the list of providers this harness can dispatch to.
	// For subprocess harnesses (claude/codex) this is typically a single
	// vendor entry. For the native "fiz" harness it is the configured
	// list of HTTP providers.
	Providers []ProviderEntry
}

HarnessEntry is the harness-side input the caller (service) supplies. It is the routing engine's view of a registered harness; the engine does not import the harnesses package directly to keep the dependency narrow.

type Inputs ¶

type Inputs struct {
	Harnesses                    []HarnessEntry
	HistoricalSuccess            map[string]float64 // by harness name; -1 = insufficient data
	ObservedSpeedTPS             map[string]float64 // by "provider:model"
	ProviderSuccessRate          map[string]float64 // by ProviderModelKey(provider, endpoint, model)
	ObservedLatencyMS            map[string]float64 // by ProviderModelKey(provider, endpoint, model)
	EndpointLoads                map[string]EndpointLoad
	EndpointLoadResolver         func(provider, endpoint, model string) (EndpointLoad, bool)
	StickyServerInstanceResolver func(stickyKey string) (string, bool)
	ProviderCooldowns            map[string]time.Time // by provider name
	CooldownDuration             time.Duration        // 0 = no cooldown enforcement

	// ProviderQuotaExhaustedUntil maps provider name → retry_after time.
	// A provider with retry_after > Now is treated as quota_exhausted and
	// disqualified from candidate selection. The service maintains the
	// per-provider quota state machine and projects it into this map for
	// each routing call.
	ProviderQuotaExhaustedUntil map[string]time.Time
	Now                         time.Time // injected for deterministic testing; default time.Now()
	CatalogResolver             func(ref, surface string) (concreteModel string, ok bool)
	CatalogCandidatesResolver   func(ref, surface string) (concreteModels []string, ok bool)
	ModelEligibility            func(model string) (ModelEligibility, bool)

	// ReasoningResolver returns the catalog's surface_policy reasoning_default
	// for a (profile, surface) pair. When set, buildHarnessCandidates uses it
	// to resolve Reasoning=auto to a concrete level before invoking the
	// capability gate, so candidates that cannot satisfy the resolved level
	// (e.g. an off-only variant under a profile whose surface default is
	// "high") are correctly disqualified instead of slipping through.
	ReasoningResolver func(profile, surface string) (resolved string, ok bool)
}

Inputs bundles the engine's external data sources.

type ModelEligibility ¶

type ModelEligibility struct {
	Power        int
	ExactPinOnly bool
	AutoRoutable bool
}

ModelEligibility is the routing engine's catalog-power view for one model. Unknown or zero-power models are still usable through exact Model pins, but are excluded from unpinned automatic routing.

type NoViableCandidateError ¶

type NoViableCandidateError struct {
	Rejected int
	Model    string
	Provider string
	Harness  string
	MinPower int
	MaxPower int
}

NoViableCandidateError reports that routing evaluated candidates but every one failed a gate.

func (*NoViableCandidateError) Error ¶

func (e *NoViableCandidateError) Error() string

type ProviderEntry ¶

type ProviderEntry struct {
	Name                 string
	BaseURL              string
	ServerInstance       string
	EndpointName         string
	EndpointBaseURL      string
	DefaultModel         string
	CostClass            string
	DiscoveredIDs        []string // models discovered via /v1/models or equivalent
	DiscoveryAttempted   bool
	ContextWindows       map[string]int
	ContextWindowSources map[string]string
	ContextWindow        int
	ContextWindowSource  string
	SupportsTools        bool

	// CostUSDPer1kTokens is the estimated blended USD cost per 1,000 tokens.
	// A zero value with CostSourceUnknown means the provider cost is unknown.
	CostUSDPer1kTokens float64
	// CostSource describes where CostUSDPer1kTokens came from: catalog,
	// subscription, user-config, or unknown.
	CostSource string

	// InCooldown reflects whether this provider is in a failure-cooldown window.
	InCooldown bool
}

ProviderEntry describes one provider available under a harness.

type Request ¶

type Request struct {
	Profile            string // "cheap" | "standard" | "smart"
	ModelRef           string // catalog alias (e.g. "qwen/qwen3.6")
	Model              string // exact concrete model pin
	Provider           string // exact provider pin; constrains routing to one provider identity
	Harness            string // hard preference; constrains routing to one harness
	Reasoning          string // public reasoning scalar
	Permissions        string // "safe" | "supervised" | "unrestricted"
	ProviderPreference string // "local-first" | "subscription-first" | "local-only" | "subscription-only"
	CorrelationID      string // validated sticky route key, when available

	// EstimatedPromptTokens, when > 0, drives context-window gating.
	EstimatedPromptTokens int

	// RequiresTools, when true, requires the candidate to support tool calling.
	RequiresTools bool
	MinPower      int
	MaxPower      int
}

Request is the routing input. All fields are optional except at least one of {Profile, ModelRef, Model, Harness, Provider} should be set (otherwise the engine has nothing to disambiguate on).

Provider is present from day one (fixes ddx-8610020e — no soft-preference dropping).

func (Request) MinContextWindow ¶

func (r Request) MinContextWindow() int

MinContextWindow returns the minimum context window the request requires, derived from EstimatedPromptTokens with a safety margin.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL