optimize

package
v0.1.0-beta.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 9, 2026 License: Apache-2.0 Imports: 2 Imported by: 0

Documentation

Overview

Package optimize produces actionable findings from gateway-observed data — server registrations, per-server token + cost totals, and per-(server, tool) call counts — to help platform engineers reduce spend on a running gridctl stack.

The package is read-only: it never mutates accumulator state or the running gateway. Callers assemble a Stats snapshot, hand it to Analyze, and render the resulting OptimizeReport (CLI table, JSON, or Web UI).

Heuristics in v1 (PR 4 of the gateway-cost-observability feature):

  • unused_server: a registered server has seen zero tool calls in the freshness window. Remediation: drop the server from the stack YAML.
  • unused_tool: a registered tool on an active server has not been called in the freshness window. Remediation: add it to the server's tools: exclusion list.

Additional heuristics (schema_overhead, format_savings_shortfall, expensive_model_on_cheap_task) ship in PR 5; the Stats shape is intentionally additive so future heuristics can read more inputs without breaking call sites.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Finding

type Finding struct {
	// ID is a stable kebab-case identifier (e.g. "unused-server-github").
	// Generated from the heuristic name plus the targeted server/tool.
	ID string `json:"id"`

	// Heuristic is the rule that fired (unused_server, unused_tool, ...).
	Heuristic string `json:"heuristic"`

	// Severity drives CLI exit codes and Web UI badge color.
	Severity Severity `json:"severity"`

	// Title is a short user-facing summary (≤ 80 chars typical).
	Title string `json:"title"`

	// Summary is a longer explanation, including measured numbers.
	Summary string `json:"summary"`

	// Server names the MCP server the finding refers to. Empty for
	// stack-wide findings such as the "<24h of data" info gate.
	Server string `json:"server,omitempty"`

	// Tool names the specific tool the finding refers to. Empty for
	// server-level findings.
	Tool string `json:"tool,omitempty"`

	// ImpactUSDPerWeek is the projected weekly USD savings from
	// applying the remediation. Always derived from measured data;
	// findings that cannot prove an impact set this to zero.
	ImpactUSDPerWeek float64 `json:"impact_usd_per_week"`

	// Remediation is a paste-ready YAML snippet or shell command that
	// resolves the finding. Multi-line strings are allowed.
	Remediation string `json:"remediation"`

	// DetectedAt is the wall-clock time the report was generated, not
	// the time the underlying condition began.
	DetectedAt time.Time `json:"detected_at"`
}

Finding is a single optimization recommendation. Each finding carries enough context for the user to either dismiss it (info) or paste the Remediation snippet into their stack YAML.

type FormatBaseline

type FormatBaseline struct {
	// OriginalTokens is the pre-conversion token count summed across
	// every conversion the gateway has observed.
	OriginalTokens int64
	// FormattedTokens is the post-conversion token count for the same
	// observations.
	FormattedTokens int64
	// SavingsPercent is the percentage of tokens saved by conversion
	// (0–100). Computed by the caller; zero means "no demonstrated
	// savings yet" and the heuristic skips.
	SavingsPercent float64
}

FormatBaseline summarizes the session-wide token savings achieved by servers already using `output_format: toon|csv` conversion. The format_savings_shortfall heuristic projects this rate onto candidate servers to derive a measured impact estimate.

type ModelStat

type ModelStat struct {
	// Model is the canonical model ID (e.g. "claude-opus-4-7"). Empty
	// when the gateway has no resolver wired for the server, in which
	// case the heuristic falls back to inferring rate from observed
	// cost.
	Model string
	// InputUSDPerToken is the input-token rate from pricing.Lookup.
	// Zero disables the rate-based path; the heuristic falls back to
	// inferring from observed cost÷tokens.
	InputUSDPerToken float64
	// OutputUSDPerToken is the output-token rate from pricing.Lookup.
	// Provided alongside InputUSDPerToken for completeness; the
	// heuristic itself only needs InputUSDPerToken to detect
	// Opus-tier pricing.
	OutputUSDPerToken float64
}

ModelStat names the dominant model for a server and carries its per-token rates. Used by expensive_model_on_cheap_task to detect "Opus-tier model on a simple lookup tool" without re-deriving rates from cost÷tokens (which conflates input vs output vs cache rates).

type OptimizeReport

type OptimizeReport struct {
	Findings    []Finding `json:"findings"`
	HealthScore int       `json:"health_score"`
	GeneratedAt time.Time `json:"generated_at"`
}

OptimizeReport is the full output of one optimize pass.

func Analyze

func Analyze(stats Stats, opts Options) OptimizeReport

Analyze runs the v1 heuristic pass over the supplied Stats and returns a fully-populated OptimizeReport. The report's Findings slice is sorted by severity (critical → warn → info) then by impact descending so renderers can stream the most actionable finding first without re-sorting.

type Options

type Options struct {
	// MinImpactUSDPerWeek filters findings whose projected weekly
	// savings fall below the threshold. Zero disables the filter.
	MinImpactUSDPerWeek float64

	// SeverityFilter, when non-empty, drops findings whose severity is
	// not in the set. Use to render only warn/critical in CI.
	SeverityFilter []Severity
}

Options tunes the Analyze pass. All fields are optional.

type PinStat

type PinStat struct {
	// SchemaTokens is the estimated total token cost of the server's
	// tool definitions when serialized into a tools/list response.
	// Populated by counting bytes of marshaled JSON and applying a
	// chars-per-token heuristic.
	SchemaTokens int
}

PinStat carries the schema-overhead inputs for a single server. The pin shape evolves over time, so callers populate this from whichever source is most authoritative on their build (live tool list, pin records, etc.). Zero values cause the heuristic to skip the server.

type ServerInfo

type ServerInfo struct {
	// Name is the server's logical name in the stack YAML.
	Name string

	// Tools is the unprefixed tool list the server exposes through the
	// gateway.
	Tools []string

	// ToolWhitelist is the operator-curated tools: list from the stack
	// YAML. Empty means no whitelist (every tool is exposed).
	ToolWhitelist []string

	// Initialized is true once the gateway has handshaken with the
	// server. Optimize skips uninitialized servers because their tool
	// list may be empty for transient reasons (cold start, network
	// blip) rather than misconfiguration.
	Initialized bool

	// OutputFormat is the configured `output_format` from the stack
	// YAML — "json" (or empty) for the default, "toon" / "csv" / "text"
	// for the format-conversion variants. Powers the
	// format_savings_shortfall heuristic.
	OutputFormat string
}

ServerInfo describes one MCP server registered in the running stack. It is the smallest cross-package shape that lets pkg/optimize reason about which servers and tools exist without depending on pkg/mcp's MCPServerStatus.

type ServerUsage

type ServerUsage struct {
	InputTokens  int64
	OutputTokens int64
	TotalTokens  int64
	TotalCostUSD float64
}

ServerUsage carries per-server token and cost totals as observed by the accumulator. The fields mirror metrics.TokenCounts and metrics.CostCounts so call sites can populate them directly.

func (ServerUsage) CallCount

func (u ServerUsage) CallCount() bool

CallCount returns a coarse "calls happened" indicator for a server. True when any token activity has been recorded, regardless of cost. Used by unused_server: a server with zero observed traffic is unused.

type Severity

type Severity string

Severity classifies findings for filtering and exit-code mapping.

const (
	SeverityInfo     Severity = "info"
	SeverityWarn     Severity = "warn"
	SeverityCritical Severity = "critical"
)

Severity levels in ascending order of actionability.

func (Severity) IsActionable

func (s Severity) IsActionable() bool

IsActionable reports whether a severity should drive a non-zero exit code. info findings (including the "<24h of data" gate) are advisory only and exit cleanly.

type Stats

type Stats struct {
	// StackName is reported back on findings for context. Empty is
	// allowed; no validation here.
	StackName string

	// ObservationStart is the wall-clock time the gateway began
	// recording metrics (typically Accumulator.StartedAt()). Used to
	// gate the "<24h of data" info finding.
	ObservationStart time.Time

	// Now is the analysis time. Tests inject a fixed value; production
	// callers leave this zero so Analyze defaults to time.Now().
	Now time.Time

	// FreshnessWindow is the lookback span for unused_server and
	// unused_tool. Defaults to 7 * 24h when zero.
	FreshnessWindow time.Duration

	// MinObservationWindow is the minimum age of the gateway before
	// non-info findings are emitted. Defaults to 24h when zero. A
	// gateway younger than this returns a single info finding.
	MinObservationWindow time.Duration

	// Servers is every server registered in the active stack.
	Servers []ServerInfo

	// Usage is the per-server token + cost totals keyed by server name.
	// Servers absent from this map are treated as zero-traffic.
	Usage map[string]ServerUsage

	// ToolUsage is per-(server, tool) call counts and last-call
	// timestamps. nil means "no per-tool data captured", which causes
	// Analyze to skip the unused_tool heuristic entirely (rather than
	// flagging every tool as unused).
	ToolUsage map[string]map[string]ToolStat

	// PinStats is per-server schema-overhead inputs. nil disables the
	// schema_overhead heuristic. Populated by callers that can read
	// schema-byte counts off the live gateway tool list (or, in the
	// future, from extended pin records); fall back to nil when no
	// data source is available so the heuristic skips silently.
	PinStats map[string]PinStat

	// FormatBaseline is the session-wide format-conversion savings rate
	// observed across servers that DO use `output_format: toon|csv`.
	// Used by the format_savings_shortfall heuristic to project savings
	// onto servers that have not adopted format conversion. The zero
	// value disables the heuristic — without a baseline we have no
	// gateway-measured evidence that conversion would help here.
	FormatBaseline FormatBaseline

	// ServerCallCount is the total tool-call count per server, summed
	// across that server's tools. Used by
	// expensive_model_on_cheap_task to compute average tokens-per-call.
	// nil means the heuristic skips per-server avg-tokens math.
	ServerCallCount map[string]int64

	// ModelStats carries per-server pricing rates when the call site
	// knows which model dominates traffic for that server. Empty/nil
	// causes expensive_model_on_cheap_task to fall back to inferring
	// the rate from observed cost÷tokens.
	ModelStats map[string]ModelStat
}

Stats is the input snapshot that callers hand to Analyze. Every field is required for the corresponding heuristic to produce a non-info finding; missing inputs degrade gracefully.

type ToolStat

type ToolStat struct {
	Calls        int64
	LastCalledAt time.Time
}

ToolStat mirrors metrics.ToolStat so pkg/optimize stays free of an import on pkg/metrics. Callers can populate it directly from metrics.Accumulator.ToolUsageSnapshot().

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL