usage

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 23, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package usage provides persistent token usage and cost tracking for LLM interactions. Records are append-only and indexed by timestamp, session, and conversation for efficient aggregation queries.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeCost

func ComputeCost(model string, inputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64

ComputeCost calculates the USD cost for a model's token usage based on the pricing table. Models not in the table are treated as free (local/Ollama models).

func ComputeCostForIdentity added in v0.9.1

func ComputeCostForIdentity(identity ModelIdentity, inputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64

ComputeCostForIdentity calculates USD cost for a resolved model identity. The selected deployment ID is checked first, then the upstream model as a fallback so deployment-qualified IDs can reuse provider pricing entries keyed by upstream model name.

func ComputeDetailedCostForIdentity added in v0.9.1

func ComputeDetailedCostForIdentity(identity ModelIdentity, inputTokens, cacheCreationInputTokens, cacheReadInputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64

ComputeDetailedCostForIdentity calculates USD cost for a resolved model identity using uncached input tokens, cache-write input tokens, cache-read input tokens, and output tokens. Deployment-qualified IDs fall back to upstream-model pricing when needed.

Cache-write tokens are charged at the 5m rate (1.25× input) by default. Callers that know the per-TTL split should use ComputeDetailedCostForIdentityWithTTL instead to correctly charge 1h writes at 2.0×.

func ComputeDetailedCostForIdentityWithTTL added in v0.9.1

func ComputeDetailedCostForIdentityWithTTL(identity ModelIdentity, inputTokens, cacheCreationTotal, cacheCreation5m, cacheCreation1h, cacheReadInputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64

ComputeDetailedCostForIdentityWithTTL is the full-fidelity cost function: callers supply the 5m/1h breakdown when the provider exposes it. The unattributed portion of cacheCreationInputTokens (that is, tokens not accounted for in the 5m or 1h buckets) is charged at the 5m rate to avoid retroactive price spikes on legacy records.

func ResolveProvider

func ResolveProvider(model string) string

ResolveProvider infers the LLM provider from the model name. Models starting with "claude-" are Anthropic; everything else is assumed to be Ollama (local).

Types

type GroupedSummary

type GroupedSummary struct {
	Key     string  `json:"key"`
	Summary Summary `json:"summary"`
}

GroupedSummary pairs a grouping key (model name, role, task name) with its aggregated usage totals. Slices of GroupedSummary preserve the SQL ordering (highest cost first).

type ModelIdentity added in v0.9.1

type ModelIdentity struct {
	Model         string
	UpstreamModel string
	Resource      string
	Provider      string
}

ModelIdentity is the normalized usage-facing identity for a selected model/deployment.

func ResolveModelIdentity added in v0.9.1

func ResolveModelIdentity(model string, cat *models.Catalog) ModelIdentity

ResolveModelIdentity resolves usage-facing metadata for a selected model/deployment. When a normalized catalog is available, it is used as the source of truth. Otherwise the function falls back to parsing deployment-qualified IDs like "resource/model".

type Record

type Record struct {
	ID                       string
	Timestamp                time.Time
	RequestID                string
	SessionID                string
	ConversationID           string
	Model                    string // Selected deployment ID when known
	UpstreamModel            string
	Resource                 string
	Provider                 string // Provider family, e.g. "anthropic", "ollama", "lmstudio"
	InputTokens              int
	OutputTokens             int
	CacheCreationInputTokens int
	// CacheCreation5mInputTokens and CacheCreation1hInputTokens break
	// down the cache-write bucket by TTL when the provider exposes it
	// (Anthropic). Their sum is ≤ CacheCreationInputTokens; any
	// shortfall reflects writes the provider didn't attribute. When
	// the breakdown is absent (both zero), cost computation treats the
	// full CacheCreationInputTokens as 5m for the conservative default.
	CacheCreation5mInputTokens int
	CacheCreation1hInputTokens int
	CacheReadInputTokens       int
	CostUSD                    float64
	Role                       string // "interactive", "delegate", "scheduled", "auxiliary"
	TaskName                   string // "email_poll", "periodic_reflection", etc. (empty for interactive)
}

Record represents a single LLM interaction's token usage and cost.

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store is an append-only SQLite store for token usage records. All public methods are safe for concurrent use (SQLite serializes writes).

func NewStore

func NewStore(db *sql.DB) (*Store, error)

NewStore creates a usage store using the given database connection. The caller owns the connection — Store does not close it. The schema is created automatically on first use.

func (*Store) Record

func (s *Store) Record(ctx context.Context, rec Record) error

Record persists a usage record. If rec.ID is empty, a UUIDv7 is generated. The context is used for cancellation only.

func (*Store) Summary

func (s *Store) Summary(start, end time.Time) (*Summary, error)

Summary returns aggregated totals for records within [start, end).

func (*Store) SummaryByGroup added in v0.9.1

func (s *Store) SummaryByGroup(groupBy string, start, end time.Time) ([]GroupedSummary, error)

SummaryByGroup dispatches the grouped summary query based on the caller-provided grouping key.

func (*Store) SummaryByModel

func (s *Store) SummaryByModel(start, end time.Time) ([]GroupedSummary, error)

SummaryByModel returns per-model aggregated totals for records within [start, end), ordered by cost descending.

func (*Store) SummaryByProvider added in v0.9.1

func (s *Store) SummaryByProvider(start, end time.Time) ([]GroupedSummary, error)

SummaryByProvider returns per-provider aggregated totals for records within [start, end), ordered by cost descending.

func (*Store) SummaryByResource added in v0.9.1

func (s *Store) SummaryByResource(start, end time.Time) ([]GroupedSummary, error)

SummaryByResource returns per-resource aggregated totals for records within [start, end), ordered by cost descending.

func (*Store) SummaryByRole

func (s *Store) SummaryByRole(start, end time.Time) ([]GroupedSummary, error)

SummaryByRole returns per-role aggregated totals for records within [start, end), ordered by cost descending.

func (*Store) SummaryByTask

func (s *Store) SummaryByTask(start, end time.Time) ([]GroupedSummary, error)

SummaryByTask returns per-task aggregated totals for records within [start, end), ordered by cost descending. Records with empty task_name are grouped under the key "".

func (*Store) SummaryByUpstreamModel added in v0.9.1

func (s *Store) SummaryByUpstreamModel(start, end time.Time) ([]GroupedSummary, error)

SummaryByUpstreamModel returns per-upstream-model aggregated totals for records within [start, end), ordered by cost descending.

type Summary

type Summary struct {
	TotalRecords                  int     `json:"total_records"`
	TotalInputTokens              int64   `json:"total_input_tokens"`
	TotalOutputTokens             int64   `json:"total_output_tokens"`
	TotalCacheCreationInputTokens int64   `json:"total_cache_creation_input_tokens"`
	TotalCacheReadInputTokens     int64   `json:"total_cache_read_input_tokens"`
	TotalCostUSD                  float64 `json:"total_cost_usd"`
}

Summary holds aggregated token usage and cost totals.

func (Summary) CacheHitRate added in v0.9.1

func (s Summary) CacheHitRate() float64

CacheHitRate returns the fraction of cache-eligible input tokens that were served from cache in this summary, as a value in [0, 1]. Zero when there were no cache-eligible tokens at all (empty window, or caching disabled). Useful for spotting cold-session spikes and validating that prompt-caching policy is actually working.

Formula matches the Anthropic-recommended observability metric: cache_read / (cache_read + cache_creation).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL