Documentation
¶
Overview ¶
Package usage provides persistent token usage and cost tracking for LLM interactions. Records are append-only and indexed by timestamp, session, and conversation for efficient aggregation queries.
Index ¶
- func ComputeCost(model string, inputTokens, outputTokens int, ...) float64
- func ComputeCostForIdentity(identity ModelIdentity, inputTokens, outputTokens int, ...) float64
- func ComputeDetailedCostForIdentity(identity ModelIdentity, ...) float64
- func ComputeDetailedCostForIdentityWithTTL(identity ModelIdentity, ...) float64
- func ResolveProvider(model string) string
- type GroupedSummary
- type ModelIdentity
- type Record
- type Store
- func (s *Store) Record(ctx context.Context, rec Record) error
- func (s *Store) Summary(start, end time.Time) (*Summary, error)
- func (s *Store) SummaryByGroup(groupBy string, start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByModel(start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByProvider(start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByResource(start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByRole(start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByTask(start, end time.Time) ([]GroupedSummary, error)
- func (s *Store) SummaryByUpstreamModel(start, end time.Time) ([]GroupedSummary, error)
- type Summary
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ComputeCost ¶
func ComputeCost(model string, inputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64
ComputeCost calculates the USD cost for a model's token usage based on the pricing table. Models not in the table are treated as free (local/Ollama models).
func ComputeCostForIdentity ¶ added in v0.9.1
func ComputeCostForIdentity(identity ModelIdentity, inputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64
ComputeCostForIdentity calculates USD cost for a resolved model identity. The selected deployment ID is checked first, then the upstream model as a fallback so deployment-qualified IDs can reuse provider pricing entries keyed by upstream model name.
func ComputeDetailedCostForIdentity ¶ added in v0.9.1
func ComputeDetailedCostForIdentity(identity ModelIdentity, inputTokens, cacheCreationInputTokens, cacheReadInputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64
ComputeDetailedCostForIdentity calculates USD cost for a resolved model identity using uncached input tokens, cache-write input tokens, cache-read input tokens, and output tokens. Deployment-qualified IDs fall back to upstream-model pricing when needed.
Cache-write tokens are charged at the 5m rate (1.25× input) by default. Callers that know the per-TTL split should use ComputeDetailedCostForIdentityWithTTL instead to correctly charge 1h writes at 2.0×.
func ComputeDetailedCostForIdentityWithTTL ¶ added in v0.9.1
func ComputeDetailedCostForIdentityWithTTL(identity ModelIdentity, inputTokens, cacheCreationTotal, cacheCreation5m, cacheCreation1h, cacheReadInputTokens, outputTokens int, pricing map[string]config.PricingEntry) float64
ComputeDetailedCostForIdentityWithTTL is the full-fidelity cost function: callers supply the 5m/1h breakdown when the provider exposes it. The unattributed portion of cacheCreationInputTokens (that is, tokens not accounted for in the 5m or 1h buckets) is charged at the 5m rate to avoid retroactive price spikes on legacy records.
func ResolveProvider ¶
ResolveProvider infers the LLM provider from the model name. Models starting with "claude-" are Anthropic; everything else is assumed to be Ollama (local).
Types ¶
type GroupedSummary ¶
GroupedSummary pairs a grouping key (model name, role, task name) with its aggregated usage totals. Slices of GroupedSummary preserve the SQL ordering (highest cost first).
type ModelIdentity ¶ added in v0.9.1
ModelIdentity is the normalized usage-facing identity for a selected model/deployment.
func ResolveModelIdentity ¶ added in v0.9.1
func ResolveModelIdentity(model string, cat *models.Catalog) ModelIdentity
ResolveModelIdentity resolves usage-facing metadata for a selected model/deployment. When a normalized catalog is available, it is used as the source of truth. Otherwise the function falls back to parsing deployment-qualified IDs like "resource/model".
type Record ¶
type Record struct {
ID string
Timestamp time.Time
RequestID string
SessionID string
ConversationID string
Model string // Selected deployment ID when known
UpstreamModel string
Resource string
Provider string // Provider family, e.g. "anthropic", "ollama", "lmstudio"
InputTokens int
OutputTokens int
CacheCreationInputTokens int
// CacheCreation5mInputTokens and CacheCreation1hInputTokens break
// down the cache-write bucket by TTL when the provider exposes it
// (Anthropic). Their sum is ≤ CacheCreationInputTokens; any
// shortfall reflects writes the provider didn't attribute. When
// the breakdown is absent (both zero), cost computation treats the
// full CacheCreationInputTokens as 5m for the conservative default.
CacheCreation5mInputTokens int
CacheCreation1hInputTokens int
CacheReadInputTokens int
CostUSD float64
Role string // "interactive", "delegate", "scheduled", "auxiliary"
TaskName string // "email_poll", "periodic_reflection", etc. (empty for interactive)
}
Record represents a single LLM interaction's token usage and cost.
type Store ¶
type Store struct {
// contains filtered or unexported fields
}
Store is an append-only SQLite store for token usage records. All public methods are safe for concurrent use (SQLite serializes writes).
func NewStore ¶
NewStore creates a usage store using the given database connection. The caller owns the connection — Store does not close it. The schema is created automatically on first use.
func (*Store) Record ¶
Record persists a usage record. If rec.ID is empty, a UUIDv7 is generated. The context is used for cancellation only.
func (*Store) SummaryByGroup ¶ added in v0.9.1
SummaryByGroup dispatches the grouped summary query based on the caller-provided grouping key.
func (*Store) SummaryByModel ¶
func (s *Store) SummaryByModel(start, end time.Time) ([]GroupedSummary, error)
SummaryByModel returns per-model aggregated totals for records within [start, end), ordered by cost descending.
func (*Store) SummaryByProvider ¶ added in v0.9.1
func (s *Store) SummaryByProvider(start, end time.Time) ([]GroupedSummary, error)
SummaryByProvider returns per-provider aggregated totals for records within [start, end), ordered by cost descending.
func (*Store) SummaryByResource ¶ added in v0.9.1
func (s *Store) SummaryByResource(start, end time.Time) ([]GroupedSummary, error)
SummaryByResource returns per-resource aggregated totals for records within [start, end), ordered by cost descending.
func (*Store) SummaryByRole ¶
func (s *Store) SummaryByRole(start, end time.Time) ([]GroupedSummary, error)
SummaryByRole returns per-role aggregated totals for records within [start, end), ordered by cost descending.
func (*Store) SummaryByTask ¶
func (s *Store) SummaryByTask(start, end time.Time) ([]GroupedSummary, error)
SummaryByTask returns per-task aggregated totals for records within [start, end), ordered by cost descending. Records with empty task_name are grouped under the key "".
func (*Store) SummaryByUpstreamModel ¶ added in v0.9.1
func (s *Store) SummaryByUpstreamModel(start, end time.Time) ([]GroupedSummary, error)
SummaryByUpstreamModel returns per-upstream-model aggregated totals for records within [start, end), ordered by cost descending.
type Summary ¶
type Summary struct {
TotalRecords int `json:"total_records"`
TotalInputTokens int64 `json:"total_input_tokens"`
TotalOutputTokens int64 `json:"total_output_tokens"`
TotalCacheCreationInputTokens int64 `json:"total_cache_creation_input_tokens"`
TotalCacheReadInputTokens int64 `json:"total_cache_read_input_tokens"`
TotalCostUSD float64 `json:"total_cost_usd"`
}
Summary holds aggregated token usage and cost totals.
func (Summary) CacheHitRate ¶ added in v0.9.1
CacheHitRate returns the fraction of cache-eligible input tokens that were served from cache in this summary, as a value in [0, 1]. Zero when there were no cache-eligible tokens at all (empty window, or caching disabled). Useful for spotting cold-session spikes and validating that prompt-caching policy is actually working.
Formula matches the Anthropic-recommended observability metric: cache_read / (cache_read + cache_creation).