ai

package

v1.0.1 Latest Latest Go to latest Published: Apr 22, 2026 License: MIT Imports: 18 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/apohor/caffeine

Links

Open Source Insights

Documentation ¶

Overview ¶

Package ai produces human-readable critiques of espresso shots using an LLM.

The package is split intentionally:

Analyzer: the public API the HTTP handler calls. It extracts the relevant signals from the raw shot blob, downsamples the time-series, builds a deterministic prompt, and asks the configured Provider.
Provider: a small interface implemented by the OpenAI client below. Swapping providers later (Anthropic, Ollama, …) means adding another file in this package; no call sites change.

We never send the entire raw sample blob to the LLM (it can be >1000 points). Instead we downsample to ~60 evenly spaced points, round to a sensible precision, and include the profile JSON verbatim.

Anthropic Claude Messages API provider.

Docs: https://docs.anthropic.com/en/api/messages

Vision-based bean bag extractor.

Given a photo of a coffee bag, ask an OpenAI vision model to read the label and extract the fields our Beans form wants (name, roaster, origin, process, roast level, roast date, notes). Returned as JSON the UI can splat straight into the form for the user to review and tweak before saving.

Two providers are supported: OpenAI chat (inline image_url data URI) and Google Gemini generateContent (inline_data). The API handler prefers OpenAI when a key is configured and falls back to Gemini otherwise. Anthropic vision would follow the same pattern but isn't wired here yet.

Google Gemini generateContent API provider.

Docs: https://ai.google.dev/api/generate-content

Model listing helpers — one per provider. Each returns the set of model IDs that can serve generateContent / chat completions, so the UI can offer a dropdown instead of asking the operator to type a model name.

These are plain functions (not methods on *Provider) because the settings UI needs to list models even when the selected model would otherwise be invalid — we don't want to construct a full provider just to list.

OpenAI Chat Completions provider. No SDK — one tiny HTTP call keeps the dependency surface small and makes auditing trivial.

Index ¶

func ComputeCost(provider, model string, inTokens, outTokens int64) float64
func IsTransient(err error) bool
func ListAnthropicModels(ctx context.Context, apiKey string) ([]string, error)
func ListGeminiModels(ctx context.Context, apiKey string) ([]string, error)
func ListOpenAIModels(ctx context.Context, apiKey string) ([]string, error)
func SplitModelName(name string) (provider, model string)
type Analysis
type Analyzer
- func NewAnalyzer(p Provider) *Analyzer
- func (a *Analyzer) Analyze(ctx context.Context, in ShotInput) (*Analysis, error)
- func (a *Analyzer) ModelName() string
type AnthropicConfig
type AnthropicProvider
- func NewAnthropic(cfg AnthropicConfig) (*AnthropicProvider, error)
- func (p *AnthropicProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
- func (p *AnthropicProvider) Name() string
type BeanInfo
type CallUsage
type Coach
- func NewCoach(p Provider) *Coach
- func (c *Coach) ModelName() string
- func (c *Coach) Suggest(ctx context.Context, in CoachInput) (*Suggestion, error)
type CoachInput
type Comparator
- func NewComparator(p Provider) *Comparator
- func (c *Comparator) Compare(ctx context.Context, in CompareInput) (*Comparison, error)
- func (c *Comparator) ModelName() string
type CompareInput
type Comparison
type CostBreak
type ExtractBeanRequest
type ExtractBeanRequestGemini
type ExtractBeanResponse
- func ExtractBeanFromImage(ctx context.Context, req ExtractBeanRequest) (*ExtractBeanResponse, error)
- func ExtractBeanFromImageGemini(ctx context.Context, req ExtractBeanRequestGemini) (*ExtractBeanResponse, error)
type ExtractedBean
type GeminiConfig
type GeminiProvider
- func NewGemini(cfg GeminiConfig) (*GeminiProvider, error)
- func (p *GeminiProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
- func (p *GeminiProvider) Name() string
type GenerateImageRequest
type GeneratedImage
- func GenerateImage(ctx context.Context, req GenerateImageRequest) (*GeneratedImage, error)
- func GenerateImageOpenAI(ctx context.Context, req OpenAIImageRequest) (*GeneratedImage, error)
type Metrics
type Namer
- func NewNamer(p Provider) *Namer
- func (n *Namer) ModelName() string
- func (n *Namer) Suggest(ctx context.Context, in ProfileNameInput) (*ProfileNameSuggestion, error)
type OpenAIConfig
type OpenAIImageRequest
type OpenAIProvider
- func NewOpenAI(cfg OpenAIConfig) (*OpenAIProvider, error)
- func (p *OpenAIProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
- func (p *OpenAIProvider) Name() string
type ProfileNameInput
type ProfileNameSuggestion
type Provider
type Rating
type Record
type Recorder
- func NewRecorder(db *sql.DB) (*Recorder, error)
- func (r *Recorder) Record(ctx context.Context, rec Record)
- func (r *Recorder) Summarize(ctx context.Context, days, recent int) (*UsageSummary, error)
type ShotInput
type ShotSummary
type Suggestion
type TokenUsage
type TranscribeGeminiRequest
type TranscribeOpenAIRequest
type Transcription
- func TranscribeGemini(ctx context.Context, req TranscribeGeminiRequest) (*Transcription, error)
- func TranscribeOpenAI(ctx context.Context, req TranscribeOpenAIRequest) (*Transcription, error)
type UsageSummary

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func ComputeCost ¶

func ComputeCost(provider, model string, inTokens, outTokens int64) float64

ComputeCost returns the USD cost of a call using real token counts and the provider's published per-1M-token price. Unknown models fall back to a conservative rate so the dashboard shows *something*.

func IsTransient ¶

func IsTransient(err error) bool

IsTransient exposes the retry predicate so HTTP handlers can distinguish provider-overload errors (which deserve a 503 + retry-later UX) from hard failures (auth, invalid model, bad input) that should surface as 502.

func ListAnthropicModels ¶

func ListAnthropicModels(ctx context.Context, apiKey string) ([]string, error)

ListAnthropicModels returns every model the Anthropic Messages API advertises. All of them support messages, so no client-side filtering.

func ListGeminiModels ¶

func ListGeminiModels(ctx context.Context, apiKey string) ([]string, error)

ListGeminiModels returns the Gemini models that support generateContent, with the "models/" prefix stripped. Gemma and preview-only variants are included too — the user can pick, we don't gatekeep.

func ListOpenAIModels ¶

func ListOpenAIModels(ctx context.Context, apiKey string) ([]string, error)

ListOpenAIModels returns the subset of OpenAI models usable with chat completions. We filter client-side since /v1/models returns every model including embeddings, moderation, TTS, etc.

func SplitModelName ¶

func SplitModelName(name string) (provider, model string)

SplitModelName decomposes a Provider.Name() result (e.g. "anthropic:claude-haiku-4-5") into provider and model. Unknown shapes return (name, "").

Types ¶

type Analysis ¶

type Analysis struct {
	Model     string    `json:"model"` // e.g. "openai:gpt-4o-mini"
	CreatedAt time.Time `json:"created_at"`
	// Rating is the model's high-level grade of the shot. Parsed out of
	// the first line of the LLM response and stripped from Summary before
	// markdown rendering. nil when the model didn't emit one in the
	// expected format — the UI just hides the badge.
	Rating *Rating `json:"rating,omitempty"`
	// Markdown summary (2-5 short paragraphs) suitable for rendering directly.
	Summary string `json:"summary"`
	// Extracted numeric metrics the UI can render as stat tiles.
	Metrics Metrics `json:"metrics"`
	// Usage is rough input/output byte accounting for the usage ledger.
	// Not part of the cached analysis payload the UI cares about, so we
	// keep it out of the JSON.
	Usage CallUsage `json:"-"`
}

Analysis is the structured output returned by the analyzer.

type Analyzer ¶

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer turns a Shot into an Analysis.

func NewAnalyzer ¶

func NewAnalyzer(p Provider) *Analyzer

NewAnalyzer wraps a Provider.

func (*Analyzer) Analyze ¶

func (a *Analyzer) Analyze(ctx context.Context, in ShotInput) (*Analysis, error)

Analyze computes metrics then asks the provider for a critique.

func (*Analyzer) ModelName ¶

func (a *Analyzer) ModelName() string

ModelName exposes the provider identifier for cache keying.

type AnthropicConfig ¶

type AnthropicConfig struct {
	APIKey   string // required
	Model    string // default "claude-3-5-haiku-latest"
	Endpoint string
	Version  string // x-api-key header version, default "2023-06-01"
	Timeout  time.Duration
}

AnthropicConfig configures the provider. Zero values pick sensible defaults.

type AnthropicProvider ¶

type AnthropicProvider struct {
	// contains filtered or unexported fields
}

AnthropicProvider calls the Anthropic Messages API.

func NewAnthropic ¶

func NewAnthropic(cfg AnthropicConfig) (*AnthropicProvider, error)

NewAnthropic constructs a provider. Returns an error if no API key is set.

func (*AnthropicProvider) Complete ¶

func (p *AnthropicProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends a system+user prompt pair and returns the assistant text along with real token usage parsed from the Anthropic response.

func (*AnthropicProvider) Name ¶

func (p *AnthropicProvider) Name() string

Name reports a stable identifier for cache keying.

type BeanInfo ¶

type BeanInfo struct {
	Name       string
	Roaster    string
	Origin     string
	Process    string
	RoastLevel string
	RoastDate  string // ISO yyyy-mm-dd; empty if unknown
	Notes      string
}

BeanInfo is the subset of a Bean the analyzer actually uses. Kept here (rather than importing internal/beans) so internal/ai stays free of app-layer dependencies and is easy to unit-test.

type CallUsage ¶

type CallUsage struct {
	InputTokens  int64
	OutputTokens int64
	DurationMs   int64
}

CallUsage is the minimum-viable accounting struct: real token counts (parsed from the provider response) plus wall-clock duration. Feeds the usage ledger + cost computation.

type Coach ¶

type Coach struct {
	// contains filtered or unexported fields
}

Coach wraps a Provider to produce structured single-suggestion output.

func NewCoach ¶

func NewCoach(p Provider) *Coach

NewCoach wraps a Provider.

func (*Coach) ModelName ¶

func (c *Coach) ModelName() string

ModelName exposes the provider identifier.

func (*Coach) Suggest ¶

func (c *Coach) Suggest(ctx context.Context, in CoachInput) (*Suggestion, error)

Suggest runs the coach and returns the parsed suggestion.

type CoachInput ¶

type CoachInput struct {
	Shot       ShotInput
	ShotRating *int          // 1..5 or nil (user's own rating)
	ShotNote   string        // user's own note
	Siblings   []ShotSummary // recent shots with the same profile, newest first
}

CoachInput is the minimal bundle of context the coach prompt needs.

type Comparator ¶

type Comparator struct {
	// contains filtered or unexported fields
}

Comparator wraps a Provider for the compare task.

func NewComparator ¶

func NewComparator(p Provider) *Comparator

NewComparator wraps a provider.

func (*Comparator) Compare ¶

func (c *Comparator) Compare(ctx context.Context, in CompareInput) (*Comparison, error)

Compare runs the comparator and returns the markdown report.

func (*Comparator) ModelName ¶

func (c *Comparator) ModelName() string

ModelName returns the provider identifier.

type CompareInput ¶

type CompareInput struct {
	A       ShotInput
	B       ShotInput
	ARating *int
	BRating *int
	ANote   string
	BNote   string
}

CompareInput bundles two shots (A and B) plus their user feedback.

type Comparison ¶

type Comparison struct {
	Model     string    `json:"model"`
	CreatedAt time.Time `json:"created_at"`
	Markdown  string    `json:"markdown"`
	Usage     CallUsage `json:"-"`
}

Comparison is the LLM output plus bookkeeping.

type CostBreak ¶

type CostBreak struct {
	Calls        int     `json:"calls"`
	InputTokens  int64   `json:"input_tokens"`
	OutputTokens int64   `json:"output_tokens"`
	CostUSD      float64 `json:"cost_usd"`
	LastUsedUnix int64   `json:"last_used_unix,omitempty"`
}

CostBreak is the per-slice rollup shown in the dashboard.

type ExtractBeanRequest ¶

type ExtractBeanRequest struct {
	APIKey string
	Model  string // e.g. "gpt-4o-mini" — must be a vision-capable OpenAI model
	Image  []byte
	MIME   string // e.g. "image/jpeg", "image/png", "image/webp"
}

ExtractBeanRequest is the input bundle for a single extraction.

type ExtractBeanRequestGemini ¶

type ExtractBeanRequestGemini struct {
	APIKey   string
	Model    string // e.g. "gemini-2.5-flash" — any multimodal Gemini works
	Image    []byte
	MIME     string
	Endpoint string // optional override; defaults to the public v1beta endpoint
}

ExtractBeanRequestGemini is the Gemini-flavoured input bundle.

type ExtractBeanResponse ¶

type ExtractBeanResponse struct {
	Bean  ExtractedBean
	Usage TokenUsage
}

ExtractBeanResponse bundles the parsed bean plus usage so the caller can record the ledger entry.

func ExtractBeanFromImage ¶

func ExtractBeanFromImage(ctx context.Context, req ExtractBeanRequest) (*ExtractBeanResponse, error)

ExtractBeanFromImage sends the image to OpenAI's chat endpoint using a vision-capable model and returns a parsed ExtractedBean plus usage.

func ExtractBeanFromImageGemini ¶

func ExtractBeanFromImageGemini(ctx context.Context, req ExtractBeanRequestGemini) (*ExtractBeanResponse, error)

ExtractBeanFromImageGemini sends the image to Gemini's generateContent endpoint as inline_data and asks for a strict JSON response. All Gemini 1.5+/2.x/2.5 models are multimodal, so we don't need a vision-capability allow-list — we just fall back to gemini-2.5-flash if the caller's configured model string is empty.

type ExtractedBean ¶

type ExtractedBean struct {
	Name       string `json:"name"`
	Roaster    string `json:"roaster"`
	Origin     string `json:"origin"`
	Process    string `json:"process"`
	RoastLevel string `json:"roast_level"`
	RoastDate  string `json:"roast_date"` // ISO yyyy-mm-dd
	Notes      string `json:"notes"`
	// Confidence is the model's self-reported confidence on a 0..1
	// scale. Handy for the UI to flag low-quality reads ("we couldn't
	// read much — please double-check").
	Confidence float64 `json:"confidence"`
}

ExtractedBean mirrors the beans.Input struct, JSON-tagged to match. Every field is optional — the LLM might only be confident about the roaster's name and the roast date, and we want to surface whatever it found without forcing it to guess.

type GeminiConfig ¶

type GeminiConfig struct {
	APIKey   string // required
	Model    string // default "gemini-1.5-flash"
	Endpoint string // v1beta base URL; default official endpoint
	Timeout  time.Duration
}

GeminiConfig configures the provider.

type GeminiProvider ¶

type GeminiProvider struct {
	// contains filtered or unexported fields
}

GeminiProvider calls the Google Generative Language API.

func NewGemini ¶

func NewGemini(cfg GeminiConfig) (*GeminiProvider, error)

NewGemini constructs a provider. Returns an error if no API key is set.

func (*GeminiProvider) Complete ¶

func (p *GeminiProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends the prompt pair and returns the assistant text along with real token usage from the response.

Gemini has no separate "system" role; we fold it into the request as system_instruction which is the documented equivalent.

func (*GeminiProvider) Name ¶

func (p *GeminiProvider) Name() string

Name reports a stable identifier for cache keying.

type GenerateImageRequest ¶

type GenerateImageRequest struct {
	// APIKey is required.
	APIKey string
	// Model defaults to "gemini-2.5-flash-image-preview" (Nano Banana).
	// The preview family returns one or more inline_data parts.
	Model string
	// Prompt is the human-readable instruction.
	Prompt string
	// Endpoint lets tests inject a fake server. Defaults to the official
	// v1beta base URL.
	Endpoint string
	// Timeout bounds the entire HTTP round-trip (including image encoding
	// on the server). Image generation is slower than text; 90s is safe.
	Timeout time.Duration
}

GenerateImageRequest configures a single Gemini image-generation call.

type GeneratedImage ¶

type GeneratedImage struct {
	MimeType string
	Data     []byte
}

GeneratedImage is the decoded binary payload returned by the API.

func GenerateImage ¶

func GenerateImage(ctx context.Context, req GenerateImageRequest) (*GeneratedImage, error)

GenerateImage calls Gemini's image-capable model with a plain text prompt and returns the first inline_data part from the response. Gemini does not have a dedicated "/images" endpoint — image generation rides on the same generateContent surface and is enabled by the model choice plus the response_modalities hint.

func GenerateImageOpenAI ¶

func GenerateImageOpenAI(ctx context.Context, req OpenAIImageRequest) (*GeneratedImage, error)

GenerateImageOpenAI calls the OpenAI images.generations endpoint.

type Metrics ¶

type Metrics struct {
	Duration       float64 `json:"duration_s"`
	PreinfusionEnd float64 `json:"preinfusion_end_s,omitempty"`
	PeakPressure   float64 `json:"peak_pressure_bar"`
	AvgPressure    float64 `json:"avg_pressure_bar"`
	PeakFlow       float64 `json:"peak_flow_mls"`
	AvgFlow        float64 `json:"avg_flow_mls"`
	FinalWeight    float64 `json:"final_weight_g"`
	FirstDripAt    float64 `json:"first_drip_s,omitempty"`
}

Metrics are computed locally from the samples before sending to the LLM, and are returned as-is alongside the critique.

type Namer ¶

type Namer struct {
	// contains filtered or unexported fields
}

Namer wraps a Provider to produce profile names.

func NewNamer ¶

func NewNamer(p Provider) *Namer

NewNamer wraps a provider.

func (*Namer) ModelName ¶

func (n *Namer) ModelName() string

ModelName returns the provider identifier.

func (*Namer) Suggest ¶

func (n *Namer) Suggest(ctx context.Context, in ProfileNameInput) (*ProfileNameSuggestion, error)

Suggest asks the LLM for a name suggestion.

type OpenAIConfig ¶

type OpenAIConfig struct {
	APIKey   string // required
	Model    string // default "gpt-4o-mini"
	Endpoint string // default official OpenAI chat endpoint
	Timeout  time.Duration
}

OpenAIConfig configures the provider. Zero values pick sensible defaults.

type OpenAIImageRequest ¶

type OpenAIImageRequest struct {
	APIKey  string
	Model   string // default "gpt-image-1"
	Prompt  string
	Size    string // e.g. "1024x1024" (default), "1024x1536", "1536x1024"
	Base    string // default https://api.openai.com/v1
	Timeout time.Duration
}

OpenAIImageRequest configures a single OpenAI image-generation call. The provider uses the /v1/images/generations surface, which accepts a plain text prompt and returns base64-encoded PNG bytes.

Docs: https://platform.openai.com/docs/api-reference/images/create

type OpenAIProvider ¶

type OpenAIProvider struct {
	// contains filtered or unexported fields
}

OpenAIProvider calls the OpenAI Chat Completions API.

func NewOpenAI ¶

func NewOpenAI(cfg OpenAIConfig) (*OpenAIProvider, error)

NewOpenAI constructs a provider. Returns an error if no API key is set.

func (*OpenAIProvider) Complete ¶

func (p *OpenAIProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends a system+user prompt pair and returns the assistant text along with real token usage from the response.

func (*OpenAIProvider) Name ¶

func (p *OpenAIProvider) Name() string

Name reports a stable identifier for cache keying.

type ProfileNameInput ¶

type ProfileNameInput struct {
	Profile     json.RawMessage
	CurrentName string
}

ProfileNameInput is the bundle the namer sees.

type ProfileNameSuggestion ¶

type ProfileNameSuggestion struct {
	Model     string    `json:"model"`
	CreatedAt time.Time `json:"created_at"`
	Name      string    `json:"name"`
	Reason    string    `json:"reason"`
	Usage     CallUsage `json:"-"`
}

ProfileNameSuggestion is the result — a short name plus a one-line reason so the user can tell what the model picked up on.

type Provider ¶

type Provider interface {
	// Complete sends a system+user prompt pair and returns the assistant
	// text along with real token usage parsed from the provider response.
	// Implementations MUST return usage whenever the API gives it to them;
	// a zero-valued TokenUsage is only acceptable when the upstream
	// response omitted the counts (fall back to zeros, never estimate).
	Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
	// Name returns a short identifier (e.g. "openai:gpt-4o-mini") used for
	// cache keying so a change of model invalidates old cached analyses.
	Name() string
}

Provider is the minimal contract the Analyzer needs from an LLM backend.

type Rating ¶

type Rating struct {
	Score int    `json:"score"` // 0..10 inclusive
	Label string `json:"label,omitempty"`
}

Rating is a compact 0-10 grade of a shot with a one-word qualitative label. The label vocabulary is small on purpose so the UI can colour- code it: "excellent", "good", "fine", "off", "bad".

type Record ¶

type Record struct {
	Time         time.Time
	Provider     string // openai, anthropic, gemini
	Model        string // gpt-4o-mini, claude-haiku-4-5, ...
	Feature      string // analyze, coach, compare, digest, ask, name, transcribe, image
	InputTokens  int64
	OutputTokens int64
	DurationMs   int64
	ShotID       string
	OK           bool
	Err          string
}

Record is the event we log for each LLM call.

type Recorder ¶

type Recorder struct {
	// contains filtered or unexported fields
}

Recorder persists AI call metadata to SQLite.

func NewRecorder ¶

func NewRecorder(db *sql.DB) (*Recorder, error)

NewRecorder wires a Recorder to an already-open *sql.DB (we reuse the shots database to avoid a second file).

func (*Recorder) Record ¶

func (r *Recorder) Record(ctx context.Context, rec Record)

Record stores a single call. Failures in the recorder are logged but never bubble up — we don't want telemetry to break user-facing flows.

func (*Recorder) Summarize ¶

func (r *Recorder) Summarize(ctx context.Context, days, recent int) (*UsageSummary, error)

Summarize returns rollups for the last `days` days plus the N most recent raw records. `days<=0` means "all time".

type ShotInput ¶

type ShotInput struct {
	Name        string
	ProfileName string
	Samples     json.RawMessage
	Profile     json.RawMessage
	// Bean describes the bag the shot was pulled with (optional). When
	// set, the analyzer surfaces it in the user prompt so the LLM can
	// factor origin / roast age / process into its critique instead of
	// guessing from numbers alone.
	Bean *BeanInfo
	// Grind is the user's grinder setting for this shot (free-form
	// label, e.g. "2.8" or "12 clicks"). Empty = not recorded.
	Grind string
	// GrindRPM is the variable-speed grinder RPM for this shot. Nil =
	// not recorded / not applicable to this grinder.
	GrindRPM *float64
}

ShotInput is the subset of a shot the analyzer needs. We accept raw JSON for samples + profile so the caller doesn't have to decode them.

type ShotSummary ¶

type ShotSummary struct {
	Name         string  `json:"name"`
	TimeISO      string  `json:"time_iso"`
	Duration     float64 `json:"duration_s"`
	PeakPressure float64 `json:"peak_pressure_bar"`
	AvgPressure  float64 `json:"avg_pressure_bar"`
	PeakFlow     float64 `json:"peak_flow_mls"`
	FinalWeight  float64 `json:"final_weight_g"`
	FirstDripAt  float64 `json:"first_drip_s,omitempty"`
	Rating       *int    `json:"user_rating,omitempty"`
	Note         string  `json:"user_note,omitempty"`
}

ShotSummary is the compact per-shot line item the coach sees for historical comparison. Keep it cheap; we never include raw samples.

type Suggestion ¶

type Suggestion struct {
	Model      string    `json:"model"`
	CreatedAt  time.Time `json:"created_at"`
	Change     string    `json:"change"`            // short imperative, e.g. "Grind 2 notches finer"
	Rationale  string    `json:"rationale"`         // 1-2 sentences citing the numbers
	VarKey     string    `json:"var_key,omitempty"` // profile variable key or ""
	Before     *float64  `json:"before,omitempty"`  // current profile value
	After      *float64  `json:"after,omitempty"`   // proposed new value
	Confidence string    `json:"confidence"`        // "low"|"medium"|"high"
	Usage      CallUsage `json:"-"`
}

Suggestion is the structured output of the coach. The LLM is asked to return JSON so the UI can render labels/values directly.

type TokenUsage ¶

type TokenUsage struct {
	InputTokens  int64 `json:"input_tokens"`
	OutputTokens int64 `json:"output_tokens"`
}

TokenUsage is the real input/output token count reported by a provider for a single call. Zero values mean the provider didn't report counts.

type TranscribeGeminiRequest ¶

type TranscribeGeminiRequest struct {
	APIKey   string
	Model    string        // default "gemini-2.5-flash"
	Audio    []byte        // raw audio bytes (20 MB inline cap)
	MIME     string        // e.g. "audio/webm", "audio/mp4", "audio/ogg"
	Endpoint string        // default https://generativelanguage.googleapis.com/v1beta
	Timeout  time.Duration // default 2m
	// Prompt overrides the default "transcribe this audio" instruction.
	Prompt string
}

TranscribeGeminiRequest configures a single speech-to-text call against Gemini's generateContent surface. Gemini accepts audio as an inlineData part alongside a text instruction; we ask the model to return nothing but the transcript so callers can store it verbatim.

type TranscribeOpenAIRequest ¶

type TranscribeOpenAIRequest struct {
	APIKey  string
	Model   string        // default "whisper-1"
	Audio   []byte        // raw audio bytes
	MIME    string        // e.g. "audio/webm", "audio/mp4"
	Base    string        // default https://api.openai.com/v1
	Timeout time.Duration // default 2m
	// Language is an optional ISO-639-1 hint (e.g. "en"). Leave empty to
	// let Whisper auto-detect.
	Language string
	// Prompt is an optional short hint that nudges the decoder (helpful
	// for domain words like "profile", "preinfusion"). Leave empty for
	// generic transcription.
	Prompt string
}

TranscribeOpenAIRequest configures a single Whisper-style speech-to-text call against OpenAI's /v1/audio/transcriptions endpoint.

type Transcription ¶

type Transcription struct {
	Text     string `json:"text"`
	Language string `json:"language,omitempty"`
}

Transcription is the decoded response from OpenAI's transcription API.

func TranscribeGemini ¶

func TranscribeGemini(ctx context.Context, req TranscribeGeminiRequest) (*Transcription, error)

TranscribeGemini uploads the audio inline and returns the model's transcription. Gemini is more permissive than Whisper about content (multilingual, can handle overlapping speech, accepts long prompts) but is capped at ~20 MB of inline data per request.

func TranscribeOpenAI ¶

func TranscribeOpenAI(ctx context.Context, req TranscribeOpenAIRequest) (*Transcription, error)

TranscribeOpenAI sends audio bytes as multipart/form-data and returns the transcribed text. Audio can be in any format the Whisper endpoint accepts (webm/opus, mp4/aac, mp3, wav, flac, ogg — up to 25 MB).

type UsageSummary ¶

type UsageSummary struct {
	Since        time.Time            `json:"since"`
	TotalCalls   int                  `json:"total_calls"`
	TotalCostUSD float64              `json:"total_cost_usd"`
	ByProvider   map[string]CostBreak `json:"by_provider"`
	ByFeature    map[string]CostBreak `json:"by_feature"`
	ByModel      map[string]CostBreak `json:"by_model"`
	Recent       []Record             `json:"recent"` // newest first
}

UsageSummary aggregates recent activity for the dashboard.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL