Documentation
¶
Overview ¶
Package ai produces human-readable critiques of espresso shots using an LLM.
The package is split intentionally:
- Analyzer: the public API the HTTP handler calls. It extracts the relevant signals from the raw shot blob, downsamples the time-series, builds a deterministic prompt, and asks the configured Provider.
- Provider: a small interface implemented by the OpenAI client below. Swapping providers later (Anthropic, Ollama, …) means adding another file in this package; no call sites change.
We never send the entire raw sample blob to the LLM (it can be >1000 points). Instead we downsample to ~60 evenly spaced points, round to a sensible precision, and include the profile JSON verbatim.
Anthropic Claude Messages API provider.
Docs: https://docs.anthropic.com/en/api/messages
Vision-based bean bag extractor.
Given a photo of a coffee bag, ask an OpenAI vision model to read the label and extract the fields our Beans form wants (name, roaster, origin, process, roast level, roast date, notes). Returned as JSON the UI can splat straight into the form for the user to review and tweak before saving.
Two providers are supported: OpenAI chat (inline image_url data URI) and Google Gemini generateContent (inline_data). The API handler prefers OpenAI when a key is configured and falls back to Gemini otherwise. Anthropic vision would follow the same pattern but isn't wired here yet.
Google Gemini generateContent API provider.
Docs: https://ai.google.dev/api/generate-content
Model listing helpers — one per provider. Each returns the set of model IDs that can serve generateContent / chat completions, so the UI can offer a dropdown instead of asking the operator to type a model name.
These are plain functions (not methods on *Provider) because the settings UI needs to list models even when the selected model would otherwise be invalid — we don't want to construct a full provider just to list.
OpenAI Chat Completions provider. No SDK — one tiny HTTP call keeps the dependency surface small and makes auditing trivial.
Index ¶
- func ComputeCost(provider, model string, inTokens, outTokens int64) float64
- func IsTransient(err error) bool
- func ListAnthropicModels(ctx context.Context, apiKey string) ([]string, error)
- func ListGeminiModels(ctx context.Context, apiKey string) ([]string, error)
- func ListOpenAIModels(ctx context.Context, apiKey string) ([]string, error)
- func SplitModelName(name string) (provider, model string)
- type Analysis
- type Analyzer
- type AnthropicConfig
- type AnthropicProvider
- type BeanInfo
- type CallUsage
- type Coach
- type CoachInput
- type Comparator
- type CompareInput
- type Comparison
- type CostBreak
- type ExtractBeanRequest
- type ExtractBeanRequestGemini
- type ExtractBeanResponse
- type ExtractedBean
- type GeminiConfig
- type GeminiProvider
- type GenerateImageRequest
- type GeneratedImage
- type Metrics
- type Namer
- type OpenAIConfig
- type OpenAIImageRequest
- type OpenAIProvider
- type ProfileNameInput
- type ProfileNameSuggestion
- type Provider
- type Rating
- type Record
- type Recorder
- type ShotInput
- type ShotSummary
- type Suggestion
- type TokenUsage
- type TranscribeGeminiRequest
- type TranscribeOpenAIRequest
- type Transcription
- type UsageSummary
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ComputeCost ¶
ComputeCost returns the USD cost of a call using real token counts and the provider's published per-1M-token price. Unknown models fall back to a conservative rate so the dashboard shows *something*.
func IsTransient ¶
IsTransient exposes the retry predicate so HTTP handlers can distinguish provider-overload errors (which deserve a 503 + retry-later UX) from hard failures (auth, invalid model, bad input) that should surface as 502.
func ListAnthropicModels ¶
ListAnthropicModels returns every model the Anthropic Messages API advertises. All of them support messages, so no client-side filtering.
func ListGeminiModels ¶
ListGeminiModels returns the Gemini models that support generateContent, with the "models/" prefix stripped. Gemma and preview-only variants are included too — the user can pick, we don't gatekeep.
func ListOpenAIModels ¶
ListOpenAIModels returns the subset of OpenAI models usable with chat completions. We filter client-side since /v1/models returns every model including embeddings, moderation, TTS, etc.
func SplitModelName ¶
SplitModelName decomposes a Provider.Name() result (e.g. "anthropic:claude-haiku-4-5") into provider and model. Unknown shapes return (name, "").
Types ¶
type Analysis ¶
type Analysis struct {
Model string `json:"model"` // e.g. "openai:gpt-4o-mini"
CreatedAt time.Time `json:"created_at"`
// Rating is the model's high-level grade of the shot. Parsed out of
// the first line of the LLM response and stripped from Summary before
// markdown rendering. nil when the model didn't emit one in the
// expected format — the UI just hides the badge.
Rating *Rating `json:"rating,omitempty"`
// Markdown summary (2-5 short paragraphs) suitable for rendering directly.
Summary string `json:"summary"`
// Extracted numeric metrics the UI can render as stat tiles.
Metrics Metrics `json:"metrics"`
// Usage is rough input/output byte accounting for the usage ledger.
// Not part of the cached analysis payload the UI cares about, so we
// keep it out of the JSON.
Usage CallUsage `json:"-"`
}
Analysis is the structured output returned by the analyzer.
type Analyzer ¶
type Analyzer struct {
// contains filtered or unexported fields
}
Analyzer turns a Shot into an Analysis.
type AnthropicConfig ¶
type AnthropicConfig struct {
APIKey string // required
Model string // default "claude-3-5-haiku-latest"
Endpoint string
Version string // x-api-key header version, default "2023-06-01"
Timeout time.Duration
}
AnthropicConfig configures the provider. Zero values pick sensible defaults.
type AnthropicProvider ¶
type AnthropicProvider struct {
// contains filtered or unexported fields
}
AnthropicProvider calls the Anthropic Messages API.
func NewAnthropic ¶
func NewAnthropic(cfg AnthropicConfig) (*AnthropicProvider, error)
NewAnthropic constructs a provider. Returns an error if no API key is set.
func (*AnthropicProvider) Complete ¶
func (p *AnthropicProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
Complete sends a system+user prompt pair and returns the assistant text along with real token usage parsed from the Anthropic response.
func (*AnthropicProvider) Name ¶
func (p *AnthropicProvider) Name() string
Name reports a stable identifier for cache keying.
type BeanInfo ¶
type BeanInfo struct {
Name string
Roaster string
Origin string
Process string
RoastLevel string
RoastDate string // ISO yyyy-mm-dd; empty if unknown
Notes string
}
BeanInfo is the subset of a Bean the analyzer actually uses. Kept here (rather than importing internal/beans) so internal/ai stays free of app-layer dependencies and is easy to unit-test.
type CallUsage ¶
CallUsage is the minimum-viable accounting struct: real token counts (parsed from the provider response) plus wall-clock duration. Feeds the usage ledger + cost computation.
type Coach ¶
type Coach struct {
// contains filtered or unexported fields
}
Coach wraps a Provider to produce structured single-suggestion output.
func (*Coach) Suggest ¶
func (c *Coach) Suggest(ctx context.Context, in CoachInput) (*Suggestion, error)
Suggest runs the coach and returns the parsed suggestion.
type CoachInput ¶
type CoachInput struct {
Shot ShotInput
ShotRating *int // 1..5 or nil (user's own rating)
ShotNote string // user's own note
Siblings []ShotSummary // recent shots with the same profile, newest first
}
CoachInput is the minimal bundle of context the coach prompt needs.
type Comparator ¶
type Comparator struct {
// contains filtered or unexported fields
}
Comparator wraps a Provider for the compare task.
func (*Comparator) Compare ¶
func (c *Comparator) Compare(ctx context.Context, in CompareInput) (*Comparison, error)
Compare runs the comparator and returns the markdown report.
func (*Comparator) ModelName ¶
func (c *Comparator) ModelName() string
ModelName returns the provider identifier.
type CompareInput ¶
type CompareInput struct {
A ShotInput
B ShotInput
ARating *int
BRating *int
ANote string
BNote string
}
CompareInput bundles two shots (A and B) plus their user feedback.
type Comparison ¶
type Comparison struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Markdown string `json:"markdown"`
Usage CallUsage `json:"-"`
}
Comparison is the LLM output plus bookkeeping.
type CostBreak ¶
type CostBreak struct {
Calls int `json:"calls"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
CostUSD float64 `json:"cost_usd"`
LastUsedUnix int64 `json:"last_used_unix,omitempty"`
}
CostBreak is the per-slice rollup shown in the dashboard.
type ExtractBeanRequest ¶
type ExtractBeanRequest struct {
APIKey string
Model string // e.g. "gpt-4o-mini" — must be a vision-capable OpenAI model
Image []byte
MIME string // e.g. "image/jpeg", "image/png", "image/webp"
}
ExtractBeanRequest is the input bundle for a single extraction.
type ExtractBeanRequestGemini ¶
type ExtractBeanRequestGemini struct {
APIKey string
Model string // e.g. "gemini-2.5-flash" — any multimodal Gemini works
Image []byte
MIME string
Endpoint string // optional override; defaults to the public v1beta endpoint
}
ExtractBeanRequestGemini is the Gemini-flavoured input bundle.
type ExtractBeanResponse ¶
type ExtractBeanResponse struct {
Bean ExtractedBean
Usage TokenUsage
}
ExtractBeanResponse bundles the parsed bean plus usage so the caller can record the ledger entry.
func ExtractBeanFromImage ¶
func ExtractBeanFromImage(ctx context.Context, req ExtractBeanRequest) (*ExtractBeanResponse, error)
ExtractBeanFromImage sends the image to OpenAI's chat endpoint using a vision-capable model and returns a parsed ExtractedBean plus usage.
func ExtractBeanFromImageGemini ¶
func ExtractBeanFromImageGemini(ctx context.Context, req ExtractBeanRequestGemini) (*ExtractBeanResponse, error)
ExtractBeanFromImageGemini sends the image to Gemini's generateContent endpoint as inline_data and asks for a strict JSON response. All Gemini 1.5+/2.x/2.5 models are multimodal, so we don't need a vision-capability allow-list — we just fall back to gemini-2.5-flash if the caller's configured model string is empty.
type ExtractedBean ¶
type ExtractedBean struct {
Name string `json:"name"`
Roaster string `json:"roaster"`
Origin string `json:"origin"`
Process string `json:"process"`
RoastLevel string `json:"roast_level"`
RoastDate string `json:"roast_date"` // ISO yyyy-mm-dd
Notes string `json:"notes"`
// Confidence is the model's self-reported confidence on a 0..1
// scale. Handy for the UI to flag low-quality reads ("we couldn't
// read much — please double-check").
Confidence float64 `json:"confidence"`
}
ExtractedBean mirrors the beans.Input struct, JSON-tagged to match. Every field is optional — the LLM might only be confident about the roaster's name and the roast date, and we want to surface whatever it found without forcing it to guess.
type GeminiConfig ¶
type GeminiConfig struct {
APIKey string // required
Model string // default "gemini-1.5-flash"
Endpoint string // v1beta base URL; default official endpoint
Timeout time.Duration
}
GeminiConfig configures the provider.
type GeminiProvider ¶
type GeminiProvider struct {
// contains filtered or unexported fields
}
GeminiProvider calls the Google Generative Language API.
func NewGemini ¶
func NewGemini(cfg GeminiConfig) (*GeminiProvider, error)
NewGemini constructs a provider. Returns an error if no API key is set.
func (*GeminiProvider) Complete ¶
func (p *GeminiProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
Complete sends the prompt pair and returns the assistant text along with real token usage from the response.
Gemini has no separate "system" role; we fold it into the request as system_instruction which is the documented equivalent.
func (*GeminiProvider) Name ¶
func (p *GeminiProvider) Name() string
Name reports a stable identifier for cache keying.
type GenerateImageRequest ¶
type GenerateImageRequest struct {
// APIKey is required.
APIKey string
// Model defaults to "gemini-2.5-flash-image-preview" (Nano Banana).
// The preview family returns one or more inline_data parts.
Model string
// Prompt is the human-readable instruction.
Prompt string
// Endpoint lets tests inject a fake server. Defaults to the official
// v1beta base URL.
Endpoint string
// Timeout bounds the entire HTTP round-trip (including image encoding
// on the server). Image generation is slower than text; 90s is safe.
Timeout time.Duration
}
GenerateImageRequest configures a single Gemini image-generation call.
type GeneratedImage ¶
GeneratedImage is the decoded binary payload returned by the API.
func GenerateImage ¶
func GenerateImage(ctx context.Context, req GenerateImageRequest) (*GeneratedImage, error)
GenerateImage calls Gemini's image-capable model with a plain text prompt and returns the first inline_data part from the response. Gemini does not have a dedicated "/images" endpoint — image generation rides on the same generateContent surface and is enabled by the model choice plus the response_modalities hint.
func GenerateImageOpenAI ¶
func GenerateImageOpenAI(ctx context.Context, req OpenAIImageRequest) (*GeneratedImage, error)
GenerateImageOpenAI calls the OpenAI images.generations endpoint.
type Metrics ¶
type Metrics struct {
Duration float64 `json:"duration_s"`
PreinfusionEnd float64 `json:"preinfusion_end_s,omitempty"`
PeakPressure float64 `json:"peak_pressure_bar"`
AvgPressure float64 `json:"avg_pressure_bar"`
PeakFlow float64 `json:"peak_flow_mls"`
AvgFlow float64 `json:"avg_flow_mls"`
FinalWeight float64 `json:"final_weight_g"`
FirstDripAt float64 `json:"first_drip_s,omitempty"`
}
Metrics are computed locally from the samples before sending to the LLM, and are returned as-is alongside the critique.
type Namer ¶
type Namer struct {
// contains filtered or unexported fields
}
Namer wraps a Provider to produce profile names.
func (*Namer) Suggest ¶
func (n *Namer) Suggest(ctx context.Context, in ProfileNameInput) (*ProfileNameSuggestion, error)
Suggest asks the LLM for a name suggestion.
type OpenAIConfig ¶
type OpenAIConfig struct {
APIKey string // required
Model string // default "gpt-4o-mini"
Endpoint string // default official OpenAI chat endpoint
Timeout time.Duration
}
OpenAIConfig configures the provider. Zero values pick sensible defaults.
type OpenAIImageRequest ¶
type OpenAIImageRequest struct {
APIKey string
Model string // default "gpt-image-1"
Prompt string
Size string // e.g. "1024x1024" (default), "1024x1536", "1536x1024"
Base string // default https://api.openai.com/v1
Timeout time.Duration
}
OpenAIImageRequest configures a single OpenAI image-generation call. The provider uses the /v1/images/generations surface, which accepts a plain text prompt and returns base64-encoded PNG bytes.
Docs: https://platform.openai.com/docs/api-reference/images/create
type OpenAIProvider ¶
type OpenAIProvider struct {
// contains filtered or unexported fields
}
OpenAIProvider calls the OpenAI Chat Completions API.
func NewOpenAI ¶
func NewOpenAI(cfg OpenAIConfig) (*OpenAIProvider, error)
NewOpenAI constructs a provider. Returns an error if no API key is set.
func (*OpenAIProvider) Complete ¶
func (p *OpenAIProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
Complete sends a system+user prompt pair and returns the assistant text along with real token usage from the response.
func (*OpenAIProvider) Name ¶
func (p *OpenAIProvider) Name() string
Name reports a stable identifier for cache keying.
type ProfileNameInput ¶
type ProfileNameInput struct {
Profile json.RawMessage
CurrentName string
}
ProfileNameInput is the bundle the namer sees.
type ProfileNameSuggestion ¶
type ProfileNameSuggestion struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Name string `json:"name"`
Reason string `json:"reason"`
Usage CallUsage `json:"-"`
}
ProfileNameSuggestion is the result — a short name plus a one-line reason so the user can tell what the model picked up on.
type Provider ¶
type Provider interface {
// Complete sends a system+user prompt pair and returns the assistant
// text along with real token usage parsed from the provider response.
// Implementations MUST return usage whenever the API gives it to them;
// a zero-valued TokenUsage is only acceptable when the upstream
// response omitted the counts (fall back to zeros, never estimate).
Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
// Name returns a short identifier (e.g. "openai:gpt-4o-mini") used for
// cache keying so a change of model invalidates old cached analyses.
Name() string
}
Provider is the minimal contract the Analyzer needs from an LLM backend.
type Rating ¶
type Rating struct {
Score int `json:"score"` // 0..10 inclusive
Label string `json:"label,omitempty"`
}
Rating is a compact 0-10 grade of a shot with a one-word qualitative label. The label vocabulary is small on purpose so the UI can colour- code it: "excellent", "good", "fine", "off", "bad".
type Record ¶
type Record struct {
Time time.Time
Provider string // openai, anthropic, gemini
Model string // gpt-4o-mini, claude-haiku-4-5, ...
Feature string // analyze, coach, compare, digest, ask, name, transcribe, image
InputTokens int64
OutputTokens int64
DurationMs int64
ShotID string
OK bool
Err string
}
Record is the event we log for each LLM call.
type Recorder ¶
type Recorder struct {
// contains filtered or unexported fields
}
Recorder persists AI call metadata to SQLite.
func NewRecorder ¶
NewRecorder wires a Recorder to an already-open *sql.DB (we reuse the shots database to avoid a second file).
type ShotInput ¶
type ShotInput struct {
Name string
ProfileName string
Samples json.RawMessage
Profile json.RawMessage
// Bean describes the bag the shot was pulled with (optional). When
// set, the analyzer surfaces it in the user prompt so the LLM can
// factor origin / roast age / process into its critique instead of
// guessing from numbers alone.
Bean *BeanInfo
// Grind is the user's grinder setting for this shot (free-form
// label, e.g. "2.8" or "12 clicks"). Empty = not recorded.
Grind string
// GrindRPM is the variable-speed grinder RPM for this shot. Nil =
// not recorded / not applicable to this grinder.
GrindRPM *float64
}
ShotInput is the subset of a shot the analyzer needs. We accept raw JSON for samples + profile so the caller doesn't have to decode them.
type ShotSummary ¶
type ShotSummary struct {
Name string `json:"name"`
TimeISO string `json:"time_iso"`
Duration float64 `json:"duration_s"`
PeakPressure float64 `json:"peak_pressure_bar"`
AvgPressure float64 `json:"avg_pressure_bar"`
PeakFlow float64 `json:"peak_flow_mls"`
FinalWeight float64 `json:"final_weight_g"`
FirstDripAt float64 `json:"first_drip_s,omitempty"`
Rating *int `json:"user_rating,omitempty"`
Note string `json:"user_note,omitempty"`
}
ShotSummary is the compact per-shot line item the coach sees for historical comparison. Keep it cheap; we never include raw samples.
type Suggestion ¶
type Suggestion struct {
Model string `json:"model"`
CreatedAt time.Time `json:"created_at"`
Change string `json:"change"` // short imperative, e.g. "Grind 2 notches finer"
Rationale string `json:"rationale"` // 1-2 sentences citing the numbers
VarKey string `json:"var_key,omitempty"` // profile variable key or ""
Before *float64 `json:"before,omitempty"` // current profile value
After *float64 `json:"after,omitempty"` // proposed new value
Confidence string `json:"confidence"` // "low"|"medium"|"high"
Usage CallUsage `json:"-"`
}
Suggestion is the structured output of the coach. The LLM is asked to return JSON so the UI can render labels/values directly.
type TokenUsage ¶
type TokenUsage struct {
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
}
TokenUsage is the real input/output token count reported by a provider for a single call. Zero values mean the provider didn't report counts.
type TranscribeGeminiRequest ¶
type TranscribeGeminiRequest struct {
APIKey string
Model string // default "gemini-2.5-flash"
Audio []byte // raw audio bytes (20 MB inline cap)
MIME string // e.g. "audio/webm", "audio/mp4", "audio/ogg"
Endpoint string // default https://generativelanguage.googleapis.com/v1beta
Timeout time.Duration // default 2m
// Prompt overrides the default "transcribe this audio" instruction.
Prompt string
}
TranscribeGeminiRequest configures a single speech-to-text call against Gemini's generateContent surface. Gemini accepts audio as an inlineData part alongside a text instruction; we ask the model to return nothing but the transcript so callers can store it verbatim.
type TranscribeOpenAIRequest ¶
type TranscribeOpenAIRequest struct {
APIKey string
Model string // default "whisper-1"
Audio []byte // raw audio bytes
MIME string // e.g. "audio/webm", "audio/mp4"
Base string // default https://api.openai.com/v1
Timeout time.Duration // default 2m
// Language is an optional ISO-639-1 hint (e.g. "en"). Leave empty to
// let Whisper auto-detect.
Language string
// Prompt is an optional short hint that nudges the decoder (helpful
// for domain words like "profile", "preinfusion"). Leave empty for
// generic transcription.
Prompt string
}
TranscribeOpenAIRequest configures a single Whisper-style speech-to-text call against OpenAI's /v1/audio/transcriptions endpoint.
type Transcription ¶
Transcription is the decoded response from OpenAI's transcription API.
func TranscribeGemini ¶
func TranscribeGemini(ctx context.Context, req TranscribeGeminiRequest) (*Transcription, error)
TranscribeGemini uploads the audio inline and returns the model's transcription. Gemini is more permissive than Whisper about content (multilingual, can handle overlapping speech, accepts long prompts) but is capped at ~20 MB of inline data per request.
func TranscribeOpenAI ¶
func TranscribeOpenAI(ctx context.Context, req TranscribeOpenAIRequest) (*Transcription, error)
TranscribeOpenAI sends audio bytes as multipart/form-data and returns the transcribed text. Audio can be in any format the Whisper endpoint accepts (webm/opus, mp4/aac, mp3, wav, flac, ogg — up to 25 MB).
type UsageSummary ¶
type UsageSummary struct {
Since time.Time `json:"since"`
TotalCalls int `json:"total_calls"`
TotalCostUSD float64 `json:"total_cost_usd"`
ByProvider map[string]CostBreak `json:"by_provider"`
ByFeature map[string]CostBreak `json:"by_feature"`
ByModel map[string]CostBreak `json:"by_model"`
Recent []Record `json:"recent"` // newest first
}
UsageSummary aggregates recent activity for the dashboard.