ai

package
v1.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 22, 2026 License: MIT Imports: 18 Imported by: 0

Documentation

Overview

Package ai produces human-readable critiques of espresso shots using an LLM.

The package is split intentionally:

  • Analyzer: the public API the HTTP handler calls. It extracts the relevant signals from the raw shot blob, downsamples the time-series, builds a deterministic prompt, and asks the configured Provider.
  • Provider: a small interface implemented by the OpenAI client below. Swapping providers later (Anthropic, Ollama, …) means adding another file in this package; no call sites change.

We never send the entire raw sample blob to the LLM (it can be >1000 points). Instead we downsample to ~60 evenly spaced points, round to a sensible precision, and include the profile JSON verbatim.

Anthropic Claude Messages API provider.

Docs: https://docs.anthropic.com/en/api/messages

Vision-based bean bag extractor.

Given a photo of a coffee bag, ask an OpenAI vision model to read the label and extract the fields our Beans form wants (name, roaster, origin, process, roast level, roast date, notes). Returned as JSON the UI can splat straight into the form for the user to review and tweak before saving.

Two providers are supported: OpenAI chat (inline image_url data URI) and Google Gemini generateContent (inline_data). The API handler prefers OpenAI when a key is configured and falls back to Gemini otherwise. Anthropic vision would follow the same pattern but isn't wired here yet.

Google Gemini generateContent API provider.

Docs: https://ai.google.dev/api/generate-content

Model listing helpers — one per provider. Each returns the set of model IDs that can serve generateContent / chat completions, so the UI can offer a dropdown instead of asking the operator to type a model name.

These are plain functions (not methods on *Provider) because the settings UI needs to list models even when the selected model would otherwise be invalid — we don't want to construct a full provider just to list.

OpenAI Chat Completions provider. No SDK — one tiny HTTP call keeps the dependency surface small and makes auditing trivial.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ComputeCost

func ComputeCost(provider, model string, inTokens, outTokens int64) float64

ComputeCost returns the USD cost of a call using real token counts and the provider's published per-1M-token price. Unknown models fall back to a conservative rate so the dashboard shows *something*.

func IsTransient

func IsTransient(err error) bool

IsTransient exposes the retry predicate so HTTP handlers can distinguish provider-overload errors (which deserve a 503 + retry-later UX) from hard failures (auth, invalid model, bad input) that should surface as 502.

func ListAnthropicModels

func ListAnthropicModels(ctx context.Context, apiKey string) ([]string, error)

ListAnthropicModels returns every model the Anthropic Messages API advertises. All of them support messages, so no client-side filtering.

func ListGeminiModels

func ListGeminiModels(ctx context.Context, apiKey string) ([]string, error)

ListGeminiModels returns the Gemini models that support generateContent, with the "models/" prefix stripped. Gemma and preview-only variants are included too — the user can pick, we don't gatekeep.

func ListOpenAIModels

func ListOpenAIModels(ctx context.Context, apiKey string) ([]string, error)

ListOpenAIModels returns the subset of OpenAI models usable with chat completions. We filter client-side since /v1/models returns every model including embeddings, moderation, TTS, etc.

func SplitModelName

func SplitModelName(name string) (provider, model string)

SplitModelName decomposes a Provider.Name() result (e.g. "anthropic:claude-haiku-4-5") into provider and model. Unknown shapes return (name, "").

Types

type Analysis

type Analysis struct {
	Model     string    `json:"model"` // e.g. "openai:gpt-4o-mini"
	CreatedAt time.Time `json:"created_at"`
	// Rating is the model's high-level grade of the shot. Parsed out of
	// the first line of the LLM response and stripped from Summary before
	// markdown rendering. nil when the model didn't emit one in the
	// expected format — the UI just hides the badge.
	Rating *Rating `json:"rating,omitempty"`
	// Markdown summary (2-5 short paragraphs) suitable for rendering directly.
	Summary string `json:"summary"`
	// Extracted numeric metrics the UI can render as stat tiles.
	Metrics Metrics `json:"metrics"`
	// Usage is rough input/output byte accounting for the usage ledger.
	// Not part of the cached analysis payload the UI cares about, so we
	// keep it out of the JSON.
	Usage CallUsage `json:"-"`
}

Analysis is the structured output returned by the analyzer.

type Analyzer

type Analyzer struct {
	// contains filtered or unexported fields
}

Analyzer turns a Shot into an Analysis.

func NewAnalyzer

func NewAnalyzer(p Provider) *Analyzer

NewAnalyzer wraps a Provider.

func (*Analyzer) Analyze

func (a *Analyzer) Analyze(ctx context.Context, in ShotInput) (*Analysis, error)

Analyze computes metrics then asks the provider for a critique.

func (*Analyzer) ModelName

func (a *Analyzer) ModelName() string

ModelName exposes the provider identifier for cache keying.

type AnthropicConfig

type AnthropicConfig struct {
	APIKey   string // required
	Model    string // default "claude-3-5-haiku-latest"
	Endpoint string
	Version  string // x-api-key header version, default "2023-06-01"
	Timeout  time.Duration
}

AnthropicConfig configures the provider. Zero values pick sensible defaults.

type AnthropicProvider

type AnthropicProvider struct {
	// contains filtered or unexported fields
}

AnthropicProvider calls the Anthropic Messages API.

func NewAnthropic

func NewAnthropic(cfg AnthropicConfig) (*AnthropicProvider, error)

NewAnthropic constructs a provider. Returns an error if no API key is set.

func (*AnthropicProvider) Complete

func (p *AnthropicProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends a system+user prompt pair and returns the assistant text along with real token usage parsed from the Anthropic response.

func (*AnthropicProvider) Name

func (p *AnthropicProvider) Name() string

Name reports a stable identifier for cache keying.

type BeanInfo

type BeanInfo struct {
	Name       string
	Roaster    string
	Origin     string
	Process    string
	RoastLevel string
	RoastDate  string // ISO yyyy-mm-dd; empty if unknown
	Notes      string
}

BeanInfo is the subset of a Bean the analyzer actually uses. Kept here (rather than importing internal/beans) so internal/ai stays free of app-layer dependencies and is easy to unit-test.

type CallUsage

type CallUsage struct {
	InputTokens  int64
	OutputTokens int64
	DurationMs   int64
}

CallUsage is the minimum-viable accounting struct: real token counts (parsed from the provider response) plus wall-clock duration. Feeds the usage ledger + cost computation.

type Coach

type Coach struct {
	// contains filtered or unexported fields
}

Coach wraps a Provider to produce structured single-suggestion output.

func NewCoach

func NewCoach(p Provider) *Coach

NewCoach wraps a Provider.

func (*Coach) ModelName

func (c *Coach) ModelName() string

ModelName exposes the provider identifier.

func (*Coach) Suggest

func (c *Coach) Suggest(ctx context.Context, in CoachInput) (*Suggestion, error)

Suggest runs the coach and returns the parsed suggestion.

type CoachInput

type CoachInput struct {
	Shot       ShotInput
	ShotRating *int          // 1..5 or nil (user's own rating)
	ShotNote   string        // user's own note
	Siblings   []ShotSummary // recent shots with the same profile, newest first
}

CoachInput is the minimal bundle of context the coach prompt needs.

type Comparator

type Comparator struct {
	// contains filtered or unexported fields
}

Comparator wraps a Provider for the compare task.

func NewComparator

func NewComparator(p Provider) *Comparator

NewComparator wraps a provider.

func (*Comparator) Compare

func (c *Comparator) Compare(ctx context.Context, in CompareInput) (*Comparison, error)

Compare runs the comparator and returns the markdown report.

func (*Comparator) ModelName

func (c *Comparator) ModelName() string

ModelName returns the provider identifier.

type CompareInput

type CompareInput struct {
	A       ShotInput
	B       ShotInput
	ARating *int
	BRating *int
	ANote   string
	BNote   string
}

CompareInput bundles two shots (A and B) plus their user feedback.

type Comparison

type Comparison struct {
	Model     string    `json:"model"`
	CreatedAt time.Time `json:"created_at"`
	Markdown  string    `json:"markdown"`
	Usage     CallUsage `json:"-"`
}

Comparison is the LLM output plus bookkeeping.

type CostBreak

type CostBreak struct {
	Calls        int     `json:"calls"`
	InputTokens  int64   `json:"input_tokens"`
	OutputTokens int64   `json:"output_tokens"`
	CostUSD      float64 `json:"cost_usd"`
	LastUsedUnix int64   `json:"last_used_unix,omitempty"`
}

CostBreak is the per-slice rollup shown in the dashboard.

type ExtractBeanRequest

type ExtractBeanRequest struct {
	APIKey string
	Model  string // e.g. "gpt-4o-mini" — must be a vision-capable OpenAI model
	Image  []byte
	MIME   string // e.g. "image/jpeg", "image/png", "image/webp"
}

ExtractBeanRequest is the input bundle for a single extraction.

type ExtractBeanRequestGemini

type ExtractBeanRequestGemini struct {
	APIKey   string
	Model    string // e.g. "gemini-2.5-flash" — any multimodal Gemini works
	Image    []byte
	MIME     string
	Endpoint string // optional override; defaults to the public v1beta endpoint
}

ExtractBeanRequestGemini is the Gemini-flavoured input bundle.

type ExtractBeanResponse

type ExtractBeanResponse struct {
	Bean  ExtractedBean
	Usage TokenUsage
}

ExtractBeanResponse bundles the parsed bean plus usage so the caller can record the ledger entry.

func ExtractBeanFromImage

func ExtractBeanFromImage(ctx context.Context, req ExtractBeanRequest) (*ExtractBeanResponse, error)

ExtractBeanFromImage sends the image to OpenAI's chat endpoint using a vision-capable model and returns a parsed ExtractedBean plus usage.

func ExtractBeanFromImageGemini

func ExtractBeanFromImageGemini(ctx context.Context, req ExtractBeanRequestGemini) (*ExtractBeanResponse, error)

ExtractBeanFromImageGemini sends the image to Gemini's generateContent endpoint as inline_data and asks for a strict JSON response. All Gemini 1.5+/2.x/2.5 models are multimodal, so we don't need a vision-capability allow-list — we just fall back to gemini-2.5-flash if the caller's configured model string is empty.

type ExtractedBean

type ExtractedBean struct {
	Name       string `json:"name"`
	Roaster    string `json:"roaster"`
	Origin     string `json:"origin"`
	Process    string `json:"process"`
	RoastLevel string `json:"roast_level"`
	RoastDate  string `json:"roast_date"` // ISO yyyy-mm-dd
	Notes      string `json:"notes"`
	// Confidence is the model's self-reported confidence on a 0..1
	// scale. Handy for the UI to flag low-quality reads ("we couldn't
	// read much — please double-check").
	Confidence float64 `json:"confidence"`
}

ExtractedBean mirrors the beans.Input struct, JSON-tagged to match. Every field is optional — the LLM might only be confident about the roaster's name and the roast date, and we want to surface whatever it found without forcing it to guess.

type GeminiConfig

type GeminiConfig struct {
	APIKey   string // required
	Model    string // default "gemini-1.5-flash"
	Endpoint string // v1beta base URL; default official endpoint
	Timeout  time.Duration
}

GeminiConfig configures the provider.

type GeminiProvider

type GeminiProvider struct {
	// contains filtered or unexported fields
}

GeminiProvider calls the Google Generative Language API.

func NewGemini

func NewGemini(cfg GeminiConfig) (*GeminiProvider, error)

NewGemini constructs a provider. Returns an error if no API key is set.

func (*GeminiProvider) Complete

func (p *GeminiProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends the prompt pair and returns the assistant text along with real token usage from the response.

Gemini has no separate "system" role; we fold it into the request as system_instruction which is the documented equivalent.

func (*GeminiProvider) Name

func (p *GeminiProvider) Name() string

Name reports a stable identifier for cache keying.

type GenerateImageRequest

type GenerateImageRequest struct {
	// APIKey is required.
	APIKey string
	// Model defaults to "gemini-2.5-flash-image-preview" (Nano Banana).
	// The preview family returns one or more inline_data parts.
	Model string
	// Prompt is the human-readable instruction.
	Prompt string
	// Endpoint lets tests inject a fake server. Defaults to the official
	// v1beta base URL.
	Endpoint string
	// Timeout bounds the entire HTTP round-trip (including image encoding
	// on the server). Image generation is slower than text; 90s is safe.
	Timeout time.Duration
}

GenerateImageRequest configures a single Gemini image-generation call.

type GeneratedImage

type GeneratedImage struct {
	MimeType string
	Data     []byte
}

GeneratedImage is the decoded binary payload returned by the API.

func GenerateImage

func GenerateImage(ctx context.Context, req GenerateImageRequest) (*GeneratedImage, error)

GenerateImage calls Gemini's image-capable model with a plain text prompt and returns the first inline_data part from the response. Gemini does not have a dedicated "/images" endpoint — image generation rides on the same generateContent surface and is enabled by the model choice plus the response_modalities hint.

func GenerateImageOpenAI

func GenerateImageOpenAI(ctx context.Context, req OpenAIImageRequest) (*GeneratedImage, error)

GenerateImageOpenAI calls the OpenAI images.generations endpoint.

type Metrics

type Metrics struct {
	Duration       float64 `json:"duration_s"`
	PreinfusionEnd float64 `json:"preinfusion_end_s,omitempty"`
	PeakPressure   float64 `json:"peak_pressure_bar"`
	AvgPressure    float64 `json:"avg_pressure_bar"`
	PeakFlow       float64 `json:"peak_flow_mls"`
	AvgFlow        float64 `json:"avg_flow_mls"`
	FinalWeight    float64 `json:"final_weight_g"`
	FirstDripAt    float64 `json:"first_drip_s,omitempty"`
}

Metrics are computed locally from the samples before sending to the LLM, and are returned as-is alongside the critique.

type Namer

type Namer struct {
	// contains filtered or unexported fields
}

Namer wraps a Provider to produce profile names.

func NewNamer

func NewNamer(p Provider) *Namer

NewNamer wraps a provider.

func (*Namer) ModelName

func (n *Namer) ModelName() string

ModelName returns the provider identifier.

func (*Namer) Suggest

Suggest asks the LLM for a name suggestion.

type OpenAIConfig

type OpenAIConfig struct {
	APIKey   string // required
	Model    string // default "gpt-4o-mini"
	Endpoint string // default official OpenAI chat endpoint
	Timeout  time.Duration
}

OpenAIConfig configures the provider. Zero values pick sensible defaults.

type OpenAIImageRequest

type OpenAIImageRequest struct {
	APIKey  string
	Model   string // default "gpt-image-1"
	Prompt  string
	Size    string // e.g. "1024x1024" (default), "1024x1536", "1536x1024"
	Base    string // default https://api.openai.com/v1
	Timeout time.Duration
}

OpenAIImageRequest configures a single OpenAI image-generation call. The provider uses the /v1/images/generations surface, which accepts a plain text prompt and returns base64-encoded PNG bytes.

Docs: https://platform.openai.com/docs/api-reference/images/create

type OpenAIProvider

type OpenAIProvider struct {
	// contains filtered or unexported fields
}

OpenAIProvider calls the OpenAI Chat Completions API.

func NewOpenAI

func NewOpenAI(cfg OpenAIConfig) (*OpenAIProvider, error)

NewOpenAI constructs a provider. Returns an error if no API key is set.

func (*OpenAIProvider) Complete

func (p *OpenAIProvider) Complete(ctx context.Context, system, user string) (string, TokenUsage, error)

Complete sends a system+user prompt pair and returns the assistant text along with real token usage from the response.

func (*OpenAIProvider) Name

func (p *OpenAIProvider) Name() string

Name reports a stable identifier for cache keying.

type ProfileNameInput

type ProfileNameInput struct {
	Profile     json.RawMessage
	CurrentName string
}

ProfileNameInput is the bundle the namer sees.

type ProfileNameSuggestion

type ProfileNameSuggestion struct {
	Model     string    `json:"model"`
	CreatedAt time.Time `json:"created_at"`
	Name      string    `json:"name"`
	Reason    string    `json:"reason"`
	Usage     CallUsage `json:"-"`
}

ProfileNameSuggestion is the result — a short name plus a one-line reason so the user can tell what the model picked up on.

type Provider

type Provider interface {
	// Complete sends a system+user prompt pair and returns the assistant
	// text along with real token usage parsed from the provider response.
	// Implementations MUST return usage whenever the API gives it to them;
	// a zero-valued TokenUsage is only acceptable when the upstream
	// response omitted the counts (fall back to zeros, never estimate).
	Complete(ctx context.Context, system, user string) (string, TokenUsage, error)
	// Name returns a short identifier (e.g. "openai:gpt-4o-mini") used for
	// cache keying so a change of model invalidates old cached analyses.
	Name() string
}

Provider is the minimal contract the Analyzer needs from an LLM backend.

type Rating

type Rating struct {
	Score int    `json:"score"` // 0..10 inclusive
	Label string `json:"label,omitempty"`
}

Rating is a compact 0-10 grade of a shot with a one-word qualitative label. The label vocabulary is small on purpose so the UI can colour- code it: "excellent", "good", "fine", "off", "bad".

type Record

type Record struct {
	Time         time.Time
	Provider     string // openai, anthropic, gemini
	Model        string // gpt-4o-mini, claude-haiku-4-5, ...
	Feature      string // analyze, coach, compare, digest, ask, name, transcribe, image
	InputTokens  int64
	OutputTokens int64
	DurationMs   int64
	ShotID       string
	OK           bool
	Err          string
}

Record is the event we log for each LLM call.

type Recorder

type Recorder struct {
	// contains filtered or unexported fields
}

Recorder persists AI call metadata to SQLite.

func NewRecorder

func NewRecorder(db *sql.DB) (*Recorder, error)

NewRecorder wires a Recorder to an already-open *sql.DB (we reuse the shots database to avoid a second file).

func (*Recorder) Record

func (r *Recorder) Record(ctx context.Context, rec Record)

Record stores a single call. Failures in the recorder are logged but never bubble up — we don't want telemetry to break user-facing flows.

func (*Recorder) Summarize

func (r *Recorder) Summarize(ctx context.Context, days, recent int) (*UsageSummary, error)

Summarize returns rollups for the last `days` days plus the N most recent raw records. `days<=0` means "all time".

type ShotInput

type ShotInput struct {
	Name        string
	ProfileName string
	Samples     json.RawMessage
	Profile     json.RawMessage
	// Bean describes the bag the shot was pulled with (optional). When
	// set, the analyzer surfaces it in the user prompt so the LLM can
	// factor origin / roast age / process into its critique instead of
	// guessing from numbers alone.
	Bean *BeanInfo
	// Grind is the user's grinder setting for this shot (free-form
	// label, e.g. "2.8" or "12 clicks"). Empty = not recorded.
	Grind string
	// GrindRPM is the variable-speed grinder RPM for this shot. Nil =
	// not recorded / not applicable to this grinder.
	GrindRPM *float64
}

ShotInput is the subset of a shot the analyzer needs. We accept raw JSON for samples + profile so the caller doesn't have to decode them.

type ShotSummary

type ShotSummary struct {
	Name         string  `json:"name"`
	TimeISO      string  `json:"time_iso"`
	Duration     float64 `json:"duration_s"`
	PeakPressure float64 `json:"peak_pressure_bar"`
	AvgPressure  float64 `json:"avg_pressure_bar"`
	PeakFlow     float64 `json:"peak_flow_mls"`
	FinalWeight  float64 `json:"final_weight_g"`
	FirstDripAt  float64 `json:"first_drip_s,omitempty"`
	Rating       *int    `json:"user_rating,omitempty"`
	Note         string  `json:"user_note,omitempty"`
}

ShotSummary is the compact per-shot line item the coach sees for historical comparison. Keep it cheap; we never include raw samples.

type Suggestion

type Suggestion struct {
	Model      string    `json:"model"`
	CreatedAt  time.Time `json:"created_at"`
	Change     string    `json:"change"`            // short imperative, e.g. "Grind 2 notches finer"
	Rationale  string    `json:"rationale"`         // 1-2 sentences citing the numbers
	VarKey     string    `json:"var_key,omitempty"` // profile variable key or ""
	Before     *float64  `json:"before,omitempty"`  // current profile value
	After      *float64  `json:"after,omitempty"`   // proposed new value
	Confidence string    `json:"confidence"`        // "low"|"medium"|"high"
	Usage      CallUsage `json:"-"`
}

Suggestion is the structured output of the coach. The LLM is asked to return JSON so the UI can render labels/values directly.

type TokenUsage

type TokenUsage struct {
	InputTokens  int64 `json:"input_tokens"`
	OutputTokens int64 `json:"output_tokens"`
}

TokenUsage is the real input/output token count reported by a provider for a single call. Zero values mean the provider didn't report counts.

type TranscribeGeminiRequest

type TranscribeGeminiRequest struct {
	APIKey   string
	Model    string        // default "gemini-2.5-flash"
	Audio    []byte        // raw audio bytes (20 MB inline cap)
	MIME     string        // e.g. "audio/webm", "audio/mp4", "audio/ogg"
	Endpoint string        // default https://generativelanguage.googleapis.com/v1beta
	Timeout  time.Duration // default 2m
	// Prompt overrides the default "transcribe this audio" instruction.
	Prompt string
}

TranscribeGeminiRequest configures a single speech-to-text call against Gemini's generateContent surface. Gemini accepts audio as an inlineData part alongside a text instruction; we ask the model to return nothing but the transcript so callers can store it verbatim.

type TranscribeOpenAIRequest

type TranscribeOpenAIRequest struct {
	APIKey  string
	Model   string        // default "whisper-1"
	Audio   []byte        // raw audio bytes
	MIME    string        // e.g. "audio/webm", "audio/mp4"
	Base    string        // default https://api.openai.com/v1
	Timeout time.Duration // default 2m
	// Language is an optional ISO-639-1 hint (e.g. "en"). Leave empty to
	// let Whisper auto-detect.
	Language string
	// Prompt is an optional short hint that nudges the decoder (helpful
	// for domain words like "profile", "preinfusion"). Leave empty for
	// generic transcription.
	Prompt string
}

TranscribeOpenAIRequest configures a single Whisper-style speech-to-text call against OpenAI's /v1/audio/transcriptions endpoint.

type Transcription

type Transcription struct {
	Text     string `json:"text"`
	Language string `json:"language,omitempty"`
}

Transcription is the decoded response from OpenAI's transcription API.

func TranscribeGemini

func TranscribeGemini(ctx context.Context, req TranscribeGeminiRequest) (*Transcription, error)

TranscribeGemini uploads the audio inline and returns the model's transcription. Gemini is more permissive than Whisper about content (multilingual, can handle overlapping speech, accepts long prompts) but is capped at ~20 MB of inline data per request.

func TranscribeOpenAI

func TranscribeOpenAI(ctx context.Context, req TranscribeOpenAIRequest) (*Transcription, error)

TranscribeOpenAI sends audio bytes as multipart/form-data and returns the transcribed text. Audio can be in any format the Whisper endpoint accepts (webm/opus, mp4/aac, mp3, wav, flac, ogg — up to 25 MB).

type UsageSummary

type UsageSummary struct {
	Since        time.Time            `json:"since"`
	TotalCalls   int                  `json:"total_calls"`
	TotalCostUSD float64              `json:"total_cost_usd"`
	ByProvider   map[string]CostBreak `json:"by_provider"`
	ByFeature    map[string]CostBreak `json:"by_feature"`
	ByModel      map[string]CostBreak `json:"by_model"`
	Recent       []Record             `json:"recent"` // newest first
}

UsageSummary aggregates recent activity for the dashboard.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL