imagegen

package
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 18, 2026 License: MIT Imports: 13 Imported by: 0

Documentation

Overview

Package imagegen implements an image-generation agent that drives OpenRouter image-output models (e.g. google/gemini-3.1-flash-image-preview).

Unlike text agents, imagegen does not implement the generic agent.Agent interface: its I/O shape (base64-decoded PNGs) is too specific and wrapping it in a generic Response would lose type safety.

Index

Constants

View Source
const (
	OutcomeSuccess          = "success"
	OutcomeProviderError    = "provider_error"    // KindUpstreamError
	OutcomeTimeout          = "timeout"           // KindTimeout (local deadline)
	OutcomeTextRefusal      = "text_refusal"      // KindTextRefusal (Google-style)
	OutcomeSilentBlockOAI   = "silent_block_oai"  // KindSilentBlockOAI (output-side filter)
	OutcomeUnknownNoImages  = "unknown_no_images" // KindUnknownNoImages
	OutcomeEmptyPrompt      = "empty_prompt"
	OutcomeNoChoices        = "no_choices"
	OutcomeAllDecodesFailed = "all_decodes_failed" // model produced images but base64 unwrap failed
	OutcomeUnknownFailure   = "unknown"
)

Outcome classifies the terminal state of a Generate call. It lives on the imagegen.Generate span as imagegen.outcome and powers TraceQL queries like {span.imagegen.outcome="silent_block_oai"} for triage. New shapes land in OutcomeUnknownFailure — surfaces in error.upstream_message rather than inventing a new bucket silently.

Outcome values intentionally mirror FailureKind.String() so the same vocabulary is used in tool messages, structured logs, and span attrs. The mapping happens in outcomeFromKind below.

Variables

This section is empty.

Functions

This section is empty.

Types

type Agent

type Agent struct {
	// contains filtered or unexported fields
}

Agent wraps an OpenRouter client and emits image-generation requests with modalities=["image","text"] set.

func New

func New(client openrouter.Client, cfg *config.ImageGeneratorConfig, logger *slog.Logger) *Agent

New constructs an image-generation Agent.

func (*Agent) Generate

func (a *Agent) Generate(ctx context.Context, req Request) (resp *Response, err error)

Generate runs a single image-generation call. It never returns partial results: if the model produced zero images it returns an error the caller can surface to the user.

type DecodedImage

type DecodedImage struct {
	MimeType string // e.g. "image/png"
	Data     []byte // raw decoded bytes
}

DecodedImage is a single output image with its MIME type.

type FailureKind added in v0.9.0

type FailureKind int

FailureKind classifies why an image-generation call did not produce an image.

Sources of signal differ by upstream provider:

  • Google image models (e.g. gemini-2.5-flash-image / nano-banana) return finish_reason=stop for both success and refusal — only the presence of an explicit text in `content` and the absence of `images` distinguishes a refusal. The text is quotable.
  • OpenAI image models (gpt-5.4-image-2) molchat: empty content, null finish_reason, no refusal field. The only correlate is moderation_latency in the OpenRouter generation API, which we do not consult synchronously.

The classifier therefore prefers shape signals (Images / Content / Provider) over finish_reason values. See docs/bugs/2026-05-06-imagegen-empty-stream-no-signal.

const (
	// KindUnknown is the zero value — never returned by the classifier on a
	// real failure path. Treat it as a programmer error if observed.
	KindUnknown FailureKind = iota

	// KindTimeout — our context deadline tripped before the upstream
	// responded (typically the per-call timeout in agents.image_generator).
	KindTimeout

	// KindUpstreamError — network failure, 5xx, or OpenRouter error envelope.
	// Distinct from KindTimeout so dashboards can plot them separately.
	KindUpstreamError

	// KindTextRefusal — the model returned a non-empty text explanation but
	// no image (typical of Google nano-banana). Text is quotable to the user.
	KindTextRefusal

	// KindSilentBlockOAI — OpenAI returned an empty stream with no text and
	// no finish_reason. Almost always a content-policy block; cannot be
	// proven without a follow-up generation-API lookup.
	KindSilentBlockOAI

	// KindUnknownNoImages — non-OpenAI provider returned no images and no
	// text. Genuinely unknown cause.
	KindUnknownNoImages
)

func (FailureKind) String added in v0.9.0

func (k FailureKind) String() string

String returns a stable lower_snake identifier suitable for span attrs, metric labels, and structured log fields.

type ImagegenFailure added in v0.9.0

type ImagegenFailure struct {
	Kind     FailureKind
	Text     string // msg.Content; non-empty for KindTextRefusal
	Provider string // resp.Provider (e.g. "OpenAI", "Google"); "" if call never reached upstream
	Cause    error  // wrapped network/timeout error; nil for content-shape kinds
}

ImagegenFailure is the typed error returned by Generate when no image is produced. Callers should use errors.As to recover the structured fields.

The tool wrapper uses Kind to pick a tailored instruction for the LLM and Text to quote the model's own refusal verbatim where applicable.

func (*ImagegenFailure) Error added in v0.9.0

func (f *ImagegenFailure) Error() string

Error implements error. The format includes the kind for grep-ability and either the model's text (truncated) or the underlying cause.

func (*ImagegenFailure) Unwrap added in v0.9.0

func (f *ImagegenFailure) Unwrap() error

Unwrap exposes the underlying network/timeout error so errors.Is/As over Cause continues to work (e.g. errors.Is(err, context.DeadlineExceeded)).

type Request

type Request struct {
	UserID int64

	// Prompt is the text description of the image to generate, in any language.
	Prompt string

	// InputImages are reference images for editing/combining. Pass the
	// user's attached photos or artifacts loaded from storage. May be empty
	// for pure text-to-image generation.
	InputImages []openrouter.FilePart

	// AspectRatio is one of the values accepted by the target model, e.g.
	// "1:1", "16:9", "21:9". Empty means model default (typically 1:1).
	AspectRatio string

	// ImageSize is one of "1K", "2K", "4K". Empty means model default (1K).
	// Note: "0.5K" passes the OpenRouter validator but is rejected upstream
	// by Google for gemini-3.1-flash-image-preview (verified end-to-end via
	// curl on 2026-04-30). "512" is blocked by OR's validator. Both ranges
	// are unusable today — only the larger sizes work.
	ImageSize string
}

Request parameters for a single image generation call.

type Response

type Response struct {
	Images      []DecodedImage
	TextContent string // Optional text the model emitted alongside images

	PromptTokens     int
	CompletionTokens int
	Cost             *float64
	Duration         time.Duration
}

Response contains generated images and accounting metadata.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL