llm

package
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 30, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Overview

Package llm defines Harbor's LLM-client interface and the runtime-wide invariants that guard every `Complete` call.

The interface is **one method**, `Complete(ctx, req) (resp, error)` (RFC §6.5). Tool dispatch is the runtime's job (RFC §6.4 + brief 07 "code-level tool calling"); the LLM client is reduced to a JSON- producing chat-completion adapter. Provider-native tool-calling shapes (the `tools=` request parameter, the `tool_choice=` mode selector, OpenAI's `function_call`, Anthropic's `tool_use` blocks, Gemini's function-calling protocol, etc.) never appear in this package — the static guard in `scripts/smoke/phase-32.sh` enforces the boundary by greppping for the canonical symbol names.

The message envelope is provider-agnostic: `ChatMessage.Content` is a sum-type that carries either `Text *string` (the common case) or `Parts []ContentPart` for multimodal input (D-021). Multimodal parts (`ImagePart`, `AudioPart`, `FilePart`) each carry one of three supply forms — `URL`, `DataURL`, or `Artifact` — and the runtime auto-materializes inline `DataURL` content above the heavy-output threshold into `ArtifactRef`s before persistence and emit (D-022).

**Context-window safety net (D-026).** Every `Complete` call routes through a catch-all pass at the LLM-client edge that (a) auto- materializes oversize `DataURL` content, (b) asserts no raw heavy content survived ANY producer's normalization step (else `ErrContextLeak`), (c) estimates token usage against the configured `ModelProfile.ContextWindowTokens` cap and fails with `ErrContextWindowExceeded` when the estimate is within `ContextWindowReserve` of the cap. **V1 fails loudly**; auto-cascading recovery is post-V1 work.

The safety pass is **mandatory by construction**: `Open` returns a wrapped client (`safetyClient`) that runs the pass before delegating to the underlying `Driver`. Drivers cannot bypass the pass through the registry; a hand-constructed `Driver` would likewise have to compose `enforceContextSafety` to maintain the runtime invariant.

Concurrent-reuse contract (D-025): one `LLMClient` is safe to share across N concurrent goroutines. Mutable state on the client (or the `Driver`) is forbidden; per-call state lives in `ctx` and the request value. The package-level `concurrent_test.go` pins this with N=128 invocations under `-race`.

Index

Constants

View Source
const (
	// EventTypeImageMaterialized — emitted when the safety-pass's
	// auto-materialize step rewrites an inline DataURL ≥ heavy-output
	// threshold to an ArtifactRef (D-022). Carries the source
	// CompleteRequest's model name + the new ref's id + size.
	EventTypeImageMaterialized events.EventType = "llm.image.materialized"
	// EventTypeContextLeak — emitted when the safety-pass detects
	// raw heavy content that survived every upstream producer's
	// normalization step (D-026 violation). The bus event lets
	// operators trace the offending producer.
	EventTypeContextLeak events.EventType = "llm.context_leak"
	// EventTypeContextWindowExceeded — emitted when the safety-pass
	// token-budget guard fires (D-026). Payload carries the
	// estimated token count + the model's cap + the reserve
	// fraction so operators can quantify how often planner-side
	// recovery (truncate / summarize) needs to engage.
	EventTypeContextWindowExceeded events.EventType = "llm.context_window_exceeded"
	// EventTypeCostRecorded — emitted by the runtime AFTER a
	// successful Complete. Phase 36a (governance accumulator)
	// subscribes; Phase 32 registers the type + ships the payload
	// shape so Phase 36a's emit site lands clean.
	EventTypeCostRecorded events.EventType = "llm.cost.recorded"
	// EventTypeModeDowngraded — emitted by Phase 35's structured-
	// output downgrade chain (`json_schema → json_object → text`).
	// Phase 32 registers the type as a forward-compat seam; no
	// downgrade logic ships in Phase 32.
	EventTypeModeDowngraded events.EventType = "llm.mode_downgraded"
	// EventTypeRetryWithFeedback (Phase 36) — emitted by the retry
	// wrapper per corrective re-ask. Carries the attempt index and a
	// truncated `Reason` derived from the validator's error.
	EventTypeRetryWithFeedback events.EventType = "llm.retry_with_feedback"
	// EventTypePostureReadAdmin — Phase 72g (D-112). Emitted when an
	// admin-scoped caller reads ANOTHER tenant's LLM posture via the
	// `llm.posture` Protocol method. An own-tenant read does NOT emit.
	// The cross-tenant read is a privileged action and lands on the
	// audit trail per CLAUDE.md §7 + RFC §6.15.
	EventTypePostureReadAdmin events.EventType = "llm.posture_read_admin"
	// EventTypeCompletionChunk — Phase 107 streaming completion event.
	// Emitted per token delta from the LLM provider under the originating
	// run's identity quadruple. The `Done=true` chunk fires exactly once
	// per stream (terminator marker). SafePayload — deltas are per-session
	// operator-visible content.
	EventTypeCompletionChunk events.EventType = "llm.completion.chunk"
)

Phase 32 LLM-edge event types. Registered via init() so the canonical events registry stays the single source of truth (see internal/events/events.go and AGENTS.md §17.6's "wiring gap" lesson — register at declaration time, publish at use time).

All payloads are SafePayload (compose events.SafeSealed): they carry no secret-shaped data. Identity is the Harbor quadruple; content payloads (artifact refs, MIME types, byte counts, model names) are operator-visible by design.

View Source
const (
	DefaultContextWindowReserve = 0.05   // 5%
	DefaultHeavyOutputThreshold = 32_768 // 32 KiB; matches D-022 / RFC §6.10
	// DefaultMaxRetries (Phase 36) — the retry-with-feedback bound
	// when `ModelProfile.MaxRetries` is zero. Conservative: one
	// corrective re-ask after the original attempt.
	DefaultMaxRetries = 1
)

Defaults applied when the snapshot's corresponding field is zero. Kept here (not in `validate.go`) so an operator who constructs a snapshot programmatically still gets reasonable behaviour without every test wiring also touching the config layer.

View Source
const DefaultDriver = "bifrost"

DefaultDriver names the production LLM driver Phase 64 (D-089) flipped this constant to point at — `"bifrost"`, the pure-Go LLM gateway shipped by Phase 33. Before Phase 64 this was `"mock"`; the flip closes the §13 "test stubs as production defaults" amendment for the LLM seam.

Operators in production set `llm.driver` explicitly to `"bifrost"` (the same value the config defaults to). The `mock` driver still self-registers via init() — its package init runs when an importer (a test that builds a deterministic LLM stack) blank-imports it — but the production `cmd/harbor` binary never imports the mock package, so a config that lists `driver: mock` in a production build hits `ErrUnknownDriver: "mock" (registered: bifrost)` rather than silently routing through a stub.

Dev-only escape hatch (D-089): the `harbor dev` subcommand reads `HARBOR_DEV_ALLOW_MOCK=1` and, when set, blank-imports the mock driver itself (the conditional blank-import lives at the subcommand boundary, not in `main.go`) AND prints a stderr banner `[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]` on every boot. Outside that one explicit dev path, the mock is unreachable.

Variables

View Source
var (
	// ErrUnknownDriver — Open was asked for a driver name no
	// registered factory handles. The error's message names the
	// registered drivers so misconfigurations are obvious (§4.4).
	ErrUnknownDriver = errors.New("llm: unknown driver")
	// ErrClientClosed — Complete called after Close. The wrapped
	// driver returns this; the safetyClient propagates it verbatim.
	ErrClientClosed = errors.New("llm: client is closed")
	// ErrIdentityMissing — Complete called with a ctx that does not
	// carry an `identity.Identity` (or `identity.Quadruple`).
	// AGENTS.md §6 rule 9 — identity is mandatory at every Harbor
	// boundary; the runtime fails closed.
	ErrIdentityMissing = errors.New("llm: identity missing from ctx")
	// ErrInvalidContent — a `ChatMessage.Content` is malformed: both
	// `Text` and `Parts` set, or neither, or a `ContentPart` whose
	// `Type` discriminator doesn't match its payload (e.g. Type=image
	// with `Image == nil`). The safety pass rejects loudly rather than
	// papering over the inconsistency.
	ErrInvalidContent = errors.New("llm: invalid message content")
	// ErrContextLeak — runtime-wide invariant violation (D-026). A
	// raw byte / string / DataURL ≥ heavy-output threshold survived
	// every producer's normalization step and reached the LLM-client
	// edge. The safety pass fails the request; the bus emits
	// `llm.context_leak` so operators can find the offending
	// producer.
	ErrContextLeak = errors.New("llm: raw heavy content reached LLM-client edge — D-026 violation")
	// ErrContextWindowExceeded — the token-budget guard fired (D-026).
	// The assembled `CompleteRequest`'s estimated token count is
	// within `ContextWindowReserve` of the model's configured
	// `ContextWindowTokens` cap. V1 fails loudly; auto-cascade is
	// post-V1 work — the planner is responsible for recovery (drop
	// older turns, summarize, etc.).
	ErrContextWindowExceeded = errors.New("llm: estimated tokens within reserve of model context window")
	// ErrInvalidConfig — `Open` called with a `ConfigSnapshot` that
	// fails structural validation (driver name empty, model profile
	// missing for the request's model, etc.). Distinct from
	// ErrUnknownDriver — that's a registry miss, this is a
	// configuration miss.
	ErrInvalidConfig = errors.New("llm: invalid configuration")
	// ErrUnsupportedModel — the safety net or driver hit a model
	// name with no matching `ModelProfile`. Required because the
	// token-budget guard depends on a profile's context-window cap.
	ErrUnsupportedModel = errors.New("llm: model has no configured ModelProfile")
	// ErrInvalidJSONSchema (Phase 35) — the provider returned a
	// `Complete` whose JSON output did not validate against the
	// requested schema (or rejected the schema itself at the wire
	// layer). The downgrade wrapper observes this via
	// `IsInvalidJSONSchemaError` and steps the request down the chain.
	// Drivers MAY wrap their provider-specific schema errors with this
	// sentinel; the classifier also matches a small allowlist of error
	// substrings to handle providers that surface only a free-form
	// `error` string.
	ErrInvalidJSONSchema = errors.New("llm: response failed JSON-schema validation")
	// ErrDowngradeExhausted (Phase 35) — the downgrade wrapper ran
	// every step in the chain and the inner call STILL produced
	// `ErrInvalidJSONSchema`. Surfaces with the wrapped chain history
	// so operators can correlate against `llm.mode_downgraded` events.
	ErrDowngradeExhausted = errors.New("llm: structured-output downgrade chain exhausted")
	// ErrRetryExhausted (Phase 36) — the retry wrapper exceeded the
	// per-model `MaxRetries` bound. Wraps the chain of validator
	// failures so operators can see why each attempt failed.
	ErrRetryExhausted = errors.New("llm: retry-with-feedback budget exhausted")
	// ErrValidationFailed (Phase 36) — surfaces when the validator
	// returns non-nil AND the retry wrapper is NOT registered (the
	// caller asked for validation without a wrapper to retry). The
	// wrapper-registered path uses the validator's own error verbatim
	// in `RetryWithFeedbackPayload.Reason`.
	ErrValidationFailed = errors.New("llm: response validator rejected output")
	// ErrOrphanToolCall — an assistant message with `ToolCalls` is
	// not followed by the corresponding `RoleTool` messages whose
	// `ToolCallID` matches each `ToolCalls[i].ID`. OpenAI's wire
	// spec requires the pairing; the safety pass rejects loudly so
	// the producer is forced to fix the upstream omission rather
	// than silently shipping an invalid wire shape.
	ErrOrphanToolCall = errors.New("llm: assistant message with ToolCalls is not followed by matching RoleTool messages")
)

Sentinel errors. Callers compare via errors.Is.

Functions

func HasIdentity

func HasIdentity(ctx context.Context) bool

HasIdentity reports whether `ctx` carries a complete Harbor identity. The LLM-client edge MUST validate this before invoking any driver — the runtime fails closed on missing identity (AGENTS.md §6 rule 9, AGENTS.md §13 forbidden-practices).

Used by `safetyClient.Complete`; exposed so test helpers can pin the check at the call site.

func IsInvalidJSONSchemaError

func IsInvalidJSONSchemaError(err error) bool

IsInvalidJSONSchemaError reports whether `err` represents a schema-class failure that the Phase 35 downgrade chain should treat as a signal to step the request down to the next `OutputMode`.

The classifier checks two paths:

  1. `errors.Is(err, ErrInvalidJSONSchema)` — drivers / wrappers that classify upstream errors and wrap with the sentinel.
  2. A small case-insensitive substring scan against `invalidJSONSchemaErrorMarkers`. This handles providers that surface only a free-form error string.

The substring allowlist is deliberately narrow to avoid false positives on transient / IO / auth failures. Returns false for nil.

func Register

func Register(name string, factory Factory)

Register installs a driver factory under `name`. Drivers self- register from their package `init()`; `cmd/harbor` blank-imports the production driver to trigger registration (Phase 33+).

Re-registering the same name panics — the registration model is write-once-at-init and a duplicate signals a build misconfig.

func RegisterCorrectionsWrapper

func RegisterCorrectionsWrapper(fn func(LLMClient, ConfigSnapshot) LLMClient)

RegisterCorrectionsWrapper installs the Phase 34 corrections wrapper hook. Called once from `internal/llm/corrections.init()`; the production binary picks up the registration by blank-importing the corrections package.

The hook signature mirrors `corrections.Wrap` — given the inner `LLMClient` (the safety wrapper) and the config snapshot, returns the corrections-wrapped client.

Re-registering panics — the registration model is write-once-at- init and a duplicate signals a build misconfig.

func RegisterDefaultOutputModeResolver

func RegisterDefaultOutputModeResolver(fn func(model string) OutputMode)

RegisterDefaultOutputModeResolver installs the per-known-provider `OutputMode` resolver from `internal/llm/corrections`. Called once from `corrections.init()`; the production binary blank-imports the corrections package so the registration fires at boot. Re-registering panics — write-once-at-init.

func RegisterDowngradeWrapper

func RegisterDowngradeWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterDowngradeWrapper installs the Phase 35 structured-output downgrade wrapper hook. Called once from `internal/llm/output.init()`; the production binary blank-imports `internal/llm/output` so the registration fires at boot.

The hook receives the inner `LLMClient` (typically `corrections(safety(driver))`), the config snapshot, and the Deps so the wrapper can emit events on the shared bus.

Re-registering panics — write-once-at-init.

func RegisterGovernanceWrapper

func RegisterGovernanceWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterGovernanceWrapper installs the Phase 36a/36b governance wrapper hook. Called once from `internal/governance.init()`; the production binary blank-imports the package so the hook lands at boot. Governance composes OUTSIDE the entire downstream chain (D-043 + D-044) — the wrapper sits at the outermost layer in `Open` so `PreCall` fires before retry / downgrade / corrections / safety even reach the driver.

The hook receives the inner `LLMClient` (typically `retry(downgrade(corrections(safety(driver))))`), the config snapshot, and the Deps so the wrapper can build its Subsystem if a factory has been registered via `governance.SetFactory`. Latent default: with no factory set, the hook returns `inner` unchanged.

Re-registering panics — write-once-at-init.

func RegisterMockModeCaptured

func RegisterMockModeCaptured(v bool)

RegisterMockModeCaptured records that the runtime booted with `HARBOR_DEV_ALLOW_MOCK=1` (D-089). It is called exactly once from `cmd/harbor/devmock.go::registerMockIfDevAllowMock` at boot — the SAME call site that prints the `[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]` stderr banner. Calling it with `true` flips the captured flag so `llm.posture` surfaces `MockMode: true`; calling it with `false` (or never calling it — the zero value) leaves the flag false.

A future PR that re-routes the dev-hatch path (e.g. promotes the env var to a CLI flag) MUST keep this call reciprocal with the banner emit — otherwise `LLMPostureResponse.MockMode` silently desyncs from the banner. The Phase 72g integration + smoke tests assert both paths fire together.

func RegisterRetryWrapper

func RegisterRetryWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterRetryWrapper installs the Phase 36 retry-with-feedback wrapper hook. Called once from `internal/llm/retry.init()`; the production binary blank-imports `internal/llm/retry`.

The hook signature mirrors `RegisterDowngradeWrapper`.

Re-registering panics — write-once-at-init.

func RegisteredDrivers

func RegisteredDrivers() []string

RegisteredDrivers returns a sorted list of driver names. Useful for boot-log emission and for surfacing in error messages.

Types

type ArtifactStub

type ArtifactStub struct {
	Ref       string     `json:"artifact_ref"`
	MIME      string     `json:"mime"`
	SizeBytes int64      `json:"size_bytes"`
	Hash      string     `json:"hash,omitempty"`
	Summary   string     `json:"summary,omitempty"`
	Fetch     *StubFetch `json:"fetch,omitempty"`
}

ArtifactStub is the model-agnostic JSON shape the LLM sees in place of heavy content during prompt assembly (RFC §6.5, D-026). The same shape is used whether the substituted content originated from a tool result, a memory turn, or a multimodal input.

Operators can override `Summary` per-producer; the rest is runtime-stamped at materialization time. The stub's JSON rendering is byte-stable across providers — no per-provider swapping.

JSON shape (omitempty on optional fields, no extra fields):

{"artifact_ref":"ref-abc-def","mime":"image/png","size_bytes":65536,
 "hash":"sha256:...","summary":"User-uploaded screenshot at turn 3",
 "fetch":{"tool":"artifact.fetch","id":"ref-abc-def"}}

func (ArtifactStub) MarshalJSON

func (s ArtifactStub) MarshalJSON() ([]byte, error)

MarshalJSON ensures the canonical render of an `ArtifactStub` — stable field order, `omitempty` honored, no extra fields. The runtime's `ObservationRenderer` and the safety-net materialization both go through this method, so producers and the LLM-side audit see byte-identical output.

Implemented explicitly (rather than relying on Go's default struct marshaling) so the contract is stable across Go version field- ordering changes.

type AudioPart

type AudioPart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
}

AudioPart is a multimodal audio input. Same supply forms as `ImagePart`; `MIME` is the audio MIME type.

type ChatMessage

type ChatMessage struct {
	Role    Role
	Content Content
	Name    *string
	// ToolCallID (Phase 107c / D-167) is the provider-assigned
	// tool-call identifier carried on RoleTool messages. Rendered
	// as the native tool-result role with matching call ID when
	// the provider supports it; falls back to user-role rendering
	// on providers without native tool-result roles.
	ToolCallID *string
	// ToolCalls (Phase 107c / D-167) is the per-message structured
	// tool-call slice carried on RoleAssistant messages that replay
	// a prior planner step's CallTool emission into the next turn's
	// thread. When non-empty, the bifrost translator emits an
	// assistant message with the provider-native `tool_calls`
	// block (OpenAI / Anthropic / Gemini all consume this shape).
	// The matching tool result is threaded back via a sibling
	// RoleTool message whose `ToolCallID` matches `ToolCalls[i].ID`.
	// Empty for every non-assistant message and for assistant
	// messages whose content is the model's final answer.
	ToolCalls []ToolCallStructured
}

ChatMessage is one entry in the chat thread.

`Content` is a sum-type: exactly one of `Text` or `Parts` is set. `Text` is the common case (text-only conversation). `Parts` is set when the message carries multimodal content. `Name` is optional — used by some providers for participant naming.

type CompleteRequest

type CompleteRequest struct {
	Model           string
	Messages        []ChatMessage
	ResponseFormat  *ResponseFormat
	Stream          bool
	OnContent       func(delta string, done bool)
	OnReasoning     func(delta string, done bool)
	Temperature     *float32
	MaxTokens       *int
	Stops           []string
	ReasoningEffort ReasoningEffort
	Extra           map[string]any
	// Validator (Phase 36) is the caller-supplied post-response
	// validation hook. When non-nil, the retry wrapper invokes it
	// after each successful `Complete`; a non-nil return triggers a
	// corrective re-ask bounded by `ModelProfile.MaxRetries`. The
	// validator is opaque to the wrapper — return any error type;
	// the wrapper truncates and includes its `Error()` in the retry
	// sub-prompt + the `llm.retry_with_feedback` event payload.
	//
	// `nil` Validator (the default) disables the retry loop entirely;
	// the wrapper is a no-op pass-through. Validators MUST be safe for
	// concurrent invocation against the same compiled artifact (the
	// wrapper itself enforces D-025; the validator runs once per call).
	Validator func(CompleteResponse) error

	// Tools (Phase 107c / D-167) is the per-turn tool catalog. When
	// nil the driver calls the provider without the tool-calling
	// block (text-only completion — preserves non-React planner
	// behavior).
	Tools []ToolDeclaration
	// ToolChoice (Phase 107c / D-167) is the per-provider tool-choice
	// passthrough. "" means "do not emit a tool_choice field"; "auto"
	// lets the provider decide; "required" forces the model to emit at
	// least one tool call; "none" suppresses tool calls entirely.
	ToolChoice string
	// ParallelToolCalls (Phase 107c / D-167) is the per-turn knob for
	// parallel function-calling (default true for supporting providers;
	// bifrost maps it per provider). The planner sets this per the
	// operator's yaml knob + the runloop executor's capability signal.
	ParallelToolCalls bool
}

CompleteRequest is the LLM-call payload. Settled in RFC §6.5; shaped by D-021 (multimodal sum-type), D-026 (safety-net invariants).

`Messages` is the chat thread — role + content only. The system / user / assistant roles are the entire vocabulary; tool-result rendering happens at the `ObservationRenderer` layer as user-role messages (RFC §6.4 + brief 07 §5).

`ResponseFormat` is an optional structured-output hint. `nil` means "plain text"; `json_object` requests provider JSON mode; `json_schema` carries a caller-supplied JSON Schema. Phase 35 owns the per-provider downgrade chain `json_schema → json_object → text`.

`Stream` + `OnContent` / `OnReasoning` cooperate: when `Stream` is true, the driver invokes the callbacks for each delta. `OnReasoning` fires only for thinking-class providers that expose a separate reasoning channel (`o1`, `o3`, `deepseek-reasoner`, etc.).

`Temperature` / `MaxTokens` / `Stops` map directly onto provider sampler controls. Pointer types (`*float32`, `*int`) distinguish "unset (use provider default)" from "set to zero".

`ReasoningEffort` is a request-level hint mapped to per-provider reasoning controls (bifrost's `ChatReasoning`). `""` means "do not touch the provider default."

`Extra` is provider-passthrough sanitized by Phase 34's correction layer. Phase 32 stores the field but does not interpret it.

type CompleteResponse

type CompleteResponse struct {
	Content   string
	ToolCalls []ToolCallStructured
	Reasoning string
	Cost      Cost
	Usage     Usage
}

CompleteResponse is the LLM-call return shape.

`Content` is the full assembled assistant message — for streaming calls the driver concatenates `OnContent` deltas into `Content` before returning. The runtime parses `Content` into a `PlannerAction` per brief 07; the LLM never emits provider-native tool calls.

`ToolCalls` (Phase 107c / D-167) carries provider-validated structured tool-call entries. When non-empty, the planner reads ToolCalls as its primary decision discriminator (native tool-calling path). Empty for text-only responses and for providers without native tool-calling support.

`Reasoning` carries the provider-side thinking trace (Anthropic extended thinking, OpenAI o-series, DeepSeek native, Gemini `thought:true` parts) normalised by the driver. It is the canonical captured trace for BOTH unary and streaming calls — distinct from the per-delta `OnReasoning` streaming callback, which exists for live UX. Empty when the provider did not surface reasoning, or when the driver does not read a reasoning channel. Reasoning is captured content, NOT replayed into prompts: the planner persists it on `trajectory.Step.ReasoningTrace` and only re-injects it when an operator opts into replay (D-148). Phase 83e (RFC §6.2 + §6.5).

`Cost` + `Usage` propagate the provider's reported figures. Governance (Phase 36a/36b) subscribes to `llm.cost.recorded` events emitted by the runtime when a `Complete` returns; the event payload re-stamps these shapes.

type CompletionChunkPayload added in v1.2.0

type CompletionChunkPayload struct {
	events.SafePayload
	Identity   identity.Quadruple
	TaskID     string
	RunID      string
	Delta      string
	Done       bool
	Kind       string
	OccurredAt time.Time
}

CompletionChunkPayload is the typed payload for EventTypeCompletionChunk (Phase 107). SafePayload — the delta is per-session operator-visible content (the LLM's own output), not a secret. Kind is "content" or "reasoning".

type ConfigSnapshot

type ConfigSnapshot struct {
	Driver               string
	ContextWindowReserve float64
	HeavyOutputThreshold int
	ModelProfiles        map[string]ModelProfile

	// DisableCorrections opts OUT of the Phase 34 per-provider
	// correction layer. Zero-value (false) = corrections enabled —
	// production callers wire `corrections.Wrap(safetyClient(driver))`
	// so quirks like NIM message reordering, OpenAI strict-schema
	// mode, thinking-class reasoning routing, Anthropic envelope
	// translation, and usage backfill all apply automatically. Tests
	// that need to exercise the safety pass in isolation set this to
	// true.
	//
	// Inverse-named so the zero-value matches the production default
	// — direct callers (tests, programmatic snapshot construction)
	// don't have to flip an extra knob to get correct behaviour. The
	// config loader resolves the operator-facing `corrections.enabled`
	// yaml field (default true) into this inverse.
	DisableCorrections bool

	// DisableDowngrade opts OUT of the Phase 35 structured-output
	// downgrade chain. Zero-value (false) = enabled. Inverse-named so
	// production callers get the right behaviour by default.
	DisableDowngrade bool

	// DisableRetry opts OUT of the Phase 36 retry-with-feedback
	// wrapper. Zero-value (false) = enabled. The wrapper is a no-op
	// when `CompleteRequest.Validator` is nil, so disabling is only
	// useful for tests that need to isolate the downgrade layer.
	DisableRetry bool

	// DisableGovernance opts OUT of the Phase 36a/36b governance
	// wrapper. Zero-value (false) = enabled — but the wrapper is also
	// a no-op pass-through when no `governance.Factory` has been
	// registered, so the latent default (Wave 7b scoping) requires
	// neither flag flip nor factory wiring. Tests that want to bypass
	// even a registered factory flip this true.
	DisableGovernance bool

	// Bifrost-driver knobs (Phase 33).
	Provider string
	Model    string
	APIKey   string
	BaseURL  string
	Timeout  time.Duration

	// CustomProviders is the operator-declared registry of
	// OpenAI-compatible providers (Phase 33a). When `Provider`
	// matches a custom entry's `Name`, the entry's `BaseURL` /
	// `APIKeyEnvVar` / `Models` / network knobs apply (legacy
	// `APIKey` / `BaseURL` / `Timeout` ignored for that case). The
	// list is keyed only by `Name`; the bifrost driver iterates and
	// registers all entries with bifrost's `Account`.
	CustomProviders []CustomProviderSpec

	// NetworkDefaults applies to every provider when the per-provider
	// override is absent. Zero-valued fields fall through to
	// bifrost's package-level defaults at construction. Restart-
	// required.
	NetworkDefaults NetworkDefaults
}

ConfigSnapshot is the strict subset of `config.LLMConfig` the LLM package consumes. Keeping a snapshot decouples drivers from the config package's type evolution (mirrors `internal/memory`'s pattern).

  • `Driver` selects the §4.4 factory. Empty defaults to `DefaultDriver` (Phase 32 = "mock"; Phase 33 will leave the default explicit at the caller — operator must opt-in to `bifrost`).
  • `ContextWindowReserve` is the safety-net token-budget margin (default 0.05 / 5%). Range [0.0, 1.0); validated at the config layer + at construction.
  • `HeavyOutputThreshold` mirrors `config.ArtifactsConfig.HeavyOutputThresholdBytes` so the LLM package does not re-import the artifact-config struct. Default 32 KiB.
  • `ModelProfiles` is keyed by canonical model name. The safety net's token-budget guard requires a profile entry for the model in the `CompleteRequest`; missing → `ErrUnsupportedModel`.

`Provider` / `Model` / `APIKey` / `BaseURL` / `Timeout` are the Phase-33 bifrost-driver knobs. Phase 32 stores them so the snapshot's shape is stable across phases; the mock driver ignores them. Phase 33's bifrost driver will read them.

type Content

type Content struct {
	Text  *string
	Parts []ContentPart
}

Content is the multimodal sum-type. Exactly one of `Text` or `Parts` must be set; both-set and both-nil are invalid and rejected by the safety net with `ErrInvalidContent`.

type ContentPart

type ContentPart struct {
	Type  PartType
	Text  string     // when Type == PartText
	Image *ImagePart // when Type == PartImage
	Audio *AudioPart // when Type == PartAudio
	File  *FilePart  // when Type == PartFile
}

ContentPart is one element of a multimodal `Content.Parts` slice. Exactly one of `Text` / `Image` / `Audio` / `File` is set per the `Type` discriminator.

type ContextLeakPayload

type ContextLeakPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	LeakSite   string
	SizeBytes  int64
	Threshold  int
	OccurredAt time.Time
}

ContextLeakPayload is the typed payload for EventTypeContextLeak. SafePayload — the leak-site identifier (a short structural fingerprint like "Messages[2].Content.Text") is operator-visible debug data, not secret-shaped.

`SizeBytes` is the size of the offending payload; `Threshold` is the runtime's configured heavy-output threshold at the time of the emit, so an operator can correlate config-change-time drift.

type ContextWindowExceededPayload

type ContextWindowExceededPayload struct {
	events.SafeSealed
	Identity             identity.Quadruple
	Model                string
	EstimatedTokens      int
	ContextWindowTokens  int
	ContextWindowReserve float64
	OccurredAt           time.Time
}

ContextWindowExceededPayload is the typed payload for EventTypeContextWindowExceeded. SafePayload — token counts + configured cap are operator-visible.

type CorrectionsProfile

type CorrectionsProfile struct {
	// MessageOrdering controls how the request's chat-message slice
	// is reordered before reaching the driver. Default (zero value)
	// passes the slice through unchanged.
	MessageOrdering MessageOrderingPolicy
	// SchemaMode controls how the request's `ResponseFormat.JSONSchema`
	// bytes are mutated before reaching the driver. Default passes
	// the schema through unchanged.
	SchemaMode SchemaSanitizationMode
	// ReasoningEffortRouting controls whether `req.ReasoningEffort` is
	// translated to a provider-specific `Extra` key (thinking-class
	// models) or passed through as the top-level field (default).
	ReasoningEffortRouting ReasoningRouting
	// ResponseFormatShape controls the wire-shape translation of
	// `req.ResponseFormat`. Default emits the OpenAI envelope; other
	// values translate to per-provider envelopes (Anthropic tool-
	// schema, `json_only` for providers that reject `json_schema`).
	ResponseFormatShape ResponseFormatProfile
	// UsageBackfillEnabled, when true, makes the corrections layer
	// compute synthetic token counts (and, if `CostOverrides` is set,
	// synthetic costs) when the driver returns an all-zeros `Usage`.
	// Default false — the response surfaces zeros verbatim.
	UsageBackfillEnabled bool
}

CorrectionsProfile carries the per-model quirk flags the Phase 34 `internal/llm/corrections` layer dispatches on. The types live in the `llm` package so the corrections sub-package can consume them without an import cycle (logic lives in `internal/llm/corrections`).

Zero-valued struct means "no quirks declared for this model"; the corrections pass treats each field's zero value as the Harbor-default behaviour (no reorder, no schema mutation, OpenAI-style envelopes, usage backfill off).

Per RFC §6.5 + brief 03 §4: this is the operator-controlled surface for adapting Harbor's neutral `CompleteRequest` shape to per-provider expectations. The corrections layer is the ONLY consumer.

type Cost

type Cost struct {
	InputTokensCost     float64
	OutputTokensCost    float64
	ReasoningTokensCost float64
	TotalCost           float64
	Currency            string // "USD" canonical; reserved for future multi-currency
}

Cost is the provider-reported cost breakdown. Values are USD. Fields are zero when the provider doesn't report a category.

Governance (Phase 36a) subscribes to `llm.cost.recorded` events to drive per-identity accumulators; Phase 36a's payload re-stamps these fields.

type CostRecordedPayload

type CostRecordedPayload struct {
	events.SafeSealed
	Identity identity.Quadruple
	Model    string
	Cost     Cost
	Usage    Usage
	// ContextWindowTokens is the model's input-token window (from the
	// model profile), stamped so the Console can render context-used vs
	// window (%). Zero when the model has no profile / configured window.
	ContextWindowTokens int
	OccurredAt          time.Time
}

CostRecordedPayload is the typed payload for EventTypeCostRecorded. SafePayload — cost / token counts are operator-visible. Phase 36a subscribes for per-identity accumulator updates.

type CostTable

type CostTable struct {
	InputPer1M     float64
	OutputPer1M    float64
	ReasoningPer1M float64
	Currency       string // "USD" canonical
}

CostTable carries fallback per-1M-token rates. Used when the provider's response doesn't include cost. Phase 36a consumes.

type CustomProviderSpec

type CustomProviderSpec struct {
	Name                 string
	BaseURL              string
	APIKeyEnvVar         string
	Models               []string
	BaseProviderType     string
	Timeout              time.Duration
	MaxRetries           int
	RetryBackoffInitial  time.Duration
	RetryBackoffMax      time.Duration
	Concurrency          int
	BufferSize           int
	RequestPathOverrides map[string]string
}

CustomProviderSpec is one operator-declared OpenAI-compatible provider (Phase 33a). The bifrost driver maps each entry to a `bfschemas.ProviderConfig` with `CustomProviderConfig.BaseProviderType = schemas.OpenAI`. Zero-valued network knobs fall through to `ConfigSnapshot.NetworkDefaults`, which itself falls through to bifrost's package-level defaults.

`APIKeyEnvVar` is the environment-variable NAME (no `env.` prefix); the driver resolves `os.Getenv(name)` at construction. Missing → `ErrMissingAPIKey` with the env var named.

`RequestPathOverrides` maps `bfschemas.RequestType` (string-coded at this layer to avoid the import) to a custom URL path; the bifrost driver translates the keys when wiring the config. Used for OpenAI-compatible endpoints that host e.g. `/chat/completions` at the root.

type Deps

type Deps struct {
	Artifacts artifacts.ArtifactStore
	Bus       events.EventBus
}

Deps carries the runtime dependencies the LLM client subsystem consumes. Both are mandatory — fail-loudly at construction.

  • `Artifacts` is the auto-materialize target (D-022). Inline `DataURL` content above the heavy-output threshold is rewritten as an `Artifact` whose bytes live in the store.
  • `Bus` is the canonical event bus. The safety pass publishes `llm.image.materialized` / `llm.context_leak` / `llm.context_window_exceeded`; the request-emit path (Phase 36a subscriber lands here) publishes `llm.cost.recorded`.

The package does NOT depend on `state.StateStore` — the LLM client is stateless across calls (D-025).

type Driver

type Driver interface {
	// Complete receives a `CompleteRequest` whose messages have
	// ALREADY passed the safety net (`enforceContextSafety`): no raw
	// heavy content survived, the token-budget guard fired or
	// passed, oversize `DataURL` content has been materialized to
	// `Artifact` form. The driver translates the request into its
	// provider's wire shape and returns the typed response.
	Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)
	// Close mirrors `LLMClient.Close`. Idempotent; second call is a
	// no-op (returns nil).
	Close(ctx context.Context) error
}

Driver is the unexported-by-naming surface every concrete driver implements. Identical shape to `LLMClient` minus the contract that the safety net has already run. `Open` wraps a `Driver` in a `safetyClient` so the safety pass is mandatory by construction.

Driver authors implement this; callers consume `LLMClient`.

type Factory

type Factory func(cfg ConfigSnapshot, deps Deps) (Driver, error)

Factory builds a `Driver` from a `ConfigSnapshot` + `Deps`. Drivers expose one `Factory` each via `init()` → `Register`.

type FilePart

type FilePart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
	Filename string
}

FilePart is a multimodal file input. Same supply forms as `ImagePart`; `MIME` is the document MIME type. `Filename` is a hint shown to the model when the provider supports it.

type ImageMaterializedPayload

type ImageMaterializedPayload struct {
	events.SafeSealed
	Identity    identity.Quadruple
	Model       string
	ArtifactRef string
	MIME        string
	SizeBytes   int64
	OccurredAt  time.Time
}

ImageMaterializedPayload is the typed payload for EventTypeImageMaterialized. SafePayload — the artifact ref, MIME type, and size are operator-visible content metadata, not secrets.

type ImagePart

type ImagePart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
	Detail   string
}

ImagePart is a multimodal image input.

Exactly one of `URL` / `DataURL` / `Artifact` is set. `URL` is a provider-fetchable remote URL. `DataURL` is an inline `data:image/...;base64,...` payload — above the heavy-output threshold the runtime materializes it to `Artifact`. `Artifact` is the canonical Harbor reference (D-022).

`MIME` is the image MIME type (`image/jpeg`, `image/png`, `image/webp`, ...). `Detail` is a provider hint (`low` / `high` / `auto`); empty string means "use provider default."

type LLMClient

type LLMClient interface {
	Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)
	// Close releases driver-held resources (HTTP connection pools,
	// background goroutines). Subsequent calls return ErrClientClosed.
	// Implementations MUST honour ctx during long teardowns.
	Close(ctx context.Context) error
}

LLMClient is the single contract callers depend on. ONE method. Streaming is signalled via `req.Stream` + `req.OnContent` / `req.OnReasoning`; cancellation flows through `ctx`. The runtime owns prompt construction, tool semantics, parsing, and parallel dispatch — see RFC §6.4 + brief 07.

Implementations MUST be safe for N concurrent goroutines against a single shared instance (D-025).

func Open

func Open(_ context.Context, cfg ConfigSnapshot, deps Deps) (LLMClient, error)

Open returns the `LLMClient` built by the factory whose name matches `cfg.Driver` (defaults to `DefaultDriver` when empty).

Identity is mandatory at every method on the returned client; the safety pass enforces. Deps are validated at construction — `nil Artifacts` / `nil Bus` return wrapped errors immediately.

The returned client is a `*safetyClient` wrapping the registered driver: every `Complete` runs through `enforceContextSafety` BEFORE the driver sees the request. This is mandatory by construction — drivers cannot bypass it through the registry path.

type MessageOrderingPolicy

type MessageOrderingPolicy string

MessageOrderingPolicy enumerates the message-reordering modes the Phase 34 corrections layer supports. Operator-set in `ModelProfile.Corrections.MessageOrdering`.

const (
	// OrderingDefault passes the message slice through unchanged.
	OrderingDefault MessageOrderingPolicy = ""
	// OrderingSystemFirstStrict collapses all system-role messages
	// to the front of the slice and emits an alternating
	// user/assistant tail. Required by NIM and some OpenAI-compatible
	// proxies that reject mid-thread `system` messages (brief 03 §4).
	OrderingSystemFirstStrict MessageOrderingPolicy = "system_first_strict"
)

type ModeDowngradedPayload

type ModeDowngradedPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	FromMode   OutputMode
	ToMode     OutputMode
	From       ResponseFormatKind
	To         ResponseFormatKind
	Reason     string
	OccurredAt time.Time
}

ModeDowngradedPayload is the typed payload for EventTypeModeDowngraded. Phase 35 fills the From/To/Reason fields. `FromMode` / `ToMode` carry the Harbor-side `OutputMode` (Native / Tools / Prompted / text); `From` / `To` carry the resolved `ResponseFormatKind` for backward visibility.

type ModelProfile

type ModelProfile struct {
	// ContextWindowTokens is the model's hard input-token cap.
	// Required (> 0); the safety net's token-budget guard uses it.
	ContextWindowTokens int
	// TokenEstimator selects the estimator the safety net runs.
	// "" / "chars_div_4" — default chars/4 + role-overhead.
	// Phase 33+ may register tiktoken-equivalent estimators by name.
	TokenEstimator string
	// JSONSchemaMode — Phase 32-era placeholder; the config loader
	// normalises this string into `OutputMode` at snapshot time
	// (Phase 35). Direct callers SHOULD set `OutputMode`; this field
	// is read only when `OutputMode` is `OutputModeUnset`.
	JSONSchemaMode string
	// OutputMode (Phase 35) — Harbor-side structured-output strategy.
	// Drives the request-shaping in `internal/llm/output` and the
	// downgrade chain. See `OutputMode` constants for semantics.
	// Zero value (`OutputModeUnset`) falls back to the per-known-
	// provider default (see `corrections.DefaultOutputModeFor`).
	OutputMode OutputMode
	// DefaultMaxTokens — Phase 36b's identity-tier override target.
	DefaultMaxTokens *int
	// ReasoningEffort — request-level default applied by the
	// corrections layer (`corrections.Complete`) when the caller left
	// `CompleteRequest.ReasoningEffort` empty; an explicit per-call
	// value overrides it. Maps to the provider reasoning param (bifrost
	// `ChatReasoning.Effort`, or `Extra["reasoning_effort"]` under
	// `ReasoningRouteThinking`).
	ReasoningEffort ReasoningEffort
	// CostOverrides — per-1M-token rates when the provider doesn't
	// report cost (some OpenRouter routes don't). Phase 36a reads.
	CostOverrides *CostTable
	// Corrections — per-provider quirk flags consumed by the Phase 34
	// `internal/llm/corrections` layer. Zero-valued struct means
	// "no corrections needed for this model"; the corrections layer
	// runs a no-op pass for default-shaped profiles.
	Corrections CorrectionsProfile
	// MaxRetries (Phase 36) — caps the validator-driven corrective
	// re-asks performed by the retry wrapper. Zero (default) maps to
	// `DefaultMaxRetries` (1). A negative value is rejected at config
	// validation.
	MaxRetries int
}

ModelProfile carries per-model knobs. Keyed by canonical model name in `LLMConfig.ModelProfiles`. Phase 32 ships the shape + `ContextWindowTokens` + `TokenEstimator` consumers; Phase 33+ consume the rest.

type NetworkDefaults

type NetworkDefaults struct {
	Timeout             time.Duration
	MaxRetries          int
	RetryBackoffInitial time.Duration
	RetryBackoffMax     time.Duration
	Concurrency         int
	BufferSize          int
}

NetworkDefaults are the operator-tunable defaults bifrost applies to every provider (native + custom) when the per-provider override is absent (Phase 33a). Zero-valued fields fall through to bifrost's package-level defaults.

type OutputMode

type OutputMode string

OutputMode selects the request-shaping strategy for structured output (Phase 35; RFC §6.5). Three modes:

  • `OutputModeNative` — pass `FormatJSONSchema` through unchanged. The provider validates against the schema natively. Default for OpenAI / Anthropic / Google.

  • `OutputModeTools` — encode the schema as a *Harbor-side prompted* envelope where the LLM is asked to emit `{"name":"respond_with","arguments":{...}}` as plain output. The runtime parses that locally. Used as a fallback for providers without native `json_schema` support.

    IMPORTANT: this is NOT a passthrough to provider-native tool-calling APIs (`tools=` / `tool_choice=` / `function_call` / `tool_use`). Harbor's runtime owns tool dispatch (RFC §6.4 / brief 07); `OutputModeTools` is purely a prompted-output technique. The static guard in `scripts/smoke/phase-35.sh` enforces this boundary.

  • `OutputModePrompted` — coerce `FormatJSONObject` and inline the schema as a system-prompt instruction. The LLM-side parse is "produce a JSON object matching this schema." Default for NIM / custom OpenAI-compatible / deepseek-reasoner.

The downgrade chain runs `current → next` on `IsInvalidJSONSchemaError` failures, bounded at 3 total attempts (initial + 2 downgrades).

const (
	// OutputModeUnset is the zero value — operator did not declare the
	// mode. The downgrade wrapper applies the per-model-prefix default
	// (see `internal/llm/corrections.DefaultOutputModeFor`).
	OutputModeUnset OutputMode = ""
	// OutputModeNative — pass `FormatJSONSchema` through. Provider
	// enforces strict schema mode.
	OutputModeNative OutputMode = "native"
	// OutputModeTools — Harbor-side prompted envelope. NOT provider
	// tool-calling APIs.
	OutputModeTools OutputMode = "tools"
	// OutputModePrompted — `FormatJSONObject` + schema in system prompt.
	OutputModePrompted OutputMode = "prompted"
)

type PartType

type PartType string

PartType discriminates a `ContentPart`.

const (
	PartText  PartType = "text"
	PartImage PartType = "image"
	PartAudio PartType = "audio"
	PartFile  PartType = "file"
)

The PartType values, one per multimodal content shape.

type PostureProvider

type PostureProvider struct {
	// contains filtered or unexported fields
}

PostureProvider is the Phase 72g read-only accessor over the runtime's bound LLM configuration. Built once per Runtime process via NewPostureProvider; `Posture` is safe for concurrent use by N goroutines (D-025).

func NewPostureProvider

func NewPostureProvider(cfg ConfigSnapshot) *PostureProvider

NewPostureProvider builds a PostureProvider over the LLM ConfigSnapshot the binary resolved at boot. The provider / model / region are read from the snapshot and frozen at construction; the `MockMode` flag is NOT taken from the snapshot — it is read live (but race-free) from the boot-captured atomic, so the posture surface reflects D-089's single capture-path source.

When the snapshot's `Driver` field is empty it is normalised to `DefaultDriver` ("bifrost") — the same default `Open` applies — so the posture surface never reports an empty provider for a default-driver boot.

func (*PostureProvider) Posture

Posture returns the read-only PostureSnapshot of the runtime's bound LLM provider for the caller. The `ctx` is accepted for signature symmetry with `governance.PostureProvider.Posture` and so a future per-tenant LLM-routing model can scope the read; V1 ships a single provider per Harbor instance (RFC §6.15 + D-088), so the snapshot is identity-independent at this layer — the Protocol handler is the identity-mandatory gate.

`MockMode` is read from the boot-captured atomic (D-089) — NOT from an `os.Getenv` re-read.

type PostureReadAdminPayload

type PostureReadAdminPayload struct {
	events.SafeSealed
	// Actor is the identity of the admin-scoped caller that performed
	// the cross-tenant read.
	Actor identity.Quadruple
	// RequestedTenant is the tenant_id the caller asked to read — a
	// tenant other than the caller's own.
	RequestedTenant string
}

PostureReadAdminPayload is the typed payload for EventTypePostureReadAdmin (Phase 72g). SafePayload — the actor's identity and the requested tenant are operator-visible audit metadata, not secret-shaped. NEVER carries provider API keys — the posture surface reports provider/model/region only. The payload runs through the audit Redactor before the bus publish (CLAUDE.md §7).

type PostureSnapshot

type PostureSnapshot struct {
	// Provider is the LLM provider name (e.g. "bifrost", "mock").
	Provider string
	// Model is the bound model identifier (e.g. "openai/gpt-5.3-chat").
	Model string
	// Region is the provider endpoint region; "" when not applicable.
	Region string
	// MockMode is true iff the runtime booted with HARBOR_DEV_ALLOW_MOCK=1
	// (D-089). Captured at boot via RegisterMockModeCaptured.
	MockMode bool
}

PostureSnapshot is the read-only view of the runtime's bound LLM provider. It is the source the `llm.posture` Protocol handler projects onto the `LLMPostureResponse` wire type.

type ReasoningEffort

type ReasoningEffort string

ReasoningEffort hints at provider-side thinking budget. Empty string means "use provider default" (DO NOT touch the request).

const (
	ReasoningOff    ReasoningEffort = "off"
	ReasoningLow    ReasoningEffort = "low"
	ReasoningMedium ReasoningEffort = "medium"
	ReasoningHigh   ReasoningEffort = "high"
)

The ReasoningEffort levels, ascending. The empty string (not listed here) means "use the provider default".

type ReasoningRouting

type ReasoningRouting string

ReasoningRouting enumerates the `ReasoningEffort` routing modes the Phase 34 corrections layer supports. Operator-set in `ModelProfile.Corrections.ReasoningEffortRouting`.

const (
	// ReasoningRouteDefault passes the top-level
	// `req.ReasoningEffort` through to the driver unchanged.
	// Bifrost's `ChatReasoning.Effort` field consumes it.
	ReasoningRouteDefault ReasoningRouting = ""
	// ReasoningRouteThinking moves the effort hint from the
	// top-level field into `req.Extra["reasoning_effort"]`.
	// Thinking-class models (`o1`, `o3`, `deepseek-reasoner`)
	// interpret the hint via a provider-specific path that bifrost
	// passes through opaquely. The top-level field is cleared so the
	// regular reasoning channel is not used.
	ReasoningRouteThinking ReasoningRouting = "thinking_model"
)

type ResponseFormat

type ResponseFormat struct {
	Kind       ResponseFormatKind
	JSONSchema json.RawMessage
}

ResponseFormat is the optional structured-output hint on `CompleteRequest`. `nil` means "plain text" (equivalent to `Kind: FormatText`).

Phase 35 owns the per-provider downgrade chain `json_schema → json_object → text` on `invalid_json_schema` errors; Phase 32 stores the field and the safety-net pass treats the JSON schema bytes as opaque metadata (no token-estimate contribution).

type ResponseFormatKind

type ResponseFormatKind string

ResponseFormatKind discriminates a `ResponseFormat`.

const (
	// FormatText — no structured-output constraint. Default when
	// `CompleteRequest.ResponseFormat` is nil.
	FormatText ResponseFormatKind = "text"
	// FormatJSONObject — provider's "JSON mode" (free-form JSON).
	FormatJSONObject ResponseFormatKind = "json_object"
	// FormatJSONSchema — caller-supplied JSON Schema (strict mode
	// when the provider exposes it).
	FormatJSONSchema ResponseFormatKind = "json_schema"
)

type ResponseFormatProfile

type ResponseFormatProfile string

ResponseFormatProfile enumerates the `response_format` envelope shapes the Phase 34 corrections layer can emit. Operator-set in `ModelProfile.Corrections.ResponseFormatShape`.

const (
	// ResponseFormatOpenAI emits the OpenAI envelope —
	// `{"type":"json_object"}` for `FormatJSONObject` and
	// `{"type":"json_schema","json_schema":{...}}` for
	// `FormatJSONSchema`. This is the default; bifrost's
	// `translateResponseFormat` already produces this shape, so a
	// `default`-profile model is a no-op in the corrections layer.
	ResponseFormatOpenAI ResponseFormatProfile = ""
	// ResponseFormatJSONOnly downgrades `FormatJSONSchema` to
	// `FormatJSONObject`. Used for providers that don't support
	// `json_schema` natively (e.g. some OpenRouter routes); the
	// schema is preserved as `Extra["schema_hint"]` so a prompted
	// fallback can reference it.
	ResponseFormatJSONOnly ResponseFormatProfile = "json_only"
	// ResponseFormatAnthropic packages the schema into Anthropic's
	// tool-schema-style envelope, surfaced in
	// `req.Extra["anthropic_tool_schema"]`. Phase 33's bifrost
	// driver passes `Extra` opaquely; the Anthropic provider
	// converter consumes the key (or future Phase 35 logic does).
	ResponseFormatAnthropic ResponseFormatProfile = "anthropic"
)

type RetryWithFeedbackPayload

type RetryWithFeedbackPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	Attempt    int
	MaxRetries int
	Reason     string
	OccurredAt time.Time
}

RetryWithFeedbackPayload (Phase 36) is the typed payload for EventTypeRetryWithFeedback. SafePayload — `Attempt` is the 1-based retry index (1 = first re-ask after the original); `Reason` is the validator's truncated `Error()` string. The wrapper truncates Reason at 256 characters to keep audit payloads bounded.

type Role

type Role string

Role is the chat-message role. Settled at the four canonical values; `RoleTool` is the in-Harbor convention for the user-role rendering of tool observations (brief 07 §5 — the rendering itself happens at `ObservationRenderer`, not here; this constant exists so callers that construct an explicit user-message describing a tool result can label it for clarity).

const (
	RoleSystem    Role = "system"
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	// RoleTool — semantically a user-role observation; reserved so
	// downstream tooling (Console traces, audit logs) can distinguish.
	RoleTool Role = "tool"
)

The Role values for a chat message.

type SchemaSanitizationMode

type SchemaSanitizationMode string

SchemaSanitizationMode enumerates the JSON-Schema-mutation modes the Phase 34 `SchemaSanitizer` supports. Operator-set in `ModelProfile.Corrections.SchemaMode`.

const (
	// SchemaDefault passes the operator-supplied schema through
	// unchanged.
	SchemaDefault SchemaSanitizationMode = ""
	// SchemaOpenAIStrict adds `additionalProperties:false` and
	// `strict:true` at every nested object schema. OpenAI's
	// structured-output mode requires both fields; most schemas
	// produced by `tools.RegisterFunc[I, O]` omit them.
	SchemaOpenAIStrict SchemaSanitizationMode = "openai_strict"
	// SchemaPermissive strips `additionalProperties` and `strict`
	// fields wherever they appear. Some providers reject those keys.
	SchemaPermissive SchemaSanitizationMode = "permissive"
)

type StubFetch

type StubFetch struct {
	Tool string `json:"tool"`
	ID   string `json:"id"`
}

StubFetch is the optional pointer-to-tool hint on an `ArtifactStub`. When set, an LLM that wants the bytes knows which Harbor tool to call (and with which artifact ID).

type ToolCallStructured added in v1.2.0

type ToolCallStructured struct {
	ID    string
	Name  string
	Args  json.RawMessage
	Index uint16
}

ToolCallStructured is a provider-validated tool-call entry (Phase 107c / D-167). Carries the provider-assigned call ID (round-trips on `ChatMessage.ToolCallID` when the result is threaded back into the next turn), the tool name (matches `tools.Tool.Name`), and provider-validated JSON args.

`Index` is the per-response position of this tool call (0-based) and is the load-bearing discriminator for streaming-delta assembly: per the OpenAI streaming spec, tool-call args arrive across multiple SSE chunks. The first delta carries `ID + Name`; subsequent deltas for the SAME tool call carry empty ID + null Name and an args FRAGMENT to be concatenated onto the prior args. The drivers key on Index to merge fragments correctly; without it, providers like Amazon Bedrock (which streams args one short fragment at a time) produce a trajectory full of half-built ToolCalls. Defaults to 0 for non-streaming responses + tests; the driver layer is the source of truth.

type ToolDeclaration added in v1.2.0

type ToolDeclaration struct {
	Name        string
	Description string
	Schema      json.RawMessage
}

ToolDeclaration is the per-turn tool declarator the LLM sees (Phase 107c / D-167). Carries the tool name, operator-facing description, and the args JSON Schema.

type Usage

type Usage struct {
	PromptTokens     int
	CompletionTokens int
	ReasoningTokens  int
	TotalTokens      int
	LatencyMS        int64
	// ProviderExtras — opaque provider-specific bag (e.g. cache
	// hit/miss). Phase 32 does not interpret these fields; Phase 34+
	// may read them for correction-layer decisions.
	ProviderExtras map[string]string
}

Usage is the provider-reported token usage.

Directories

Path Synopsis
Package corrections is Harbor's provider correction layer (Phase 34 — RFC §6.5).
Package corrections is Harbor's provider correction layer (Phase 34 — RFC §6.5).
drivers
bifrost
Package bifrost is Harbor's bifrost-backed LLM driver.
Package bifrost is Harbor's bifrost-backed LLM driver.
Package mock is Harbor's test-grade LLM driver.
Package mock is Harbor's test-grade LLM driver.
Package output is Harbor's structured-output strategy + downgrade chain (Phase 35 — RFC §6.5).
Package output is Harbor's structured-output strategy + downgrade chain (Phase 35 — RFC §6.5).
Package retry is Harbor's retry-with-feedback wrapper (Phase 36 — RFC §6.5).
Package retry is Harbor's retry-with-feedback wrapper (Phase 36 — RFC §6.5).
Package summarizer is Harbor's production LLM-backed `memory.Summarizer` — the §13 "test stubs as production defaults" amendment closure for the memory subsystem's `rolling_summary` strategy.
Package summarizer is Harbor's production LLM-backed `memory.Summarizer` — the §13 "test stubs as production defaults" amendment closure for the memory subsystem's `rolling_summary` strategy.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL