llm

package

v1.3.1 Latest Latest Go to latest Published: Jun 11, 2026 License: Apache-2.0 Imports: 17 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/hurtener/Harbor

Links

Open Source Insights

Documentation ¶

Overview ¶

Package llm defines Harbor's LLM-client interface and the runtime-wide invariants that guard every `Complete` call.

The interface is **one method**, `Complete(ctx, req) (resp, error)` (RFC §6.5). Tool dispatch is the runtime's job (RFC §6.4 + brief 07 "code-level tool calling"); the LLM client is reduced to a JSON- producing chat-completion adapter. Provider-native tool-calling shapes (the `tools=` request parameter, the `tool_choice=` mode selector, OpenAI's `function_call`, Anthropic's `tool_use` blocks, Gemini's function-calling protocol, etc.) never appear in this package — the static guard in `scripts/smoke/phase-32.sh` enforces the boundary by greppping for the canonical symbol names.

The message envelope is provider-agnostic: `ChatMessage.Content` is a sum-type that carries either `Text *string` (the common case) or `Parts []ContentPart` for multimodal input (D-021). Multimodal parts (`ImagePart`, `AudioPart`, `FilePart`) each carry one of three supply forms — `URL`, `DataURL`, or `Artifact` — and the runtime auto-materializes inline `DataURL` content above the heavy-output threshold into `ArtifactRef`s before persistence and emit (D-022).

**Context-window safety net (D-026).** Every `Complete` call routes through a catch-all pass at the LLM-client edge that (a) auto- materializes oversize `DataURL` content, (b) asserts no raw heavy content survived ANY producer's normalization step (else `ErrContextLeak`), (c) estimates token usage against the configured `ModelProfile.ContextWindowTokens` cap and fails with `ErrContextWindowExceeded` when the estimate is within `ContextWindowReserve` of the cap. **V1 fails loudly**; auto-cascading recovery is post-V1 work.

The safety pass is **mandatory by construction**: `Open` returns a wrapped client (`safetyClient`) that runs the pass before delegating to the underlying `Driver`. Drivers cannot bypass the pass through the registry; a hand-constructed `Driver` would likewise have to compose `enforceContextSafety` to maintain the runtime invariant.

Concurrent-reuse contract (D-025): one `LLMClient` is safe to share across N concurrent goroutines. Mutable state on the client (or the `Driver`) is forbidden; per-call state lives in `ctx` and the request value. The package-level `concurrent_test.go` pins this with N=128 invocations under `-race`.

Index ¶

Constants
Variables
func HasIdentity(ctx context.Context) bool
func IsInvalidJSONSchemaError(err error) bool
func NewChunkPublisher(bus events.EventBus, q identity.Quadruple, taskID string, logger *slog.Logger) func(delta string, done bool, kind string)
func NewChunkPublisherContext(baseCtx context.Context, bus events.EventBus, q identity.Quadruple, ...) func(delta string, done bool, kind string)
func Register(name string, factory Factory)
func RegisterCorrectionsWrapper(fn func(LLMClient, ConfigSnapshot) LLMClient)
func RegisterDefaultOutputModeResolver(fn func(model string) OutputMode)
func RegisterDowngradeWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)
func RegisterGovernanceWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)
func RegisterMockModeCaptured(v bool)
func RegisterRetryWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)
func RegisteredDrivers() []string
type ArtifactStub
- func (s ArtifactStub) MarshalJSON() ([]byte, error)
type AudioPart
type ChatMessage
type CompleteRequest
type CompleteResponse
type CompletionChunkPayload
type ConfigSnapshot
- func SnapshotFromConfig(cfg config.LLMConfig, art config.ArtifactsConfig) ConfigSnapshot
type Content
type ContentPart
type ContextLeakPayload
type ContextWindowExceededPayload
type CorrectionsProfile
type Cost
type CostRecordedPayload
type CostTable
type CustomProviderSpec
type Deps
type Driver
type Factory
type FilePart
type ImageMaterializedPayload
type ImagePart
type LLMClient
- func Open(_ context.Context, cfg ConfigSnapshot, deps Deps) (LLMClient, error)
type MessageOrderingPolicy
type ModeDowngradedPayload
type ModelProfile
type NetworkDefaults
type OutputMode
type PartType
type PostureProvider
- func NewPostureProvider(cfg ConfigSnapshot) *PostureProvider
- func (p *PostureProvider) Posture(_ context.Context) (PostureSnapshot, error)
type PostureReadAdminPayload
type PostureSnapshot
type ReasoningEffort
type ReasoningRouting
type ResponseFormat
type ResponseFormatKind
type ResponseFormatProfile
type RetryWithFeedbackPayload
type Role
type SchemaSanitizationMode
type StubFetch
type ToolCallStructured
type ToolDeclaration
type Usage

Constants ¶

View Source

const (
	// EventTypeImageMaterialized — emitted when the safety-pass's
	// auto-materialize step rewrites an inline DataURL ≥ heavy-output
	// threshold to an ArtifactRef (D-022). Carries the source
	// CompleteRequest's model name + the new ref's id + size.
	EventTypeImageMaterialized events.EventType = "llm.image.materialized"
	// EventTypeContextLeak — emitted when the safety-pass detects
	// raw heavy content that survived every upstream producer's
	// normalization step (D-026 violation). The bus event lets
	// operators trace the offending producer.
	EventTypeContextLeak events.EventType = "llm.context_leak"
	// EventTypeContextWindowExceeded — emitted when the safety-pass
	// token-budget guard fires (D-026). Payload carries the
	// estimated token count + the model's cap + the reserve
	// fraction so operators can quantify how often planner-side
	// recovery (truncate / summarize) needs to engage.
	EventTypeContextWindowExceeded events.EventType = "llm.context_window_exceeded"
	// EventTypeCostRecorded — emitted by the runtime AFTER a
	// successful Complete. Phase 36a (governance accumulator)
	// subscribes; Phase 32 registers the type + ships the payload
	// shape so Phase 36a's emit site lands clean.
	EventTypeCostRecorded events.EventType = "llm.cost.recorded"
	// EventTypeModeDowngraded — emitted by Phase 35's structured-
	// output downgrade chain (`json_schema → json_object → text`).
	// Phase 32 registers the type as a forward-compat seam; no
	// downgrade logic ships in Phase 32.
	EventTypeModeDowngraded events.EventType = "llm.mode_downgraded"
	// EventTypeRetryWithFeedback (Phase 36) — emitted by the retry
	// wrapper per corrective re-ask. Carries the attempt index and a
	// truncated `Reason` derived from the validator's error.
	EventTypeRetryWithFeedback events.EventType = "llm.retry_with_feedback"
	// EventTypePostureReadAdmin — Phase 72g (D-112). Emitted when an
	// admin-scoped caller reads ANOTHER tenant's LLM posture via the
	// `llm.posture` Protocol method. An own-tenant read does NOT emit.
	// The cross-tenant read is a privileged action and lands on the
	// audit trail per CLAUDE.md §7 + RFC §6.15.
	EventTypePostureReadAdmin events.EventType = "llm.posture_read_admin"
	// EventTypeCompletionChunk — Phase 107 streaming completion event.
	// Emitted per token delta from the LLM provider under the originating
	// run's identity quadruple. The `Done=true` chunk fires exactly once
	// per stream (terminator marker). SafePayload — deltas are per-session
	// operator-visible content.
	EventTypeCompletionChunk events.EventType = "llm.completion.chunk"
)

Phase 32 LLM-edge event types. Registered via init() so the canonical events registry stays the single source of truth (see internal/events/events.go and AGENTS.md §17.6's "wiring gap" lesson — register at declaration time, publish at use time).

All payloads are SafePayload (compose events.SafeSealed): they carry no secret-shaped data. Identity is the Harbor quadruple; content payloads (artifact refs, MIME types, byte counts, model names) are operator-visible by design.

View Source

const (
	DefaultContextWindowReserve = 0.05 // 5%
	// DefaultHeavyOutputThreshold (32 KiB; D-022 / RFC §6.10) is
	// single-sourced on `config.DefaultHeavyOutputThresholdBytes` so
	// the snapshot default cannot drift from the operator-config
	// default (the DefaultSpawnDepthCap precedent).
	DefaultHeavyOutputThreshold = config.DefaultHeavyOutputThresholdBytes
	// DefaultMaxRetries (Phase 36) — the retry-with-feedback bound
	// when `ModelProfile.MaxRetries` is zero. Conservative: one
	// corrective re-ask after the original attempt.
	DefaultMaxRetries = 1
)

Defaults applied when the snapshot's corresponding field is zero. Kept here (not in `validate.go`) so an operator who constructs a snapshot programmatically still gets reasonable behaviour without every test wiring also touching the config layer.

View Source

const DefaultDriver = "bifrost"

DefaultDriver names the production LLM driver Phase 64 (D-089) flipped this constant to point at — `"bifrost"`, the pure-Go LLM gateway shipped by Phase 33. Before Phase 64 this was `"mock"`; the flip closes the §13 "test stubs as production defaults" amendment for the LLM seam.

Operators in production set `llm.driver` explicitly to `"bifrost"` (the same value the config defaults to). The `mock` driver still self-registers via init() — its package init runs when an importer (a test that builds a deterministic LLM stack) blank-imports it — but the production `cmd/harbor` binary never imports the mock package, so a config that lists `driver: mock` in a production build hits `ErrUnknownDriver: "mock" (registered: bifrost)` rather than silently routing through a stub.

Dev-only escape hatch (D-089): the `harbor dev` subcommand reads `HARBOR_DEV_ALLOW_MOCK=1` and, when set, blank-imports the mock driver itself (the conditional blank-import lives at the subcommand boundary, not in `main.go`) AND prints a stderr banner `[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]` on every boot. Outside that one explicit dev path, the mock is unreachable.

View Source

const WarnWrapperNotSeated = "llm: wrapper hook not seated — composing client WITHOUT this production layer"

WarnWrapperNotSeated is the message Open emits (via slog.Warn) once per production wrapper layer whose hook was never seated by its blank import. Exported so tests that gate on the warning's PRESENCE or ABSENCE (e.g. the Phase 110c aggregator integration test) reference the one source of truth — a message rewording breaks the gate loudly instead of silently turning it into a tautological pass.

Variables ¶

View Source

var (
	// ErrUnknownDriver — Open was asked for a driver name no
	// registered factory handles. The error's message names the
	// registered drivers so misconfigurations are obvious (§4.4).
	ErrUnknownDriver = errors.New("llm: unknown driver")
	// ErrClientClosed — Complete called after Close. The wrapped
	// driver returns this; the safetyClient propagates it verbatim.
	ErrClientClosed = errors.New("llm: client is closed")
	// ErrIdentityMissing — Complete called with a ctx that does not
	// carry an `identity.Identity` (or `identity.Quadruple`).
	// AGENTS.md §6 rule 9 — identity is mandatory at every Harbor
	// boundary; the runtime fails closed.
	ErrIdentityMissing = errors.New("llm: identity missing from ctx")
	// ErrInvalidContent — a `ChatMessage.Content` is malformed: both
	// `Text` and `Parts` set, or neither, or a `ContentPart` whose
	// `Type` discriminator doesn't match its payload (e.g. Type=image
	// with `Image == nil`). The safety pass rejects loudly rather than
	// papering over the inconsistency.
	ErrInvalidContent = errors.New("llm: invalid message content")
	// ErrContextLeak — runtime-wide invariant violation (D-026). A
	// raw byte / string / DataURL ≥ heavy-output threshold survived
	// every producer's normalization step and reached the LLM-client
	// edge. The safety pass fails the request; the bus emits
	// `llm.context_leak` so operators can find the offending
	// producer.
	ErrContextLeak = errors.New("llm: raw heavy content reached LLM-client edge — D-026 violation")
	// ErrContextWindowExceeded — the token-budget guard fired (D-026).
	// The assembled `CompleteRequest`'s estimated token count is
	// within `ContextWindowReserve` of the model's configured
	// `ContextWindowTokens` cap. V1 fails loudly; auto-cascade is
	// post-V1 work — the planner is responsible for recovery (drop
	// older turns, summarize, etc.).
	ErrContextWindowExceeded = errors.New("llm: estimated tokens within reserve of model context window")
	// ErrInvalidConfig — `Open` called with a `ConfigSnapshot` that
	// fails structural validation (driver name empty, model profile
	// missing for the request's model, etc.). Distinct from
	// ErrUnknownDriver — that's a registry miss, this is a
	// configuration miss.
	ErrInvalidConfig = errors.New("llm: invalid configuration")
	// ErrUnsupportedModel — the safety net or driver hit a model
	// name with no matching `ModelProfile`. Required because the
	// token-budget guard depends on a profile's context-window cap.
	ErrUnsupportedModel = errors.New("llm: model has no configured ModelProfile")
	// ErrInvalidJSONSchema (Phase 35) — the provider returned a
	// `Complete` whose JSON output did not validate against the
	// requested schema (or rejected the schema itself at the wire
	// layer). The downgrade wrapper observes this via
	// `IsInvalidJSONSchemaError` and steps the request down the chain.
	// Drivers MAY wrap their provider-specific schema errors with this
	// sentinel; the classifier also matches a small allowlist of error
	// substrings to handle providers that surface only a free-form
	// `error` string.
	ErrInvalidJSONSchema = errors.New("llm: response failed JSON-schema validation")
	// ErrDowngradeExhausted (Phase 35) — the downgrade wrapper ran
	// every step in the chain and the inner call STILL produced
	// `ErrInvalidJSONSchema`. Surfaces with the wrapped chain history
	// so operators can correlate against `llm.mode_downgraded` events.
	ErrDowngradeExhausted = errors.New("llm: structured-output downgrade chain exhausted")
	// ErrRetryExhausted (Phase 36) — the retry wrapper exceeded the
	// per-model `MaxRetries` bound. Wraps the chain of validator
	// failures so operators can see why each attempt failed.
	ErrRetryExhausted = errors.New("llm: retry-with-feedback budget exhausted")
	// ErrValidationFailed (Phase 36) — surfaces when the validator
	// returns non-nil AND the retry wrapper is NOT registered (the
	// caller asked for validation without a wrapper to retry). The
	// wrapper-registered path uses the validator's own error verbatim
	// in `RetryWithFeedbackPayload.Reason`.
	ErrValidationFailed = errors.New("llm: response validator rejected output")
	// ErrOrphanToolCall — an assistant message with `ToolCalls` is
	// not followed by the corresponding `RoleTool` messages whose
	// `ToolCallID` matches each `ToolCalls[i].ID`. OpenAI's wire
	// spec requires the pairing; the safety pass rejects loudly so
	// the producer is forced to fix the upstream omission rather
	// than silently shipping an invalid wire shape.
	ErrOrphanToolCall = errors.New("llm: assistant message with ToolCalls is not followed by matching RoleTool messages")
)

Sentinel errors. Callers compare via errors.Is.

Functions ¶

func HasIdentity ¶

func HasIdentity(ctx context.Context) bool

HasIdentity reports whether `ctx` carries a complete Harbor identity. The LLM-client edge MUST validate this before invoking any driver — the runtime fails closed on missing identity (AGENTS.md §6 rule 9, AGENTS.md §13 forbidden-practices).

Used by `safetyClient.Complete`; exposed so test helpers can pin the check at the call site.

func IsInvalidJSONSchemaError ¶

func IsInvalidJSONSchemaError(err error) bool

IsInvalidJSONSchemaError reports whether `err` represents a schema-class failure that the Phase 35 downgrade chain should treat as a signal to step the request down to the next `OutputMode`.

The classifier checks two paths:

`errors.Is(err, ErrInvalidJSONSchema)` — drivers / wrappers that classify upstream errors and wrap with the sentinel.
A small case-insensitive substring scan against `invalidJSONSchemaErrorMarkers`. This handles providers that surface only a free-form error string.

The substring allowlist is deliberately narrow to avoid false positives on transient / IO / auth failures. Returns false for nil.

func NewChunkPublisher ¶ added in v1.3.0

func NewChunkPublisher(bus events.EventBus, q identity.Quadruple, taskID string, logger *slog.Logger) func(delta string, done bool, kind string)

NewChunkPublisher returns the per-run OnChunk closure a run-loop driver adapts onto `planner.RunContext.OnChunk` (Phase 110b — D-195). The closure builds a CompletionChunkPayload and publishes EventTypeCompletionChunk with the run's identity quadruple on the **Event envelope**, not just the payload — the event bus validates the envelope before fan-out (CLAUDE.md §6 rule 5). This is the hard-won trap the constructor encodes: when the original closure stamped the payload only, live testing surfaced 280+ rejected chunks per task ("events: event identity missing one or more components: type=llm.completion.chunk"). Publish failures Warn loudly (brief 06 §5) — never silent drops.

`kind` is string-typed because `planner` imports `llm`, so this package cannot name `planner.ChunkKind`; the run loop adapts with a one-line wrapper:

pub := llm.NewChunkPublisher(bus, q, string(taskID), logger)
onChunk := func(d string, done bool, k planner.ChunkKind) { pub(d, done, string(k)) }

The publish context is `context.Background()` — the documented bridge across an unmanaged async boundary (CLAUDE.md §5) for callers that genuinely have no lifetime ctx. Run-loop drivers are NOT that caller: they own a driver-lifetime ctx and MUST use NewChunkPublisherContext so publishes are bounded by it — the durable bus driver drives its `store.Save` with the publish ctx, and an unbounded Background ctx silently outlived driver Close (D-195's recorded correction, resolved by D-207).

Concurrent reuse (D-025): the constructor allocates no shared mutable state — each run constructs its own closure over the run's quadruple + task ID; N concurrent runs see N independent closures over one shared (concurrent-safe) bus.

A nil logger defaults to slog.Default() so the failure path stays loud for every caller.

func NewChunkPublisherContext ¶ added in v1.3.0

func NewChunkPublisherContext(baseCtx context.Context, bus events.EventBus, q identity.Quadruple, taskID string, logger *slog.Logger) func(delta string, done bool, kind string)

NewChunkPublisherContext is NewChunkPublisher with a caller-supplied base context bounding every publish (D-207, closing D-195's correction note). Run-loop drivers pass their driver-lifetime ctx (the pre-110b `d.subCtx` semantics): on the durable bus driver — which persists each chunk event via `store.Save` under the publish ctx — cancelling baseCtx stops persistence at driver teardown instead of letting late chunks write past Close. Publish failures (including baseCtx cancellation) Warn loudly, never silently drop.

A nil baseCtx falls back to context.Background() (the ctx-less constructor's documented bridge).

func Register ¶

func Register(name string, factory Factory)

Register installs a driver factory under `name`. Drivers self- register from their package `init()`; `cmd/harbor` blank-imports the production driver to trigger registration (Phase 33+).

Re-registering the same name panics — the registration model is write-once-at-init and a duplicate signals a build misconfig.

func RegisterCorrectionsWrapper ¶

func RegisterCorrectionsWrapper(fn func(LLMClient, ConfigSnapshot) LLMClient)

RegisterCorrectionsWrapper installs the Phase 34 corrections wrapper hook. Called once from `internal/llm/corrections.init()`; the production binary picks up the registration by blank-importing the corrections package.

The hook signature mirrors `corrections.Wrap` — given the inner `LLMClient` (the safety wrapper) and the config snapshot, returns the corrections-wrapped client.

Re-registering panics — the registration model is write-once-at- init and a duplicate signals a build misconfig.

func RegisterDefaultOutputModeResolver ¶

func RegisterDefaultOutputModeResolver(fn func(model string) OutputMode)

RegisterDefaultOutputModeResolver installs the per-known-provider `OutputMode` resolver from `internal/llm/corrections`. Called once from `corrections.init()`; the production binary blank-imports the corrections package so the registration fires at boot. Re-registering panics — write-once-at-init.

func RegisterDowngradeWrapper ¶

func RegisterDowngradeWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterDowngradeWrapper installs the Phase 35 structured-output downgrade wrapper hook. Called once from `internal/llm/output.init()`; the production binary blank-imports `internal/llm/output` so the registration fires at boot.

The hook receives the inner `LLMClient` (typically `corrections(safety(driver))`), the config snapshot, and the Deps so the wrapper can emit events on the shared bus.

Re-registering panics — write-once-at-init.

func RegisterGovernanceWrapper ¶

func RegisterGovernanceWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterGovernanceWrapper installs the Phase 36a/36b governance wrapper hook. Called once from `internal/governance.init()`; the production binary blank-imports the package so the hook lands at boot. Governance composes OUTSIDE the entire downstream chain (D-043 + D-044) — the wrapper sits at the outermost layer in `Open` so `PreCall` fires before retry / downgrade / corrections / safety even reach the driver.

The hook receives the inner `LLMClient` (typically `retry(downgrade(corrections(safety(driver))))`), the config snapshot, and the Deps so the wrapper can build its Subsystem if a factory has been registered via `governance.SetFactory`. Latent default: with no factory set, the hook returns `inner` unchanged.

Re-registering panics — write-once-at-init.

func RegisterMockModeCaptured ¶

func RegisterMockModeCaptured(v bool)

RegisterMockModeCaptured records that the runtime booted with `HARBOR_DEV_ALLOW_MOCK=1` (D-089). It is called exactly once from `cmd/harbor/devmock.go::registerMockIfDevAllowMock` at boot — the SAME call site that prints the `[DEV-ONLY MOCK LLM — DO NOT USE IN PRODUCTION]` stderr banner. Calling it with `true` flips the captured flag so `llm.posture` surfaces `MockMode: true`; calling it with `false` (or never calling it — the zero value) leaves the flag false.

A future PR that re-routes the dev-hatch path (e.g. promotes the env var to a CLI flag) MUST keep this call reciprocal with the banner emit — otherwise `LLMPostureResponse.MockMode` silently desyncs from the banner. The Phase 72g integration + smoke tests assert both paths fire together.

func RegisterRetryWrapper ¶

func RegisterRetryWrapper(fn func(LLMClient, ConfigSnapshot, Deps) LLMClient)

RegisterRetryWrapper installs the Phase 36 retry-with-feedback wrapper hook. Called once from `internal/llm/retry.init()`; the production binary blank-imports `internal/llm/retry`.

The hook signature mirrors `RegisterDowngradeWrapper`.

Re-registering panics — write-once-at-init.

func RegisteredDrivers ¶

func RegisteredDrivers() []string

RegisteredDrivers returns a sorted list of driver names. Useful for boot-log emission and for surfacing in error messages.

Types ¶

type ArtifactStub ¶

type ArtifactStub struct {
	Ref       string     `json:"artifact_ref"`
	MIME      string     `json:"mime"`
	SizeBytes int64      `json:"size_bytes"`
	Hash      string     `json:"hash,omitempty"`
	Summary   string     `json:"summary,omitempty"`
	Fetch     *StubFetch `json:"fetch,omitempty"`
}

ArtifactStub is the model-agnostic JSON shape the LLM sees in place of heavy content during prompt assembly (RFC §6.5, D-026). The same shape is used whether the substituted content originated from a tool result, a memory turn, or a multimodal input.

Operators can override `Summary` per-producer; the rest is runtime-stamped at materialization time. The stub's JSON rendering is byte-stable across providers — no per-provider swapping.

JSON shape (omitempty on optional fields, no extra fields):

{"artifact_ref":"ref-abc-def","mime":"image/png","size_bytes":65536,
 "hash":"sha256:...","summary":"User-uploaded screenshot at turn 3",
 "fetch":{"tool":"artifact.fetch","id":"ref-abc-def"}}

func (ArtifactStub) MarshalJSON ¶

func (s ArtifactStub) MarshalJSON() ([]byte, error)

MarshalJSON ensures the canonical render of an `ArtifactStub` — stable field order, `omitempty` honored, no extra fields. The runtime's `ObservationRenderer` and the safety-net materialization both go through this method, so producers and the LLM-side audit see byte-identical output.

Implemented explicitly (rather than relying on Go's default struct marshaling) so the contract is stable across Go version field- ordering changes.

type AudioPart ¶

type AudioPart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
}

AudioPart is a multimodal audio input. Same supply forms as `ImagePart`; `MIME` is the audio MIME type.

type ChatMessage ¶

type ChatMessage struct {
	Role    Role
	Content Content
	Name    *string
	// ToolCallID (Phase 107c / D-167) is the provider-assigned
	// tool-call identifier carried on RoleTool messages. Rendered
	// as the native tool-result role with matching call ID when
	// the provider supports it; falls back to user-role rendering
	// on providers without native tool-result roles.
	ToolCallID *string
	// ToolCalls (Phase 107c / D-167) is the per-message structured
	// tool-call slice carried on RoleAssistant messages that replay
	// a prior planner step's CallTool emission into the next turn's
	// thread. When non-empty, the bifrost translator emits an
	// assistant message with the provider-native `tool_calls`
	// block (OpenAI / Anthropic / Gemini all consume this shape).
	// The matching tool result is threaded back via a sibling
	// RoleTool message whose `ToolCallID` matches `ToolCalls[i].ID`.
	// Empty for every non-assistant message and for assistant
	// messages whose content is the model's final answer.
	ToolCalls []ToolCallStructured
}

ChatMessage is one entry in the chat thread.

`Content` is a sum-type: exactly one of `Text` or `Parts` is set. `Text` is the common case (text-only conversation). `Parts` is set when the message carries multimodal content. `Name` is optional — used by some providers for participant naming.

type CompleteRequest ¶

type CompleteRequest struct {
	Model           string
	Messages        []ChatMessage
	ResponseFormat  *ResponseFormat
	Stream          bool
	OnContent       func(delta string, done bool)
	OnReasoning     func(delta string, done bool)
	Temperature     *float32
	MaxTokens       *int
	Stops           []string
	ReasoningEffort ReasoningEffort
	Extra           map[string]any
	// Validator (Phase 36) is the caller-supplied post-response
	// validation hook. When non-nil, the retry wrapper invokes it
	// after each successful `Complete`; a non-nil return triggers a
	// corrective re-ask bounded by `ModelProfile.MaxRetries`. The
	// validator is opaque to the wrapper — return any error type;
	// the wrapper truncates and includes its `Error()` in the retry
	// sub-prompt + the `llm.retry_with_feedback` event payload.
	//
	// `nil` Validator (the default) disables the retry loop entirely;
	// the wrapper is a no-op pass-through. Validators MUST be safe for
	// concurrent invocation against the same compiled artifact (the
	// wrapper itself enforces D-025; the validator runs once per call).
	Validator func(CompleteResponse) error

	// Tools (Phase 107c / D-167) is the per-turn tool catalog. When
	// nil the driver calls the provider without the tool-calling
	// block (text-only completion — preserves non-React planner
	// behavior).
	Tools []ToolDeclaration
	// ToolChoice (Phase 107c / D-167) is the per-provider tool-choice
	// passthrough. "" means "do not emit a tool_choice field"; "auto"
	// lets the provider decide; "required" forces the model to emit at
	// least one tool call; "none" suppresses tool calls entirely.
	ToolChoice string
	// ParallelToolCalls (Phase 107c / D-167) is the per-turn knob for
	// parallel function-calling (default true for supporting providers;
	// bifrost maps it per provider). The planner sets this per the
	// operator's yaml knob + the runloop executor's capability signal.
	ParallelToolCalls bool
}

CompleteRequest is the LLM-call payload. Settled in RFC §6.5; shaped by D-021 (multimodal sum-type), D-026 (safety-net invariants).

`Messages` is the chat thread — role + content only. The system / user / assistant roles are the entire vocabulary; tool-result rendering happens at the `ObservationRenderer` layer as user-role messages (RFC §6.4 + brief 07 §5).

`ResponseFormat` is an optional structured-output hint. `nil` means "plain text"; `json_object` requests provider JSON mode; `json_schema` carries a caller-supplied JSON Schema. Phase 35 owns the per-provider downgrade chain `json_schema → json_object → text`.

`Stream` + `OnContent` / `OnReasoning` cooperate: when `Stream` is true, the driver invokes the callbacks for each delta. `OnReasoning` fires only for thinking-class providers that expose a separate reasoning channel (`o1`, `o3`, `deepseek-reasoner`, etc.).

`Temperature` / `MaxTokens` / `Stops` map directly onto provider sampler controls. Pointer types (`*float32`, `*int`) distinguish "unset (use provider default)" from "set to zero".

`ReasoningEffort` is a request-level hint mapped to per-provider reasoning controls (bifrost's `ChatReasoning`). `""` means "do not touch the provider default."

`Extra` is provider-passthrough sanitized by Phase 34's correction layer. Phase 32 stores the field but does not interpret it.

type CompleteResponse ¶

type CompleteResponse struct {
	Content   string
	ToolCalls []ToolCallStructured
	Reasoning string
	Cost      Cost
	Usage     Usage
}

CompleteResponse is the LLM-call return shape.

`Content` is the full assembled assistant message — for streaming calls the driver concatenates `OnContent` deltas into `Content` before returning. The runtime parses `Content` into a `PlannerAction` per brief 07; the LLM never emits provider-native tool calls.

`ToolCalls` (Phase 107c / D-167) carries provider-validated structured tool-call entries. When non-empty, the planner reads ToolCalls as its primary decision discriminator (native tool-calling path). Empty for text-only responses and for providers without native tool-calling support.

`Reasoning` carries the provider-side thinking trace (Anthropic extended thinking, OpenAI o-series, DeepSeek native, Gemini `thought:true` parts) normalised by the driver. It is the canonical captured trace for BOTH unary and streaming calls — distinct from the per-delta `OnReasoning` streaming callback, which exists for live UX. Empty when the provider did not surface reasoning, or when the driver does not read a reasoning channel. Reasoning is captured content, NOT replayed into prompts: the planner persists it on `trajectory.Step.ReasoningTrace` and only re-injects it when an operator opts into replay (D-148). Phase 83e (RFC §6.2 + §6.5).

`Cost` + `Usage` propagate the provider's reported figures. Governance (Phase 36a/36b) subscribes to `llm.cost.recorded` events emitted by the runtime when a `Complete` returns; the event payload re-stamps these shapes.

type CompletionChunkPayload ¶ added in v1.2.0

type CompletionChunkPayload struct {
	events.SafePayload
	Identity   identity.Quadruple
	TaskID     string
	RunID      string
	Delta      string
	Done       bool
	Kind       string
	OccurredAt time.Time
}

CompletionChunkPayload is the typed payload for EventTypeCompletionChunk (Phase 107). SafePayload — the delta is per-session operator-visible content (the LLM's own output), not a secret. Kind is "content" or "reasoning".

type ConfigSnapshot ¶

type ConfigSnapshot struct {
	Driver               string
	ContextWindowReserve float64
	HeavyOutputThreshold int
	ModelProfiles        map[string]ModelProfile

	// DisableCorrections opts OUT of the Phase 34 per-provider
	// correction layer. Zero-value (false) = corrections enabled —
	// production callers wire `corrections.Wrap(safetyClient(driver))`
	// so quirks like NIM message reordering, OpenAI strict-schema
	// mode, thinking-class reasoning routing, Anthropic envelope
	// translation, and usage backfill all apply automatically. Tests
	// that need to exercise the safety pass in isolation set this to
	// true.
	//
	// Inverse-named so the zero-value matches the production default
	// — direct callers (tests, programmatic snapshot construction)
	// don't have to flip an extra knob to get correct behaviour. The
	// config loader resolves the operator-facing `corrections.enabled`
	// yaml field (default true) into this inverse.
	DisableCorrections bool

	// DisableDowngrade opts OUT of the Phase 35 structured-output
	// downgrade chain. Zero-value (false) = enabled. Inverse-named so
	// production callers get the right behaviour by default.
	DisableDowngrade bool

	// DisableRetry opts OUT of the Phase 36 retry-with-feedback
	// wrapper. Zero-value (false) = enabled. The wrapper is a no-op
	// when `CompleteRequest.Validator` is nil, so disabling is only
	// useful for tests that need to isolate the downgrade layer.
	DisableRetry bool

	// DisableGovernance opts OUT of the Phase 36a/36b governance
	// wrapper. Zero-value (false) = enabled — but the wrapper is also
	// a no-op pass-through when no `governance.Factory` has been
	// registered, so the latent default (Wave 7b scoping) requires
	// neither flag flip nor factory wiring. Tests that want to bypass
	// even a registered factory flip this true.
	DisableGovernance bool

	// Bifrost-driver knobs (Phase 33).
	Provider string
	Model    string
	APIKey   string
	BaseURL  string
	Timeout  time.Duration

	// CustomProviders is the operator-declared registry of
	// OpenAI-compatible providers (Phase 33a). When `Provider`
	// matches a custom entry's `Name`, the entry's `BaseURL` /
	// `APIKeyEnvVar` / `Models` / network knobs apply (legacy
	// `APIKey` / `BaseURL` / `Timeout` ignored for that case). The
	// list is keyed only by `Name`; the bifrost driver iterates and
	// registers all entries with bifrost's `Account`.
	CustomProviders []CustomProviderSpec

	// NetworkDefaults applies to every provider when the per-provider
	// override is absent. Zero-valued fields fall through to
	// bifrost's package-level defaults at construction. Restart-
	// required.
	NetworkDefaults NetworkDefaults
}

ConfigSnapshot is the strict subset of `config.LLMConfig` the LLM package consumes. Keeping a snapshot decouples drivers from the config package's type evolution (mirrors `internal/memory`'s pattern).

`Driver` selects the §4.4 factory. Empty defaults to `DefaultDriver` (Phase 32 = "mock"; Phase 33 will leave the default explicit at the caller — operator must opt-in to `bifrost`).
`ContextWindowReserve` is the safety-net token-budget margin (default 0.05 / 5%). Range [0.0, 1.0); validated at the config layer + at construction.
`HeavyOutputThreshold` mirrors `config.ArtifactsConfig.HeavyOutputThresholdBytes` so the LLM package does not re-import the artifact-config struct. Default 32 KiB.
`ModelProfiles` is keyed by canonical model name. The safety net's token-budget guard requires a profile entry for the model in the `CompleteRequest`; missing → `ErrUnsupportedModel`.

`Provider` / `Model` / `APIKey` / `BaseURL` / `Timeout` are the Phase-33 bifrost-driver knobs. Phase 32 stores them so the snapshot's shape is stable across phases; the mock driver ignores them. Phase 33's bifrost driver will read them.

func SnapshotFromConfig ¶ added in v1.3.0

func SnapshotFromConfig(cfg config.LLMConfig, art config.ArtifactsConfig) ConfigSnapshot

SnapshotFromConfig projects the operator-facing `config.LLMConfig` block (plus the one artifacts-config field the LLM edge consumes) onto the llm package's decoupled ConfigSnapshot. Every config-sourced snapshot field is populated here — cmd/harbor, harbortest/devstack, and headless embedders all call this ONE projection (closing the D-155 / audit-B3 silent-field-drop class).

`art` supplies `HeavyOutputThresholdBytes` → `HeavyOutputThreshold` (the snapshot mirrors it so the llm package does not re-import the artifact-config struct; see the ConfigSnapshot godoc).

The returned snapshot is a deep copy: mutating it (e.g. the `HARBOR_DEV_ALLOW_MOCK=1` driver override in `harbor dev`) never reaches back into the caller's *config.Config.

Fields NOT populated here, with reasons:

`DisableDowngrade` / `DisableRetry` / `DisableGovernance` — no operator-facing config knob exists for them (test-only opt-outs); the zero value (enabled) is the production default.
`ModelProfile.OutputMode` — llm-internal normalisation target; `applyDefaults` resolves it from `JSONSchemaMode` + the registered per-provider resolver at Open time.

type Content ¶

type Content struct {
	Text  *string
	Parts []ContentPart
}

Content is the multimodal sum-type. Exactly one of `Text` or `Parts` must be set; both-set and both-nil are invalid and rejected by the safety net with `ErrInvalidContent`.

type ContentPart ¶

type ContentPart struct {
	Type  PartType
	Text  string     // when Type == PartText
	Image *ImagePart // when Type == PartImage
	Audio *AudioPart // when Type == PartAudio
	File  *FilePart  // when Type == PartFile
}

ContentPart is one element of a multimodal `Content.Parts` slice. Exactly one of `Text` / `Image` / `Audio` / `File` is set per the `Type` discriminator.

type ContextLeakPayload ¶

type ContextLeakPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	LeakSite   string
	SizeBytes  int64
	Threshold  int
	OccurredAt time.Time
}

ContextLeakPayload is the typed payload for EventTypeContextLeak. SafePayload — the leak-site identifier (a short structural fingerprint like "Messages[2].Content.Text") is operator-visible debug data, not secret-shaped.

`SizeBytes` is the size of the offending payload; `Threshold` is the runtime's configured heavy-output threshold at the time of the emit, so an operator can correlate config-change-time drift.

type ContextWindowExceededPayload ¶

type ContextWindowExceededPayload struct {
	events.SafeSealed
	Identity             identity.Quadruple
	Model                string
	EstimatedTokens      int
	ContextWindowTokens  int
	ContextWindowReserve float64
	OccurredAt           time.Time
}

ContextWindowExceededPayload is the typed payload for EventTypeContextWindowExceeded. SafePayload — token counts + configured cap are operator-visible.

type CorrectionsProfile ¶

type CorrectionsProfile struct {
	// MessageOrdering controls how the request's chat-message slice
	// is reordered before reaching the driver. Default (zero value)
	// passes the slice through unchanged.
	MessageOrdering MessageOrderingPolicy
	// SchemaMode controls how the request's `ResponseFormat.JSONSchema`
	// bytes are mutated before reaching the driver. Default passes
	// the schema through unchanged.
	SchemaMode SchemaSanitizationMode
	// ReasoningEffortRouting controls whether `req.ReasoningEffort` is
	// translated to a provider-specific `Extra` key (thinking-class
	// models) or passed through as the top-level field (default).
	ReasoningEffortRouting ReasoningRouting
	// ResponseFormatShape controls the wire-shape translation of
	// `req.ResponseFormat`. Default emits the OpenAI envelope; other
	// values translate to per-provider envelopes (Anthropic tool-
	// schema, `json_only` for providers that reject `json_schema`).
	ResponseFormatShape ResponseFormatProfile
	// UsageBackfillEnabled, when true, makes the corrections layer
	// compute synthetic token counts (and, if `CostOverrides` is set,
	// synthetic costs) when the driver returns an all-zeros `Usage`.
	// Default false — the response surfaces zeros verbatim.
	UsageBackfillEnabled bool
}

CorrectionsProfile carries the per-model quirk flags the Phase 34 `internal/llm/corrections` layer dispatches on. The types live in the `llm` package so the corrections sub-package can consume them without an import cycle (logic lives in `internal/llm/corrections`).

Zero-valued struct means "no quirks declared for this model"; the corrections pass treats each field's zero value as the Harbor-default behaviour (no reorder, no schema mutation, OpenAI-style envelopes, usage backfill off).

Per RFC §6.5 + brief 03 §4: this is the operator-controlled surface for adapting Harbor's neutral `CompleteRequest` shape to per-provider expectations. The corrections layer is the ONLY consumer.

type Cost ¶

type Cost struct {
	InputTokensCost     float64
	OutputTokensCost    float64
	ReasoningTokensCost float64
	TotalCost           float64
	Currency            string // "USD" canonical; reserved for future multi-currency
}

Cost is the provider-reported cost breakdown. Values are USD. Fields are zero when the provider doesn't report a category.

Governance (Phase 36a) subscribes to `llm.cost.recorded` events to drive per-identity accumulators; Phase 36a's payload re-stamps these fields.

type CostRecordedPayload ¶

type CostRecordedPayload struct {
	events.SafeSealed
	Identity identity.Quadruple
	Model    string
	Cost     Cost
	Usage    Usage
	// ContextWindowTokens is the model's input-token window (from the
	// model profile), stamped so the Console can render context-used vs
	// window (%). Zero when the model has no profile / configured window.
	ContextWindowTokens int
	OccurredAt          time.Time
}

CostRecordedPayload is the typed payload for EventTypeCostRecorded. SafePayload — cost / token counts are operator-visible. Phase 36a subscribes for per-identity accumulator updates.

type CostTable ¶

type CostTable struct {
	InputPer1M     float64
	OutputPer1M    float64
	ReasoningPer1M float64
	Currency       string // "USD" canonical
}

CostTable carries fallback per-1M-token rates. Used when the provider's response doesn't include cost. Phase 36a consumes.

type CustomProviderSpec ¶

type CustomProviderSpec struct {
	Name                 string
	BaseURL              string
	APIKeyEnvVar         string
	Models               []string
	BaseProviderType     string
	Timeout              time.Duration
	MaxRetries           int
	RetryBackoffInitial  time.Duration
	RetryBackoffMax      time.Duration
	Concurrency          int
	BufferSize           int
	RequestPathOverrides map[string]string
}

CustomProviderSpec is one operator-declared OpenAI-compatible provider (Phase 33a). The bifrost driver maps each entry to a `bfschemas.ProviderConfig` with `CustomProviderConfig.BaseProviderType = schemas.OpenAI`. Zero-valued network knobs fall through to `ConfigSnapshot.NetworkDefaults`, which itself falls through to bifrost's package-level defaults.

`APIKeyEnvVar` is the environment-variable NAME (no `env.` prefix); the driver resolves `os.Getenv(name)` at construction. Missing → `ErrMissingAPIKey` with the env var named.

`RequestPathOverrides` maps `bfschemas.RequestType` (string-coded at this layer to avoid the import) to a custom URL path; the bifrost driver translates the keys when wiring the config. Used for OpenAI-compatible endpoints that host e.g. `/chat/completions` at the root.

type Deps ¶

type Deps struct {
	Artifacts artifacts.ArtifactStore
	Bus       events.EventBus
}

Deps carries the runtime dependencies the LLM client subsystem consumes. Both are mandatory — fail-loudly at construction.

`Artifacts` is the auto-materialize target (D-022). Inline `DataURL` content above the heavy-output threshold is rewritten as an `Artifact` whose bytes live in the store.
`Bus` is the canonical event bus. The safety pass publishes `llm.image.materialized` / `llm.context_leak` / `llm.context_window_exceeded`; the request-emit path (Phase 36a subscriber lands here) publishes `llm.cost.recorded`.

The package does NOT depend on `state.StateStore` — the LLM client is stateless across calls (D-025).

type Driver ¶

type Driver interface {
	// Complete receives a `CompleteRequest` whose messages have
	// ALREADY passed the safety net (`enforceContextSafety`): no raw
	// heavy content survived, the token-budget guard fired or
	// passed, oversize `DataURL` content has been materialized to
	// `Artifact` form. The driver translates the request into its
	// provider's wire shape and returns the typed response.
	Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)
	// Close mirrors `LLMClient.Close`. Idempotent; second call is a
	// no-op (returns nil).
	Close(ctx context.Context) error
}

Driver is the unexported-by-naming surface every concrete driver implements. Identical shape to `LLMClient` minus the contract that the safety net has already run. `Open` wraps a `Driver` in a `safetyClient` so the safety pass is mandatory by construction.

Driver authors implement this; callers consume `LLMClient`.

type Factory ¶

type Factory func(cfg ConfigSnapshot, deps Deps) (Driver, error)

Factory builds a `Driver` from a `ConfigSnapshot` + `Deps`. Drivers expose one `Factory` each via `init()` → `Register`.

type FilePart ¶

type FilePart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
	Filename string
}

FilePart is a multimodal file input. Same supply forms as `ImagePart`; `MIME` is the document MIME type. `Filename` is a hint shown to the model when the provider supports it.

type ImageMaterializedPayload ¶

type ImageMaterializedPayload struct {
	events.SafeSealed
	Identity    identity.Quadruple
	Model       string
	ArtifactRef string
	MIME        string
	SizeBytes   int64
	OccurredAt  time.Time
}

ImageMaterializedPayload is the typed payload for EventTypeImageMaterialized. SafePayload — the artifact ref, MIME type, and size are operator-visible content metadata, not secrets.

type ImagePart ¶

type ImagePart struct {
	URL      string
	DataURL  string
	Artifact *ArtifactStub
	MIME     string
	Detail   string
}

ImagePart is a multimodal image input.

Exactly one of `URL` / `DataURL` / `Artifact` is set. `URL` is a provider-fetchable remote URL. `DataURL` is an inline `data:image/...;base64,...` payload — above the heavy-output threshold the runtime materializes it to `Artifact`. `Artifact` is the canonical Harbor reference (D-022).

`MIME` is the image MIME type (`image/jpeg`, `image/png`, `image/webp`, ...). `Detail` is a provider hint (`low` / `high` / `auto`); empty string means "use provider default."

type LLMClient ¶

type LLMClient interface {
	Complete(ctx context.Context, req CompleteRequest) (CompleteResponse, error)
	// Close releases driver-held resources (HTTP connection pools,
	// background goroutines). Subsequent calls return ErrClientClosed.
	// Implementations MUST honour ctx during long teardowns.
	Close(ctx context.Context) error
}

LLMClient is the single contract callers depend on. ONE method. Streaming is signalled via `req.Stream` + `req.OnContent` / `req.OnReasoning`; cancellation flows through `ctx`. The runtime owns prompt construction, tool semantics, parsing, and parallel dispatch — see RFC §6.4 + brief 07.

Implementations MUST be safe for N concurrent goroutines against a single shared instance (D-025).

func Open ¶

func Open(_ context.Context, cfg ConfigSnapshot, deps Deps) (LLMClient, error)

Open returns the `LLMClient` built by the factory whose name matches `cfg.Driver` (defaults to `DefaultDriver` when empty).

Identity is mandatory at every method on the returned client; the safety pass enforces. Deps are validated at construction — `nil Artifacts` / `nil Bus` return wrapped errors immediately.

The returned client is a `*safetyClient` wrapping the registered driver: every `Complete` runs through `enforceContextSafety` BEFORE the driver sees the request. This is mandatory by construction — drivers cannot bypass it through the registry path.

Unseated wrapper hooks warn loudly ¶

The corrections / downgrade / retry / governance layers self- register via init() in their own packages and are pulled in by blank imports at the binary entry point. When a Disable* flag is FALSE (production behaviour requested) but the corresponding hook is nil (the blank import never fired), Open does NOT silently skip the layer: it emits one slog warning per missing wrapper naming the blank-import path that seats it (§13 — no silent degradation). The warning is not an error because the mock-only test path legitimately composes without wrappers; embedders that see the warning against a live provider are missing production semantics and should add the named import.

type MessageOrderingPolicy ¶

type MessageOrderingPolicy string

MessageOrderingPolicy enumerates the message-reordering modes the Phase 34 corrections layer supports. Operator-set in `ModelProfile.Corrections.MessageOrdering`.

const (
	// OrderingDefault passes the message slice through unchanged.
	OrderingDefault MessageOrderingPolicy = ""
	// OrderingSystemFirstStrict collapses all system-role messages
	// to the front of the slice and emits an alternating
	// user/assistant tail. Required by NIM and some OpenAI-compatible
	// proxies that reject mid-thread `system` messages (brief 03 §4).
	OrderingSystemFirstStrict MessageOrderingPolicy = "system_first_strict"
)

type ModeDowngradedPayload ¶

type ModeDowngradedPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	FromMode   OutputMode
	ToMode     OutputMode
	From       ResponseFormatKind
	To         ResponseFormatKind
	Reason     string
	OccurredAt time.Time
}

ModeDowngradedPayload is the typed payload for EventTypeModeDowngraded. Phase 35 fills the From/To/Reason fields. `FromMode` / `ToMode` carry the Harbor-side `OutputMode` (Native / Tools / Prompted / text); `From` / `To` carry the resolved `ResponseFormatKind` for backward visibility.

type ModelProfile ¶

type ModelProfile struct {
	// ContextWindowTokens is the model's hard input-token cap.
	// Required (> 0); the safety net's token-budget guard uses it.
	ContextWindowTokens int
	// TokenEstimator selects the estimator the safety net runs.
	// "" / "chars_div_4" — default chars/4 + role-overhead.
	// Phase 33+ may register tiktoken-equivalent estimators by name.
	TokenEstimator string
	// JSONSchemaMode — Phase 32-era placeholder; the config loader
	// normalises this string into `OutputMode` at snapshot time
	// (Phase 35). Direct callers SHOULD set `OutputMode`; this field
	// is read only when `OutputMode` is `OutputModeUnset`.
	JSONSchemaMode string
	// OutputMode (Phase 35) — Harbor-side structured-output strategy.
	// Drives the request-shaping in `internal/llm/output` and the
	// downgrade chain. See `OutputMode` constants for semantics.
	// Zero value (`OutputModeUnset`) falls back to the per-known-
	// provider default (see `corrections.DefaultOutputModeFor`).
	OutputMode OutputMode
	// DefaultMaxTokens — Phase 36b's identity-tier override target.
	DefaultMaxTokens *int
	// ReasoningEffort — request-level default applied by the
	// corrections layer (`corrections.Complete`) when the caller left
	// `CompleteRequest.ReasoningEffort` empty; an explicit per-call
	// value overrides it. Maps to the provider reasoning param (bifrost
	// `ChatReasoning.Effort`, or `Extra["reasoning_effort"]` under
	// `ReasoningRouteThinking`).
	ReasoningEffort ReasoningEffort
	// CostOverrides — per-1M-token rates when the provider doesn't
	// report cost (some OpenRouter routes don't). Phase 36a reads.
	CostOverrides *CostTable
	// Corrections — per-provider quirk flags consumed by the Phase 34
	// `internal/llm/corrections` layer. Zero-valued struct means
	// "no corrections needed for this model"; the corrections layer
	// runs a no-op pass for default-shaped profiles.
	Corrections CorrectionsProfile
	// MaxRetries (Phase 36) — caps the validator-driven corrective
	// re-asks performed by the retry wrapper. Zero (default) maps to
	// `DefaultMaxRetries` (1). A negative value is rejected at config
	// validation.
	MaxRetries int
}

ModelProfile carries per-model knobs. Keyed by canonical model name in `LLMConfig.ModelProfiles`. Phase 32 ships the shape + `ContextWindowTokens` + `TokenEstimator` consumers; Phase 33+ consume the rest.

type NetworkDefaults ¶

type NetworkDefaults struct {
	Timeout             time.Duration
	MaxRetries          int
	RetryBackoffInitial time.Duration
	RetryBackoffMax     time.Duration
	Concurrency         int
	BufferSize          int
}

NetworkDefaults are the operator-tunable defaults bifrost applies to every provider (native + custom) when the per-provider override is absent (Phase 33a). Zero-valued fields fall through to bifrost's package-level defaults.

type OutputMode ¶

type OutputMode string

OutputMode selects the request-shaping strategy for structured output (Phase 35; RFC §6.5). Three modes:

`OutputModeNative` — pass `FormatJSONSchema` through unchanged. The provider validates against the schema natively. Default for OpenAI / Anthropic / Google.
`OutputModeTools` — encode the schema as a *Harbor-side prompted* envelope where the LLM is asked to emit `{"name":"respond_with","arguments":{...}}` as plain output. The runtime parses that locally. Used as a fallback for providers without native `json_schema` support.
IMPORTANT: this is NOT a passthrough to provider-native tool-calling APIs (`tools=` / `tool_choice=` / `function_call` / `tool_use`). Harbor's runtime owns tool dispatch (RFC §6.4 / brief 07); `OutputModeTools` is purely a prompted-output technique. The static guard in `scripts/smoke/phase-35.sh` enforces this boundary.
`OutputModePrompted` — coerce `FormatJSONObject` and inline the schema as a system-prompt instruction. The LLM-side parse is "produce a JSON object matching this schema." Default for NIM / custom OpenAI-compatible / deepseek-reasoner.

The downgrade chain runs `current → next` on `IsInvalidJSONSchemaError` failures, bounded at 3 total attempts (initial + 2 downgrades).

const (
	// OutputModeUnset is the zero value — operator did not declare the
	// mode. The downgrade wrapper applies the per-model-prefix default
	// (see `internal/llm/corrections.DefaultOutputModeFor`).
	OutputModeUnset OutputMode = ""
	// OutputModeNative — pass `FormatJSONSchema` through. Provider
	// enforces strict schema mode.
	OutputModeNative OutputMode = "native"
	// OutputModeTools — Harbor-side prompted envelope. NOT provider
	// tool-calling APIs.
	OutputModeTools OutputMode = "tools"
	// OutputModePrompted — `FormatJSONObject` + schema in system prompt.
	OutputModePrompted OutputMode = "prompted"
)

type PartType ¶

type PartType string

PartType discriminates a `ContentPart`.

const (
	PartText  PartType = "text"
	PartImage PartType = "image"
	PartAudio PartType = "audio"
	PartFile  PartType = "file"
)

The PartType values, one per multimodal content shape.

type PostureProvider ¶

type PostureProvider struct {
	// contains filtered or unexported fields
}

PostureProvider is the Phase 72g read-only accessor over the runtime's bound LLM configuration. Built once per Runtime process via NewPostureProvider; `Posture` is safe for concurrent use by N goroutines (D-025).

func NewPostureProvider ¶

func NewPostureProvider(cfg ConfigSnapshot) *PostureProvider

NewPostureProvider builds a PostureProvider over the LLM ConfigSnapshot the binary resolved at boot. The provider / model / region are read from the snapshot and frozen at construction; the `MockMode` flag is NOT taken from the snapshot — it is read live (but race-free) from the boot-captured atomic, so the posture surface reflects D-089's single capture-path source.

When the snapshot's `Driver` field is empty it is normalised to `DefaultDriver` ("bifrost") — the same default `Open` applies — so the posture surface never reports an empty provider for a default-driver boot.

func (*PostureProvider) Posture ¶

func (p *PostureProvider) Posture(_ context.Context) (PostureSnapshot, error)

Posture returns the read-only PostureSnapshot of the runtime's bound LLM provider for the caller. The `ctx` is accepted for signature symmetry with `governance.PostureProvider.Posture` and so a future per-tenant LLM-routing model can scope the read; V1 ships a single provider per Harbor instance (RFC §6.15 + D-088), so the snapshot is identity-independent at this layer — the Protocol handler is the identity-mandatory gate.

`MockMode` is read from the boot-captured atomic (D-089) — NOT from an `os.Getenv` re-read.

type PostureReadAdminPayload ¶

type PostureReadAdminPayload struct {
	events.SafeSealed
	// Actor is the identity of the admin-scoped caller that performed
	// the cross-tenant read.
	Actor identity.Quadruple
	// RequestedTenant is the tenant_id the caller asked to read — a
	// tenant other than the caller's own.
	RequestedTenant string
}

PostureReadAdminPayload is the typed payload for EventTypePostureReadAdmin (Phase 72g). SafePayload — the actor's identity and the requested tenant are operator-visible audit metadata, not secret-shaped. NEVER carries provider API keys — the posture surface reports provider/model/region only. The payload runs through the audit Redactor before the bus publish (CLAUDE.md §7).

type PostureSnapshot ¶

type PostureSnapshot struct {
	// Provider is the LLM provider name (e.g. "bifrost", "mock").
	Provider string
	// Model is the bound model identifier (e.g. "openai/gpt-5.3-chat").
	Model string
	// Region is the provider endpoint region; "" when not applicable.
	Region string
	// MockMode is true iff the runtime booted with HARBOR_DEV_ALLOW_MOCK=1
	// (D-089). Captured at boot via RegisterMockModeCaptured.
	MockMode bool
}

PostureSnapshot is the read-only view of the runtime's bound LLM provider. It is the source the `llm.posture` Protocol handler projects onto the `LLMPostureResponse` wire type.

type ReasoningEffort ¶

type ReasoningEffort string

ReasoningEffort hints at provider-side thinking budget. Empty string means "use provider default" (DO NOT touch the request).

const (
	ReasoningOff    ReasoningEffort = "off"
	ReasoningLow    ReasoningEffort = "low"
	ReasoningMedium ReasoningEffort = "medium"
	ReasoningHigh   ReasoningEffort = "high"
)

The ReasoningEffort levels, ascending. The empty string (not listed here) means "use the provider default".

type ReasoningRouting ¶

type ReasoningRouting string

ReasoningRouting enumerates the `ReasoningEffort` routing modes the Phase 34 corrections layer supports. Operator-set in `ModelProfile.Corrections.ReasoningEffortRouting`.

const (
	// ReasoningRouteDefault passes the top-level
	// `req.ReasoningEffort` through to the driver unchanged.
	// Bifrost's `ChatReasoning.Effort` field consumes it.
	ReasoningRouteDefault ReasoningRouting = ""
	// ReasoningRouteThinking moves the effort hint from the
	// top-level field into `req.Extra["reasoning_effort"]`.
	// Thinking-class models (`o1`, `o3`, `deepseek-reasoner`)
	// interpret the hint via a provider-specific path that bifrost
	// passes through opaquely. The top-level field is cleared so the
	// regular reasoning channel is not used.
	ReasoningRouteThinking ReasoningRouting = "thinking_model"
)

type ResponseFormat ¶

type ResponseFormat struct {
	Kind       ResponseFormatKind
	JSONSchema json.RawMessage
}

ResponseFormat is the optional structured-output hint on `CompleteRequest`. `nil` means "plain text" (equivalent to `Kind: FormatText`).

Phase 35 owns the per-provider downgrade chain `json_schema → json_object → text` on `invalid_json_schema` errors; Phase 32 stores the field and the safety-net pass treats the JSON schema bytes as opaque metadata (no token-estimate contribution).

type ResponseFormatKind ¶

type ResponseFormatKind string

ResponseFormatKind discriminates a `ResponseFormat`.

const (
	// FormatText — no structured-output constraint. Default when
	// `CompleteRequest.ResponseFormat` is nil.
	FormatText ResponseFormatKind = "text"
	// FormatJSONObject — provider's "JSON mode" (free-form JSON).
	FormatJSONObject ResponseFormatKind = "json_object"
	// FormatJSONSchema — caller-supplied JSON Schema (strict mode
	// when the provider exposes it).
	FormatJSONSchema ResponseFormatKind = "json_schema"
)

type ResponseFormatProfile ¶

type ResponseFormatProfile string

ResponseFormatProfile enumerates the `response_format` envelope shapes the Phase 34 corrections layer can emit. Operator-set in `ModelProfile.Corrections.ResponseFormatShape`.

const (
	// ResponseFormatOpenAI emits the OpenAI envelope —
	// `{"type":"json_object"}` for `FormatJSONObject` and
	// `{"type":"json_schema","json_schema":{...}}` for
	// `FormatJSONSchema`. This is the default; bifrost's
	// `translateResponseFormat` already produces this shape, so a
	// `default`-profile model is a no-op in the corrections layer.
	ResponseFormatOpenAI ResponseFormatProfile = ""
	// ResponseFormatJSONOnly downgrades `FormatJSONSchema` to
	// `FormatJSONObject`. Used for providers that don't support
	// `json_schema` natively (e.g. some OpenRouter routes); the
	// schema is preserved as `Extra["schema_hint"]` so a prompted
	// fallback can reference it.
	ResponseFormatJSONOnly ResponseFormatProfile = "json_only"
	// ResponseFormatAnthropic packages the schema into Anthropic's
	// tool-schema-style envelope, surfaced in
	// `req.Extra["anthropic_tool_schema"]`. Phase 33's bifrost
	// driver passes `Extra` opaquely; the Anthropic provider
	// converter consumes the key (or future Phase 35 logic does).
	ResponseFormatAnthropic ResponseFormatProfile = "anthropic"
)

type RetryWithFeedbackPayload ¶

type RetryWithFeedbackPayload struct {
	events.SafeSealed
	Identity   identity.Quadruple
	Model      string
	Attempt    int
	MaxRetries int
	Reason     string
	OccurredAt time.Time
}

RetryWithFeedbackPayload (Phase 36) is the typed payload for EventTypeRetryWithFeedback. SafePayload — `Attempt` is the 1-based retry index (1 = first re-ask after the original); `Reason` is the validator's truncated `Error()` string. The wrapper truncates Reason at 256 characters to keep audit payloads bounded.

type Role ¶

type Role string

Role is the chat-message role. Settled at the four canonical values; `RoleTool` is the in-Harbor convention for the user-role rendering of tool observations (brief 07 §5 — the rendering itself happens at `ObservationRenderer`, not here; this constant exists so callers that construct an explicit user-message describing a tool result can label it for clarity).

const (
	RoleSystem    Role = "system"
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	// RoleTool — semantically a user-role observation; reserved so
	// downstream tooling (Console traces, audit logs) can distinguish.
	RoleTool Role = "tool"
)

The Role values for a chat message.

type SchemaSanitizationMode ¶

type SchemaSanitizationMode string

SchemaSanitizationMode enumerates the JSON-Schema-mutation modes the Phase 34 `SchemaSanitizer` supports. Operator-set in `ModelProfile.Corrections.SchemaMode`.

const (
	// SchemaDefault passes the operator-supplied schema through
	// unchanged.
	SchemaDefault SchemaSanitizationMode = ""
	// SchemaOpenAIStrict adds `additionalProperties:false` and
	// `strict:true` at every nested object schema. OpenAI's
	// structured-output mode requires both fields; most schemas
	// produced by `tools.RegisterFunc[I, O]` omit them.
	SchemaOpenAIStrict SchemaSanitizationMode = "openai_strict"
	// SchemaPermissive strips `additionalProperties` and `strict`
	// fields wherever they appear. Some providers reject those keys.
	SchemaPermissive SchemaSanitizationMode = "permissive"
)

type StubFetch ¶

type StubFetch struct {
	Tool string `json:"tool"`
	ID   string `json:"id"`
}

StubFetch is the optional pointer-to-tool hint on an `ArtifactStub`. When set, an LLM that wants the bytes knows which Harbor tool to call (and with which artifact ID).

type ToolCallStructured ¶ added in v1.2.0

type ToolCallStructured struct {
	ID    string
	Name  string
	Args  json.RawMessage
	Index uint16
}

ToolCallStructured is a provider-validated tool-call entry (Phase 107c / D-167). Carries the provider-assigned call ID (round-trips on `ChatMessage.ToolCallID` when the result is threaded back into the next turn), the tool name (matches `tools.Tool.Name`), and provider-validated JSON args.

`Index` is the per-response position of this tool call (0-based) and is the load-bearing discriminator for streaming-delta assembly: per the OpenAI streaming spec, tool-call args arrive across multiple SSE chunks. The first delta carries `ID + Name`; subsequent deltas for the SAME tool call carry empty ID + null Name and an args FRAGMENT to be concatenated onto the prior args. The drivers key on Index to merge fragments correctly; without it, providers like Amazon Bedrock (which streams args one short fragment at a time) produce a trajectory full of half-built ToolCalls. Defaults to 0 for non-streaming responses + tests; the driver layer is the source of truth.

type ToolDeclaration ¶ added in v1.2.0

type ToolDeclaration struct {
	Name        string
	Description string
	Schema      json.RawMessage
}

ToolDeclaration is the per-turn tool declarator the LLM sees (Phase 107c / D-167). Carries the tool name, operator-facing description, and the args JSON Schema.

type Usage ¶

type Usage struct {
	PromptTokens     int
	CompletionTokens int
	ReasoningTokens  int
	TotalTokens      int
	LatencyMS        int64
	// ProviderExtras — opaque provider-specific bag (e.g. cache
	// hit/miss). Phase 32 does not interpret these fields; Phase 34+
	// may read them for correction-layer decisions.
	ProviderExtras map[string]string
}

Usage is the provider-reported token usage.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
corrections Package corrections is Harbor's provider correction layer (Phase 34 — RFC §6.5).	Package corrections is Harbor's provider correction layer (Phase 34 — RFC §6.5).
drivers
bifrost Package bifrost is Harbor's bifrost-backed LLM driver.	Package bifrost is Harbor's bifrost-backed LLM driver.
mock Package mock is Harbor's test-grade LLM driver.	Package mock is Harbor's test-grade LLM driver.
output Package output is Harbor's structured-output strategy + downgrade chain (Phase 35 — RFC §6.5).	Package output is Harbor's structured-output strategy + downgrade chain (Phase 35 — RFC §6.5).
retry Package retry is Harbor's retry-with-feedback wrapper (Phase 36 — RFC §6.5).	Package retry is Harbor's retry-with-feedback wrapper (Phase 36 — RFC §6.5).
summarizer Package summarizer is the home of Harbor's production LLM-backed summarisation: the memory-subsystem `memory.Summarizer` (Phase 64, D-089) and the planner-trajectory `planner.Summariser` (Phase 111e, D-202).	Package summarizer is the home of Harbor's production LLM-backed summarisation: the memory-subsystem `memory.Summarizer` (Phase 64, D-089) and the planner-trajectory `planner.Summariser` (Phase 111e, D-202).

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL