Documentation
¶
Overview ¶
Package harbortest is Harbor's public test kit — the ergonomic authoring surface for flow-level agent tests.
The kit exposes five entry points (RFC §6.13, brief 06 §3 "What the test kit gives authors"):
RunOnce(ctx, agent, input, deps...) (Output, *EventLog, error) runs an Agent once under a deterministic identity quadruple (or one the caller supplies) and returns every event the run emitted.
EventLog.RecordedEvents(runID) []events.Event returns the events for one RunID; EventLog.All() returns the full capture.
AssertSequence(t, log, want) verifies a list of EventTypes appears as an ordered subsequence of the captured log.
AssertNoLeaks(t, log) verifies no event tagged with one identity triple references a RunID owned by a different identity triple — the public, ergonomic version of Harbor's cross-tenant/session isolation contract (CLAUDE.md §6).
SimulateFailure(injector, toolName, class, n) schedules the next n invocations of a wrapped tool to fail with the given error class.
The kit composes REAL drivers — the production in-mem events bus, the patterns audit redactor, the canonical tool catalog. No stub LLM ships here; CLAUDE.md §13 ("Test stubs as production defaults on operator-facing seams") rules out shipping a stub-by-default kit, even on a test-only surface. The Agent the caller provides is whatever they want exercised — a planner, a flow, a hand-rolled function. Wiring a mock LLM for the Agent's interior is the test author's concern.
Concurrent reuse contract (CLAUDE.md §5, D-025). A constructed EventLog is concurrent-safe; RunOnce builds a fresh EventLog per invocation and is itself safe to call from N concurrent goroutines. The FaultInjector serialises access to its counter map so SimulateFailure can be called from any goroutine.
The package lives at the top level (not under internal/) because the kit is meant to be imported by consumer test code OUTSIDE this module — Go's `internal/` rule would forbid that. The CLAUDE.md §3 layout documents the new top-level harbortest/ directory; the same rationale (`golang.org/x/tools/go/analysis/analysistest` lives at a top-level path inside its module) drives Harbor's choice. The package name `harbortest` makes a production import grep-visible.
External usage (RFC §3.6 item 5, Phase 112b / D-206). The kit's parameter vocabulary is internal-typed, but every parameter type is re-exported as an alias by the public `sdk/` facade — an alias IS the internal type, so external modules satisfy the full surface:
- Deps.Bus: open one via sdk/events.Open (redactor from sdk/audit.Open; blank-import sdk/drivers/prod for the drivers).
- Deps.Redactor: sdk/audit.Open(ctx, config.AuditConfig{Driver: "patterns"}).
- Deps.Identity: &identity.Identity{...} via sdk/identity.
- AssertSequence's []events.EventType: sdk/events.EventType values (register custom types via sdk/events.RegisterEventType).
- NewFaultInjector's tools.ToolCatalog: sdk/tools.NewCatalog (+ sdk/tools/inproc.RegisterFunc); SimulateFailure's ErrorClass values are re-exported on sdk/tools.
An external Agent under RunOnce reads its identity via sdk/identity.MustQuadrupleFrom(ctx) and publishes events via sdk/events.MustFrom(ctx) — the captured EventLog observes them like any in-module producer. scripts/smoke/phase-112b.sh runs exactly this shape as an external module on every preflight.
Index ¶
- Variables
- func AssertNoLeaks(t TestingT, log *EventLog) bool
- func AssertSequence(t TestingT, log *EventLog, want []events.EventType) bool
- func SimulateFailure(f *FaultInjector, toolName string, class tools.ErrorClass, n int)
- type Agent
- type AgentFunc
- type Deps
- type EventLog
- type FaultInjector
- type TestingT
Constants ¶
This section is empty.
Variables ¶
var ( // ErrNilAgent — RunOnce was called with a nil Agent. ErrNilAgent = errors.New("harbortest: RunOnce called with nil Agent") // ErrStackConstruction — a required component (bus, redactor) // could not be constructed. The wrapped error names the failing // component. RunOnce fails loudly per CLAUDE.md §5. ErrStackConstruction = errors.New("harbortest: failed to construct test stack") )
Sentinel errors. Callers compare via errors.Is.
var ErrSimulatedFailure = errors.New("harbortest: simulated tool failure")
ErrSimulatedFailure is the sentinel a SimulateFailure-triggered error wraps. Callers compare via errors.Is to distinguish kit-injected failures from genuine tool errors.
Functions ¶
func AssertNoLeaks ¶
AssertNoLeaks verifies the cross-tenant / cross-session isolation contract over the captured EventLog. Specifically:
- Group the events by their identity triple (TenantID, UserID, SessionID).
- Collect the set of RunIDs each triple owns.
- For every event, the event's RunID MUST be owned by the event's own identity triple — an event tagged with triple A whose RunID belongs to triple B is a leak (run-id cross-talk).
- Additionally, the event's payload MAY embed an identity quadruple (RFC §6.4 tool payloads, planner payloads); if it does, the embedded triple MUST match the event's outer identity triple. A payload triple from a different identity is also a leak (payload cross-talk — the worst kind, since it means the producer captured a foreign identity).
The kit's RunOnce subscribes with Admin scope so the log naturally observes events across identity triples — that's the source of data this assertion analyses. Test authors that share a single Deps.Bus across multiple RunOnce invocations get an automatic regression test for run-isolation by piping the union log through AssertNoLeaks at the end of the test.
Returns true on success; on failure calls t.Errorf naming the offending event(s) and their identity context, then returns false.
func AssertSequence ¶
AssertSequence verifies that `want` appears as an ordered subsequence of the captured event types in log. The captured log may contain ADDITIONAL events between matches — only the order of `want` is checked. This is the right semantic for flow-level tests where bus-internal events (audit.admin_scope_used, bus.dropped, etc.) may interleave with the agent's emits and the test author cares only about the sequence of meaningful types.
Returns true on a successful match. On failure, calls t.Errorf naming the first missing want entry and listing the captured event-type sequence so the diff is actionable, then returns false.
Empty `want` always matches (vacuously). A captured log shorter than `want` always fails.
func SimulateFailure ¶
func SimulateFailure(f *FaultInjector, toolName string, class tools.ErrorClass, n int)
SimulateFailure schedules the next n invocations of toolName to fail with the given error class. Subsequent invocations (after the n failures pop) resume normal behaviour. Calling SimulateFailure twice on the same tool stacks the counters in FIFO order: a (transient, 2) then a (permanent, 1) yields two transient failures followed by one permanent failure followed by success.
n must be positive; calls with n <= 0 are silent no-ops (the caller intent is unclear and we choose the conservative reading).
Per the test-only nature of this surface, SimulateFailure does NOT validate that toolName is a registered tool — the test author may schedule failures on a tool name they're about to register. A tool name with no scheduled failures behaves identically to a tool never wrapped at all.
Types ¶
type Agent ¶
type Agent interface {
// Run executes the agent's logic against input and returns the
// produced output (or an error). The captured EventLog is built
// from the events the agent's interior publishes against the
// kit's event bus — the Agent does not return the log directly.
Run(ctx context.Context, input any) (output any, err error)
}
Agent is the unit of code a test author exercises via RunOnce. Implementations are typically thin wrappers around the test author's production code path — a planner step, a flow, a hand-rolled function — anything that consumes a Harbor identity context and produces an output value.
The signature is intentionally narrower than the engine's full runtime surface: most test authors do not own engine graphs and should not be forced to construct them just to exercise a tool call. Agents that DO want full runtime semantics can construct an engine internally and Run it inside the Agent body — the kit's identity context flows through.
Identity propagation. The ctx passed to Run carries the identity quadruple (TenantID, UserID, SessionID, RunID) via the internal/identity helpers; Agents that need it use identity.MustQuadrupleFrom(ctx). Tool drivers and bus publishers automatically attach the triple to events emitted during the run.
type AgentFunc ¶
AgentFunc adapts a plain function into an Agent. Use this when the test author's code is a top-level function rather than a method on a typed receiver.
type Deps ¶
type Deps struct {
// Bus is the canonical EventBus the kit subscribes to. When nil,
// RunOnce opens a fresh in-mem bus AND closes it before
// returning. When non-nil, the caller owns the bus's lifecycle.
Bus events.EventBus
// Redactor is the audit redactor RunOnce uses if it has to open
// its own bus (Bus == nil). When Bus is non-nil this field is
// ignored — the bus already has its redactor configured.
Redactor audit.Redactor
// Identity overrides the deterministic canonical identity. When
// nil, RunOnce uses the canonical "harbortest" triple.
Identity *identity.Identity
// RunID overrides the auto-generated RunID. When empty, RunOnce
// synthesises a fresh ULID-ish RunID via runIDFromCounter so
// concurrent invocations don't collide.
RunID string
}
Deps carries optional dependencies for RunOnce. The zero-value Deps causes RunOnce to synthesise deterministic defaults (canonical identity triple, fresh in-mem bus + redactor, fresh RunID). Callers that need cross-RunOnce coordination — e.g. a shared bus that captures events from multiple agents — pass an explicit Deps with the shared component set.
Identity and RunID. When Identity is nil the kit uses a canonical "harbortest" triple (TenantID="harbortest", UserID="harbortest", SessionID="harbortest"). A nil Identity AND a non-empty RunID is honoured — the RunID is paired with the canonical identity. When the caller supplies an Identity, it must validate (non-empty triple) or RunOnce returns a wrapped identity.ErrIdentityIncomplete.
type EventLog ¶
type EventLog struct {
// contains filtered or unexported fields
}
EventLog is the captured event stream from one RunOnce invocation (or, when the caller shares a Deps.Bus across runs, the union of every event published during the lifetime of the captured subscription). Events are appended in the order they arrive from the bus subscription channel — Harbor's bus assigns monotonic Sequence numbers, so the slice order matches Sequence order within a single bus.
Concurrent reuse contract (D-025). EventLog serialises appends behind an internal mutex; All() returns a defensive copy so callers may iterate while the producer side is still publishing. Readers and writers may run concurrently.
func RunOnce ¶
RunOnce executes agent.Run under a deterministic identity quadruple (or one the caller supplies via deps), captures every event the run emits onto the kit's event bus, and returns the (Output, EventLog, error) triple.
The captured EventLog is built by subscribing to the bus with an Admin filter — that's the only way the kit can observe events across identity triples, which is what AssertNoLeaks needs to do its job. The Admin subscription causes the bus to emit one audit.admin_scope_used event (which itself appears in the log, since the subscription IS the admin caller); test authors should expect to see that event present alongside their agent's emits.
Stack construction. When deps.Bus is nil, RunOnce opens a fresh in-mem bus AND closes it before returning. Construction errors fail loudly (CLAUDE.md §5): missing components are surfaced via ErrStackConstruction with the failing component named.
Identity. RunOnce calls identity.WithRun on the supplied (or canonical) identity + RunID and passes the resulting ctx to agent.Run. The bus subscription closes BEFORE RunOnce returns so subscription goroutines do not leak past the call.
Concurrent reuse (D-025). RunOnce is safe to call from N concurrent goroutines. Each invocation builds its own EventLog and its own subscription; the package-level runCounter ensures RunIDs do not collide even when concurrent callers omit RunID. A caller sharing a Deps.Bus across goroutines is responsible for the bus's lifetime; RunOnce never closes a caller-supplied bus.
func (*EventLog) All ¶
All returns a defensive copy of the captured events in arrival order. The slice is safe to retain across goroutines; mutating the slice does not affect the EventLog.
func (*EventLog) Len ¶
Len returns the number of captured events. Equivalent to len(l.All()) but avoids the defensive copy when callers only need the count.
func (*EventLog) RecordedEvents ¶
RecordedEvents returns the subset of captured events whose Identity.RunID equals runID, in arrival order. Returns an empty slice (never nil) when no event matches — callers iterate without a nil-check.
This is the test author's hook into "what did THIS run emit?" when one EventLog spans multiple runs (the typical case when a test author shares a Deps.Bus across several RunOnce invocations to verify isolation behaviour).
type FaultInjector ¶
type FaultInjector struct {
// contains filtered or unexported fields
}
FaultInjector wraps a tools.ToolCatalog so the kit can schedule per-tool failures before exercising an Agent. The injector is transparent on the read path — Resolve returns a ToolDescriptor whose Invoke closure pops a counter on each call; when the counter is exhausted the original descriptor's Invoke runs unmodified.
Concurrent reuse (D-025). The injector serialises access to its counter map behind a mutex; counter updates are atomic per tool. Resolve / List are safe for N concurrent goroutines.
func NewFaultInjector ¶
func NewFaultInjector(cat tools.ToolCatalog) *FaultInjector
NewFaultInjector wraps cat. Subsequent calls to inj.Catalog() return a tools.ToolCatalog that participates in the kit's failure-injection mechanism. The wrapped catalog's Register + List + Resolve all defer to cat; only Resolve adds the injection-counter wrapper around the returned descriptor's Invoke.
NewFaultInjector panics with a clear message if cat is nil — a nil catalog at the kit boundary is a test-author bug, not a production fail-loud concern (CLAUDE.md §5 forbids panic in production code; this surface is test-only).
func (*FaultInjector) Catalog ¶
func (f *FaultInjector) Catalog() tools.ToolCatalog
Catalog returns the wrapped ToolCatalog. Pass this to the Agent's interior (typically via tools.WithCatalog on the run ctx) so the injector intercepts every tool resolution.
type TestingT ¶
type TestingT interface {
// Helper marks the calling function as a helper so the test
// runtime's failure reporting points at the call site, not at
// the helper internals.
Helper()
// Errorf reports a non-fatal test failure. The assertion
// helpers in this kit use Errorf (not Fatalf) so multiple
// failures in one test surface together — this matches the
// convention every Harbor integration test uses.
Errorf(format string, args ...any)
}
TestingT is the subset of *testing.T the kit's assertions need. Mirroring the standard testing.TB-ish shape lets callers pass either *testing.T or *testing.B; the kit's own self-tests use *testing.T directly.
The kit deliberately avoids the full testing.TB interface because TB is sealed by the Go standard library — callers that want to drive the assertions from non-test code (e.g. a programmatic verification harness) can implement TestingT themselves with their own Helper / Errorf semantics.