conformance

package
v1.3.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2026 License: Apache-2.0 Imports: 19 Imported by: 0

Documentation

Overview

Package conformance ships the planner conformance pack.

Phase 42 landed the harness shape (the Harness struct + the Run(t, factory) entry point + the §13 import-graph lint test). Phase 49 fills in every scenario body — the top-prompt LLM-round-trip set, the malformed-LLM-output salvage path, the CallParallel atomicity check, the load-bearing wake-mode round-trip (D-032 — binding), the budget-aware finish, the pause-payload bounds, the steering drain-between-steps, and the D-025 concurrent-reuse surface.

The conformance pack is a shared test asset: every concrete `Planner` (Phase 45 ReAct, Phase 48 Deterministic, and every future concrete on the same iface) calls Run against the same scenarios. The pack itself never imports a concrete-planner package — the `internal/planner/conformance.TestImportGraph_PlannerDoesNotImportRuntime` lint test walks the planner subtree and would fail otherwise.

Per-concrete consumption pattern (Phase 49+):

func TestReact_Conformance(t *testing.T) {
    conformance.Run(t, func() conformance.Harness {
        return conformance.Harness{
            Factory: func() planner.Planner {
                return react.New(mock.New(mock.Options{
                    SyntheticContent: scenarioContent,
                }))
            },
            WakeMode:           planner.WakePush,
            RunContextFactory:  conformance.DefaultRunContext,
            Capabilities:       conformance.CapabilitySetLLM,
            ScenarioContentMap: conformance.DefaultReactContentMap(),
        }
    })
}

The harness factory pattern matches the events / tools / tasks conformance suites: each subtest gets a fresh planner instance so internal state can't bleed between scenarios. The harness factory's `Factory` closure returns a planner that is safe under D-025 concurrent reuse — the D-025 scenario runs N=64 concurrent Next calls against one shared instance.

Index

Constants

CapabilitySetDeterministic is the canonical capability set for Phase 48's deterministic planner. Distinct from ReAct: no LLM (Deterministic is programmatic), can emit Pause (via PauseStep), supports the wake-mode poll round-trip.

CapabilitySetReAct is the canonical capability set for Phase 45's LLM-driven ReAct planner.

Variables

This section is empty.

Functions

func DefaultRunContext

func DefaultRunContext() planner.RunContext

DefaultRunContext is a convenience factory the per-concrete tests can pass as `RunContextFactory`. Stamps a populated identity quadruple + a non-empty goal. Concretes that need extra fields (Trajectory, Catalog, etc.) typically build their own factory; this shape covers the Sanity + most scenario subtests.

func Run

func Run(t *testing.T, factoryFunc func() Harness)

Run executes the conformance pack against the planner produced by `factoryFunc`. Phase 49 fills every scenario; the Sanity skeleton scenarios from Phase 42 are preserved verbatim (subtest names are pinned). New scenarios use real drivers at the seam (§17.3 #1).

The factory is called once per subtest so per-scenario planner state can't bleed; the harness's `Cleanup`, when supplied, runs at subtest end.

func SecondStepContent

func SecondStepContent() string

SecondStepContent returns a canned `_finish` envelope used by the wake-mode round-trip scenario's post-resolve Next call. The ScenarioFactory for ReAct supplies a multi-response scripted mock (first response: SpawnTask emission; second: this Finish).

Types

type Capability

type Capability uint32

Capability flags declare which scenarios a concrete planner can execute. The pack honours capability gating so a non-LLM concrete (Deterministic) does not run an LLM-only scenario (e.g. `MalformedLLM_Salvage`).

A concrete planner's per-package conformance test passes a `Capabilities` value built from the constants below; the harness gates each scenario by inspecting the bitmask. A scenario that does NOT match the planner's capabilities reports `t.Skip(...)` with a reason — never a silent skip.

const (
	// CapabilityLLMDriven — the planner uses an LLM client and
	// participates in the LLM-round-trip + malformed-output scenarios.
	// Phase 45 ReAct sets this; Phase 48 Deterministic does not.
	CapabilityLLMDriven Capability = 1 << iota
	// CapabilityCanPause — the planner can emit `RequestPause` under
	// operator configuration; the pause-payload bounds scenario runs.
	// Deterministic sets this via its `PauseStep`; ReAct does not in
	// V1 (Phase 50 wires the planner-side emission path).
	CapabilityCanPause
	// CapabilityWakeRoundTrip — the planner is wired to consume the
	// wake-mode round-trip via real `tasks.TaskRegistry`. Both ReAct
	// (WakePush) and Deterministic (WakePoll) set this in V1.
	CapabilityWakeRoundTrip
	// CapabilityHonoursCancelControl — the planner returns
	// Finish{Cancelled} at the step boundary when
	// `rc.Control.Cancelled` is true. Every concrete in V1 sets this;
	// the cap exists so the steering-drain scenario can fail-loudly
	// if a future concrete forgets the contract.
	CapabilityHonoursCancelControl
)

Capability constants.

type Harness

type Harness struct {
	// Factory constructs a fresh planner instance per subtest. Used
	// by scenarios that do NOT need a scenario-specific planner
	// configuration (Sanity, WakeMode_Declared, Sealed_DecisionSum,
	// Steering_DrainBetweenSteps, ConcurrentReuse_D025). When
	// ScenarioFactory is non-nil, scenarios that need a tailored
	// planner consume it instead.
	Factory func() planner.Planner

	// ScenarioFactory, when non-nil, takes a ScenarioName and returns
	// a planner pre-configured for that scenario. Used by scenarios
	// like TopPrompts_LLMRoundTrip (ReAct needs a specific mock-LLM
	// envelope per scenario) and ParallelCall_Atomicity (Deterministic
	// needs a CallParallel-emitting step). Fallback when nil: the
	// scenario uses `Factory()`.
	ScenarioFactory PlannerFactoryFn

	// WakeMode is the wake mode the concrete declares (D-032). The
	// WakeMode_Declared scenario asserts the planner's
	// `ResolveWakeMode` agrees; the WakeMode_RoundTrip scenario
	// drives the corresponding round-trip path (push vs poll).
	WakeMode planner.WakeMode

	// RunContextFactory builds the minimal valid RunContext the
	// concrete needs. Required at Phase 49 — every concrete now
	// validates identity at Next boundary (§6 rule 9 + D-001).
	RunContextFactory func() planner.RunContext

	// Capabilities is the planner's declared capability set. The
	// pack uses the bitmask to gate scenarios — a scenario whose
	// required capability is absent skips with a reason.
	Capabilities Capability

	// TaskRegistryFactory, when non-nil, builds the real
	// `tasks.TaskRegistry` (production inprocess driver) the
	// WakeMode_RoundTrip scenario drives. The factory also wires a
	// real `events.EventBus` since the registry needs one (D-032 +
	// §17.3 #1 — no mocks at the seam).
	//
	// The pack ships a default factory (`DefaultTaskRegistryFactory`)
	// that opens an inmem bus + inprocess registry + inmem state
	// store; per-concrete tests typically set this to the default.
	TaskRegistryFactory func(t *testing.T) (*WakeRoundTripDeps, func())

	// PrebuiltPlannerFactory is the optional hook the
	// WakeMode_RoundTrip scenario consumes when the concrete planner
	// must be constructed AGAINST a pre-existing TaskRegistry (the
	// Deterministic planner binds its registry at construction time
	// via `deterministic.WithRegistry`). When nil, the scenario falls
	// back to the standard Factory and assumes the planner does NOT
	// need a pre-bound registry (ReAct's case — its emission path is
	// LLM-prompted, no registry binding at construction).
	PrebuiltPlannerFactory func(*WakeRoundTripDeps) planner.Planner

	// Cleanup is called at subtest end. Optional — typical for
	// planner concretes that hold lifecycle resources.
	Cleanup func()
}

Harness is the per-subtest fixture the harness Run loop consumes. Each conformance subtest invokes `factory()` once to obtain a fresh Harness with a fresh planner instance.

Compatibility with Phase 42: the original three fields (Factory, WakeMode, RunContextFactory, Cleanup) are unchanged. Phase 49 adds `ScenarioFactory`, `Capabilities`, `TaskRegistryFactory`, and the scenario-content factories at the bottom — additive only; existing per-concrete tests continue to compile.

type PlannerFactoryFn

type PlannerFactoryFn func(scenario ScenarioName) planner.Planner

PlannerFactoryFn is the factory shape the per-scenario hooks consume. The pack passes a `ScenarioName` so the factory can return a planner pre-configured for that scenario (e.g. ReAct with a mock LLM that emits the right envelope; Deterministic with a step set that emits the right Decision shape).

Factories MUST be safe to invoke multiple times across subtests — each invocation returns a fresh planner instance so internal state (atomic counters, sync.Map state) can't bleed between scenarios.

type ScenarioContentMap

type ScenarioContentMap map[ScenarioName]string

ScenarioContentMap maps a ScenarioName to the synthetic LLM content the harness asks the mock to emit for that scenario. ReAct per-concrete tests construct one via `DefaultReactContentMap` and pass it; the ScenarioFactory consumes the entry to build a fresh mock-LLM driver per subtest.

func DefaultReactContentMap

func DefaultReactContentMap() ScenarioContentMap

DefaultReactContentMap returns the conformance-pack's canned ReAct-side LLM responses keyed by scenario name. Per-concrete tests typically pass this map verbatim; operators with bespoke emission shapes can override individual entries.

The content envelope shapes mirror Phase 45's `DefaultSystemPrompt` — JSON-only, the reserved tool names (`_finish`, `_spawn_task`, `_await_task`), arrays for parallel fan-out.

type ScenarioName

type ScenarioName string

ScenarioName identifies one scenario in the pack. Stable across phases so per-concrete test reports remain comparable.

const (
	ScenarioTopPrompts        ScenarioName = "TopPrompts_LLMRoundTrip"
	ScenarioMalformedLLM      ScenarioName = "MalformedLLM_Salvage"
	ScenarioParallelAtomicity ScenarioName = "ParallelCall_Atomicity"
	ScenarioWakeRoundTrip     ScenarioName = "WakeMode_RoundTrip"
	ScenarioBudgetAware       ScenarioName = "BudgetAware_FinishDeadlineExceeded"
	ScenarioPauseBounds       ScenarioName = "PausePayload_BoundsRespected"
	ScenarioSteeringDrain     ScenarioName = "Steering_DrainBetweenSteps"
	ScenarioConcurrentReuse   ScenarioName = "ConcurrentReuse_D025"
)

Scenario names. Pinned strings — a rename would break per-concrete suites that may key on subtest names.

type WakeRoundTripDeps

type WakeRoundTripDeps struct {
	Bus      events.EventBus
	Registry tasks.TaskRegistry
	State    state.StateStore
}

WakeRoundTripDeps bundles the real drivers the wake-mode round-trip scenario consumes. Constructed by the harness's `TaskRegistryFactory`; torn down by the returned cleanup.

All fields are real production drivers (§17.3 #1 — no mocks at the seam): inmem `events.EventBus`, inprocess `tasks.TaskRegistry`, inmem `state.StateStore`. The wake-mode round-trip is the load-bearing D-032 scenario; mocks here would defeat its purpose.

func DefaultTaskRegistryFactory

func DefaultTaskRegistryFactory(t *testing.T) (*WakeRoundTripDeps, func())

DefaultTaskRegistryFactory is the harness-shipped factory that opens an inmem bus + inprocess task registry + inmem state store. Per-concrete tests can use it as-is or wrap it for additional instrumentation.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL