harness

package
v1.9.30 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 22, 2026 License: MIT Imports: 12 Imported by: 0

Documentation

Overview

Package harness provides a behavioral evaluator sandbox for agent-deck.

The harness builds the agent-deck binary once per test package, then spawns it under a scratch HOME with an isolated tmux socket and stub shims on PATH (gh, etc.). Tests drive the binary via PTY or CLI args and assert on what a human would see — stdout timing, tmux display state, recorded shim calls.

See docs/rfc/EVALUATOR_HARNESS.md for the motivating bugs and design.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GhCall

type GhCall struct {
	Args  []string `json:"args"`
	Stdin string   `json:"stdin"`
}

GhCall is one recorded invocation of the gh shim.

type GhShim

type GhShim struct {
	// contains filtered or unexported fields
}

GhShim is a recorder + scriptable stub installed at $ShimDir/gh. Tests configure ScriptSuccess / ScriptFailure before the binary runs, then inspect recorded calls after.

func (*GhShim) AssertCalled

func (g *GhShim) AssertCalled(substrs ...string) GhCall

AssertCalled fails if no call matches all substrs. Returns the first match.

func (*GhShim) AssertNotCalled

func (g *GhShim) AssertNotCalled(substrs ...string)

AssertNotCalled fails if any call matches all substrs. Empty substrs means "never called at all".

func (*GhShim) Calls

func (g *GhShim) Calls() []GhCall

Calls returns a snapshot of recorded gh invocations, in order.

func (*GhShim) CallsWith

func (g *GhShim) CallsWith(substrs ...string) []GhCall

CallsWith returns only the calls whose argv contains all of substrs.

func (*GhShim) ScriptFailure

func (g *GhShim) ScriptFailure()

ScriptFailure sets the shim to exit 1 with a scripted stderr line.

func (*GhShim) ScriptSuccess

func (g *GhShim) ScriptSuccess()

ScriptSuccess sets the shim to exit 0 on every call (default).

type PTYSession

type PTYSession struct {
	// contains filtered or unexported fields
}

PTYSession is a running process attached to a pseudo-terminal. Output streams through the PTY (not an io.Pipe) so line-buffered stdout behaves exactly as it would in a real user's terminal — which is the entire point of this harness (see Bug 1 in the evaluator-harness RFC).

func (*PTYSession) Close

func (p *PTYSession) Close()

Close tears down the PTY session. Idempotent.

func (*PTYSession) Dump

func (p *PTYSession) Dump() string

Dump is a helper for `t.Logf(p.Dump())` when a test is failing.

func (*PTYSession) ExpectExit

func (p *PTYSession) ExpectExit(wantCode int, timeout time.Duration)

ExpectExit waits up to timeout for the child to exit and asserts on the exit code.

func (*PTYSession) ExpectOutput

func (p *PTYSession) ExpectOutput(want string, timeout time.Duration) int

ExpectOutput waits up to timeout for want to appear in the PTY output and returns the index of the first match. Fails the test on timeout.

func (*PTYSession) ExpectOutputBefore

func (p *PTYSession) ExpectOutputBefore(want, before string, timeout time.Duration)

ExpectOutputBefore waits up to timeout for want to appear, and asserts that want appears strictly before before in the output stream. If before has already appeared at the moment want arrives, or want never arrives, the test fails with a diagnostic dump.

This is the core assertion for Bug 1 (strings.Builder buffering hid the disclosure behind a stdin read): in production, "posted PUBLICLY" must arrive before "y/N" does.

func (*PTYSession) Output

func (p *PTYSession) Output() string

Output returns a snapshot of everything read from the PTY so far.

func (*PTYSession) Send

func (p *PTYSession) Send(s string)

Send writes s to the PTY master (i.e. as if the user typed it).

type Sandbox

type Sandbox struct {

	// Home is a scratch directory used for HOME, XDG_CONFIG_HOME, and
	// agent-deck's state dir. Cleaned up on test exit.
	Home string

	// BinPath is the full path to the agent-deck binary built for this
	// package (shared across tests in the package via buildOnce).
	BinPath string

	// ShimDir is on PATH ahead of /usr/bin, housing executable shims
	// (gh, etc.). Tests mutate its contents before spawning.
	ShimDir string

	// GhShim records calls to the gh stub and scripts its exit behavior.
	GhShim *GhShim
	// contains filtered or unexported fields
}

Sandbox is a per-test isolated environment: scratch HOME, dedicated tmux socket, stub shims on PATH. All state is torn down by t.Cleanup.

func NewSandbox

func NewSandbox(t *testing.T) *Sandbox

NewSandbox returns a Sandbox bound to t. All resources are released via t.Cleanup. Safe to call from a parallel test.

func (*Sandbox) Env

func (s *Sandbox) Env() []string

Env returns the base environment vector used when spawning the binary. Tests can extend this slice before passing it to exec.Cmd.

func (*Sandbox) InstallTmuxShim

func (s *Sandbox) InstallTmuxShim(t *testing.T)

InstallTmuxShim drops a `tmux` wrapper into ShimDir that transparently forces every invocation through the sandbox's per-test socket via `-S`. Without this the binary's internal `exec.Command("tmux", ...)` calls hit the user's default tmux server, poisoning (and being poisoned by) real sessions.

The shim discovers the real tmux lazily from the first PATH entry that is NOT ShimDir — so the child of the shim finds /usr/bin/tmux even though the shim itself is named `tmux`.

Call this from tests that actually start tmux sessions.

func (*Sandbox) Spawn

func (s *Sandbox) Spawn(args ...string) *PTYSession

Spawn starts the agent-deck binary with args attached to a PTY. The caller is responsible for calling Close() — tests typically `defer p.Close()`.

func (*Sandbox) SpawnWithEnv added in v1.7.56

func (s *Sandbox) SpawnWithEnv(extraEnv []string, args ...string) *PTYSession

SpawnWithEnv is like Spawn but overlays extraEnv on top of Sandbox.Env(). Each entry in extraEnv is a "KEY=VALUE" string; it replaces any earlier entry with the same key, matching the convention of exec.Cmd.Env. Use this when a test needs real terminal capabilities (TERM=xterm-256color) — the default Env sets TERM=dumb to keep termenv probes quiet.

func (*Sandbox) Tmux

func (s *Sandbox) Tmux(args ...string) string

Tmux runs `tmux -S <socket> <args...>` and returns trimmed stdout. The first failure fails the test.

func (*Sandbox) TmuxSocket

func (s *Sandbox) TmuxSocket() string

TmuxSocket returns the path to this sandbox's tmux socket. Lazy-created.

func (*Sandbox) TmuxTry

func (s *Sandbox) TmuxTry(args ...string) (string, error)

TmuxTry runs `tmux -S <socket> <args...>` and returns stdout+err without failing the test. Useful when the caller wants to tolerate a non-zero exit (e.g. kill-server when no server is running).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL