Documentation
¶
Overview ¶
Package harness provides a behavioral evaluator sandbox for agent-deck.
The harness builds the agent-deck binary once per test package, then spawns it under a scratch HOME with an isolated tmux socket and stub shims on PATH (gh, etc.). Tests drive the binary via PTY or CLI args and assert on what a human would see — stdout timing, tmux display state, recorded shim calls.
See docs/rfc/EVALUATOR_HARNESS.md for the motivating bugs and design.
Index ¶
- type GhCall
- type GhShim
- type PTYSession
- func (p *PTYSession) Close()
- func (p *PTYSession) Dump() string
- func (p *PTYSession) ExpectExit(wantCode int, timeout time.Duration)
- func (p *PTYSession) ExpectOutput(want string, timeout time.Duration) int
- func (p *PTYSession) ExpectOutputBefore(want, before string, timeout time.Duration)
- func (p *PTYSession) Output() string
- func (p *PTYSession) Send(s string)
- type Sandbox
- func (s *Sandbox) Env() []string
- func (s *Sandbox) InstallTmuxShim(t *testing.T)
- func (s *Sandbox) Spawn(args ...string) *PTYSession
- func (s *Sandbox) SpawnWithEnv(extraEnv []string, args ...string) *PTYSession
- func (s *Sandbox) Tmux(args ...string) string
- func (s *Sandbox) TmuxSocket() string
- func (s *Sandbox) TmuxTry(args ...string) (string, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type GhShim ¶
type GhShim struct {
// contains filtered or unexported fields
}
GhShim is a recorder + scriptable stub installed at $ShimDir/gh. Tests configure ScriptSuccess / ScriptFailure before the binary runs, then inspect recorded calls after.
func (*GhShim) AssertCalled ¶
AssertCalled fails if no call matches all substrs. Returns the first match.
func (*GhShim) AssertNotCalled ¶
AssertNotCalled fails if any call matches all substrs. Empty substrs means "never called at all".
func (*GhShim) ScriptFailure ¶
func (g *GhShim) ScriptFailure()
ScriptFailure sets the shim to exit 1 with a scripted stderr line.
func (*GhShim) ScriptSuccess ¶
func (g *GhShim) ScriptSuccess()
ScriptSuccess sets the shim to exit 0 on every call (default).
type PTYSession ¶
type PTYSession struct {
// contains filtered or unexported fields
}
PTYSession is a running process attached to a pseudo-terminal. Output streams through the PTY (not an io.Pipe) so line-buffered stdout behaves exactly as it would in a real user's terminal — which is the entire point of this harness (see Bug 1 in the evaluator-harness RFC).
func (*PTYSession) Close ¶
func (p *PTYSession) Close()
Close tears down the PTY session. Idempotent.
func (*PTYSession) Dump ¶
func (p *PTYSession) Dump() string
Dump is a helper for `t.Logf(p.Dump())` when a test is failing.
func (*PTYSession) ExpectExit ¶
func (p *PTYSession) ExpectExit(wantCode int, timeout time.Duration)
ExpectExit waits up to timeout for the child to exit and asserts on the exit code.
func (*PTYSession) ExpectOutput ¶
func (p *PTYSession) ExpectOutput(want string, timeout time.Duration) int
ExpectOutput waits up to timeout for want to appear in the PTY output and returns the index of the first match. Fails the test on timeout.
func (*PTYSession) ExpectOutputBefore ¶
func (p *PTYSession) ExpectOutputBefore(want, before string, timeout time.Duration)
ExpectOutputBefore waits up to timeout for want to appear, and asserts that want appears strictly before before in the output stream. If before has already appeared at the moment want arrives, or want never arrives, the test fails with a diagnostic dump.
This is the core assertion for Bug 1 (strings.Builder buffering hid the disclosure behind a stdin read): in production, "posted PUBLICLY" must arrive before "y/N" does.
func (*PTYSession) Output ¶
func (p *PTYSession) Output() string
Output returns a snapshot of everything read from the PTY so far.
func (*PTYSession) Send ¶
func (p *PTYSession) Send(s string)
Send writes s to the PTY master (i.e. as if the user typed it).
type Sandbox ¶
type Sandbox struct {
// Home is a scratch directory used for HOME, XDG_CONFIG_HOME, and
// agent-deck's state dir. Cleaned up on test exit.
Home string
// BinPath is the full path to the agent-deck binary built for this
// package (shared across tests in the package via buildOnce).
BinPath string
// ShimDir is on PATH ahead of /usr/bin, housing executable shims
// (gh, etc.). Tests mutate its contents before spawning.
ShimDir string
// GhShim records calls to the gh stub and scripts its exit behavior.
GhShim *GhShim
// contains filtered or unexported fields
}
Sandbox is a per-test isolated environment: scratch HOME, dedicated tmux socket, stub shims on PATH. All state is torn down by t.Cleanup.
func NewSandbox ¶
NewSandbox returns a Sandbox bound to t. All resources are released via t.Cleanup. Safe to call from a parallel test.
func (*Sandbox) Env ¶
Env returns the base environment vector used when spawning the binary. Tests can extend this slice before passing it to exec.Cmd.
func (*Sandbox) InstallTmuxShim ¶
InstallTmuxShim drops a `tmux` wrapper into ShimDir that transparently forces every invocation through the sandbox's per-test socket via `-S`. Without this the binary's internal `exec.Command("tmux", ...)` calls hit the user's default tmux server, poisoning (and being poisoned by) real sessions.
The shim discovers the real tmux lazily from the first PATH entry that is NOT ShimDir — so the child of the shim finds /usr/bin/tmux even though the shim itself is named `tmux`.
Call this from tests that actually start tmux sessions.
func (*Sandbox) Spawn ¶
func (s *Sandbox) Spawn(args ...string) *PTYSession
Spawn starts the agent-deck binary with args attached to a PTY. The caller is responsible for calling Close() — tests typically `defer p.Close()`.
func (*Sandbox) SpawnWithEnv ¶ added in v1.7.56
func (s *Sandbox) SpawnWithEnv(extraEnv []string, args ...string) *PTYSession
SpawnWithEnv is like Spawn but overlays extraEnv on top of Sandbox.Env(). Each entry in extraEnv is a "KEY=VALUE" string; it replaces any earlier entry with the same key, matching the convention of exec.Cmd.Env. Use this when a test needs real terminal capabilities (TERM=xterm-256color) — the default Env sets TERM=dumb to keep termenv probes quiet.
func (*Sandbox) Tmux ¶
Tmux runs `tmux -S <socket> <args...>` and returns trimmed stdout. The first failure fails the test.
func (*Sandbox) TmuxSocket ¶
TmuxSocket returns the path to this sandbox's tmux socket. Lazy-created.