Documentation
¶
Overview ¶
Package testutil contains shared helpers for agent-deck's test suites.
Performance regression API (this file) ======================================
All TestPerf_* tests use this small set of primitives. The design has three deliberate parts:
- Cold vs warm budget classification.
- n=11 trimmed-mean measurement.
- 1 ms budget floor.
── Cold vs warm ────────────────────────────────────────────────────────
The budget formula differs based on whether the test crosses a process or syscall boundary that introduces external variance the test code can't control:
COLD tests use ColdBudget(t, base) = max(base × 5, 1 ms) × multiplier. For: cold-start exec, real-disk fsync (Tier 2 — see docs/perf-budget-suite.md), child-process spawn, network round-trips. The 5× factor reflects that runner / loader / fsync variance can't be capped tighter without false positives.
WARM tests use WarmBudget(t, base) = max(base × 3, 1 ms) × multiplier. For: pure in-process Go work measurable under controlled GC. The tighter 3× factor is safe because TrimmedMeanWarm forces a GC cycle and disables auto-GC during each timed window — the largest noise source is eliminated. WARM tests run on tmpfs / in memory (Tier 1) so disk speed is irrelevant.
Both formulas are scaled by PERF_BUDGET_MULTIPLIER (default 1.0; CI sets 2.0 to absorb shared-runner variance). The effective CI gate is 10× local for cold, 6× local for warm.
── n=11 trimmed mean ───────────────────────────────────────────────────
TrimmedMean runs n=11 timed iterations, sorts, drops the top 2 and bottom 2 samples, and averages the middle 7 (a 36% trimmed mean).
Why n=11:
- Odd: median is well-defined as a fallback diagnostic.
- Cheap: 11 × ~10 ms typical = ~110 ms per test, well under the 60 s perf-suite timeout.
- Drop 2/2: handles one GC pause + one scheduler hiccup without losing too many samples. In practice, ~1 in 10 samples on a loaded host is an outlier; dropping 2 from each end is robust.
- Middle 7: variance of the mean scales as 1/√7 ≈ 0.38 — about 2× noise reduction vs a single sample, ~1.5× better than the median of 5 used previously.
Larger n (21, 51) was considered but rejected: marginal variance reduction (~30%) at 2–5× test cost. n=11 is the sweet spot.
── 1 ms floor ──────────────────────────────────────────────────────────
PerfBudgetFloor caps the minimum budget at 1 ms regardless of the caller's base × multiplier. Anything faster is either "just fast" (sub-millisecond timing is dominated by clock resolution + scheduler jitter) or a sign that the unit under test is too small to be a meaningful regression target. Move such tests to Benchmark* (Track A) instead of TestPerf_* (Track B).
Index ¶
- Constants
- func CleanGitEnv(base []string) []string
- func ColdBudget(t *testing.T, base time.Duration) time.Duration
- func IsolateTmuxSocket() func()
- func SkipIfShort(t *testing.T)
- func TrimmedMean(fn func()) time.Duration
- func TrimmedMeanWarm(fn func()) time.Duration
- func TrimmedMeanWithSetup(setup, op func()) time.Duration
- func UnsetGitRepoEnv()
- func WarmBudget(t *testing.T, base time.Duration) time.Duration
Constants ¶
const PerfBudgetFloor = 1 * time.Millisecond
PerfBudgetFloor is the minimum budget any TestPerf_* gate may apply. Sub-millisecond budgets are dominated by clock resolution and scheduler jitter, not by the unit under test. See package docblock for rationale.
const PerfBudgetMultiplierEnv = "PERF_BUDGET_MULTIPLIER"
PerfBudgetMultiplierEnv is the env var read by ColdBudget and WarmBudget. CI sets this to 2.0 to absorb shared-runner variance; developers on slow laptops can bump it locally. Default is 1.0.
const TestIsolationMarkerEnv = "AGENT_DECK_TEST_ISOLATED"
Name of the marker env var set during test isolation. Runtime guards in internal/tmux use this to detect a test context so they can panic loudly on an isolation leak instead of silently attacking the user's real tmux server.
Variables ¶
This section is empty.
Functions ¶
func CleanGitEnv ¶
CleanGitEnv returns a copy of base with git repository-routing env vars removed.
func ColdBudget ¶ added in v1.9.20
ColdBudget returns the budget for a COLD test: one that crosses a process or syscall boundary (cold-start exec, real-disk fsync, child-process spawn, network). Formula:
max(base * 5, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER
Pair with TrimmedMean(fn) for measurement.
func IsolateTmuxSocket ¶ added in v1.7.3
func IsolateTmuxSocket() func()
IsolateTmuxSocket makes it safe to spawn real tmux servers from tests even when `go test` is invoked from inside a live tmux session (the default on every developer host that uses agent-deck).
The helper does THREE things:
Unsets TMUX and TMUX_PANE. Tmux's client discovery order is: `$TMUX → -S path → -L name → $TMUX_TMPDIR`. If TMUX is set, every later step is ignored — so setting TMUX_TMPDIR alone provides zero isolation when the test process inherits TMUX from a parent tmux pane. This was the 2026-04-17 three-cascade bug: v1.7.3 set TMUX_TMPDIR but left TMUX set, so every test-spawned tmux session joined the user's real server and eventually destabilised it.
Sets TMUX_TMPDIR to a fresh per-call temp dir. Tests that use `-L <name>` or `$TMUX_TMPDIR`-derived sockets will land here, never at /tmp/tmux-<uid>/default.
Sets AGENT_DECK_TEST_ISOLATED=1. Production code paths in internal/tmux read this marker at tmux-spawn time and panic with a clear message if TMUX is still set and points to a non-isolated socket — the "make the failure loud, not silent" belt to the TMUX-unset suspender.
Call this from every package-level TestMain that transitively spawns tmux:
func TestMain(m *testing.M) {
cleanup := testutil.IsolateTmuxSocket()
defer cleanup()
os.Exit(m.Run())
}
Returns a cleanup function that removes the temp dir and restores the original TMUX / TMUX_PANE / AGENT_DECK_TEST_ISOLATED values so the parent process's env is not permanently altered.
func SkipIfShort ¶ added in v1.9.20
SkipIfShort skips the test when `go test -short` is in effect. TestPerf_* tests are expensive enough that contributors running quick unit-test loops shouldn't pay for them. CI always runs in long mode.
func TrimmedMean ¶ added in v1.9.20
TrimmedMean runs fn n=11 times (plus 1 warm-up), sorts the samples, drops the top 2 and bottom 2, and returns the average of the middle 7.
Use for COLD tests where GC manipulation in the parent process doesn't help (the timed work happens in a child process or kernel).
func TrimmedMeanWarm ¶ added in v1.9.20
TrimmedMeanWarm is TrimmedMean with controlled GC: forces a runtime.GC() before each timed iteration and disables auto-GC during the timed window via debug.SetGCPercent(-1), restoring the original setting on return.
Use for WARM tests measuring pure-Go in-process work, where GC pauses are the dominant noise source.
func TrimmedMeanWithSetup ¶ added in v1.9.20
TrimmedMeanWithSetup runs setup() before each timed op() but excludes setup from the timing window. Use when the timed primitive needs a fresh fixture per iteration (e.g. a populated tree before timing DeleteAll). Setup and op share state via closure capture in the caller.
COLD variant — no GC manipulation. For warm work needing setup, compose: temporarily disable GC yourself or call runtime.GC() inside setup before each timed op.
func UnsetGitRepoEnv ¶
func UnsetGitRepoEnv()
UnsetGitRepoEnv removes git repository-routing env vars from the current process. This prevents subprocess git commands from accidentally targeting the caller's repo.
func WarmBudget ¶ added in v1.9.20
WarmBudget returns the budget for a WARM test: pure in-process Go work that can be measured under controlled GC. Formula:
max(base * 3, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER
Pair with TrimmedMeanWarm(fn) — that variant forces a GC cycle and disables auto-GC around each timed iteration, eliminating the largest noise source.
No callers in this PR; exported to capture the cold/warm convention in code so future contributors classify their tests correctly.
Types ¶
This section is empty.
Directories
¶
| Path | Synopsis |
|---|---|
|
Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture).
|
Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture). |
|
Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress.
|
Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress. |
|
Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify).
|
Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify). |
|
Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr.
|
Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr. |
|
Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2).
|
Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2). |
|
Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir.
|
Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir. |
|
Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1):
|
Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1): |