testutil

package

v1.9.29 Latest Latest Go to latest Published: May 21, 2026 License: MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/asheshgoplani/agent-deck

Links

Documentation ¶

Overview ¶

Package testutil contains shared helpers for agent-deck's test suites.

Performance regression API (this file) ======================================

All TestPerf_* tests use this small set of primitives. The design has three deliberate parts:

Cold vs warm budget classification.
n=11 trimmed-mean measurement.
1 ms budget floor.

── Cold vs warm ────────────────────────────────────────────────────────

The budget formula differs based on whether the test crosses a process or syscall boundary that introduces external variance the test code can't control:

COLD tests use ColdBudget(t, base) = max(base × 5, 1 ms) × multiplier. For: cold-start exec, real-disk fsync (Tier 2 — see docs/perf-budget-suite.md), child-process spawn, network round-trips. The 5× factor reflects that runner / loader / fsync variance can't be capped tighter without false positives.
WARM tests use WarmBudget(t, base) = max(base × 3, 1 ms) × multiplier. For: pure in-process Go work measurable under controlled GC. The tighter 3× factor is safe because TrimmedMeanWarm forces a GC cycle and disables auto-GC during each timed window — the largest noise source is eliminated. WARM tests run on tmpfs / in memory (Tier 1) so disk speed is irrelevant.

Both formulas are scaled by PERF_BUDGET_MULTIPLIER (default 1.0; CI sets 2.0 to absorb shared-runner variance). The effective CI gate is 10× local for cold, 6× local for warm.

── n=11 trimmed mean ───────────────────────────────────────────────────

TrimmedMean runs n=11 timed iterations, sorts, drops the top 2 and bottom 2 samples, and averages the middle 7 (a 36% trimmed mean).

Why n=11:

Odd: median is well-defined as a fallback diagnostic.
Cheap: 11 × ~10 ms typical = ~110 ms per test, well under the 60 s perf-suite timeout.
Drop 2/2: handles one GC pause + one scheduler hiccup without losing too many samples. In practice, ~1 in 10 samples on a loaded host is an outlier; dropping 2 from each end is robust.
Middle 7: variance of the mean scales as 1/√7 ≈ 0.38 — about 2× noise reduction vs a single sample, ~1.5× better than the median of 5 used previously.

Larger n (21, 51) was considered but rejected: marginal variance reduction (~30%) at 2–5× test cost. n=11 is the sweet spot.

── 1 ms floor ──────────────────────────────────────────────────────────

PerfBudgetFloor caps the minimum budget at 1 ms regardless of the caller's base × multiplier. Anything faster is either "just fast" (sub-millisecond timing is dominated by clock resolution + scheduler jitter) or a sign that the unit under test is too small to be a meaningful regression target. Move such tests to Benchmark* (Track A) instead of TestPerf_* (Track B).

Index ¶

Constants
func CleanGitEnv(base []string) []string
func ColdBudget(t *testing.T, base time.Duration) time.Duration
func IsolateTmuxSocket() func()
func SkipIfShort(t *testing.T)
func TrimmedMean(fn func()) time.Duration
func TrimmedMeanWarm(fn func()) time.Duration
func TrimmedMeanWithSetup(setup, op func()) time.Duration
func UnsetGitRepoEnv()
func WarmBudget(t *testing.T, base time.Duration) time.Duration

Constants ¶

View Source

const PerfBudgetFloor = 1 * time.Millisecond

PerfBudgetFloor is the minimum budget any TestPerf_* gate may apply. Sub-millisecond budgets are dominated by clock resolution and scheduler jitter, not by the unit under test. See package docblock for rationale.

View Source

const PerfBudgetMultiplierEnv = "PERF_BUDGET_MULTIPLIER"

PerfBudgetMultiplierEnv is the env var read by ColdBudget and WarmBudget. CI sets this to 2.0 to absorb shared-runner variance; developers on slow laptops can bump it locally. Default is 1.0.

View Source

const TestIsolationMarkerEnv = "AGENT_DECK_TEST_ISOLATED"

Name of the marker env var set during test isolation. Runtime guards in internal/tmux use this to detect a test context so they can panic loudly on an isolation leak instead of silently attacking the user's real tmux server.

Variables ¶

This section is empty.

Functions ¶

func CleanGitEnv ¶

func CleanGitEnv(base []string) []string

CleanGitEnv returns a copy of base with git repository-routing env vars removed.

func ColdBudget ¶ added in v1.9.20

func ColdBudget(t *testing.T, base time.Duration) time.Duration

ColdBudget returns the budget for a COLD test: one that crosses a process or syscall boundary (cold-start exec, real-disk fsync, child-process spawn, network). Formula:

max(base * 5, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER

Pair with TrimmedMean(fn) for measurement.

func IsolateTmuxSocket ¶ added in v1.7.3

func IsolateTmuxSocket() func()

IsolateTmuxSocket makes it safe to spawn real tmux servers from tests even when `go test` is invoked from inside a live tmux session (the default on every developer host that uses agent-deck).

The helper does THREE things:

Unsets TMUX and TMUX_PANE. Tmux's client discovery order is: `$TMUX → -S path → -L name → $TMUX_TMPDIR`. If TMUX is set, every later step is ignored — so setting TMUX_TMPDIR alone provides zero isolation when the test process inherits TMUX from a parent tmux pane. This was the 2026-04-17 three-cascade bug: v1.7.3 set TMUX_TMPDIR but left TMUX set, so every test-spawned tmux session joined the user's real server and eventually destabilised it.
Sets TMUX_TMPDIR to a fresh per-call temp dir. Tests that use `-L <name>` or `$TMUX_TMPDIR`-derived sockets will land here, never at /tmp/tmux-<uid>/default.
Sets AGENT_DECK_TEST_ISOLATED=1. Production code paths in internal/tmux read this marker at tmux-spawn time and panic with a clear message if TMUX is still set and points to a non-isolated socket — the "make the failure loud, not silent" belt to the TMUX-unset suspender.

Call this from every package-level TestMain that transitively spawns tmux:

func TestMain(m *testing.M) {
    cleanup := testutil.IsolateTmuxSocket()
    defer cleanup()
    os.Exit(m.Run())
}

Returns a cleanup function that removes the temp dir and restores the original TMUX / TMUX_PANE / AGENT_DECK_TEST_ISOLATED values so the parent process's env is not permanently altered.

func SkipIfShort ¶ added in v1.9.20

func SkipIfShort(t *testing.T)

SkipIfShort skips the test when `go test -short` is in effect. TestPerf_* tests are expensive enough that contributors running quick unit-test loops shouldn't pay for them. CI always runs in long mode.

func TrimmedMean ¶ added in v1.9.20

func TrimmedMean(fn func()) time.Duration

TrimmedMean runs fn n=11 times (plus 1 warm-up), sorts the samples, drops the top 2 and bottom 2, and returns the average of the middle 7.

Use for COLD tests where GC manipulation in the parent process doesn't help (the timed work happens in a child process or kernel).

func TrimmedMeanWarm ¶ added in v1.9.20

func TrimmedMeanWarm(fn func()) time.Duration

TrimmedMeanWarm is TrimmedMean with controlled GC: forces a runtime.GC() before each timed iteration and disables auto-GC during the timed window via debug.SetGCPercent(-1), restoring the original setting on return.

Use for WARM tests measuring pure-Go in-process work, where GC pauses are the dominant noise source.

func TrimmedMeanWithSetup ¶ added in v1.9.20

func TrimmedMeanWithSetup(setup, op func()) time.Duration

TrimmedMeanWithSetup runs setup() before each timed op() but excludes setup from the timing window. Use when the timed primitive needs a fresh fixture per iteration (e.g. a populated tree before timing DeleteAll). Setup and op share state via closure capture in the caller.

COLD variant — no GC manipulation. For warm work needing setup, compose: temporarily disable GC yourself or call runtime.GC() inside setup before each timed op.

func UnsetGitRepoEnv ¶

func UnsetGitRepoEnv()

UnsetGitRepoEnv removes git repository-routing env vars from the current process. This prevents subprocess git commands from accidentally targeting the caller's repo.

func WarmBudget ¶ added in v1.9.20

func WarmBudget(t *testing.T, base time.Duration) time.Duration

WarmBudget returns the budget for a WARM test: pure in-process Go work that can be measured under controlled GC. Formula:

max(base * 3, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER

Pair with TrimmedMeanWarm(fn) — that variant forces a GC cycle and disables auto-GC around each timed iteration, eliminating the largest noise source.

No callers in this PR; exported to capture the cold/warm convention in code so future contributors classify their tests correctly.

Types ¶

This section is empty.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
crossfixture Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture).	Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture).
fakeclock Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress.	Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress.
fakeinotify Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify).	Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify).
logassert Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr.	Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr.
multiclienttmux Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2).	Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2).
profilefixture Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir.	Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir.
teatesthelper Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1):	Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1):

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL