testutil

package
v1.9.29 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 21, 2026 License: MIT Imports: 9 Imported by: 0

Documentation

Overview

Package testutil contains shared helpers for agent-deck's test suites.

Performance regression API (this file) ======================================

All TestPerf_* tests use this small set of primitives. The design has three deliberate parts:

  1. Cold vs warm budget classification.
  2. n=11 trimmed-mean measurement.
  3. 1 ms budget floor.

── Cold vs warm ────────────────────────────────────────────────────────

The budget formula differs based on whether the test crosses a process or syscall boundary that introduces external variance the test code can't control:

  • COLD tests use ColdBudget(t, base) = max(base × 5, 1 ms) × multiplier. For: cold-start exec, real-disk fsync (Tier 2 — see docs/perf-budget-suite.md), child-process spawn, network round-trips. The 5× factor reflects that runner / loader / fsync variance can't be capped tighter without false positives.

  • WARM tests use WarmBudget(t, base) = max(base × 3, 1 ms) × multiplier. For: pure in-process Go work measurable under controlled GC. The tighter 3× factor is safe because TrimmedMeanWarm forces a GC cycle and disables auto-GC during each timed window — the largest noise source is eliminated. WARM tests run on tmpfs / in memory (Tier 1) so disk speed is irrelevant.

Both formulas are scaled by PERF_BUDGET_MULTIPLIER (default 1.0; CI sets 2.0 to absorb shared-runner variance). The effective CI gate is 10× local for cold, 6× local for warm.

── n=11 trimmed mean ───────────────────────────────────────────────────

TrimmedMean runs n=11 timed iterations, sorts, drops the top 2 and bottom 2 samples, and averages the middle 7 (a 36% trimmed mean).

Why n=11:

  • Odd: median is well-defined as a fallback diagnostic.
  • Cheap: 11 × ~10 ms typical = ~110 ms per test, well under the 60 s perf-suite timeout.
  • Drop 2/2: handles one GC pause + one scheduler hiccup without losing too many samples. In practice, ~1 in 10 samples on a loaded host is an outlier; dropping 2 from each end is robust.
  • Middle 7: variance of the mean scales as 1/√7 ≈ 0.38 — about 2× noise reduction vs a single sample, ~1.5× better than the median of 5 used previously.

Larger n (21, 51) was considered but rejected: marginal variance reduction (~30%) at 2–5× test cost. n=11 is the sweet spot.

── 1 ms floor ──────────────────────────────────────────────────────────

PerfBudgetFloor caps the minimum budget at 1 ms regardless of the caller's base × multiplier. Anything faster is either "just fast" (sub-millisecond timing is dominated by clock resolution + scheduler jitter) or a sign that the unit under test is too small to be a meaningful regression target. Move such tests to Benchmark* (Track A) instead of TestPerf_* (Track B).

Index

Constants

View Source
const PerfBudgetFloor = 1 * time.Millisecond

PerfBudgetFloor is the minimum budget any TestPerf_* gate may apply. Sub-millisecond budgets are dominated by clock resolution and scheduler jitter, not by the unit under test. See package docblock for rationale.

View Source
const PerfBudgetMultiplierEnv = "PERF_BUDGET_MULTIPLIER"

PerfBudgetMultiplierEnv is the env var read by ColdBudget and WarmBudget. CI sets this to 2.0 to absorb shared-runner variance; developers on slow laptops can bump it locally. Default is 1.0.

View Source
const TestIsolationMarkerEnv = "AGENT_DECK_TEST_ISOLATED"

Name of the marker env var set during test isolation. Runtime guards in internal/tmux use this to detect a test context so they can panic loudly on an isolation leak instead of silently attacking the user's real tmux server.

Variables

This section is empty.

Functions

func CleanGitEnv

func CleanGitEnv(base []string) []string

CleanGitEnv returns a copy of base with git repository-routing env vars removed.

func ColdBudget added in v1.9.20

func ColdBudget(t *testing.T, base time.Duration) time.Duration

ColdBudget returns the budget for a COLD test: one that crosses a process or syscall boundary (cold-start exec, real-disk fsync, child-process spawn, network). Formula:

max(base * 5, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER

Pair with TrimmedMean(fn) for measurement.

func IsolateTmuxSocket added in v1.7.3

func IsolateTmuxSocket() func()

IsolateTmuxSocket makes it safe to spawn real tmux servers from tests even when `go test` is invoked from inside a live tmux session (the default on every developer host that uses agent-deck).

The helper does THREE things:

  1. Unsets TMUX and TMUX_PANE. Tmux's client discovery order is: `$TMUX → -S path → -L name → $TMUX_TMPDIR`. If TMUX is set, every later step is ignored — so setting TMUX_TMPDIR alone provides zero isolation when the test process inherits TMUX from a parent tmux pane. This was the 2026-04-17 three-cascade bug: v1.7.3 set TMUX_TMPDIR but left TMUX set, so every test-spawned tmux session joined the user's real server and eventually destabilised it.

  2. Sets TMUX_TMPDIR to a fresh per-call temp dir. Tests that use `-L <name>` or `$TMUX_TMPDIR`-derived sockets will land here, never at /tmp/tmux-<uid>/default.

  3. Sets AGENT_DECK_TEST_ISOLATED=1. Production code paths in internal/tmux read this marker at tmux-spawn time and panic with a clear message if TMUX is still set and points to a non-isolated socket — the "make the failure loud, not silent" belt to the TMUX-unset suspender.

Call this from every package-level TestMain that transitively spawns tmux:

func TestMain(m *testing.M) {
    cleanup := testutil.IsolateTmuxSocket()
    defer cleanup()
    os.Exit(m.Run())
}

Returns a cleanup function that removes the temp dir and restores the original TMUX / TMUX_PANE / AGENT_DECK_TEST_ISOLATED values so the parent process's env is not permanently altered.

func SkipIfShort added in v1.9.20

func SkipIfShort(t *testing.T)

SkipIfShort skips the test when `go test -short` is in effect. TestPerf_* tests are expensive enough that contributors running quick unit-test loops shouldn't pay for them. CI always runs in long mode.

func TrimmedMean added in v1.9.20

func TrimmedMean(fn func()) time.Duration

TrimmedMean runs fn n=11 times (plus 1 warm-up), sorts the samples, drops the top 2 and bottom 2, and returns the average of the middle 7.

Use for COLD tests where GC manipulation in the parent process doesn't help (the timed work happens in a child process or kernel).

func TrimmedMeanWarm added in v1.9.20

func TrimmedMeanWarm(fn func()) time.Duration

TrimmedMeanWarm is TrimmedMean with controlled GC: forces a runtime.GC() before each timed iteration and disables auto-GC during the timed window via debug.SetGCPercent(-1), restoring the original setting on return.

Use for WARM tests measuring pure-Go in-process work, where GC pauses are the dominant noise source.

func TrimmedMeanWithSetup added in v1.9.20

func TrimmedMeanWithSetup(setup, op func()) time.Duration

TrimmedMeanWithSetup runs setup() before each timed op() but excludes setup from the timing window. Use when the timed primitive needs a fresh fixture per iteration (e.g. a populated tree before timing DeleteAll). Setup and op share state via closure capture in the caller.

COLD variant — no GC manipulation. For warm work needing setup, compose: temporarily disable GC yourself or call runtime.GC() inside setup before each timed op.

func UnsetGitRepoEnv

func UnsetGitRepoEnv()

UnsetGitRepoEnv removes git repository-routing env vars from the current process. This prevents subprocess git commands from accidentally targeting the caller's repo.

func WarmBudget added in v1.9.20

func WarmBudget(t *testing.T, base time.Duration) time.Duration

WarmBudget returns the budget for a WARM test: pure in-process Go work that can be measured under controlled GC. Formula:

max(base * 3, PerfBudgetFloor) * PERF_BUDGET_MULTIPLIER

Pair with TrimmedMeanWarm(fn) — that variant forces a GC cycle and disables auto-GC around each timed iteration, eliminating the largest noise source.

No callers in this PR; exported to capture the cold/warm convention in code so future contributors classify their tests correctly.

Types

This section is empty.

Directories

Path Synopsis
Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture).
Package crossfixture supplies the shared scaffolding for tests that must verify TUI ↔ web ↔ CLI parity (TEST-PLAN.md §6.4 / TUI-TEST-PLAN.md §6.4 crossProcessFixture).
Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress.
Package fakeclock provides an injectable Clock for tests that exercise time-sensitive logic — hook freshness windows, heartbeat staleness, log-rotation maintenance — without sleeping or depending on wall-clock progress.
Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify).
Package fakeinotify provides a controllable filesystem event source for tests that exercise hook-status-watcher overflow / fallback paths (TEST-PLAN.md J2 regression, TUI-TEST-PLAN.md §6.2 fakeInotify).
Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr.
Package logassert captures slog records during a test and offers assertion helpers so tests can verify "this code path emitted hook_overflow with dropped>0" without grepping stderr.
Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2).
Package multiclienttmux boots an isolated tmux server with aggressive-resize=on and lets a test attach N pty clients at chosen sizes — the harness from TEST-PLAN.md §6.1 / TUI-TEST-PLAN.md §6.8 for the "two web clients hijacking pane size" regression (J4 / F2).
Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir.
Package profilefixture builds the controlled environment that profile-resolution parity tests require: a known AGENTDECK_PROFILE, CLAUDE_CONFIG_DIR, agent-deck config-dir override, and isolated tempdir.
Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1):
Package teatesthelper wraps charmbracelet/x/exp/teatest with the conventions our TUI tests need (TUI-TEST-PLAN.md §6.1):

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL