loadtest

package
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 1, 2026 License: Apache-2.0 Imports: 23 Imported by: 0

Documentation

Overview

Package loadtest provides a gRPC load testing harness for the Loom server.

It spins up a real gRPC server backed by a mock LLM provider, then drives concurrent Weave and StreamWeave calls to measure throughput, latency percentiles, error rates, and resource usage — all without consuming real LLM tokens.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BuildLatencyHistogram

func BuildLatencyHistogram(results []result) []int64

BuildLatencyHistogram returns the raw latency values in microseconds. Downstream consumers can bucket these however they want.

func PercentileExported

func PercentileExported(values []time.Duration, p float64) time.Duration

PercentileExported computes the p-th percentile from an unsorted slice. It sorts the input in place.

func WriteJSON

func WriteJSON(path string, report *BenchmarkReport) error

WriteJSON writes the benchmark report to a JSON file.

Types

type BenchmarkConfig

type BenchmarkConfig struct {
	HarnessConfig

	// Runs is the number of measured runs (default 10).
	Runs int

	// WarmupRuns is the number of warmup runs to discard (default 2).
	WarmupRuns int

	// PerRunWarmup is the duration of warmup to discard at the start of each
	// duration-based run. For request-count-based runs, 10% of requests are
	// discarded instead. Zero disables per-run warmup trimming.
	PerRunWarmup time.Duration

	// ScenarioName is the name of the scenario for the JSON report.
	ScenarioName string

	// OutputPath is where to write the JSON report. Empty means stdout only.
	OutputPath string

	// CollectTimeSeries enables per-second throughput bucketing.
	CollectTimeSeries bool

	// CollectHistogram enables raw latency histogram collection.
	CollectHistogram bool

	// CollectResources enables runtime resource sampling.
	CollectResources bool

	// CollectGCCorrelation enables GC-latency correlation analysis.
	CollectGCCorrelation bool
}

BenchmarkConfig extends HarnessConfig with multi-run and publication settings.

func DefaultBenchmarkConfig

func DefaultBenchmarkConfig() BenchmarkConfig

DefaultBenchmarkConfig returns a config suitable for publication benchmarks.

type BenchmarkReport

type BenchmarkReport struct {
	Environment   EnvironmentInfo
	ScenarioName  string
	HarnessConfig HarnessConfig
	WarmupRuns    int
	Runs          []RunResult
	Aggregate     JSONAggregate
}

BenchmarkReport holds the complete benchmark output.

func (*BenchmarkReport) ToJSON

func (br *BenchmarkReport) ToJSON() *JSONReport

ToJSON converts a BenchmarkReport to the JSON output format.

type BenchmarkRunner

type BenchmarkRunner struct {
	// contains filtered or unexported fields
}

BenchmarkRunner orchestrates multi-run benchmarks with warmup and statistics.

func NewBenchmarkRunner

func NewBenchmarkRunner(config BenchmarkConfig) *BenchmarkRunner

NewBenchmarkRunner creates a runner with the given config.

func (*BenchmarkRunner) Run

Run executes the full benchmark: warmup runs, measured runs, aggregate stats.

type EnvironmentInfo

type EnvironmentInfo struct {
	GoVersion    string `json:"go_version"`
	GOOS         string `json:"goos"`
	GOARCH       string `json:"goarch"`
	NumCPU       int    `json:"num_cpu"`
	GOMAXPROCS   int    `json:"gomaxprocs"`
	GRPCVersion  string `json:"grpc_version"`
	CommitSHA    string `json:"commit_sha"`
	RaceDetector bool   `json:"race_detector"`
	TimestampUTC string `json:"timestamp_utc"`

	// K8s metadata (populated from downward API env vars if running in a pod)
	NodeName   string `json:"node_name,omitempty"`
	PodName    string `json:"pod_name,omitempty"`
	K8sVersion string `json:"k8s_version,omitempty"`
	VMSize     string `json:"vm_size,omitempty"`
	Region     string `json:"region,omitempty"`

	// System info (populated on Linux)
	KernelVersion string `json:"kernel_version,omitempty"`
	CPUModel      string `json:"cpu_model,omitempty"`
	RAMTotalMB    int64  `json:"ram_total_mb,omitempty"`
}

EnvironmentInfo captures the runtime and infrastructure environment for reproducibility of benchmark results.

func CaptureEnvironment

func CaptureEnvironment() EnvironmentInfo

CaptureEnvironment gathers runtime and system information.

type GCCorrelation

type GCCorrelation struct {
	TotalP95Requests     int     `json:"total_p95_requests"`
	GCAttributedRequests int     `json:"gc_attributed_requests"`
	GCAttributionPct     float64 `json:"gc_attribution_pct"`
	TotalGCPauses        int     `json:"total_gc_pauses"`
	MaxGCPauseUs         int64   `json:"max_gc_pause_us"`
	AvgGCPauseUs         float64 `json:"avg_gc_pause_us"`
	TotalGCPauseUs       int64   `json:"total_gc_pause_us"`
}

GCCorrelation holds the result of correlating GC pauses with tail latency.

func CorrelateGCWithLatency

func CorrelateGCWithLatency(results []result, runStart time.Time) GCCorrelation

CorrelateGCWithLatency checks whether requests with latency > P95 overlap with a GC pause window. This answers "what causes the tail?"

It compares per-request timestamps and latencies against the GC pause log from runtime.MemStats. The GC pause ring buffer holds the most recent 256 pauses.

type Harness

type Harness struct {
	// contains filtered or unexported fields
}

Harness manages the lifecycle of a load test.

func NewHarness

func NewHarness(config HarnessConfig) *Harness

NewHarness creates a new load test harness. Call Setup() to start the server.

func (*Harness) Run

func (h *Harness) Run(ctx context.Context) (*Report, error)

Run executes the load test and returns a report.

func (*Harness) RunRaw

func (h *Harness) RunRaw(ctx context.Context) ([]result, time.Duration, error)

RunRaw executes the load test and returns the raw results and wall time. Use this when you need access to per-request data for time-series analysis.

func (*Harness) ServerAddr

func (h *Harness) ServerAddr() string

ServerAddr returns the address the gRPC server is listening on. Useful for pointing external load testing tools (e.g., ghz) at the server.

func (*Harness) Setup

func (h *Harness) Setup() (string, error)

Setup starts the gRPC server with a mock LLM provider. Returns the server address for external clients (e.g., ghz). If HarnessConfig.ServerAddr is set, connects to a remote server instead.

func (*Harness) Teardown

func (h *Harness) Teardown()

Teardown stops the server and closes connections.

type HarnessConfig

type HarnessConfig struct {
	// Concurrency is the number of concurrent goroutines making gRPC calls.
	Concurrency int

	// TotalRequests is the total number of requests to make.
	// If 0, the harness runs for Duration instead.
	TotalRequests int

	// Duration is how long to run the load test (if TotalRequests is 0).
	Duration time.Duration

	// RequestTimeout is the per-request context timeout.
	RequestTimeout time.Duration

	// LLMConfig controls the mock LLM provider behavior.
	LLMConfig loadtest.ProviderConfig

	// UseStreaming uses StreamWeave instead of Weave.
	UseStreaming bool

	// Query is the query string sent in each request.
	Query string

	// RampUp is the duration over which to gradually ramp up to full concurrency.
	// Zero means all goroutines start immediately.
	RampUp time.Duration

	// LLMConcurrencyLimit overrides the server's LLM concurrency semaphore.
	// 0 means use the server default (5). Set high (e.g., 10000) to effectively
	// disable the limiter for load testing.
	LLMConcurrencyLimit int

	// SessionID, if set, is used for all requests (session reuse/contention testing).
	// Empty means each request creates a new session.
	SessionID string

	// NumAgents controls how many agents to register in the multi-agent server.
	// 0 or 1 means a single default agent. >1 creates N agents, and requests
	// are round-robined across them via agent_id.
	NumAgents int

	// ServerAddr, if set, connects to a remote gRPC server instead of starting
	// an in-process one. Setup() will only establish the client connection.
	// The mock LLM provider will be nil in this mode.
	ServerAddr string
}

HarnessConfig controls the load test parameters.

func DefaultHarnessConfig

func DefaultHarnessConfig() HarnessConfig

DefaultHarnessConfig returns sensible defaults for a quick load test.

type JSONAggregate

type JSONAggregate struct {
	ThroughputRPS Stats `json:"throughput_rps"`
	LatencyP50Us  Stats `json:"latency_p50_us"`
	LatencyP90Us  Stats `json:"latency_p90_us"`
	LatencyP95Us  Stats `json:"latency_p95_us"`
	LatencyP99Us  Stats `json:"latency_p99_us"`
}

JSONAggregate holds aggregate statistics across all runs.

type JSONConfig

type JSONConfig struct {
	Concurrency         int     `json:"concurrency"`
	TotalRequests       int     `json:"total_requests,omitempty"`
	DurationMs          int64   `json:"duration_ms,omitempty"`
	LLMBaseLatencyMs    int64   `json:"llm_base_latency_ms"`
	LLMJitterMs         int64   `json:"llm_jitter_ms"`
	LLMErrorRate        float64 `json:"llm_error_rate"`
	LLMConcurrencyLimit int     `json:"llm_concurrency_limit"`
	UseStreaming        bool    `json:"use_streaming"`
	NumAgents           int     `json:"num_agents"`
}

JSONConfig is a JSON-safe subset of the harness configuration. It excludes function pointers and non-serializable fields.

type JSONReport

type JSONReport struct {
	BenchmarkVersion string          `json:"benchmark_version"`
	LoomVersion      string          `json:"loom_version"`
	Environment      EnvironmentInfo `json:"environment"`
	Scenario         string          `json:"scenario"`
	Config           JSONConfig      `json:"config"`
	WarmupRuns       int             `json:"warmup_runs"`
	MeasuredRuns     int             `json:"measured_runs"`
	Runs             []JSONRunResult `json:"runs"`
	Aggregate        JSONAggregate   `json:"aggregate"`
}

JSONReport is the top-level output structure matching the spec's Section 6 schema.

type JSONRunResult

type JSONRunResult struct {
	RunNumber          int                `json:"run_number"`
	ThroughputRPS      float64            `json:"throughput_rps"`
	LatencyP50Us       int64              `json:"latency_p50_us"`
	LatencyP90Us       int64              `json:"latency_p90_us"`
	LatencyP95Us       int64              `json:"latency_p95_us"`
	LatencyP99Us       int64              `json:"latency_p99_us"`
	LatencyMinUs       int64              `json:"latency_min_us"`
	LatencyMaxUs       int64              `json:"latency_max_us"`
	LatencyAvgUs       float64            `json:"latency_avg_us"`
	ErrorCount         int64              `json:"error_count"`
	ErrorRate          float64            `json:"error_rate"`
	TotalRequests      int64              `json:"total_requests"`
	WallTimeMs         int64              `json:"wall_time_ms"`
	ThroughputTimeline []ThroughputBucket `json:"throughput_timeline,omitempty"`
	LatencyHistogram   []int64            `json:"latency_histogram,omitempty"`
	ResourceTimeline   []ResourceSample   `json:"resource_timeline,omitempty"`
	GCCorrelation      *GCCorrelation     `json:"gc_correlation,omitempty"`
}

JSONRunResult holds the data for a single benchmark run.

type Report

type Report struct {
	// Config is the configuration used for the test.
	Concurrency   int
	TotalRequests int
	UseStreaming  bool
	LLMLatency    string // description of LLM config

	// Timing
	WallTime time.Duration

	// Throughput
	RequestsCompleted int64
	RequestsPerSecond float64

	// Latency percentiles (request-level, including LLM simulation)
	P50 time.Duration
	P90 time.Duration
	P95 time.Duration
	P99 time.Duration
	Min time.Duration
	Max time.Duration
	Avg time.Duration

	// Error rates
	Errors    int64
	ErrorRate float64

	// LLM provider metrics
	LLMMetrics loadtest.MetricsSnapshot
}

Report contains the load test results.

func (*Report) String

func (r *Report) String() string

String returns a human-readable summary of the load test report.

type ResourceSample

type ResourceSample struct {
	Second       int     `json:"second"`
	Goroutines   int     `json:"goroutines"`
	HeapAllocMB  float64 `json:"heap_alloc_mb"`
	HeapSysMB    float64 `json:"heap_sys_mb"`
	NumGC        uint32  `json:"num_gc"`
	GCPauseTotUs int64   `json:"gc_pause_total_us"`
}

ResourceSample holds a single point-in-time snapshot of runtime metrics.

type ResourceSampler

type ResourceSampler struct {
	// contains filtered or unexported fields
}

ResourceSampler collects runtime metrics at 1-second intervals.

func NewResourceSampler

func NewResourceSampler() *ResourceSampler

NewResourceSampler creates a sampler. Call Start() to begin collection.

func (*ResourceSampler) Start

func (s *ResourceSampler) Start()

Start begins collecting samples every second.

func (*ResourceSampler) Stop

func (s *ResourceSampler) Stop() []ResourceSample

Stop ends collection and returns all samples.

type RunResult

type RunResult struct {
	Report             *Report
	RawResults         []result
	RunStart           time.Time
	ThroughputTimeline []ThroughputBucket
	LatencyHistogram   []int64
	ResourceTimeline   []ResourceSample
	GCCorrelation      *GCCorrelation
}

RunResult holds all data collected from a single benchmark run.

type Stats

type Stats struct {
	Median  float64 `json:"median"`
	Mean    float64 `json:"mean"`
	StdDev  float64 `json:"stddev"`
	CI95Low float64 `json:"ci95_low"`
	CI95Hi  float64 `json:"ci95_high"`
	CV      float64 `json:"cv_pct"` // coefficient of variation as percentage
	Min     float64 `json:"min"`
	Max     float64 `json:"max"`
}

Stats holds aggregate statistics computed from multiple benchmark runs.

func ComputeStats

func ComputeStats(values []float64) Stats

ComputeStats computes aggregate statistics from a slice of values. Returns zero-value Stats if the input is empty.

type ThroughputBucket

type ThroughputBucket struct {
	Second   int     `json:"second"`
	Requests int     `json:"requests"`
	Errors   int     `json:"errors"`
	P50Us    int64   `json:"p50_us"`
	P99Us    int64   `json:"p99_us"`
	MeanUs   float64 `json:"mean_us"`
}

ThroughputBucket holds throughput data for one second of a benchmark run.

func BuildTimeSeries

func BuildTimeSeries(results []result, runStart time.Time) []ThroughputBucket

BuildTimeSeries bins raw results into 1-second throughput buckets. runStart is the time the run began; results must have startedAt set.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL