Documentation
¶
Overview ¶
Package loadtest provides a gRPC load testing harness for the Loom server.
It spins up a real gRPC server backed by a mock LLM provider, then drives concurrent Weave and StreamWeave calls to measure throughput, latency percentiles, error rates, and resource usage — all without consuming real LLM tokens.
Index ¶
- func BuildLatencyHistogram(results []result) []int64
- func PercentileExported(values []time.Duration, p float64) time.Duration
- func WriteJSON(path string, report *BenchmarkReport) error
- type BenchmarkConfig
- type BenchmarkReport
- type BenchmarkRunner
- type EnvironmentInfo
- type GCCorrelation
- type Harness
- type HarnessConfig
- type JSONAggregate
- type JSONConfig
- type JSONReport
- type JSONRunResult
- type Report
- type ResourceSample
- type ResourceSampler
- type RunResult
- type Stats
- type ThroughputBucket
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func BuildLatencyHistogram ¶
func BuildLatencyHistogram(results []result) []int64
BuildLatencyHistogram returns the raw latency values in microseconds. Downstream consumers can bucket these however they want.
func PercentileExported ¶
PercentileExported computes the p-th percentile from an unsorted slice. It sorts the input in place.
func WriteJSON ¶
func WriteJSON(path string, report *BenchmarkReport) error
WriteJSON writes the benchmark report to a JSON file.
Types ¶
type BenchmarkConfig ¶
type BenchmarkConfig struct {
HarnessConfig
// Runs is the number of measured runs (default 10).
Runs int
// WarmupRuns is the number of warmup runs to discard (default 2).
WarmupRuns int
// PerRunWarmup is the duration of warmup to discard at the start of each
// duration-based run. For request-count-based runs, 10% of requests are
// discarded instead. Zero disables per-run warmup trimming.
PerRunWarmup time.Duration
// ScenarioName is the name of the scenario for the JSON report.
ScenarioName string
// OutputPath is where to write the JSON report. Empty means stdout only.
OutputPath string
// CollectTimeSeries enables per-second throughput bucketing.
CollectTimeSeries bool
// CollectHistogram enables raw latency histogram collection.
CollectHistogram bool
// CollectResources enables runtime resource sampling.
CollectResources bool
// CollectGCCorrelation enables GC-latency correlation analysis.
CollectGCCorrelation bool
}
BenchmarkConfig extends HarnessConfig with multi-run and publication settings.
func DefaultBenchmarkConfig ¶
func DefaultBenchmarkConfig() BenchmarkConfig
DefaultBenchmarkConfig returns a config suitable for publication benchmarks.
type BenchmarkReport ¶
type BenchmarkReport struct {
Environment EnvironmentInfo
ScenarioName string
HarnessConfig HarnessConfig
WarmupRuns int
Runs []RunResult
Aggregate JSONAggregate
}
BenchmarkReport holds the complete benchmark output.
func (*BenchmarkReport) ToJSON ¶
func (br *BenchmarkReport) ToJSON() *JSONReport
ToJSON converts a BenchmarkReport to the JSON output format.
type BenchmarkRunner ¶
type BenchmarkRunner struct {
// contains filtered or unexported fields
}
BenchmarkRunner orchestrates multi-run benchmarks with warmup and statistics.
func NewBenchmarkRunner ¶
func NewBenchmarkRunner(config BenchmarkConfig) *BenchmarkRunner
NewBenchmarkRunner creates a runner with the given config.
func (*BenchmarkRunner) Run ¶
func (br *BenchmarkRunner) Run(ctx context.Context) (*BenchmarkReport, error)
Run executes the full benchmark: warmup runs, measured runs, aggregate stats.
type EnvironmentInfo ¶
type EnvironmentInfo struct {
GoVersion string `json:"go_version"`
GOOS string `json:"goos"`
GOARCH string `json:"goarch"`
NumCPU int `json:"num_cpu"`
GOMAXPROCS int `json:"gomaxprocs"`
GRPCVersion string `json:"grpc_version"`
CommitSHA string `json:"commit_sha"`
RaceDetector bool `json:"race_detector"`
TimestampUTC string `json:"timestamp_utc"`
// K8s metadata (populated from downward API env vars if running in a pod)
NodeName string `json:"node_name,omitempty"`
PodName string `json:"pod_name,omitempty"`
K8sVersion string `json:"k8s_version,omitempty"`
VMSize string `json:"vm_size,omitempty"`
Region string `json:"region,omitempty"`
// System info (populated on Linux)
KernelVersion string `json:"kernel_version,omitempty"`
CPUModel string `json:"cpu_model,omitempty"`
RAMTotalMB int64 `json:"ram_total_mb,omitempty"`
}
EnvironmentInfo captures the runtime and infrastructure environment for reproducibility of benchmark results.
func CaptureEnvironment ¶
func CaptureEnvironment() EnvironmentInfo
CaptureEnvironment gathers runtime and system information.
type GCCorrelation ¶
type GCCorrelation struct {
TotalP95Requests int `json:"total_p95_requests"`
GCAttributedRequests int `json:"gc_attributed_requests"`
GCAttributionPct float64 `json:"gc_attribution_pct"`
TotalGCPauses int `json:"total_gc_pauses"`
MaxGCPauseUs int64 `json:"max_gc_pause_us"`
AvgGCPauseUs float64 `json:"avg_gc_pause_us"`
TotalGCPauseUs int64 `json:"total_gc_pause_us"`
}
GCCorrelation holds the result of correlating GC pauses with tail latency.
func CorrelateGCWithLatency ¶
func CorrelateGCWithLatency(results []result, runStart time.Time) GCCorrelation
CorrelateGCWithLatency checks whether requests with latency > P95 overlap with a GC pause window. This answers "what causes the tail?"
It compares per-request timestamps and latencies against the GC pause log from runtime.MemStats. The GC pause ring buffer holds the most recent 256 pauses.
type Harness ¶
type Harness struct {
// contains filtered or unexported fields
}
Harness manages the lifecycle of a load test.
func NewHarness ¶
func NewHarness(config HarnessConfig) *Harness
NewHarness creates a new load test harness. Call Setup() to start the server.
func (*Harness) RunRaw ¶
RunRaw executes the load test and returns the raw results and wall time. Use this when you need access to per-request data for time-series analysis.
func (*Harness) ServerAddr ¶
ServerAddr returns the address the gRPC server is listening on. Useful for pointing external load testing tools (e.g., ghz) at the server.
type HarnessConfig ¶
type HarnessConfig struct {
// Concurrency is the number of concurrent goroutines making gRPC calls.
Concurrency int
// TotalRequests is the total number of requests to make.
// If 0, the harness runs for Duration instead.
TotalRequests int
// Duration is how long to run the load test (if TotalRequests is 0).
Duration time.Duration
// RequestTimeout is the per-request context timeout.
RequestTimeout time.Duration
// LLMConfig controls the mock LLM provider behavior.
LLMConfig loadtest.ProviderConfig
// UseStreaming uses StreamWeave instead of Weave.
UseStreaming bool
// Query is the query string sent in each request.
Query string
// RampUp is the duration over which to gradually ramp up to full concurrency.
// Zero means all goroutines start immediately.
RampUp time.Duration
// LLMConcurrencyLimit overrides the server's LLM concurrency semaphore.
// 0 means use the server default (5). Set high (e.g., 10000) to effectively
// disable the limiter for load testing.
LLMConcurrencyLimit int
// SessionID, if set, is used for all requests (session reuse/contention testing).
// Empty means each request creates a new session.
SessionID string
// NumAgents controls how many agents to register in the multi-agent server.
// 0 or 1 means a single default agent. >1 creates N agents, and requests
// are round-robined across them via agent_id.
NumAgents int
// ServerAddr, if set, connects to a remote gRPC server instead of starting
// an in-process one. Setup() will only establish the client connection.
// The mock LLM provider will be nil in this mode.
ServerAddr string
}
HarnessConfig controls the load test parameters.
func DefaultHarnessConfig ¶
func DefaultHarnessConfig() HarnessConfig
DefaultHarnessConfig returns sensible defaults for a quick load test.
type JSONAggregate ¶
type JSONAggregate struct {
ThroughputRPS Stats `json:"throughput_rps"`
LatencyP50Us Stats `json:"latency_p50_us"`
LatencyP90Us Stats `json:"latency_p90_us"`
LatencyP95Us Stats `json:"latency_p95_us"`
LatencyP99Us Stats `json:"latency_p99_us"`
}
JSONAggregate holds aggregate statistics across all runs.
type JSONConfig ¶
type JSONConfig struct {
Concurrency int `json:"concurrency"`
TotalRequests int `json:"total_requests,omitempty"`
DurationMs int64 `json:"duration_ms,omitempty"`
LLMBaseLatencyMs int64 `json:"llm_base_latency_ms"`
LLMJitterMs int64 `json:"llm_jitter_ms"`
LLMErrorRate float64 `json:"llm_error_rate"`
LLMConcurrencyLimit int `json:"llm_concurrency_limit"`
UseStreaming bool `json:"use_streaming"`
NumAgents int `json:"num_agents"`
}
JSONConfig is a JSON-safe subset of the harness configuration. It excludes function pointers and non-serializable fields.
type JSONReport ¶
type JSONReport struct {
BenchmarkVersion string `json:"benchmark_version"`
LoomVersion string `json:"loom_version"`
Environment EnvironmentInfo `json:"environment"`
Scenario string `json:"scenario"`
Config JSONConfig `json:"config"`
WarmupRuns int `json:"warmup_runs"`
MeasuredRuns int `json:"measured_runs"`
Runs []JSONRunResult `json:"runs"`
Aggregate JSONAggregate `json:"aggregate"`
}
JSONReport is the top-level output structure matching the spec's Section 6 schema.
type JSONRunResult ¶
type JSONRunResult struct {
RunNumber int `json:"run_number"`
ThroughputRPS float64 `json:"throughput_rps"`
LatencyP50Us int64 `json:"latency_p50_us"`
LatencyP90Us int64 `json:"latency_p90_us"`
LatencyP95Us int64 `json:"latency_p95_us"`
LatencyP99Us int64 `json:"latency_p99_us"`
LatencyMinUs int64 `json:"latency_min_us"`
LatencyMaxUs int64 `json:"latency_max_us"`
LatencyAvgUs float64 `json:"latency_avg_us"`
ErrorCount int64 `json:"error_count"`
ErrorRate float64 `json:"error_rate"`
TotalRequests int64 `json:"total_requests"`
WallTimeMs int64 `json:"wall_time_ms"`
ThroughputTimeline []ThroughputBucket `json:"throughput_timeline,omitempty"`
LatencyHistogram []int64 `json:"latency_histogram,omitempty"`
ResourceTimeline []ResourceSample `json:"resource_timeline,omitempty"`
GCCorrelation *GCCorrelation `json:"gc_correlation,omitempty"`
}
JSONRunResult holds the data for a single benchmark run.
type Report ¶
type Report struct {
// Config is the configuration used for the test.
Concurrency int
TotalRequests int
UseStreaming bool
LLMLatency string // description of LLM config
// Timing
WallTime time.Duration
// Throughput
RequestsCompleted int64
RequestsPerSecond float64
// Latency percentiles (request-level, including LLM simulation)
P50 time.Duration
P90 time.Duration
P95 time.Duration
P99 time.Duration
Min time.Duration
Max time.Duration
Avg time.Duration
// Error rates
Errors int64
ErrorRate float64
// LLM provider metrics
LLMMetrics loadtest.MetricsSnapshot
}
Report contains the load test results.
type ResourceSample ¶
type ResourceSample struct {
Second int `json:"second"`
Goroutines int `json:"goroutines"`
HeapAllocMB float64 `json:"heap_alloc_mb"`
HeapSysMB float64 `json:"heap_sys_mb"`
NumGC uint32 `json:"num_gc"`
GCPauseTotUs int64 `json:"gc_pause_total_us"`
}
ResourceSample holds a single point-in-time snapshot of runtime metrics.
type ResourceSampler ¶
type ResourceSampler struct {
// contains filtered or unexported fields
}
ResourceSampler collects runtime metrics at 1-second intervals.
func NewResourceSampler ¶
func NewResourceSampler() *ResourceSampler
NewResourceSampler creates a sampler. Call Start() to begin collection.
func (*ResourceSampler) Start ¶
func (s *ResourceSampler) Start()
Start begins collecting samples every second.
func (*ResourceSampler) Stop ¶
func (s *ResourceSampler) Stop() []ResourceSample
Stop ends collection and returns all samples.
type RunResult ¶
type RunResult struct {
Report *Report
RawResults []result
RunStart time.Time
ThroughputTimeline []ThroughputBucket
LatencyHistogram []int64
ResourceTimeline []ResourceSample
GCCorrelation *GCCorrelation
}
RunResult holds all data collected from a single benchmark run.
type Stats ¶
type Stats struct {
Median float64 `json:"median"`
Mean float64 `json:"mean"`
StdDev float64 `json:"stddev"`
CI95Low float64 `json:"ci95_low"`
CI95Hi float64 `json:"ci95_high"`
CV float64 `json:"cv_pct"` // coefficient of variation as percentage
Min float64 `json:"min"`
Max float64 `json:"max"`
}
Stats holds aggregate statistics computed from multiple benchmark runs.
func ComputeStats ¶
ComputeStats computes aggregate statistics from a slice of values. Returns zero-value Stats if the input is empty.
type ThroughputBucket ¶
type ThroughputBucket struct {
Second int `json:"second"`
Requests int `json:"requests"`
Errors int `json:"errors"`
P50Us int64 `json:"p50_us"`
P99Us int64 `json:"p99_us"`
MeanUs float64 `json:"mean_us"`
}
ThroughputBucket holds throughput data for one second of a benchmark run.
func BuildTimeSeries ¶
func BuildTimeSeries(results []result, runStart time.Time) []ThroughputBucket
BuildTimeSeries bins raw results into 1-second throughput buckets. runStart is the time the run began; results must have startedAt set.