bench

command
v0.18.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 18, 2026 License: MIT Imports: 12 Imported by: 0

README

Ollama Benchmark Tool

A Go-based command-line tool for benchmarking Ollama models with configurable parameters, warmup phases, TTFT tracking, VRAM monitoring, and benchstat/CSV output.

Features

  • Benchmark multiple models in a single run
  • Support for both text and image prompts
  • Configurable generation parameters (temperature, max tokens, seed, etc.)
  • Warmup phase before timed epochs to stabilize measurements
  • Time-to-first-token (TTFT) tracking per epoch
  • Model metadata display (parameter size, quantization level, family)
  • VRAM and CPU memory usage tracking via running process info
  • Controlled prompt token length for reproducible benchmarks
  • Benchstat and CSV output formats

Building from Source

go build -o ollama-bench ./cmd/bench
./ollama-bench -model gemma3 -epochs 6 -format csv

Using Go Run (without building)

go run ./cmd/bench -model gemma3 -epochs 3

Usage

Basic Example

./ollama-bench -model gemma3 -epochs 6

Benchmark Multiple Models

./ollama-bench -model gemma3,gemma3n -epochs 6 -max-tokens 100 -p "Write me a short story" | tee gemma.bench
benchstat -col /name gemma.bench

With Image Prompt

./ollama-bench -model qwen3-vl -image photo.jpg -epochs 6 -max-tokens 100 -p "Describe this image"

Controlled Prompt Length

./ollama-bench -model gemma3 -epochs 6 -prompt-tokens 512

Advanced Example

./ollama-bench -model llama3 -epochs 10 -temperature 0.7 -max-tokens 500 -seed 42 -warmup 2 -format csv -output results.csv

Command Line Options

Option Description Default
-model Comma-separated list of models to benchmark (required)
-epochs Number of iterations per model 6
-max-tokens Maximum tokens for model response 200
-temperature Temperature parameter 0.0
-seed Random seed 0 (random)
-timeout Timeout in seconds 300
-p Prompt text (default story prompt)
-image Image file to include in prompt
-k Keep-alive duration in seconds 0
-format Output format (benchstat, csv) benchstat
-output Output file for results "" (stdout)
-warmup Number of warmup requests before timing 1
-prompt-tokens Generate prompt targeting ~N tokens (0 = use -p) 0
-v Verbose mode false
-debug Show debug information false

Output Formats

Benchstat Format (default)

Compatible with Go's benchstat tool for statistical analysis. Uses one value/unit pair per line, standard ns/op for timing metrics, and ns/token for throughput. Each epoch produces one set of lines -- benchstat aggregates across repeated runs to compute statistics.

# Model: gemma3 | Params: 4.3B | Quant: Q4_K_M | Family: gemma3 | Size: 4080218931 | VRAM: 4080218931
BenchmarkModel/name=gemma3/step=prefill 1 78125.00 ns/token 12800.00 token/sec
BenchmarkModel/name=gemma3/step=generate 1 19531.25 ns/token 51200.00 token/sec
BenchmarkModel/name=gemma3/step=ttft 1 45123000 ns/op
BenchmarkModel/name=gemma3/step=load 1 1500000000 ns/op
BenchmarkModel/name=gemma3/step=total 1 2861047625 ns/op

Use with benchstat:

./ollama-bench -model gemma3 -epochs 6 > gemma3.bench
benchstat -col /step gemma3.bench

Compare two runs:

./ollama-bench -model gemma3 -epochs 6 > before.bench
# ... make changes ...
./ollama-bench -model gemma3 -epochs 6 > after.bench
benchstat before.bench after.bench

CSV Format

Machine-readable comma-separated values:

NAME,STEP,COUNT,NS_PER_COUNT,TOKEN_PER_SEC
# Model: gemma3 | Params: 4.3B | Quant: Q4_K_M | Family: gemma3 | Size: 4080218931 | VRAM: 4080218931
gemma3,prefill,128,78125.00,12800.00
gemma3,generate,512,19531.25,51200.00
gemma3,ttft,1,45123000,0
gemma3,load,1,1500000000,0
gemma3,total,1,2861047625,0

Metrics Explained

The tool reports the following metrics for each epoch:

  • prefill: Time spent processing the prompt (ns/token)
  • generate: Time spent generating the response (ns/token)
  • ttft: Time to first token -- latency from request start to first response content
  • load: Model loading time (one-time cost)
  • total: Total request duration

Additionally, the model info comment line (displayed once per model before epochs) includes:

  • Params: Model parameter count (e.g., 4.3B)
  • Quant: Quantization level (e.g., Q4_K_M)
  • Family: Model family (e.g., gemma3)
  • Size: Total model memory in bytes
  • VRAM: GPU memory used by the loaded model (when Size > VRAM, the difference is CPU spill)

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL