llmnop

command module

v0.0.0-...-70ed722 Latest Latest Go to latest Published: Jul 29, 2024 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/jpreagan/llmnop

Links

Open Source Insights

README ¶

LLMNOP

LLMNOP is a command-line tool for benchmarking Large Language Models (LLM) performance metrics.

Features

Measures key performance indicators: Time To First Token (TTFT) and Throughput
Support for concurrent requests to simulate real-world load
Configurable input and output tokens distribution for realistic load testing
Standardizes comparisons across models using the tokenizer of your choice

Installation

go install github.com/jpreagan/llmnop@latest

Usage

Usage:
  llmnop [flags]

Flags:
  -k, --api-key string             API key for the inference server
  -u, --base-url string            base URL for the inference server (e.g., "https://example.com/v1")
  -c, --concurrency int            number of concurrent requests (default 1)
  -h, --help                       help for llmnop
      --mean-input-tokens int      mean number of tokens to send in the prompt for the request (default 550)
      --mean-output-tokens int     mean number of tokens to generate from each LLM request (default 150)
  -m, --model string               specify the model to benchmark (e.g., "meta-llama/Meta-Llama-3-70B-Instruct")
  -n, --num-iterations int         number of iterations to run (default 2)
      --stddev-input-tokens int    standard deviation of number of tokens to send in the prompt for the request (default 150)
      --stddev-output-tokens int   standard deviation on the number of tokens to generate per LLM request (default 10)
  -t, --tokenizer string           path to the tokenizer.json file

Example

llmnop \
  --base-url https://example.com/v1 \
  --api-key your-api-key-here \
  --model meta-llama/Meta-Llama-3-70B-Instruct \
  --num-iterations 10 \
  --concurrency 1
  --tokenizer path/to/tokenizer.json

Sample Output

Benchmark Setup
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Model: meta-llama/Meta-Llama-3-70B-Instruct
Endpoint: https://example.com/v1/chat/completions
Total Requests: 10 (Iterations: 10, Concurrency: 1)
Input Tokens: Mean 550 ± 150
Output Tokens: Mean 150 ± 10
Timestamp: 2024-07-28T16:11:43-10:00

Performance Metrics
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

1. Time To First Token (TTFT) (ms):
       [───────────────|───────────────|───────────────|───────────────]
      Min             P25        Median (P50)         P75             Max
    106 ms          123 ms          129 ms          177 ms          196 ms

   Average (Mean): 145 ms
   Standard Deviation: 32 ms

2. Throughput (tokens/second):
       [───────────────|───────────────|───────────────|───────────────]
      Min             P25        Median (P50)         P75             Max
   17.9 t/s        18.2 t/s        18.8 t/s        19.1 t/s        19.4 t/s

   Average (Mean): 18.7 t/s
   Standard Deviation: 0.5 t/s

3. Input Token Count:
       [───────────────|───────────────|───────────────|───────────────]
      Min             P25        Median (P50)         P75             Max
      204             327             450             566             683

   Average (Mean): 446
   Standard Deviation: 196

4. Output Token Count:
       [───────────────|───────────────|───────────────|───────────────]
      Min             P25        Median (P50)         P75             Max
      160             238             317             586             854

   Average (Mean): 444
   Standard Deviation: 297

Request Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total Requests:    10
Successful:        10 (100.00%)
Failed:            0 (0.00%)

License

Apache-2.0 license

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL