Example CLI
This directory contains a command-line example demonstrating go-llama.cpp usage in
both interactive and single-shot modes.
Getting started
Start by building the library:
# Build using act-runner (includes cmake and build tools)
docker run --rm -v $(pwd):/workspace -w /workspace \
  ghcr.io/tcpipuk/act-runner:ubuntu-latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace make libbinding.a"
Next, download a test model. We'll use Qwen3-0.6B
as it's small enough to run efficiently even with CPU-only inference:
# Download test model (~600MB)
wget -q https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf
Now you can run the example:
# Single-shot mode
docker run --rm -v $(pwd):/workspace -w /workspace golang:latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace LD_LIBRARY_PATH=/workspace \
           go run ./examples -m Qwen3-0.6B-Q8_0.gguf -p 'Hello world' -n 50"
# Interactive mode (omit -p flag)
docker run --rm -it -v $(pwd):/workspace -w /workspace golang:latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace LD_LIBRARY_PATH=/workspace \
           go run ./examples -m Qwen3-0.6B-Q8_0.gguf"
Command-line options
| Flag | 
Description | 
Default | 
-m | 
Path to GGUF model file | 
./models/model.gguf | 
-p | 
Prompt for single-shot mode (interactive if empty) | 
"" | 
-c | 
Context length | 
512 | 
-ngl | 
Number of GPU layers to utilise | 
0 | 
-t | 
Number of CPU threads | 
runtime.NumCPU() | 
-n | 
Number of tokens to predict | 
512 | 
-s | 
Predict RNG seed (-1 for random) | 
-1 | 
Usage modes
Single-shot mode
Use -p to provide a prompt and generate text once before exiting:
go run ./examples -m model.gguf -p "The capital of France is" -n 50
Interactive mode
Omit the -p flag to start interactive mode where you can chat with the model:
go run ./examples -m model.gguf
In interactive mode:
- Type your prompt and press Enter on an empty line to submit
 
- Use Ctrl+C to exit
 
- The model will stream responses token by token
 
Environment variables
The example requires these environment variables:
export LIBRARY_PATH=$PWD        # For linking during build
export C_INCLUDE_PATH=$PWD      # For header files during build
export LD_LIBRARY_PATH=$PWD     # For shared libraries at runtime
Without these, you'll see "undefined symbol" or "library not found" errors.
Hardware acceleration
To enable GPU acceleration, build with the appropriate backend:
# CUDA example
BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" \
  go run ./examples -m model.gguf -ngl 32
See the building guide for all hardware acceleration options.