examples

command

v0.0.0-...-835a7cf Latest Latest Go to latest Published: Sep 29, 2025 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tcpipuk/go-llama.cpp

Links

Open Source Insights

README ¶

Example CLI

This directory contains a command-line example demonstrating go-llama.cpp usage in both interactive and single-shot modes.

Getting started

Start by building the library:

# Build using act-runner (includes cmake and build tools)
docker run --rm -v $(pwd):/workspace -w /workspace \
  ghcr.io/tcpipuk/act-runner:ubuntu-latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace make libbinding.a"

Next, download a test model. We'll use Qwen3-0.6B as it's small enough to run efficiently even with CPU-only inference:

# Download test model (~600MB)
wget -q https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf

Now you can run the example:

# Single-shot mode
docker run --rm -v $(pwd):/workspace -w /workspace golang:latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace LD_LIBRARY_PATH=/workspace \
           go run ./examples -m Qwen3-0.6B-Q8_0.gguf -p 'Hello world' -n 50"

# Interactive mode (omit -p flag)
docker run --rm -it -v $(pwd):/workspace -w /workspace golang:latest \
  bash -c "LIBRARY_PATH=/workspace C_INCLUDE_PATH=/workspace LD_LIBRARY_PATH=/workspace \
           go run ./examples -m Qwen3-0.6B-Q8_0.gguf"

Command-line options

Flag	Description	Default
`-m`	Path to GGUF model file	`./models/model.gguf`
`-p`	Prompt for single-shot mode (interactive if empty)	`""`
`-c`	Context length	`512`
`-ngl`	Number of GPU layers to utilise	`0`
`-t`	Number of CPU threads	`runtime.NumCPU()`
`-n`	Number of tokens to predict	`512`
`-s`	Predict RNG seed (-1 for random)	`-1`

Usage modes

Single-shot mode

Use -p to provide a prompt and generate text once before exiting:

go run ./examples -m model.gguf -p "The capital of France is" -n 50

Interactive mode

Omit the -p flag to start interactive mode where you can chat with the model:

go run ./examples -m model.gguf

In interactive mode:

Type your prompt and press Enter on an empty line to submit
Use Ctrl+C to exit
The model will stream responses token by token

Environment variables

The example requires these environment variables:

export LIBRARY_PATH=$PWD        # For linking during build
export C_INCLUDE_PATH=$PWD      # For header files during build
export LD_LIBRARY_PATH=$PWD     # For shared libraries at runtime

Without these, you'll see "undefined symbol" or "library not found" errors.

Hardware acceleration

To enable GPU acceleration, build with the appropriate backend:

# CUDA example
BUILD_TYPE=cublas make libbinding.a
CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" \
  go run ./examples -m model.gguf -ngl 32

See the building guide for all hardware acceleration options.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL