llama

package module

v0.0.0-...-835a7cf Latest Latest Go to latest Published: Sep 29, 2025 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tcpipuk/go-llama.cpp

Links

Open Source Insights

README ¶

go-llama.cpp

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which appears unmaintained. We're keeping it current with the latest llama.cpp developments and Go best practices.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/go-llama.cpp.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/go-llama.cpp
cd go-llama.cpp

See examples/README.md for complete build and usage instructions.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/go-llama.cpp"
)

func main() {
    model, err := llama.New(
        "/path/to/model.gguf",
        llama.EnableF16Memory,
        llama.SetContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Free()

    response, err := model.Predict("Hello world", llama.SetTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

Remember to set the required environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

Important: The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides

Getting started - Complete walkthrough from installation to first inference
Building guide - Build options, hardware acceleration, and troubleshooting
API guide - Detailed usage examples, configuration options, and thread safety

Quick references

Examples - CLI example with build instructions and usage options
Go package documentation - Full API reference
RELEASE.md - Release process and compatibility tracking

Requirements

Go 1.25+
Docker (recommended) or C++ compiler with CMake
Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping computational work in the optimised llama.cpp C++ code whilst providing a clean Go API. This approach minimises CGO overhead whilst maximising performance.

Key components:

binding.cpp/binding.h - CGO interface to llama.cpp
llama.go - Main Go API with functional options
options.go - Configuration using the functional options pattern
llama.cpp/ - Git submodule containing upstream llama.cpp

Contributing

We welcome contributions! For development:

# Format and static analysis
go fmt ./...
go vet ./...

# Run tests (requires model file)
TEST_MODEL="/path/to/test.gguf" make test

# Pre-commit checks
prek run --all-files

See our building guide for development environment setup.

Licence

MIT

Documentation ¶

Index ¶

type LLama
- func New(model string, opts ...ModelOption) (*LLama, error)
type ModelOption
type ModelOptions
- func NewModelOptions(opts ...ModelOption) ModelOptions
type PredictOption
type PredictOptions
- func NewPredictOptions(opts ...PredictOption) PredictOptions

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type LLama ¶

type LLama struct {
	// contains filtered or unexported fields
}

func New ¶

func New(model string, opts ...ModelOption) (*LLama, error)

func (*LLama) Embeddings ¶

func (l *LLama) Embeddings(text string, opts ...PredictOption) ([]float32, error)

Embeddings

func (*LLama) Eval ¶

func (l *LLama) Eval(text string, opts ...PredictOption) error

func (*LLama) Free ¶

func (l *LLama) Free()

func (*LLama) LoadState ¶

func (l *LLama) LoadState(state string) error

func (*LLama) Predict ¶

func (l *LLama) Predict(text string, opts ...PredictOption) (string, error)

func (*LLama) SaveState ¶

func (l *LLama) SaveState(dst string) error

func (*LLama) SetTokenCallback ¶

func (l *LLama) SetTokenCallback(callback func(token string) bool)

SetTokenCallback registers a callback for the individual tokens created when running Predict. It will be called once for each token. The callback shall return true as long as the model should continue predicting the next token. When the callback returns false the predictor will return. The tokens are just converted into Go strings, they are not trimmed or otherwise changed. Also the tokens may not be valid UTF-8. Pass in nil to remove a callback.

It is save to call this method while a prediction is running.

func (*LLama) SpeculativeSampling ¶

func (l *LLama) SpeculativeSampling(ll *LLama, text string, opts ...PredictOption) (string, error)

func (*LLama) TokenEmbeddings ¶

func (l *LLama) TokenEmbeddings(tokens []int, opts ...PredictOption) ([]float32, error)

Token Embeddings

func (*LLama) TokenizeString ¶

func (l *LLama) TokenizeString(text string, opts ...PredictOption) (int32, []int32, error)

tokenize has an interesting return property: negative lengths (potentially) have meaning. Therefore, return the length seperate from the slice and error - all three can be used together

type ModelOption ¶

type ModelOption func(p *ModelOptions)

var EnabelLowVRAM ModelOption = func(p *ModelOptions) {
	p.LowVRAM = true
}

var EnableEmbeddings ModelOption = func(p *ModelOptions) {
	p.Embeddings = true
}

var EnableF16Memory ModelOption = func(p *ModelOptions) {
	p.F16Memory = true
}

var EnableMLock ModelOption = func(p *ModelOptions) {
	p.MLock = true
}

var EnableNUMA ModelOption = func(p *ModelOptions) {
	p.NUMA = true
}

func SetContext ¶

func SetContext(c int) ModelOption

SetContext sets the context size.

func SetGPULayers ¶

func SetGPULayers(n int) ModelOption

SetGPULayers sets the number of GPU layers to use to offload computation

func SetLoraAdapter ¶

func SetLoraAdapter(s string) ModelOption

func SetLoraBase ¶

func SetLoraBase(s string) ModelOption

func SetMMap ¶

func SetMMap(b bool) ModelOption

SetContext sets the context size.

func SetMainGPU ¶

func SetMainGPU(maingpu string) ModelOption

SetMainGPU sets the main_gpu

func SetModelSeed ¶

func SetModelSeed(c int) ModelOption

func SetMulMatQ ¶

func SetMulMatQ(b bool) ModelOption

func SetNBatch ¶

func SetNBatch(n_batch int) ModelOption

SetNBatch sets the n_Batch

func SetPerplexity ¶

func SetPerplexity(b bool) ModelOption

func SetTensorSplit ¶

func SetTensorSplit(maingpu string) ModelOption

Set sets the tensor split for the GPU

func WithRopeFreqBase ¶

func WithRopeFreqBase(f float32) ModelOption

func WithRopeFreqScale ¶

func WithRopeFreqScale(f float32) ModelOption

type ModelOptions ¶

type ModelOptions struct {
	ContextSize   int
	Seed          int
	NBatch        int
	F16Memory     bool
	MLock         bool
	MMap          bool
	LowVRAM       bool
	Embeddings    bool
	NUMA          bool
	NGPULayers    int
	MainGPU       string
	TensorSplit   string
	FreqRopeBase  float32
	FreqRopeScale float32
	MulMatQ       *bool
	LoraBase      string
	LoraAdapter   string
	Perplexity    bool
}

var DefaultModelOptions ModelOptions = ModelOptions{
	ContextSize:   512,
	Seed:          0,
	F16Memory:     false,
	MLock:         false,
	Embeddings:    false,
	MMap:          true,
	LowVRAM:       false,
	NBatch:        512,
	FreqRopeBase:  10000,
	FreqRopeScale: 1.0,
}

func NewModelOptions ¶

func NewModelOptions(opts ...ModelOption) ModelOptions

Create a new PredictOptions object with the given options.

type PredictOption ¶

type PredictOption func(p *PredictOptions)

var Debug PredictOption = func(p *PredictOptions) {
	p.DebugMode = true
}

var EnableF16KV PredictOption = func(p *PredictOptions) {
	p.F16KV = true
}

var EnablePromptCacheAll PredictOption = func(p *PredictOptions) {
	p.PromptCacheAll = true
}

var EnablePromptCacheRO PredictOption = func(p *PredictOptions) {
	p.PromptCacheRO = true
}

var IgnoreEOS PredictOption = func(p *PredictOptions) {
	p.IgnoreEOS = true
}

func SetBatch ¶

func SetBatch(size int) PredictOption

SetBatch sets the batch size.

func SetFrequencyPenalty ¶

func SetFrequencyPenalty(fp float32) PredictOption

SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.

func SetLogitBias ¶

func SetLogitBias(lb string) PredictOption

SetLogitBias sets the logit bias parameter.

func SetMemoryMap ¶

func SetMemoryMap(b bool) PredictOption

SetMemoryMap sets memory mapping.

func SetMirostat ¶

func SetMirostat(m int) PredictOption

SetMirostat sets the mirostat parameter.

func SetMirostatETA ¶

func SetMirostatETA(me float32) PredictOption

SetMirostatETA sets the mirostat ETA parameter.

func SetMirostatTAU ¶

func SetMirostatTAU(mt float32) PredictOption

SetMirostatTAU sets the mirostat TAU parameter.

func SetMlock ¶

func SetMlock(b bool) PredictOption

SetMlock sets the memory lock.

func SetNDraft ¶

func SetNDraft(nd int) PredictOption

func SetNKeep ¶

func SetNKeep(n int) PredictOption

SetKeep sets the number of tokens from initial prompt to keep.

func SetNegativePrompt ¶

func SetNegativePrompt(np string) PredictOption

func SetNegativePromptScale ¶

func SetNegativePromptScale(nps float32) PredictOption

func SetPathPromptCache ¶

func SetPathPromptCache(f string) PredictOption

SetPathPromptCache sets the session file to store the prompt cache.

func SetPenalizeNL ¶

func SetPenalizeNL(pnl bool) PredictOption

SetPenalizeNL sets whether to penalize newlines or not.

func SetPenalty ¶

func SetPenalty(penalty float32) PredictOption

SetPenalty sets the repetition penalty for text generation.

func SetPredictionMainGPU ¶

func SetPredictionMainGPU(maingpu string) PredictOption

SetPredictionMainGPU sets the main_gpu

func SetPredictionTensorSplit ¶

func SetPredictionTensorSplit(maingpu string) PredictOption

SetPredictionTensorSplit sets the tensor split for the GPU

func SetPresencePenalty ¶

func SetPresencePenalty(pp float32) PredictOption

SetPresencePenalty sets the presence penalty parameter, presence_penalty.

func SetRepeat ¶

func SetRepeat(repeat int) PredictOption

SetRepeat sets the number of times to repeat text generation.

func SetRopeFreqBase ¶

func SetRopeFreqBase(rfb float32) PredictOption

Rope and negative prompt parameters

func SetRopeFreqScale ¶

func SetRopeFreqScale(rfs float32) PredictOption

func SetSeed ¶

func SetSeed(seed int) PredictOption

SetSeed sets the random seed for sampling text generation.

func SetStopWords ¶

func SetStopWords(stop ...string) PredictOption

SetStopWords sets the prompts that will stop predictions.

func SetTailFreeSamplingZ ¶

func SetTailFreeSamplingZ(tfz float32) PredictOption

SetTailFreeSamplingZ sets the tail free sampling, parameter z.

func SetTemperature ¶

func SetTemperature(temp float32) PredictOption

SetTemperature sets the temperature value for text generation.

func SetThreads ¶

func SetThreads(threads int) PredictOption

SetThreads sets the number of threads to use for text generation.

func SetTokenCallback ¶

func SetTokenCallback(fn func(string) bool) PredictOption

SetTokenCallback sets the prompts that will stop predictions.

func SetTokens ¶

func SetTokens(tokens int) PredictOption

SetTokens sets the number of tokens to generate.

func SetTopK ¶

func SetTopK(topk int) PredictOption

SetTopK sets the value for top-K sampling.

func SetTopP ¶

func SetTopP(topp float32) PredictOption

SetTopP sets the value for nucleus sampling.

func SetTypicalP ¶

func SetTypicalP(tp float32) PredictOption

SetTypicalP sets the typicality parameter, p_typical.

func WithGrammar ¶

func WithGrammar(s string) PredictOption

WithGrammar sets the grammar to constrain the output of the LLM response

type PredictOptions ¶

type PredictOptions struct {
	Seed, Threads, Tokens, TopK, Repeat, Batch, NKeep int
	TopP, Temperature, Penalty                        float32
	NDraft                                            int
	F16KV                                             bool
	DebugMode                                         bool
	StopPrompts                                       []string
	IgnoreEOS                                         bool

	TailFreeSamplingZ float32
	TypicalP          float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	Mirostat          int
	MirostatETA       float32
	MirostatTAU       float32
	PenalizeNL        bool
	LogitBias         string
	TokenCallback     func(string) bool

	PathPromptCache             string
	MLock, MMap, PromptCacheAll bool
	PromptCacheRO               bool
	Grammar                     string
	MainGPU                     string
	TensorSplit                 string

	// Rope parameters
	RopeFreqBase  float32
	RopeFreqScale float32

	// Negative prompt parameters
	NegativePromptScale float32
	NegativePrompt      string
}

var DefaultOptions PredictOptions = PredictOptions{
	Seed:              -1,
	Threads:           4,
	Tokens:            128,
	Penalty:           1.1,
	Repeat:            64,
	Batch:             512,
	NKeep:             64,
	TopK:              40,
	TopP:              0.95,
	TailFreeSamplingZ: 1.0,
	TypicalP:          1.0,
	Temperature:       0.8,
	FrequencyPenalty:  0.0,
	PresencePenalty:   0.0,
	Mirostat:          0,
	MirostatTAU:       5.0,
	MirostatETA:       0.1,
	MMap:              true,
	RopeFreqBase:      10000,
	RopeFreqScale:     1.0,
}

func NewPredictOptions ¶

func NewPredictOptions(opts ...PredictOption) PredictOptions

Create a new PredictOptions object with the given options.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL