llama

package module
v0.0.0-...-835a7cf Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 29, 2025 License: MIT Imports: 6 Imported by: 0

README

go-llama.cpp

Go Reference

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which appears unmaintained. We're keeping it current with the latest llama.cpp developments and Go best practices.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/go-llama.cpp.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/go-llama.cpp
cd go-llama.cpp

See examples/README.md for complete build and usage instructions.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/go-llama.cpp"
)

func main() {
    model, err := llama.New(
        "/path/to/model.gguf",
        llama.EnableF16Memory,
        llama.SetContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Free()

    response, err := model.Predict("Hello world", llama.SetTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

Remember to set the required environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

Important: The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides
  • Getting started - Complete walkthrough from installation to first inference
  • Building guide - Build options, hardware acceleration, and troubleshooting
  • API guide - Detailed usage examples, configuration options, and thread safety
Quick references

Requirements

  • Go 1.25+
  • Docker (recommended) or C++ compiler with CMake
  • Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping computational work in the optimised llama.cpp C++ code whilst providing a clean Go API. This approach minimises CGO overhead whilst maximising performance.

Key components:

  • binding.cpp/binding.h - CGO interface to llama.cpp
  • llama.go - Main Go API with functional options
  • options.go - Configuration using the functional options pattern
  • llama.cpp/ - Git submodule containing upstream llama.cpp

Contributing

We welcome contributions! For development:

# Format and static analysis
go fmt ./...
go vet ./...

# Run tests (requires model file)
TEST_MODEL="/path/to/test.gguf" make test

# Pre-commit checks
prek run --all-files

See our building guide for development environment setup.

Licence

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type LLama

type LLama struct {
	// contains filtered or unexported fields
}

func New

func New(model string, opts ...ModelOption) (*LLama, error)

func (*LLama) Embeddings

func (l *LLama) Embeddings(text string, opts ...PredictOption) ([]float32, error)

Embeddings

func (*LLama) Eval

func (l *LLama) Eval(text string, opts ...PredictOption) error

func (*LLama) Free

func (l *LLama) Free()

func (*LLama) LoadState

func (l *LLama) LoadState(state string) error

func (*LLama) Predict

func (l *LLama) Predict(text string, opts ...PredictOption) (string, error)

func (*LLama) SaveState

func (l *LLama) SaveState(dst string) error

func (*LLama) SetTokenCallback

func (l *LLama) SetTokenCallback(callback func(token string) bool)

SetTokenCallback registers a callback for the individual tokens created when running Predict. It will be called once for each token. The callback shall return true as long as the model should continue predicting the next token. When the callback returns false the predictor will return. The tokens are just converted into Go strings, they are not trimmed or otherwise changed. Also the tokens may not be valid UTF-8. Pass in nil to remove a callback.

It is save to call this method while a prediction is running.

func (*LLama) SpeculativeSampling

func (l *LLama) SpeculativeSampling(ll *LLama, text string, opts ...PredictOption) (string, error)

func (*LLama) TokenEmbeddings

func (l *LLama) TokenEmbeddings(tokens []int, opts ...PredictOption) ([]float32, error)

Token Embeddings

func (*LLama) TokenizeString

func (l *LLama) TokenizeString(text string, opts ...PredictOption) (int32, []int32, error)

tokenize has an interesting return property: negative lengths (potentially) have meaning. Therefore, return the length seperate from the slice and error - all three can be used together

type ModelOption

type ModelOption func(p *ModelOptions)
var EnabelLowVRAM ModelOption = func(p *ModelOptions) {
	p.LowVRAM = true
}
var EnableEmbeddings ModelOption = func(p *ModelOptions) {
	p.Embeddings = true
}
var EnableF16Memory ModelOption = func(p *ModelOptions) {
	p.F16Memory = true
}
var EnableMLock ModelOption = func(p *ModelOptions) {
	p.MLock = true
}
var EnableNUMA ModelOption = func(p *ModelOptions) {
	p.NUMA = true
}

func SetContext

func SetContext(c int) ModelOption

SetContext sets the context size.

func SetGPULayers

func SetGPULayers(n int) ModelOption

SetGPULayers sets the number of GPU layers to use to offload computation

func SetLoraAdapter

func SetLoraAdapter(s string) ModelOption

func SetLoraBase

func SetLoraBase(s string) ModelOption

func SetMMap

func SetMMap(b bool) ModelOption

SetContext sets the context size.

func SetMainGPU

func SetMainGPU(maingpu string) ModelOption

SetMainGPU sets the main_gpu

func SetModelSeed

func SetModelSeed(c int) ModelOption

func SetMulMatQ

func SetMulMatQ(b bool) ModelOption

func SetNBatch

func SetNBatch(n_batch int) ModelOption

SetNBatch sets the n_Batch

func SetPerplexity

func SetPerplexity(b bool) ModelOption

func SetTensorSplit

func SetTensorSplit(maingpu string) ModelOption

Set sets the tensor split for the GPU

func WithRopeFreqBase

func WithRopeFreqBase(f float32) ModelOption

func WithRopeFreqScale

func WithRopeFreqScale(f float32) ModelOption

type ModelOptions

type ModelOptions struct {
	ContextSize   int
	Seed          int
	NBatch        int
	F16Memory     bool
	MLock         bool
	MMap          bool
	LowVRAM       bool
	Embeddings    bool
	NUMA          bool
	NGPULayers    int
	MainGPU       string
	TensorSplit   string
	FreqRopeBase  float32
	FreqRopeScale float32
	MulMatQ       *bool
	LoraBase      string
	LoraAdapter   string
	Perplexity    bool
}
var DefaultModelOptions ModelOptions = ModelOptions{
	ContextSize:   512,
	Seed:          0,
	F16Memory:     false,
	MLock:         false,
	Embeddings:    false,
	MMap:          true,
	LowVRAM:       false,
	NBatch:        512,
	FreqRopeBase:  10000,
	FreqRopeScale: 1.0,
}

func NewModelOptions

func NewModelOptions(opts ...ModelOption) ModelOptions

Create a new PredictOptions object with the given options.

type PredictOption

type PredictOption func(p *PredictOptions)
var Debug PredictOption = func(p *PredictOptions) {
	p.DebugMode = true
}
var EnableF16KV PredictOption = func(p *PredictOptions) {
	p.F16KV = true
}
var EnablePromptCacheAll PredictOption = func(p *PredictOptions) {
	p.PromptCacheAll = true
}
var EnablePromptCacheRO PredictOption = func(p *PredictOptions) {
	p.PromptCacheRO = true
}
var IgnoreEOS PredictOption = func(p *PredictOptions) {
	p.IgnoreEOS = true
}

func SetBatch

func SetBatch(size int) PredictOption

SetBatch sets the batch size.

func SetFrequencyPenalty

func SetFrequencyPenalty(fp float32) PredictOption

SetFrequencyPenalty sets the frequency penalty parameter, freq_penalty.

func SetLogitBias

func SetLogitBias(lb string) PredictOption

SetLogitBias sets the logit bias parameter.

func SetMemoryMap

func SetMemoryMap(b bool) PredictOption

SetMemoryMap sets memory mapping.

func SetMirostat

func SetMirostat(m int) PredictOption

SetMirostat sets the mirostat parameter.

func SetMirostatETA

func SetMirostatETA(me float32) PredictOption

SetMirostatETA sets the mirostat ETA parameter.

func SetMirostatTAU

func SetMirostatTAU(mt float32) PredictOption

SetMirostatTAU sets the mirostat TAU parameter.

func SetMlock

func SetMlock(b bool) PredictOption

SetMlock sets the memory lock.

func SetNDraft

func SetNDraft(nd int) PredictOption

func SetNKeep

func SetNKeep(n int) PredictOption

SetKeep sets the number of tokens from initial prompt to keep.

func SetNegativePrompt

func SetNegativePrompt(np string) PredictOption

func SetNegativePromptScale

func SetNegativePromptScale(nps float32) PredictOption

func SetPathPromptCache

func SetPathPromptCache(f string) PredictOption

SetPathPromptCache sets the session file to store the prompt cache.

func SetPenalizeNL

func SetPenalizeNL(pnl bool) PredictOption

SetPenalizeNL sets whether to penalize newlines or not.

func SetPenalty

func SetPenalty(penalty float32) PredictOption

SetPenalty sets the repetition penalty for text generation.

func SetPredictionMainGPU

func SetPredictionMainGPU(maingpu string) PredictOption

SetPredictionMainGPU sets the main_gpu

func SetPredictionTensorSplit

func SetPredictionTensorSplit(maingpu string) PredictOption

SetPredictionTensorSplit sets the tensor split for the GPU

func SetPresencePenalty

func SetPresencePenalty(pp float32) PredictOption

SetPresencePenalty sets the presence penalty parameter, presence_penalty.

func SetRepeat

func SetRepeat(repeat int) PredictOption

SetRepeat sets the number of times to repeat text generation.

func SetRopeFreqBase

func SetRopeFreqBase(rfb float32) PredictOption

Rope and negative prompt parameters

func SetRopeFreqScale

func SetRopeFreqScale(rfs float32) PredictOption

func SetSeed

func SetSeed(seed int) PredictOption

SetSeed sets the random seed for sampling text generation.

func SetStopWords

func SetStopWords(stop ...string) PredictOption

SetStopWords sets the prompts that will stop predictions.

func SetTailFreeSamplingZ

func SetTailFreeSamplingZ(tfz float32) PredictOption

SetTailFreeSamplingZ sets the tail free sampling, parameter z.

func SetTemperature

func SetTemperature(temp float32) PredictOption

SetTemperature sets the temperature value for text generation.

func SetThreads

func SetThreads(threads int) PredictOption

SetThreads sets the number of threads to use for text generation.

func SetTokenCallback

func SetTokenCallback(fn func(string) bool) PredictOption

SetTokenCallback sets the prompts that will stop predictions.

func SetTokens

func SetTokens(tokens int) PredictOption

SetTokens sets the number of tokens to generate.

func SetTopK

func SetTopK(topk int) PredictOption

SetTopK sets the value for top-K sampling.

func SetTopP

func SetTopP(topp float32) PredictOption

SetTopP sets the value for nucleus sampling.

func SetTypicalP

func SetTypicalP(tp float32) PredictOption

SetTypicalP sets the typicality parameter, p_typical.

func WithGrammar

func WithGrammar(s string) PredictOption

WithGrammar sets the grammar to constrain the output of the LLM response

type PredictOptions

type PredictOptions struct {
	Seed, Threads, Tokens, TopK, Repeat, Batch, NKeep int
	TopP, Temperature, Penalty                        float32
	NDraft                                            int
	F16KV                                             bool
	DebugMode                                         bool
	StopPrompts                                       []string
	IgnoreEOS                                         bool

	TailFreeSamplingZ float32
	TypicalP          float32
	FrequencyPenalty  float32
	PresencePenalty   float32
	Mirostat          int
	MirostatETA       float32
	MirostatTAU       float32
	PenalizeNL        bool
	LogitBias         string
	TokenCallback     func(string) bool

	PathPromptCache             string
	MLock, MMap, PromptCacheAll bool
	PromptCacheRO               bool
	Grammar                     string
	MainGPU                     string
	TensorSplit                 string

	// Rope parameters
	RopeFreqBase  float32
	RopeFreqScale float32

	// Negative prompt parameters
	NegativePromptScale float32
	NegativePrompt      string
}
var DefaultOptions PredictOptions = PredictOptions{
	Seed:              -1,
	Threads:           4,
	Tokens:            128,
	Penalty:           1.1,
	Repeat:            64,
	Batch:             512,
	NKeep:             64,
	TopK:              40,
	TopP:              0.95,
	TailFreeSamplingZ: 1.0,
	TypicalP:          1.0,
	Temperature:       0.8,
	FrequencyPenalty:  0.0,
	PresencePenalty:   0.0,
	Mirostat:          0,
	MirostatTAU:       5.0,
	MirostatETA:       0.1,
	MMap:              true,
	RopeFreqBase:      10000,
	RopeFreqScale:     1.0,
}

func NewPredictOptions

func NewPredictOptions(opts ...PredictOption) PredictOptions

Create a new PredictOptions object with the given options.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL