llama

package module

v0.0.0-...-f5c9e9f Latest Latest Go to latest Published: Sep 29, 2025 License: MIT Imports: 5 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tcpipuk/go-llama.cpp

Links

Open Source Insights

README ¶

go-llama.cpp

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which appears unmaintained. We're keeping it current with the latest llama.cpp developments and Go best practices.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/go-llama.cpp.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/go-llama.cpp
cd go-llama.cpp

See examples/README.md for complete build and usage instructions.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/go-llama.cpp"
)

func main() {
    model, err := llama.LoadModel(
        "/path/to/model.gguf",
        llama.WithF16Memory(),
        llama.WithContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Close()

    response, err := model.Generate("Hello world", llama.WithMaxTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

Remember to set the required environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

Important: The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides

Getting started - Complete walkthrough from installation to first inference
Building guide - Build options, hardware acceleration, and troubleshooting
API guide - Detailed usage examples, configuration options, and thread safety

Quick references

Examples - CLI example with build instructions and usage options
Go package documentation - Full API reference
RELEASE.md - Release process and compatibility tracking

Requirements

Go 1.25+
Docker (recommended) or C++ compiler with CMake
Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping computational work in the optimised llama.cpp C++ code whilst providing a clean Go API. This approach minimises CGO overhead whilst maximising performance.

Key components:

wrapper.cpp/wrapper.h - CGO interface to llama.cpp with comprehensive error handling
model.go - Main Go API with functional options pattern
llama.cpp/ - Git submodule containing upstream llama.cpp library

The design uses modern Go patterns including functional options for configuration, proper resource management with finalizers, and streaming callbacks using cgo.Handle for safe Go-C interaction.

Future goals

This fork aims to provide a stable, well-tested Go interface to llama.cpp that keeps pace with upstream developments. Key objectives:

Short term (current release cycle)

Comprehensive testing: Automated test suite covering all functionality including streaming, speculative sampling, and embeddings
CUDA testing: Automated CI testing against GPU-enabled builds
API stability: Ensure backwards compatibility as the library matures

Medium term (next few releases)

Performance optimisation: Profile and optimise the Go-C boundary for minimal overhead
Model compatibility: Test against diverse model formats and architectures
Platform support: Verify builds across different architectures and operating systems

Long term (ongoing)

Upstream tracking: Automated monitoring and testing against new llama.cpp releases
Go ecosystem integration: Better integration with Go HTTP servers, WebSocket streaming
Developer experience: Improved error messages, debugging tools, and comprehensive examples

The project prioritises reliability and maintainability over rapid feature addition. Each new llama.cpp feature is carefully integrated with proper error handling and Go-friendly APIs.

Contributing

We welcome contributions! For development:

# Format and static analysis
go fmt ./...
go vet ./...

# Run tests (requires model file)
TEST_MODEL="/path/to/test.gguf" make test

# Pre-commit checks
prek run --all-files

See our building guide for development environment setup.

Licence

MIT

Documentation ¶

Index ¶

type GenerateOption
type Model
- func LoadModel(path string, opts ...ModelOption) (*Model, error)
type ModelOption

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type GenerateOption ¶

type GenerateOption func(*generateConfig)

GenerateOption configures text generation

func WithDebug ¶

func WithDebug() GenerateOption

func WithDraftTokens ¶

func WithDraftTokens(n int) GenerateOption

func WithMaxTokens ¶

func WithMaxTokens(n int) GenerateOption

Generation options

func WithSeed ¶

func WithSeed(seed int) GenerateOption

func WithStopWords ¶

func WithStopWords(words ...string) GenerateOption

func WithTemperature ¶

func WithTemperature(t float32) GenerateOption

func WithTopK ¶

func WithTopK(k int) GenerateOption

func WithTopP ¶

func WithTopP(p float32) GenerateOption

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model represents a loaded LLAMA model with its context

func LoadModel ¶

func LoadModel(path string, opts ...ModelOption) (*Model, error)

LoadModel loads a GGUF model from the specified path

func (*Model) Close ¶

func (m *Model) Close() error

Close frees the model and its associated resources

func (*Model) Generate ¶

func (m *Model) Generate(prompt string, opts ...GenerateOption) (string, error)

Generate generates text from the given prompt

func (*Model) GenerateStream ¶

func (m *Model) GenerateStream(prompt string, callback func(token string) bool, opts ...GenerateOption) error

GenerateStream generates text with streaming output via callback

func (*Model) GenerateWithDraft ¶

func (m *Model) GenerateWithDraft(prompt string, draft *Model, opts ...GenerateOption) (string, error)

GenerateWithDraft performs speculative generation using a draft model

func (*Model) GenerateWithDraftStream ¶

func (m *Model) GenerateWithDraftStream(prompt string, draft *Model, callback func(token string) bool, opts ...GenerateOption) error

GenerateWithDraftStream performs speculative generation with streaming output

func (*Model) GetEmbeddings ¶

func (m *Model) GetEmbeddings(text string) ([]float32, error)

GetEmbeddings computes embeddings for the given text

func (*Model) Tokenize ¶

func (m *Model) Tokenize(text string) ([]int32, error)

Tokenize converts text to tokens

type ModelOption ¶

type ModelOption func(*modelConfig)

ModelOption configures model loading

func WithBatch ¶

func WithBatch(size int) ModelOption

func WithContext ¶

func WithContext(size int) ModelOption

Model loading options

func WithEmbeddings ¶

func WithEmbeddings() ModelOption

func WithF16Memory ¶

func WithF16Memory() ModelOption

func WithGPULayers ¶

func WithGPULayers(n int) ModelOption

func WithMLock ¶

func WithMLock() ModelOption

func WithMMap ¶

func WithMMap(enabled bool) ModelOption

func WithMainGPU ¶

func WithMainGPU(gpu string) ModelOption

func WithTensorSplit ¶

func WithTensorSplit(split string) ModelOption

func WithThreads ¶

func WithThreads(n int) ModelOption

Source Files ¶

View all Source files

model.go

Directories ¶

Path	Synopsis
examples
embedding command
simple command
speculative command
streaming command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL