llama

package module
v0.0.0-...-f5c9e9f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 29, 2025 License: MIT Imports: 5 Imported by: 0

README

go-llama.cpp

Go Reference

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which appears unmaintained. We're keeping it current with the latest llama.cpp developments and Go best practices.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/go-llama.cpp.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/go-llama.cpp
cd go-llama.cpp

See examples/README.md for complete build and usage instructions.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/go-llama.cpp"
)

func main() {
    model, err := llama.LoadModel(
        "/path/to/model.gguf",
        llama.WithF16Memory(),
        llama.WithContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Close()

    response, err := model.Generate("Hello world", llama.WithMaxTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

Remember to set the required environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

Important: The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides
  • Getting started - Complete walkthrough from installation to first inference
  • Building guide - Build options, hardware acceleration, and troubleshooting
  • API guide - Detailed usage examples, configuration options, and thread safety
Quick references

Requirements

  • Go 1.25+
  • Docker (recommended) or C++ compiler with CMake
  • Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping computational work in the optimised llama.cpp C++ code whilst providing a clean Go API. This approach minimises CGO overhead whilst maximising performance.

Key components:

  • wrapper.cpp/wrapper.h - CGO interface to llama.cpp with comprehensive error handling
  • model.go - Main Go API with functional options pattern
  • llama.cpp/ - Git submodule containing upstream llama.cpp library

The design uses modern Go patterns including functional options for configuration, proper resource management with finalizers, and streaming callbacks using cgo.Handle for safe Go-C interaction.

Future goals

This fork aims to provide a stable, well-tested Go interface to llama.cpp that keeps pace with upstream developments. Key objectives:

Short term (current release cycle)
  • Comprehensive testing: Automated test suite covering all functionality including streaming, speculative sampling, and embeddings
  • CUDA testing: Automated CI testing against GPU-enabled builds
  • API stability: Ensure backwards compatibility as the library matures
Medium term (next few releases)
  • Performance optimisation: Profile and optimise the Go-C boundary for minimal overhead
  • Model compatibility: Test against diverse model formats and architectures
  • Platform support: Verify builds across different architectures and operating systems
Long term (ongoing)
  • Upstream tracking: Automated monitoring and testing against new llama.cpp releases
  • Go ecosystem integration: Better integration with Go HTTP servers, WebSocket streaming
  • Developer experience: Improved error messages, debugging tools, and comprehensive examples

The project prioritises reliability and maintainability over rapid feature addition. Each new llama.cpp feature is carefully integrated with proper error handling and Go-friendly APIs.

Contributing

We welcome contributions! For development:

# Format and static analysis
go fmt ./...
go vet ./...

# Run tests (requires model file)
TEST_MODEL="/path/to/test.gguf" make test

# Pre-commit checks
prek run --all-files

See our building guide for development environment setup.

Licence

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GenerateOption

type GenerateOption func(*generateConfig)

GenerateOption configures text generation

func WithDebug

func WithDebug() GenerateOption

func WithDraftTokens

func WithDraftTokens(n int) GenerateOption

func WithMaxTokens

func WithMaxTokens(n int) GenerateOption

Generation options

func WithSeed

func WithSeed(seed int) GenerateOption

func WithStopWords

func WithStopWords(words ...string) GenerateOption

func WithTemperature

func WithTemperature(t float32) GenerateOption

func WithTopK

func WithTopK(k int) GenerateOption

func WithTopP

func WithTopP(p float32) GenerateOption

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model represents a loaded LLAMA model with its context

func LoadModel

func LoadModel(path string, opts ...ModelOption) (*Model, error)

LoadModel loads a GGUF model from the specified path

func (*Model) Close

func (m *Model) Close() error

Close frees the model and its associated resources

func (*Model) Generate

func (m *Model) Generate(prompt string, opts ...GenerateOption) (string, error)

Generate generates text from the given prompt

func (*Model) GenerateStream

func (m *Model) GenerateStream(prompt string, callback func(token string) bool, opts ...GenerateOption) error

GenerateStream generates text with streaming output via callback

func (*Model) GenerateWithDraft

func (m *Model) GenerateWithDraft(prompt string, draft *Model, opts ...GenerateOption) (string, error)

GenerateWithDraft performs speculative generation using a draft model

func (*Model) GenerateWithDraftStream

func (m *Model) GenerateWithDraftStream(prompt string, draft *Model, callback func(token string) bool, opts ...GenerateOption) error

GenerateWithDraftStream performs speculative generation with streaming output

func (*Model) GetEmbeddings

func (m *Model) GetEmbeddings(text string) ([]float32, error)

GetEmbeddings computes embeddings for the given text

func (*Model) Tokenize

func (m *Model) Tokenize(text string) ([]int32, error)

Tokenize converts text to tokens

type ModelOption

type ModelOption func(*modelConfig)

ModelOption configures model loading

func WithBatch

func WithBatch(size int) ModelOption

func WithContext

func WithContext(size int) ModelOption

Model loading options

func WithEmbeddings

func WithEmbeddings() ModelOption

func WithF16Memory

func WithF16Memory() ModelOption

func WithGPULayers

func WithGPULayers(n int) ModelOption

func WithMLock

func WithMLock() ModelOption

func WithMMap

func WithMMap(enabled bool) ModelOption

func WithMainGPU

func WithMainGPU(gpu string) ModelOption

func WithTensorSplit

func WithTensorSplit(split string) ModelOption

func WithThreads

func WithThreads(n int) ModelOption

Directories

Path Synopsis
examples
embedding command
simple command
speculative command
streaming command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL