llama

package module

v0.0.0-...-4c33416 Latest Latest Go to latest Published: Oct 5, 2025 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/tcpipuk/llama-go

Links

Open Source Insights

README ¶

llama-go: Run LLMs locally with Go

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which hasn't been maintained since October 2023. The goal is keeping Go developers up-to-date with llama.cpp whilst offering a lighter, more performant alternative to Python-based ML stacks like PyTorch and/or vLLM.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/llama-go.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/llama-go
cd llama-go

# Build the library
make libbinding.a

# Download a test model
wget https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf

# Run an example
export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD
go run ./examples/simple -m Qwen3-0.6B-Q8_0.gguf -p "Hello world" -n 50

See getting started guide for detailed instructions and examples for more usage patterns.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/llama-go"
)

func main() {
    model, err := llama.LoadModel(
        "/path/to/model.gguf",
        llama.WithF16Memory(),
        llama.WithContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Close()

    response, err := model.Generate("Hello world", llama.WithMaxTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

When building, set these environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides

Getting started - Complete walkthrough from installation to first inference
Building guide - Build options, hardware acceleration, and troubleshooting
API guide - Detailed usage examples, configuration options, and thread safety

Quick references

Examples - CLI example with build instructions and usage options
Go package documentation - Full API reference
RELEASE.md - Release process and compatibility tracking

Requirements

Go 1.25+
Docker (recommended) or C++ compiler with CMake
Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping the heavy computation in llama.cpp's optimised C++ code whilst providing a clean Go API. This minimises CGO overhead whilst maximising performance.

Key components:

wrapper.cpp/wrapper.h - CGO interface to llama.cpp
model.go - Go API using functional options pattern
llama.cpp/ - Git submodule containing upstream llama.cpp

The design uses functional options for configuration, resource management with finalizers, and streaming callbacks via cgo.Handle for safe Go-C interaction.

Licence

MIT

Documentation ¶

Index ¶

type GenerateOption
type Model
- func LoadModel(path string, opts ...ModelOption) (*Model, error)
type ModelOption

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type GenerateOption ¶

type GenerateOption func(*generateConfig)

GenerateOption configures text generation

func WithDebug ¶

func WithDebug() GenerateOption

func WithDraftTokens ¶

func WithDraftTokens(n int) GenerateOption

func WithMaxTokens ¶

func WithMaxTokens(n int) GenerateOption

Generation options

func WithSeed ¶

func WithSeed(seed int) GenerateOption

func WithStopWords ¶

func WithStopWords(words ...string) GenerateOption

func WithTemperature ¶

func WithTemperature(t float32) GenerateOption

func WithTopK ¶

func WithTopK(k int) GenerateOption

func WithTopP ¶

func WithTopP(p float32) GenerateOption

type Model ¶

type Model struct {
	// contains filtered or unexported fields
}

Model represents a loaded LLAMA model with its context pool

func LoadModel ¶

func LoadModel(path string, opts ...ModelOption) (*Model, error)

LoadModel loads a GGUF model from the specified path

func (*Model) Close ¶

func (m *Model) Close() error

Close frees the model and its associated resources

func (*Model) Generate ¶

func (m *Model) Generate(prompt string, opts ...GenerateOption) (string, error)

Generate generates text from the given prompt

func (*Model) GenerateStream ¶

func (m *Model) GenerateStream(prompt string, callback func(token string) bool, opts ...GenerateOption) error

GenerateStream generates text with streaming output via callback

func (*Model) GenerateWithDraft ¶

func (m *Model) GenerateWithDraft(prompt string, draft *Model, opts ...GenerateOption) (string, error)

GenerateWithDraft performs speculative generation using a draft model

func (*Model) GenerateWithDraftStream ¶

func (m *Model) GenerateWithDraftStream(prompt string, draft *Model, callback func(token string) bool, opts ...GenerateOption) error

GenerateWithDraftStream performs speculative generation with streaming output

func (*Model) GetEmbeddings ¶

func (m *Model) GetEmbeddings(text string) ([]float32, error)

GetEmbeddings computes embeddings for the given text

func (*Model) Tokenize ¶

func (m *Model) Tokenize(text string) ([]int32, error)

Tokenize converts text to tokens

type ModelOption ¶

type ModelOption func(*modelConfig)

ModelOption configures model loading

func WithBatch ¶

func WithBatch(size int) ModelOption

func WithContext ¶

func WithContext(size int) ModelOption

Model loading options

func WithEmbeddings ¶

func WithEmbeddings() ModelOption

func WithF16Memory ¶

func WithF16Memory() ModelOption

func WithGPULayers ¶

func WithGPULayers(n int) ModelOption

func WithIdleTimeout ¶

func WithIdleTimeout(d time.Duration) ModelOption

func WithMLock ¶

func WithMLock() ModelOption

func WithMMap ¶

func WithMMap(enabled bool) ModelOption

func WithMainGPU ¶

func WithMainGPU(gpu string) ModelOption

func WithPoolSize ¶

func WithPoolSize(min, max int) ModelOption

func WithTensorSplit ¶

func WithTensorSplit(split string) ModelOption

func WithThreads ¶

func WithThreads(n int) ModelOption

Source Files ¶

View all Source files

model.go

Directories ¶

Path	Synopsis
examples
embedding command
simple command
speculative command
streaming command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL