llama

package module
v0.0.0-...-4c33416 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 5, 2025 License: MIT Imports: 7 Imported by: 0

README

llama-go: Run LLMs locally with Go

Go Reference

Go bindings for llama.cpp, enabling you to run large language models locally with hardware acceleration support. Integrate LLM inference directly into Go applications with a clean, idiomatic API.

This is an active fork of go-skynet/go-llama.cpp, which hasn't been maintained since October 2023. The goal is keeping Go developers up-to-date with llama.cpp whilst offering a lighter, more performant alternative to Python-based ML stacks like PyTorch and/or vLLM.

Note: Historical tags use the original module path github.com/go-skynet/go-llama.cpp. For new development, use github.com/tcpipuk/llama-go.

Releases: This fork's tags follow llama.cpp releases using the format llama.cpp-{tag} (e.g. llama.cpp-b6603). This ensures compatibility tracking with the underlying C++ library.

Quick start

# Clone with submodules
git clone --recurse-submodules https://github.com/tcpipuk/llama-go
cd llama-go

# Build the library
make libbinding.a

# Download a test model
wget https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf

# Run an example
export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD
go run ./examples/simple -m Qwen3-0.6B-Q8_0.gguf -p "Hello world" -n 50

See getting started guide for detailed instructions and examples for more usage patterns.

Basic usage

package main

import (
    "fmt"
    llama "github.com/tcpipuk/llama-go"
)

func main() {
    model, err := llama.LoadModel(
        "/path/to/model.gguf",
        llama.WithF16Memory(),
        llama.WithContext(512),
    )
    if err != nil {
        panic(err)
    }
    defer model.Close()

    response, err := model.Generate("Hello world", llama.WithMaxTokens(50))
    if err != nil {
        panic(err)
    }

    fmt.Println(response)
}

When building, set these environment variables:

export LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD LD_LIBRARY_PATH=$PWD

Thread safety

The library is not thread-safe. For concurrent usage, implement a pool pattern as shown in the API guide.

Documentation

Essential guides
  • Getting started - Complete walkthrough from installation to first inference
  • Building guide - Build options, hardware acceleration, and troubleshooting
  • API guide - Detailed usage examples, configuration options, and thread safety
Quick references

Requirements

  • Go 1.25+
  • Docker (recommended) or C++ compiler with CMake
  • Git with submodule support

The library uses the GGUF model format, which is the current standard for llama.cpp. For legacy GGML format support, use the pre-gguf tag.

Architecture

The library bridges Go and C++ using CGO, keeping the heavy computation in llama.cpp's optimised C++ code whilst providing a clean Go API. This minimises CGO overhead whilst maximising performance.

Key components:

  • wrapper.cpp/wrapper.h - CGO interface to llama.cpp
  • model.go - Go API using functional options pattern
  • llama.cpp/ - Git submodule containing upstream llama.cpp

The design uses functional options for configuration, resource management with finalizers, and streaming callbacks via cgo.Handle for safe Go-C interaction.

Licence

MIT

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type GenerateOption

type GenerateOption func(*generateConfig)

GenerateOption configures text generation

func WithDebug

func WithDebug() GenerateOption

func WithDraftTokens

func WithDraftTokens(n int) GenerateOption

func WithMaxTokens

func WithMaxTokens(n int) GenerateOption

Generation options

func WithSeed

func WithSeed(seed int) GenerateOption

func WithStopWords

func WithStopWords(words ...string) GenerateOption

func WithTemperature

func WithTemperature(t float32) GenerateOption

func WithTopK

func WithTopK(k int) GenerateOption

func WithTopP

func WithTopP(p float32) GenerateOption

type Model

type Model struct {
	// contains filtered or unexported fields
}

Model represents a loaded LLAMA model with its context pool

func LoadModel

func LoadModel(path string, opts ...ModelOption) (*Model, error)

LoadModel loads a GGUF model from the specified path

func (*Model) Close

func (m *Model) Close() error

Close frees the model and its associated resources

func (*Model) Generate

func (m *Model) Generate(prompt string, opts ...GenerateOption) (string, error)

Generate generates text from the given prompt

func (*Model) GenerateStream

func (m *Model) GenerateStream(prompt string, callback func(token string) bool, opts ...GenerateOption) error

GenerateStream generates text with streaming output via callback

func (*Model) GenerateWithDraft

func (m *Model) GenerateWithDraft(prompt string, draft *Model, opts ...GenerateOption) (string, error)

GenerateWithDraft performs speculative generation using a draft model

func (*Model) GenerateWithDraftStream

func (m *Model) GenerateWithDraftStream(prompt string, draft *Model, callback func(token string) bool, opts ...GenerateOption) error

GenerateWithDraftStream performs speculative generation with streaming output

func (*Model) GetEmbeddings

func (m *Model) GetEmbeddings(text string) ([]float32, error)

GetEmbeddings computes embeddings for the given text

func (*Model) Tokenize

func (m *Model) Tokenize(text string) ([]int32, error)

Tokenize converts text to tokens

type ModelOption

type ModelOption func(*modelConfig)

ModelOption configures model loading

func WithBatch

func WithBatch(size int) ModelOption

func WithContext

func WithContext(size int) ModelOption

Model loading options

func WithEmbeddings

func WithEmbeddings() ModelOption

func WithF16Memory

func WithF16Memory() ModelOption

func WithGPULayers

func WithGPULayers(n int) ModelOption

func WithIdleTimeout

func WithIdleTimeout(d time.Duration) ModelOption

func WithMLock

func WithMLock() ModelOption

func WithMMap

func WithMMap(enabled bool) ModelOption

func WithMainGPU

func WithMainGPU(gpu string) ModelOption

func WithPoolSize

func WithPoolSize(min, max int) ModelOption

func WithTensorSplit

func WithTensorSplit(split string) ModelOption

func WithThreads

func WithThreads(n int) ModelOption

Directories

Path Synopsis
examples
embedding command
simple command
speculative command
streaming command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL