llamacpp

package
v0.0.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 1, 2026 License: Apache-2.0 Imports: 15 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractThinkingBlocks

func ExtractThinkingBlocks(text string) []string

ExtractThinkingBlocks returns all thinking blocks found in the text without modifying the original text

func HasThinkingTags

func HasThinkingTags(text string) bool

HasThinkingTags checks if the text contains any thinking/reasoning tags

func IsThinkingModelName

func IsThinkingModelName(modelInfo string) bool

IsThinkingModelName checks if a model name/architecture suggests it's a reasoning model

func IsThinkingTemplate

func IsThinkingTemplate(template string) bool

IsThinkingTemplate checks if a chat template string suggests it's a reasoning template by looking for common thinking-related patterns

func StripThinking

func StripThinking(text string) string

StripThinking removes all thinking/reasoning tags from text leaving only the final response content

Types

type Llama

type Llama struct {
	sync.RWMutex

	*store.Store
	// contains filtered or unexported fields
}

Llama is a singleton that manages the llama.cpp runtime and models

func New

func New(path string, opts ...Opt) (*Llama, error)

New creates or returns the singleton Llama instance

func (*Llama) Chat added in v0.0.5

func (l *Llama) Chat(ctx context.Context, req schema.ChatRequest, onChunk func(schema.ChatChunk) error) (result *schema.ChatResponse, err error)

Chat generates a response for the given chat messages. If onChunk is provided, it will be called for each generated token. The callback can stop generation early by returning an error.

func (*Llama) Close

func (l *Llama) Close() error

Close releases all resources and cleans up the llama.cpp runtime

func (*Llama) Complete

func (l *Llama) Complete(ctx context.Context, req schema.CompletionRequest, onChunk func(schema.CompletionChunk) error) (result *schema.CompletionResponse, err error)

Complete generates a completion for the given prompt. If onChunk is provided, it will be called for each generated token. The callback can stop generation early by returning an error.

func (*Llama) DeleteModel

func (l *Llama) DeleteModel(ctx context.Context, name string) (err error)

DeleteModel deletes a model from disk and removes it from the cache if loaded.

func (*Llama) Detokenize

func (l *Llama) Detokenize(ctx context.Context, req schema.DetokenizeRequest) (result *schema.DetokenizeResponse, err error)

Detokenize converts tokens back to text using the specified model. Loads the model if not already cached.

func (*Llama) Embed

func (l *Llama) Embed(ctx context.Context, req schema.EmbedRequest) (result *schema.EmbedResponse, err error)

Embed generates embeddings for one or more texts. Loads the model if not already cached, creates a context with embeddings enabled, and computes embeddings for all input texts in a single batch.

func (*Llama) GPUInfo

func (l *Llama) GPUInfo(ctx context.Context) *schema.GPUInfo

GPUInfo returns information about available GPU devices and the backend.

func (*Llama) GetModel

func (l *Llama) GetModel(ctx context.Context, name string) (result *schema.CachedModel, err error)

GetModel returns a model by name as a CachedModel. If the model is loaded, returns the cached version with LoadedAt and Handle. If not loaded, returns a CachedModel with zero timestamp and nil Handle.

func (*Llama) ListModels

func (l *Llama) ListModels(ctx context.Context) (result []*schema.CachedModel, err error)

ListModels returns all models in the store as CachedModel structures. Models that are loaded have their LoadedAt timestamp and Handle set. Models that are not loaded have zero timestamp and nil Handle.

func (*Llama) LoadModel

func (l *Llama) LoadModel(ctx context.Context, req schema.LoadModelRequest) (result *schema.CachedModel, err error)

LoadModel loads a model into memory with the given parameters. Returns a CachedModel with the model handle and load timestamp. If the model is already cached, returns the existing cached model.

func (*Llama) PullModel

func (l *Llama) PullModel(ctx context.Context, req schema.PullModelRequest, fn PullCallback) (result *schema.CachedModel, err error)

PullModel downloads a model from the given URL and returns the cached model.

func (*Llama) Tokenize

func (l *Llama) Tokenize(ctx context.Context, req schema.TokenizeRequest) (result *schema.TokenizeResponse, err error)

Tokenize converts text to tokens using the specified model. Loads the model if not already cached.

func (*Llama) UnloadModel

func (l *Llama) UnloadModel(ctx context.Context, name string) (result *schema.CachedModel, err error)

UnloadModel unloads a model from memory and removes it from the cache. Returns the model (now uncached with zero timestamp) and any error.

func (*Llama) WithContext

func (l *Llama) WithContext(ctx context.Context, req schema.ContextRequest, fn TaskFunc) (err error)

WithContext loads a model (if not already cached), creates an inference context, and calls the function with a Task containing both. The context is freed after the callback returns, but the model remains loaded. Use this for operations that need a context (e.g., completion, embeddings).

Thread-safety: The callback is responsible for acquiring the model's mutex if needed. Use task.CachedModel().Lock()/Unlock() for operations that are not thread-safe (most llama.cpp operations).

func (*Llama) WithModel

func (l *Llama) WithModel(ctx context.Context, req schema.LoadModelRequest, fn TaskFunc) (err error)

WithModel loads a model (if not already cached) and calls the function with a Task containing the model. The model remains loaded after the callback returns. Use this for operations that only need the model (e.g., tokenization, metadata access).

Thread-safety: The callback is responsible for acquiring the model's mutex if needed. Use task.CachedModel().Lock()/Unlock() for operations that are not thread-safe (most llama.cpp operations).

type Opt

type Opt func(*opt) error

func WithTracer

func WithTracer(tracer trace.Tracer) Opt

WithTracer sets the OpenTelemetry tracer for distributed tracing of model loading, unloading, and other operations.

type PullCallback

type PullCallback func(filename string, bytes_received uint64, total_bytes uint64) error

PullCallback defines the callback function signature for progress updates during model downloads

type ReasoningResult

type ReasoningResult struct {
	// Thinking contains the model's reasoning/thinking process
	// This is typically hidden from end users
	Thinking string

	// Content contains the final response after reasoning
	// This is what should be shown to users
	Content string

	// HasThinking indicates whether thinking tags were found
	HasThinking bool
}

ReasoningResult contains the parsed output from a reasoning model

func ParseReasoning

func ParseReasoning(text string) ReasoningResult

ParseReasoning extracts thinking/reasoning blocks from model output It supports multiple common formats:

  • <think>...</think> (DeepSeek R1)
  • <reasoning>...</reasoning>
  • <scratchpad>...</scratchpad>
  • <thought>...</thought>
  • <internal>...</internal>

Returns a ReasoningResult with separated thinking and content

func ParseReasoningWithTag

func ParseReasoningWithTag(text, tag string) ReasoningResult

ParseReasoningWithTag extracts content from a specific tag pattern tag should be the tag name without brackets (e.g., "think" not "<think>")

type Task

type Task struct {
	// contains filtered or unexported fields
}

Task provides access to a loaded model and optionally a context for inference operations. Tasks are created via WithModel or WithContext and are valid only within the callback function.

func (*Task) CachedModel

func (t *Task) CachedModel() *schema.CachedModel

CachedModel returns the full CachedModel metadata.

func (*Task) Context

func (t *Task) Context() *llamacpp.Context

Context returns the underlying llamacpp Context, or nil if this task was created with WithModel (no context).

func (*Task) Model

func (t *Task) Model() *llamacpp.Model

Model returns the underlying llamacpp Model handle.

type TaskFunc

type TaskFunc func(context.Context, *Task) error

TaskFunc is a callback function that receives a context and Task.

Directories

Path Synopsis
Package httpclient provides a typed Go client for consuming the go-llama REST API.
Package httpclient provides a typed Go client for consuming the go-llama REST API.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL