Documentation
¶
Index ¶
- func ExtractThinkingBlocks(text string) []string
- func HasThinkingTags(text string) bool
- func IsThinkingModelName(modelInfo string) bool
- func IsThinkingTemplate(template string) bool
- func StripThinking(text string) string
- type Llama
- func (l *Llama) Chat(ctx context.Context, req schema.ChatRequest, ...) (result *schema.ChatResponse, err error)
- func (l *Llama) Close() error
- func (l *Llama) Complete(ctx context.Context, req schema.CompletionRequest, ...) (result *schema.CompletionResponse, err error)
- func (l *Llama) DeleteModel(ctx context.Context, name string) (err error)
- func (l *Llama) Detokenize(ctx context.Context, req schema.DetokenizeRequest) (result *schema.DetokenizeResponse, err error)
- func (l *Llama) Embed(ctx context.Context, req schema.EmbedRequest) (result *schema.EmbedResponse, err error)
- func (l *Llama) GPUInfo(ctx context.Context) *schema.GPUInfo
- func (l *Llama) GetModel(ctx context.Context, name string) (result *schema.CachedModel, err error)
- func (l *Llama) ListModels(ctx context.Context) (result []*schema.CachedModel, err error)
- func (l *Llama) LoadModel(ctx context.Context, req schema.LoadModelRequest) (result *schema.CachedModel, err error)
- func (l *Llama) PullModel(ctx context.Context, req schema.PullModelRequest, fn PullCallback) (result *schema.CachedModel, err error)
- func (l *Llama) Tokenize(ctx context.Context, req schema.TokenizeRequest) (result *schema.TokenizeResponse, err error)
- func (l *Llama) UnloadModel(ctx context.Context, name string) (result *schema.CachedModel, err error)
- func (l *Llama) WithContext(ctx context.Context, req schema.ContextRequest, fn TaskFunc) (err error)
- func (l *Llama) WithModel(ctx context.Context, req schema.LoadModelRequest, fn TaskFunc) (err error)
- type Opt
- type PullCallback
- type ReasoningResult
- type Task
- type TaskFunc
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ExtractThinkingBlocks ¶
ExtractThinkingBlocks returns all thinking blocks found in the text without modifying the original text
func HasThinkingTags ¶
HasThinkingTags checks if the text contains any thinking/reasoning tags
func IsThinkingModelName ¶
IsThinkingModelName checks if a model name/architecture suggests it's a reasoning model
func IsThinkingTemplate ¶
IsThinkingTemplate checks if a chat template string suggests it's a reasoning template by looking for common thinking-related patterns
func StripThinking ¶
StripThinking removes all thinking/reasoning tags from text leaving only the final response content
Types ¶
type Llama ¶
Llama is a singleton that manages the llama.cpp runtime and models
func (*Llama) Chat ¶ added in v0.0.5
func (l *Llama) Chat(ctx context.Context, req schema.ChatRequest, onChunk func(schema.ChatChunk) error) (result *schema.ChatResponse, err error)
Chat generates a response for the given chat messages. If onChunk is provided, it will be called for each generated token. The callback can stop generation early by returning an error.
func (*Llama) Complete ¶
func (l *Llama) Complete(ctx context.Context, req schema.CompletionRequest, onChunk func(schema.CompletionChunk) error) (result *schema.CompletionResponse, err error)
Complete generates a completion for the given prompt. If onChunk is provided, it will be called for each generated token. The callback can stop generation early by returning an error.
func (*Llama) DeleteModel ¶
DeleteModel deletes a model from disk and removes it from the cache if loaded.
func (*Llama) Detokenize ¶
func (l *Llama) Detokenize(ctx context.Context, req schema.DetokenizeRequest) (result *schema.DetokenizeResponse, err error)
Detokenize converts tokens back to text using the specified model. Loads the model if not already cached.
func (*Llama) Embed ¶
func (l *Llama) Embed(ctx context.Context, req schema.EmbedRequest) (result *schema.EmbedResponse, err error)
Embed generates embeddings for one or more texts. Loads the model if not already cached, creates a context with embeddings enabled, and computes embeddings for all input texts in a single batch.
func (*Llama) GetModel ¶
GetModel returns a model by name as a CachedModel. If the model is loaded, returns the cached version with LoadedAt and Handle. If not loaded, returns a CachedModel with zero timestamp and nil Handle.
func (*Llama) ListModels ¶
ListModels returns all models in the store as CachedModel structures. Models that are loaded have their LoadedAt timestamp and Handle set. Models that are not loaded have zero timestamp and nil Handle.
func (*Llama) LoadModel ¶
func (l *Llama) LoadModel(ctx context.Context, req schema.LoadModelRequest) (result *schema.CachedModel, err error)
LoadModel loads a model into memory with the given parameters. Returns a CachedModel with the model handle and load timestamp. If the model is already cached, returns the existing cached model.
func (*Llama) PullModel ¶
func (l *Llama) PullModel(ctx context.Context, req schema.PullModelRequest, fn PullCallback) (result *schema.CachedModel, err error)
PullModel downloads a model from the given URL and returns the cached model.
func (*Llama) Tokenize ¶
func (l *Llama) Tokenize(ctx context.Context, req schema.TokenizeRequest) (result *schema.TokenizeResponse, err error)
Tokenize converts text to tokens using the specified model. Loads the model if not already cached.
func (*Llama) UnloadModel ¶
func (l *Llama) UnloadModel(ctx context.Context, name string) (result *schema.CachedModel, err error)
UnloadModel unloads a model from memory and removes it from the cache. Returns the model (now uncached with zero timestamp) and any error.
func (*Llama) WithContext ¶
func (l *Llama) WithContext(ctx context.Context, req schema.ContextRequest, fn TaskFunc) (err error)
WithContext loads a model (if not already cached), creates an inference context, and calls the function with a Task containing both. The context is freed after the callback returns, but the model remains loaded. Use this for operations that need a context (e.g., completion, embeddings).
Thread-safety: The callback is responsible for acquiring the model's mutex if needed. Use task.CachedModel().Lock()/Unlock() for operations that are not thread-safe (most llama.cpp operations).
func (*Llama) WithModel ¶
func (l *Llama) WithModel(ctx context.Context, req schema.LoadModelRequest, fn TaskFunc) (err error)
WithModel loads a model (if not already cached) and calls the function with a Task containing the model. The model remains loaded after the callback returns. Use this for operations that only need the model (e.g., tokenization, metadata access).
Thread-safety: The callback is responsible for acquiring the model's mutex if needed. Use task.CachedModel().Lock()/Unlock() for operations that are not thread-safe (most llama.cpp operations).
type Opt ¶
type Opt func(*opt) error
func WithTracer ¶
WithTracer sets the OpenTelemetry tracer for distributed tracing of model loading, unloading, and other operations.
type PullCallback ¶
PullCallback defines the callback function signature for progress updates during model downloads
type ReasoningResult ¶
type ReasoningResult struct {
// Thinking contains the model's reasoning/thinking process
// This is typically hidden from end users
Thinking string
// Content contains the final response after reasoning
// This is what should be shown to users
Content string
// HasThinking indicates whether thinking tags were found
HasThinking bool
}
ReasoningResult contains the parsed output from a reasoning model
func ParseReasoning ¶
func ParseReasoning(text string) ReasoningResult
ParseReasoning extracts thinking/reasoning blocks from model output It supports multiple common formats:
- <think>...</think> (DeepSeek R1)
- <reasoning>...</reasoning>
- <scratchpad>...</scratchpad>
- <thought>...</thought>
- <internal>...</internal>
Returns a ReasoningResult with separated thinking and content
func ParseReasoningWithTag ¶
func ParseReasoningWithTag(text, tag string) ReasoningResult
ParseReasoningWithTag extracts content from a specific tag pattern tag should be the tag name without brackets (e.g., "think" not "<think>")
type Task ¶
type Task struct {
// contains filtered or unexported fields
}
Task provides access to a loaded model and optionally a context for inference operations. Tasks are created via WithModel or WithContext and are valid only within the callback function.
func (*Task) CachedModel ¶
func (t *Task) CachedModel() *schema.CachedModel
CachedModel returns the full CachedModel metadata.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package httpclient provides a typed Go client for consuming the go-llama REST API.
|
Package httpclient provides a typed Go client for consuming the go-llama REST API. |