llama

package
v0.32.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 19, 2026 License: Apache-2.0 Imports: 13 Imported by: 0

Documentation

Overview

Package llama is the graduated local coding-node runtime: a persistent, workspace-scoped inference session that keeps a stable prefix's KV hot and re-prefills only the changed suffix (the live warm-reuse hot path), distinct from the toy fixed-constant `local` provider.

This package defines the backend-neutral session contract. Backend adapters implement it — llama.cpp now (./llamasession), OpenVINO later. Product code talks to Session, never to llama.cpp or OpenVINO concepts. Snapshot/restore (durability, branching, crash recovery) is a separate, later concern; the hot coding loop is EnsurePrefix -> PrefillSuffix -> Decode on a live session.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrSessionUnavailable means no native llama backend was compiled into
	// this binary.
	ErrSessionUnavailable = errors.New("llama: session backend unavailable")
	// ErrSessionClosed means the caller used a closed persistent session.
	ErrSessionClosed = errors.New("llama: session closed")
	// ErrContextOverflow means a prefix, suffix, or decode would exceed NumCtx.
	ErrContextOverflow = errors.New("llama: context overflow")
	// ErrUnsupportedFeature marks explicit product-surface gaps such as tools.
	ErrUnsupportedFeature = errors.New("llama: unsupported feature")
	// ErrSessionFatal means the backend marked the session unusable and callers
	// must evict it instead of trying to reuse resident KV.
	ErrSessionFatal = errors.New("llama: session fatal")
)
View Source
var ErrManifestMismatch = contextasm.ErrManifestMismatch

ErrManifestMismatch is returned when a prefix/suffix cannot be safely paired with resident KV under the current manifest.

Functions

func EmbedAvailable

func EmbedAvailable() bool

EmbedAvailable reports whether an embedding backend is compiled into this build.

func NewContextOverflowError

func NewContextOverflowError(stage string, resident, additional, numCtx int) error

func NewManifestMismatchError

func NewManifestMismatchError(reason string) error

NewManifestMismatchError builds a manifest-mismatch error with a reason.

func NewUnsupportedFeatureError

func NewUnsupportedFeatureError(feature string) error

func RuntimeInfo added in v0.32.5

func RuntimeInfo() transport.ModelInfo

RuntimeInfo reports the linked llama.cpp runtime identity and device inventory. In non-direct builds this returns an empty record.

func SessionAvailable

func SessionAvailable() bool

SessionAvailable reports whether a session backend is compiled into this build.

func SetEmbedFunc

func SetEmbedFunc(f EmbedFunc)

SetEmbedFunc registers the native embedding backend.

func SetSessionFactory

func SetSessionFactory(f SessionFactory)

SetSessionFactory registers the backend that creates sessions. The llama.cpp adapter (./llamasession) calls this from its init when built with the 'llamanode' tag, so the provider never imports the CGo package directly (no import cycle, default build stays CGo-free).

Types

type Config

type Config = transport.Config

type ContextManifest

type ContextManifest = contextasm.ContextManifest

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type ContextOverflowError

type ContextOverflowError struct {
	Stage            string
	ResidentTokens   int
	AdditionalTokens int
	NumCtx           int
}

ContextOverflowError carries token counts for an overflow at a specific primitive boundary.

func (*ContextOverflowError) Error

func (e *ContextOverflowError) Error() string

func (*ContextOverflowError) Is

func (e *ContextOverflowError) Is(target error) bool

type ContextReport

type ContextReport = transport.ContextReport

type DecodeConfig

type DecodeConfig = transport.DecodeConfig

type EmbedFunc

type EmbedFunc func(ctx context.Context, modelPath string, cfg Config, input string) ([]float64, error)

EmbedFunc computes a single embedding via the native backend. The llama.cpp adapter registers one from its init when built with the 'llamanode' tag.

type ManifestMismatchError

type ManifestMismatchError = contextasm.ManifestMismatchError

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type ManifestSegment

type ManifestSegment = contextasm.ManifestSegment

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type PrefixInput

type PrefixInput = transport.PrefixInput

type PrefixStatus

type PrefixStatus = transport.PrefixStatus

type Service added in v0.32.3

type Service struct {
	// contains filtered or unexported fields
}

Service implements the runtime/transport.Service boundary. It acts as the opener for native llama.cpp backend sessions.

func NewService added in v0.32.5

func NewService(opts ...ServiceOption) *Service

func (*Service) Describe added in v0.32.3

Describe reports the model's trained context window read from the GGUF header (no tensor load). The runtime consumes this as the model's capacity; it never reads the GGUF itself.

func (*Service) Embed added in v0.32.5

Embed is not served through the modeld transport for llama yet. The runtime's llama provider still owns its native one-shot embedding path separately.

func (*Service) OpenSession added in v0.32.3

OpenSession binds a session to the requested model. It rejects a model typed for a different backend (ErrBackendMismatch) before loading, so a GGUF request sent to an openvino-mode daemon — or vice versa — fails at the boundary, not deep in the engine. The model is loaded from req.Path (resolved by the runtime); identity/caching uses req.Digest.

type ServiceOption added in v0.32.5

type ServiceOption func(*Service)

func WithCapacityPolicy added in v0.32.5

func WithCapacityPolicy(p capacity.Policy) ServiceOption

func WithMemorySource added in v0.32.5

func WithMemorySource(src capacity.MemorySource) ServiceOption

type Session

type Session = transport.Session

type SessionFactory

type SessionFactory func(modelPath string, cfg Config) (Session, error)

SessionFactory creates a backend session for a model with explicit config.

type SessionSnapshot added in v0.32.5

type SessionSnapshot = transport.SessionSnapshot

type StreamChunk

type StreamChunk = transport.StreamChunk

type SuffixInput

type SuffixInput = transport.SuffixInput

type SuffixStatus

type SuffixStatus = transport.SuffixStatus

type TokenizeFunc

type TokenizeFunc = contextasm.TokenizeFunc

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type ToolCall added in v0.32.5

type ToolCall = transport.ToolCall

type UnsupportedFeatureError

type UnsupportedFeatureError struct {
	Feature string
}

UnsupportedFeatureError describes a deliberately unsupported surface.

func (*UnsupportedFeatureError) Error

func (e *UnsupportedFeatureError) Error() string

func (*UnsupportedFeatureError) Is

func (e *UnsupportedFeatureError) Is(target error) bool

Directories

Path Synopsis
Package llamacppshim owns the direct llama.cpp C API boundary for modeld.
Package llamacppshim owns the direct llama.cpp C API boundary for modeld.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL