llama

package
v0.32.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2026 License: Apache-2.0 Imports: 10 Imported by: 0

Documentation

Overview

Package llama is the graduated local coding-node runtime: a persistent, workspace-scoped inference session that keeps a stable prefix's KV hot and re-prefills only the changed suffix (the live warm-reuse hot path), distinct from the toy fixed-constant `local` provider.

This package defines the backend-neutral session contract. Backend adapters implement it — llama.cpp now (./llamasession), OpenVINO later. Product code talks to Session, never to llama.cpp or OpenVINO concepts. Snapshot/restore (durability, branching, crash recovery) is a separate, later concern; the hot coding loop is EnsurePrefix -> PrefillSuffix -> Decode on a live session.

Index

Constants

This section is empty.

Variables

View Source
var (
	// ErrSessionUnavailable means no native llama backend was compiled into
	// this binary.
	ErrSessionUnavailable = errors.New("llama: session backend unavailable")
	// ErrSessionClosed means the caller used a closed persistent session.
	ErrSessionClosed = errors.New("llama: session closed")
	// ErrContextOverflow means a prefix, suffix, or decode would exceed NumCtx.
	ErrContextOverflow = errors.New("llama: context overflow")
	// ErrUnsupportedFeature marks explicit product-surface gaps such as tools.
	ErrUnsupportedFeature = errors.New("llama: unsupported feature")
	// ErrSessionFatal means the backend marked the session unusable and callers
	// must evict it instead of trying to reuse resident KV.
	ErrSessionFatal = errors.New("llama: session fatal")
)
View Source
var ErrManifestMismatch = contextasm.ErrManifestMismatch

ErrManifestMismatch is returned when a prefix/suffix cannot be safely paired with resident KV under the current manifest.

Functions

func EmbedAvailable

func EmbedAvailable() bool

EmbedAvailable reports whether an embedding backend is compiled into this build.

func NewContextOverflowError

func NewContextOverflowError(stage string, resident, additional, numCtx int) error

func NewManifestMismatchError

func NewManifestMismatchError(reason string) error

NewManifestMismatchError builds a manifest-mismatch error with a reason.

func NewUnsupportedFeatureError

func NewUnsupportedFeatureError(feature string) error

func SessionAvailable

func SessionAvailable() bool

SessionAvailable reports whether a session backend is compiled into this build.

func SetEmbedFunc

func SetEmbedFunc(f EmbedFunc)

SetEmbedFunc registers the native embedding backend.

func SetSessionFactory

func SetSessionFactory(f SessionFactory)

SetSessionFactory registers the backend that creates sessions. The llama.cpp adapter (./llamasession) calls this from its init when built with the 'llamanode' tag, so the provider never imports the CGo package directly (no import cycle, default build stays CGo-free).

Types

type Config

type Config = transport.Config

type ContextManifest

type ContextManifest = contextasm.ContextManifest

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type ContextOverflowError

type ContextOverflowError struct {
	Stage            string
	ResidentTokens   int
	AdditionalTokens int
	NumCtx           int
}

ContextOverflowError carries token counts for an overflow at a specific primitive boundary.

func (*ContextOverflowError) Error

func (e *ContextOverflowError) Error() string

func (*ContextOverflowError) Is

func (e *ContextOverflowError) Is(target error) bool

type ContextReport

type ContextReport = transport.ContextReport

type DecodeConfig

type DecodeConfig = transport.DecodeConfig

type EmbedFunc

type EmbedFunc func(ctx context.Context, modelPath string, cfg Config, input string) ([]float64, error)

EmbedFunc computes a single embedding via the native backend. The llama.cpp adapter registers one from its init when built with the 'llamanode' tag.

type ManifestMismatchError

type ManifestMismatchError = contextasm.ManifestMismatchError

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type ManifestSegment

type ManifestSegment = contextasm.ManifestSegment

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type PrefixInput

type PrefixInput = transport.PrefixInput

type PrefixStatus

type PrefixStatus = transport.PrefixStatus

type Service added in v0.32.3

type Service struct{}

Service implements the runtime/transport.Service boundary. It acts as the opener for native llama.cpp backend sessions.

func (*Service) Describe added in v0.32.3

Describe reports the model's trained context window read from the GGUF header (no tensor load). The runtime consumes this as the model's capacity; it never reads the GGUF itself.

func (*Service) OpenSession added in v0.32.3

OpenSession binds a session to the requested model. It rejects a model typed for a different backend (ErrBackendMismatch) before loading, so a GGUF request sent to an openvino-mode daemon — or vice versa — fails at the boundary, not deep in the engine. The model is loaded from req.Path (resolved by the runtime); identity/caching uses req.Digest.

type Session

type Session = transport.Session

type SessionFactory

type SessionFactory func(modelPath string, cfg Config) (Session, error)

SessionFactory creates a backend session for a model with explicit config.

type StreamChunk

type StreamChunk = transport.StreamChunk

type SuffixInput

type SuffixInput = transport.SuffixInput

type SuffixStatus

type SuffixStatus = transport.SuffixStatus

type TokenizeFunc

type TokenizeFunc = contextasm.TokenizeFunc

The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.

type UnsupportedFeatureError

type UnsupportedFeatureError struct {
	Feature string
}

UnsupportedFeatureError describes a deliberately unsupported surface.

func (*UnsupportedFeatureError) Error

func (e *UnsupportedFeatureError) Error() string

func (*UnsupportedFeatureError) Is

func (e *UnsupportedFeatureError) Is(target error) bool

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL