Documentation
¶
Overview ¶
Package llama is the graduated local coding-node runtime: a persistent, workspace-scoped inference session that keeps a stable prefix's KV hot and re-prefills only the changed suffix (the live warm-reuse hot path), distinct from the toy fixed-constant `local` provider.
This package defines the backend-neutral session contract. Backend adapters implement it — llama.cpp now (./llamasession), OpenVINO later. Product code talks to Session, never to llama.cpp or OpenVINO concepts. Snapshot/restore (durability, branching, crash recovery) is a separate, later concern; the hot coding loop is EnsurePrefix -> PrefillSuffix -> Decode on a live session.
Index ¶
- Variables
- func EmbedAvailable() bool
- func NewContextOverflowError(stage string, resident, additional, numCtx int) error
- func NewManifestMismatchError(reason string) error
- func NewUnsupportedFeatureError(feature string) error
- func SessionAvailable() bool
- func SetEmbedFunc(f EmbedFunc)
- func SetSessionFactory(f SessionFactory)
- type Config
- type ContextManifest
- type ContextOverflowError
- type ContextReport
- type DecodeConfig
- type EmbedFunc
- type ManifestMismatchError
- type ManifestSegment
- type PrefixInput
- type PrefixStatus
- type Service
- type Session
- type SessionFactory
- type StreamChunk
- type SuffixInput
- type SuffixStatus
- type TokenizeFunc
- type UnsupportedFeatureError
Constants ¶
This section is empty.
Variables ¶
var ( // this binary. ErrSessionUnavailable = errors.New("llama: session backend unavailable") // ErrSessionClosed means the caller used a closed persistent session. ErrSessionClosed = errors.New("llama: session closed") // ErrContextOverflow means a prefix, suffix, or decode would exceed NumCtx. ErrContextOverflow = errors.New("llama: context overflow") // ErrUnsupportedFeature marks explicit product-surface gaps such as tools. ErrUnsupportedFeature = errors.New("llama: unsupported feature") // ErrSessionFatal means the backend marked the session unusable and callers // must evict it instead of trying to reuse resident KV. ErrSessionFatal = errors.New("llama: session fatal") )
var ErrManifestMismatch = contextasm.ErrManifestMismatch
ErrManifestMismatch is returned when a prefix/suffix cannot be safely paired with resident KV under the current manifest.
Functions ¶
func EmbedAvailable ¶
func EmbedAvailable() bool
EmbedAvailable reports whether an embedding backend is compiled into this build.
func NewContextOverflowError ¶
func NewManifestMismatchError ¶
NewManifestMismatchError builds a manifest-mismatch error with a reason.
func SessionAvailable ¶
func SessionAvailable() bool
SessionAvailable reports whether a session backend is compiled into this build.
func SetEmbedFunc ¶
func SetEmbedFunc(f EmbedFunc)
SetEmbedFunc registers the native embedding backend.
func SetSessionFactory ¶
func SetSessionFactory(f SessionFactory)
SetSessionFactory registers the backend that creates sessions. The llama.cpp adapter (./llamasession) calls this from its init when built with the 'llamanode' tag, so the provider never imports the CGo package directly (no import cycle, default build stays CGo-free).
Types ¶
type ContextManifest ¶
type ContextManifest = contextasm.ContextManifest
The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.
type ContextOverflowError ¶
type ContextOverflowError struct {
Stage string
ResidentTokens int
AdditionalTokens int
NumCtx int
}
ContextOverflowError carries token counts for an overflow at a specific primitive boundary.
func (*ContextOverflowError) Error ¶
func (e *ContextOverflowError) Error() string
func (*ContextOverflowError) Is ¶
func (e *ContextOverflowError) Is(target error) bool
type ContextReport ¶
type ContextReport = transport.ContextReport
type DecodeConfig ¶
type DecodeConfig = transport.DecodeConfig
type EmbedFunc ¶
type EmbedFunc func(ctx context.Context, modelPath string, cfg Config, input string) ([]float64, error)
EmbedFunc computes a single embedding via the native backend. The llama.cpp adapter registers one from its init when built with the 'llamanode' tag.
type ManifestMismatchError ¶
type ManifestMismatchError = contextasm.ManifestMismatchError
The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.
type ManifestSegment ¶
type ManifestSegment = contextasm.ManifestSegment
The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.
type PrefixInput ¶
type PrefixInput = transport.PrefixInput
type PrefixStatus ¶
type PrefixStatus = transport.PrefixStatus
type Service ¶ added in v0.32.3
type Service struct{}
Service implements the runtime/transport.Service boundary. It acts as the opener for native llama.cpp backend sessions.
func (*Service) Describe ¶ added in v0.32.3
func (s *Service) Describe(_ context.Context, req transport.OpenSessionRequest) (transport.ModelInfo, error)
Describe reports the model's trained context window read from the GGUF header (no tensor load). The runtime consumes this as the model's capacity; it never reads the GGUF itself.
func (*Service) OpenSession ¶ added in v0.32.3
func (s *Service) OpenSession(ctx context.Context, req transport.OpenSessionRequest) (transport.Session, error)
OpenSession binds a session to the requested model. It rejects a model typed for a different backend (ErrBackendMismatch) before loading, so a GGUF request sent to an openvino-mode daemon — or vice versa — fails at the boundary, not deep in the engine. The model is loaded from req.Path (resolved by the runtime); identity/caching uses req.Digest.
type SessionFactory ¶
SessionFactory creates a backend session for a model with explicit config.
type StreamChunk ¶
type StreamChunk = transport.StreamChunk
type SuffixInput ¶
type SuffixInput = transport.SuffixInput
type SuffixStatus ¶
type SuffixStatus = transport.SuffixStatus
type TokenizeFunc ¶
type TokenizeFunc = contextasm.TokenizeFunc
The llama backend keys warm KV reuse on the backend-neutral context manifest owned by the runtime (runtime/contextasm, surfaced to the runtime as transport.ContextManifest). These aliases let the llama.cpp session adapter and its tests refer to those types through this package without importing contextasm directly. The manifest is assembled by the runtime and crosses the transport; modeld only fills the backend-resolved token data during prefill.
type UnsupportedFeatureError ¶
type UnsupportedFeatureError struct {
Feature string
}
UnsupportedFeatureError describes a deliberately unsupported surface.
func (*UnsupportedFeatureError) Error ¶
func (e *UnsupportedFeatureError) Error() string
func (*UnsupportedFeatureError) Is ¶
func (e *UnsupportedFeatureError) Is(target error) bool