Documentation
¶
Overview ¶
Package transport defines the contract the modeld daemon implements and the runtime calls: a persistent, manifest-keyed warm-reuse inference session.
The boundary is the backend-neutral session seam already validated on llama.cpp and OpenVINO — EnsurePrefix / PrefillSuffix / Decode. The runtime owns this contract; modeld implements it per backend.
An earlier draft put a lower token-level Evaluate/Generate boundary here. Stress-checking it against the real llama.Session and OpenVINO GenAISession showed both backends sit at this higher, manifest-keyed altitude: OpenVINO GenAI holds the tokenizer and chat template internally and caches a string prefix, so a token-only daemon could not honor its proven prefix reuse. See docs/blueprints/modeld-interface-boundary.md.
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( ErrNotOwner = errors.New("instance is not the local runtime owner") ErrStaleFence = errors.New("stale owner fence token") ErrSessionClosed = errors.New("session is closed") ErrContextOverflow = errors.New("exceeded the session context window") )
Canonical errors expected to cross the boundary.
Functions ¶
This section is empty.
Types ¶
type Config ¶
type Config struct {
NumCtx int // context window in tokens
NumBatch int // prefill batch size
NumThreads int // CPU threads (0 = NumCPU)
NumGpuLayers int // layers offloaded to the GPU (0 = CPU only)
TensorSplit []float32 // multi-GPU split
FlashAttn bool
KVCacheType string // "", "q8_0", "q4_0"
PromptFormat string // profile-declared prompt format, e.g. "chatml" or "llama3"
PromptTemplateDigest string // digest of the declared/rendered prompt template
DisableBOS bool
}
Config is the explicit hardware/runtime configuration for a session. Every knob is a tested setting, not a magic default.
type ContextManifest ¶
type ContextManifest = contextasm.ContextManifest
ContextManifest is the shared, backend-neutral cache key: profile, model, tokenizer/template digests, BOS policy, and stable/volatile hashes. Reuse is valid only when the manifest matches; byte equality alone is not enough.
type ContextReport ¶
type ContextReport struct {
ResidentTokens int
PrefixTokens int
NumCtx int
AvailableTokens int
StableByteHash string
StableTokenHash string
ManifestDigest string
Manifest ContextManifest
Closed bool
}
ContextReport explains the session's resident context (explain-context).
type DecodeConfig ¶
DecodeConfig controls a single decode pass.
type Fence ¶
type Fence struct {
OwnerInstanceID string
}
Fence carries the owner identity a client expects to be serving it. It is supplied once, at OpenSession; the returned Session is bound to that owner epoch, so a takeover invalidates the session rather than every method needing a fence. It is a freshness check, not an authentication secret.
type MemoryService ¶
type MemoryService struct {
// contains filtered or unexported fields
}
MemoryService is an in-process, in-memory Service. It does no real inference: it models the warm-reuse contract so the runtime wrapper can be built and tested against the boundary before any CGO backend exists. Reuse is keyed on the manifest (a changed stable segment OR a changed profile/template/runtime digest invalidates the resident prefix), and token counts are byte-length proxies. See docs/blueprints/modeld-interface-boundary.md.
It is safe for concurrent use.
func NewMemoryService ¶
func NewMemoryService(opts ...Option) *MemoryService
NewMemoryService returns an in-memory Service.
func (*MemoryService) OpenSession ¶
func (m *MemoryService) OpenSession(_ context.Context, req OpenSessionRequest) (Session, error)
OpenSession binds a session to the owner epoch (the fence) and the requested context window.
type OpenSessionRequest ¶
OpenSessionRequest asks the owner to open a session for a model.
type Option ¶
type Option func(*MemoryService)
Option configures a MemoryService.
func WithOwnerFence ¶
WithOwnerFence makes OpenSession reject a request whose Fence does not match ownerInstanceID with ErrStaleFence. With no fence configured (the default), the fence is ignored, keeping the unwired placeholder path simple.
type PrefixInput ¶
type PrefixInput struct {
Text string
Manifest ContextManifest
}
PrefixInput is the stable prefix text plus the manifest that makes reuse valid: tokenizer, template, runtime config, BOS policy, and model identity are part of the cache key, not just the text.
type PrefixStatus ¶
type PrefixStatus struct {
ReusedTokens int
PrefilledTokens int
DroppedTokens int
PrefixTokens int
ResidentTokens int
AvailableTokens int
StableByteHash string
StableTokenHash string
ManifestDigest string
}
PrefixStatus reports what EnsurePrefix reused versus had to (re)compute. ReusedTokens > 0 is a warm hit.
type Service ¶
type Service interface {
OpenSession(ctx context.Context, req OpenSessionRequest) (Session, error)
}
Service is the entry point modeld serves: it opens persistent sessions on the owned hardware. Opening is where the model is made resident and the session is bound to the owner epoch.
type Session ¶
type Session interface {
// EnsurePrefix makes the resident KV equal `prefix`, reusing the longest
// already-resident matching prefix and prefilling only the divergent tail
// (this also drops any previous suffix and generated tokens).
EnsurePrefix(ctx context.Context, prefix PrefixInput) (PrefixStatus, error)
// PrefillSuffix prefills the volatile suffix (diff / test output / user
// turn) after the stable prefix, leaving the stable KV untouched.
PrefillSuffix(ctx context.Context, suffix SuffixInput) (SuffixStatus, error)
// Decode streams generated text from the current resident state.
Decode(ctx context.Context, cfg DecodeConfig) (<-chan StreamChunk, error)
// ExplainContext reports the resident context for observability.
ExplainContext() ContextReport
// Close releases the session's resources.
Close() error
}
Session is a persistent, workspace-scoped inference session. The hot coding loop is EnsurePrefix -> PrefillSuffix -> Decode: keep the stable prefix's KV hot, re-prefill only the changed suffix, decode.
type StreamChunk ¶
StreamChunk is a decoded text delta or a terminal error.
type SuffixInput ¶
type SuffixInput struct {
Text string
Manifest ContextManifest
}
SuffixInput is the volatile text appended after the stable prefix. It carries the same manifest so a suffix cannot be prefilled against resident KV from a different profile/template/runtime.