transport

package
v0.32.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 17, 2026 License: Apache-2.0 Imports: 4 Imported by: 0

Documentation

Overview

Package transport defines the contract the modeld daemon implements and the runtime calls: a persistent, manifest-keyed warm-reuse inference session.

The boundary is the backend-neutral session seam already validated on llama.cpp and OpenVINO — EnsurePrefix / PrefillSuffix / Decode. The runtime owns this contract; modeld implements it per backend.

An earlier draft put a lower token-level Evaluate/Generate boundary here. Stress-checking it against the real llama.Session and OpenVINO GenAISession showed both backends sit at this higher, manifest-keyed altitude: OpenVINO GenAI holds the tokenizer and chat template internally and caches a string prefix, so a token-only daemon could not honor its proven prefix reuse. See docs/blueprints/modeld-interface-boundary.md.

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNotOwner        = errors.New("instance is not the local runtime owner")
	ErrStaleFence      = errors.New("stale owner fence token")
	ErrSessionClosed   = errors.New("session is closed")
	ErrContextOverflow = errors.New("exceeded the session context window")
)

Canonical errors expected to cross the boundary.

Functions

This section is empty.

Types

type Config

type Config struct {
	NumCtx       int       // context window in tokens
	NumBatch     int       // prefill batch size
	NumThreads   int       // CPU threads (0 = NumCPU)
	NumGpuLayers int       // layers offloaded to the GPU (0 = CPU only)
	TensorSplit  []float32 // multi-GPU split
	FlashAttn    bool
	KVCacheType  string // "", "q8_0", "q4_0"

	PromptFormat         string // profile-declared prompt format, e.g. "chatml" or "llama3"
	PromptTemplateDigest string // digest of the declared/rendered prompt template
	DisableBOS           bool
}

Config is the explicit hardware/runtime configuration for a session. Every knob is a tested setting, not a magic default.

type ContextManifest

type ContextManifest = contextasm.ContextManifest

ContextManifest is the shared, backend-neutral cache key: profile, model, tokenizer/template digests, BOS policy, and stable/volatile hashes. Reuse is valid only when the manifest matches; byte equality alone is not enough.

type ContextReport

type ContextReport struct {
	ResidentTokens  int
	PrefixTokens    int
	NumCtx          int
	AvailableTokens int
	StableByteHash  string
	StableTokenHash string
	ManifestDigest  string
	Manifest        ContextManifest
	Closed          bool
}

ContextReport explains the session's resident context (explain-context).

type DecodeConfig

type DecodeConfig struct {
	MaxTokens   int
	Temperature *float64
	TopP        *float64
	TopK        int
	Seed        *int
}

DecodeConfig controls a single decode pass.

type Fence

type Fence struct {
	OwnerInstanceID string
}

Fence carries the owner identity a client expects to be serving it. It is supplied once, at OpenSession; the returned Session is bound to that owner epoch, so a takeover invalidates the session rather than every method needing a fence. It is a freshness check, not an authentication secret.

type MemoryService

type MemoryService struct {
	// contains filtered or unexported fields
}

MemoryService is an in-process, in-memory Service. It does no real inference: it models the warm-reuse contract so the runtime wrapper can be built and tested against the boundary before any CGO backend exists. Reuse is keyed on the manifest (a changed stable segment OR a changed profile/template/runtime digest invalidates the resident prefix), and token counts are byte-length proxies. See docs/blueprints/modeld-interface-boundary.md.

It is safe for concurrent use.

func NewMemoryService

func NewMemoryService(opts ...Option) *MemoryService

NewMemoryService returns an in-memory Service.

func (*MemoryService) OpenSession

func (m *MemoryService) OpenSession(_ context.Context, req OpenSessionRequest) (Session, error)

OpenSession binds a session to the owner epoch (the fence) and the requested context window.

type OpenSessionRequest

type OpenSessionRequest struct {
	Fence
	ModelID string
	Config  Config
}

OpenSessionRequest asks the owner to open a session for a model.

type Option

type Option func(*MemoryService)

Option configures a MemoryService.

func WithOwnerFence

func WithOwnerFence(ownerInstanceID string) Option

WithOwnerFence makes OpenSession reject a request whose Fence does not match ownerInstanceID with ErrStaleFence. With no fence configured (the default), the fence is ignored, keeping the unwired placeholder path simple.

type PrefixInput

type PrefixInput struct {
	Text     string
	Manifest ContextManifest
}

PrefixInput is the stable prefix text plus the manifest that makes reuse valid: tokenizer, template, runtime config, BOS policy, and model identity are part of the cache key, not just the text.

type PrefixStatus

type PrefixStatus struct {
	ReusedTokens    int
	PrefilledTokens int
	DroppedTokens   int
	PrefixTokens    int
	ResidentTokens  int
	AvailableTokens int
	StableByteHash  string
	StableTokenHash string
	ManifestDigest  string
}

PrefixStatus reports what EnsurePrefix reused versus had to (re)compute. ReusedTokens > 0 is a warm hit.

type Service

type Service interface {
	OpenSession(ctx context.Context, req OpenSessionRequest) (Session, error)
}

Service is the entry point modeld serves: it opens persistent sessions on the owned hardware. Opening is where the model is made resident and the session is bound to the owner epoch.

type Session

type Session interface {
	// EnsurePrefix makes the resident KV equal `prefix`, reusing the longest
	// already-resident matching prefix and prefilling only the divergent tail
	// (this also drops any previous suffix and generated tokens).
	EnsurePrefix(ctx context.Context, prefix PrefixInput) (PrefixStatus, error)

	// PrefillSuffix prefills the volatile suffix (diff / test output / user
	// turn) after the stable prefix, leaving the stable KV untouched.
	PrefillSuffix(ctx context.Context, suffix SuffixInput) (SuffixStatus, error)

	// Decode streams generated text from the current resident state.
	Decode(ctx context.Context, cfg DecodeConfig) (<-chan StreamChunk, error)

	// ExplainContext reports the resident context for observability.
	ExplainContext() ContextReport

	// Close releases the session's resources.
	Close() error
}

Session is a persistent, workspace-scoped inference session. The hot coding loop is EnsurePrefix -> PrefillSuffix -> Decode: keep the stable prefix's KV hot, re-prefill only the changed suffix, decode.

type StreamChunk

type StreamChunk struct {
	Text  string
	Error error
}

StreamChunk is a decoded text delta or a terminal error.

type SuffixInput

type SuffixInput struct {
	Text     string
	Manifest ContextManifest
}

SuffixInput is the volatile text appended after the stable prefix. It carries the same manifest so a suffix cannot be prefilled against resident KV from a different profile/template/runtime.

type SuffixStatus

type SuffixStatus struct {
	SuffixTokens    int
	PrefixTokens    int
	ResidentTokens  int
	AvailableTokens int
	ManifestDigest  string
}

SuffixStatus reports the volatile suffix added after the stable prefix.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL