capacity

package
v0.32.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 19, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Documentation

Overview

Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory — not the model's trained ceiling alone. modeld owns this because it owns the hardware (see docs/blueprints/modeld-interface-boundary.md and plan-llamacpp.md:16); the runtime consumes the resolved number and never computes it.

Index

Constants

View Source
const DefaultHeadroomFrac = 0.1

DefaultHeadroomFrac of free memory is reserved for activations, the compute graph, and fragmentation, leaving the rest for model weights + KV cache.

View Source
const DefaultMaxResidentFrac = 0.8

DefaultMaxResidentFrac is the launch-time cap used when the user did not set a memory ceiling. modeld will not grow past this fraction of the memory that was free when the backend service was created; per-call current free memory can still clamp lower.

Variables

This section is empty.

Functions

func HeadroomFromEnv

func HeadroomFromEnv() float64

HeadroomFromEnv reads CONTENOX_MODELD_MEM_HEADROOM (a fraction in (0,1)), falling back to DefaultHeadroomFrac.

func KVBytesPerToken

func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64

KVBytesPerToken is the memory one token of context costs in the KV cache: K and V, across every layer and KV head, at the KV precision.

func ParseBytes added in v0.32.5

func ParseBytes(s string) (int64, error)

ParseBytes parses byte strings used by modeld memory settings.

Types

type DeviceSnapshot added in v0.32.5

type DeviceSnapshot struct {
	Kind              string `json:"kind,omitempty"`
	DeviceID          string `json:"device_id,omitempty"`
	TotalBytes        int64  `json:"total_bytes,omitempty"`
	FreeBytes         int64  `json:"free_bytes,omitempty"`
	SharedWithDisplay bool   `json:"shared_with_display,omitempty"`
}

DeviceSnapshot describes the memory pool the backend will allocate from.

func Snapshot added in v0.32.5

func Snapshot(src MemorySource) (DeviceSnapshot, error)

Snapshot returns a DeviceSnapshot for either a richer source with Snapshot or a legacy FreeBytes-only source.

func (DeviceSnapshot) Key added in v0.32.5

func (d DeviceSnapshot) Key() string

Key identifies the memory pool for launch-default budgeting. Kind+ID is the normal path; total/shared are included so anonymous test or fallback sources still get stable separation when possible.

type LaunchDefaults added in v0.32.5

type LaunchDefaults struct {
	// contains filtered or unexported fields
}

LaunchDefaults records the first observed free-memory snapshot per memory pool. It lets services apply the "80% of launch-free memory" default lazily for the actual selected device, while keeping an explicit MaxResidentBytes as a hard user cap.

func (*LaunchDefaults) Policy added in v0.32.5

func (d *LaunchDefaults) Policy(base Policy, st DeviceSnapshot) Policy

Policy returns base with a default MaxResidentBytes filled from the first snapshot seen for this device. It is intentionally sticky per memory pool: if memory later gets tighter, Resolve clamps on current FreeBytes; if memory gets freer, modeld does not grow past the launch budget.

type MemorySource

type MemorySource interface {
	FreeBytes() (int64, error)
}

MemorySource reports the free memory of the device a backend serves on. modeld picks the source by device: system RAM for CPU; GPU VRAM (ov::Core / ggml) is a CGO seam filled per backend when a GPU device is selected.

type ModelCapacity

type ModelCapacity struct {
	ModelMaxContext  int
	EffectiveContext int
	KVBytesPerToken  int64
	FreeBytes        int64
	WeightsBytes     int64
	OverheadBytes    int64
	ReservedBytes    int64
	UserLimitBytes   int64
	MinFreeBytes     int64
	UsableBytes      int64
	RequiredBytes    int64
	Clamped          bool
	Reason           string
}

ModelCapacity is the resolved result reported to the runtime. EffectiveContext is the window modeld will actually serve and the value the cache identity (manifest context_size) must use; the rest explain how it was derived.

func Resolve

func Resolve(p Params) ModelCapacity

Resolve computes the physical hot context window:

usable = min(free - minFree, userLimit - reserved) * (1 - headroom)
effective = clamp(request, 0, min(modelMax, (usable - weights - overhead) / kvBytesPerToken))

Unknown inputs degrade gracefully: with no KV cost it falls back to the model ceiling (clamped by request); with no ceiling it uses the memory budget.

type Params

type Params struct {
	ModelMaxCtx     int     // model's trained context ceiling (0 = unknown)
	KVBytesPerToken int64   // 0 = unknown (cannot budget by memory)
	WeightsBytes    int64   // resident model weight footprint
	OverheadBytes   int64   // fixed runtime buffers (compute graph, staging)
	FreeBytes       int64   // device free memory
	ReservedBytes   int64   // memory already reserved by resident sessions
	UserLimitBytes  int64   // user cap for modeld resident memory (0 = no cap)
	MinFreeBytes    int64   // memory to leave free for the desktop/other workloads
	Request         int     // requested window (0 = use the resolved max)
	HeadroomFrac    float64 // <=0 or >=1 falls back to DefaultHeadroomFrac
}

Params are the inputs to a capacity resolution. Zero values mean "unknown": an unknown ModelMaxCtx or KVBytesPerToken disables that side of the clamp rather than producing a bogus window.

type Policy added in v0.32.5

type Policy struct {
	MaxResidentBytes int64   `json:"max_resident_bytes,omitempty"`
	MinFreeBytes     int64   `json:"min_free_bytes,omitempty"`
	HeadroomFrac     float64 `json:"headroom_frac,omitempty"`
}

Policy is the user/operator memory policy modeld applies before opening a resident session. MaxResidentBytes is a hard ceiling on modeld's resident footprint for the served device; MinFreeBytes preserves memory for the desktop or other local workloads that may share the same device.

func LoadPolicy added in v0.32.5

func LoadPolicy(dataRoot string) Policy

LoadPolicy reads <dataRoot>/modeld.json and then applies env overrides. The JSON accepts either numeric byte fields or string fields ("8GiB", "512MiB"):

{"memory":{"max_resident":"8GiB","reserve_free":"2GiB","headroom_frac":0.15}}

func WithLaunchDefaults added in v0.32.5

func WithLaunchDefaults(p Policy, launch DeviceSnapshot) Policy

WithLaunchDefaults fills missing policy values from the launch-time device snapshot. The default resident cap is intentionally a top floor based on launch free memory, not a moving target: if memory later gets tighter, the current FreeBytes in Resolve clamps lower; if memory later gets freer, modeld does not opportunistically consume more than the launch budget.

type SystemRAM

type SystemRAM struct{}

SystemRAM reports available host RAM via gopsutil — the CPU-device source.

func (SystemRAM) FreeBytes

func (SystemRAM) FreeBytes() (int64, error)

func (SystemRAM) Snapshot added in v0.32.5

func (SystemRAM) Snapshot() (DeviceSnapshot, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL