capacity

package

v0.32.5 Latest Latest Go to latest Published: Jun 19, 2026 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/contenox/runtime

Links

Open Source Insights

Documentation ¶

Overview ¶

Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory — not the model's trained ceiling alone. modeld owns this because it owns the hardware (see docs/blueprints/modeld-interface-boundary.md and plan-llamacpp.md:16); the runtime consumes the resolved number and never computes it.

Index ¶

Constants
func HeadroomFromEnv() float64
func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64
func ParseBytes(s string) (int64, error)
type DeviceSnapshot
- func Snapshot(src MemorySource) (DeviceSnapshot, error)
- func (d DeviceSnapshot) Key() string
type LaunchDefaults
- func (d *LaunchDefaults) Policy(base Policy, st DeviceSnapshot) Policy
type MemorySource
type ModelCapacity
- func Resolve(p Params) ModelCapacity
type Params
type Policy
- func LoadPolicy(dataRoot string) Policy
- func WithLaunchDefaults(p Policy, launch DeviceSnapshot) Policy
type SystemRAM
- func (SystemRAM) FreeBytes() (int64, error)
- func (SystemRAM) Snapshot() (DeviceSnapshot, error)

Constants ¶

View Source

const DefaultHeadroomFrac = 0.1

DefaultHeadroomFrac of free memory is reserved for activations, the compute graph, and fragmentation, leaving the rest for model weights + KV cache.

View Source

const DefaultMaxResidentFrac = 0.8

DefaultMaxResidentFrac is the launch-time cap used when the user did not set a memory ceiling. modeld will not grow past this fraction of the memory that was free when the backend service was created; per-call current free memory can still clamp lower.

Variables ¶

This section is empty.

Functions ¶

func HeadroomFromEnv ¶

func HeadroomFromEnv() float64

HeadroomFromEnv reads CONTENOX_MODELD_MEM_HEADROOM (a fraction in (0,1)), falling back to DefaultHeadroomFrac.

func KVBytesPerToken ¶

func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64

KVBytesPerToken is the memory one token of context costs in the KV cache: K and V, across every layer and KV head, at the KV precision.

func ParseBytes ¶ added in v0.32.5

func ParseBytes(s string) (int64, error)

ParseBytes parses byte strings used by modeld memory settings.

Types ¶

type DeviceSnapshot ¶ added in v0.32.5

type DeviceSnapshot struct {
	Kind              string `json:"kind,omitempty"`
	DeviceID          string `json:"device_id,omitempty"`
	TotalBytes        int64  `json:"total_bytes,omitempty"`
	FreeBytes         int64  `json:"free_bytes,omitempty"`
	SharedWithDisplay bool   `json:"shared_with_display,omitempty"`
}

DeviceSnapshot describes the memory pool the backend will allocate from.

func Snapshot ¶ added in v0.32.5

func Snapshot(src MemorySource) (DeviceSnapshot, error)

Snapshot returns a DeviceSnapshot for either a richer source with Snapshot or a legacy FreeBytes-only source.

func (DeviceSnapshot) Key ¶ added in v0.32.5

func (d DeviceSnapshot) Key() string

Key identifies the memory pool for launch-default budgeting. Kind+ID is the normal path; total/shared are included so anonymous test or fallback sources still get stable separation when possible.

type LaunchDefaults ¶ added in v0.32.5

type LaunchDefaults struct {
	// contains filtered or unexported fields
}

LaunchDefaults records the first observed free-memory snapshot per memory pool. It lets services apply the "80% of launch-free memory" default lazily for the actual selected device, while keeping an explicit MaxResidentBytes as a hard user cap.

func (*LaunchDefaults) Policy ¶ added in v0.32.5

func (d *LaunchDefaults) Policy(base Policy, st DeviceSnapshot) Policy

Policy returns base with a default MaxResidentBytes filled from the first snapshot seen for this device. It is intentionally sticky per memory pool: if memory later gets tighter, Resolve clamps on current FreeBytes; if memory gets freer, modeld does not grow past the launch budget.

type MemorySource ¶

type MemorySource interface {
	FreeBytes() (int64, error)
}

MemorySource reports the free memory of the device a backend serves on. modeld picks the source by device: system RAM for CPU; GPU VRAM (ov::Core / ggml) is a CGO seam filled per backend when a GPU device is selected.

type ModelCapacity ¶

type ModelCapacity struct {
	ModelMaxContext  int
	EffectiveContext int
	KVBytesPerToken  int64
	FreeBytes        int64
	WeightsBytes     int64
	OverheadBytes    int64
	ReservedBytes    int64
	UserLimitBytes   int64
	MinFreeBytes     int64
	UsableBytes      int64
	RequiredBytes    int64
	Clamped          bool
	Reason           string
}

ModelCapacity is the resolved result reported to the runtime. EffectiveContext is the window modeld will actually serve and the value the cache identity (manifest context_size) must use; the rest explain how it was derived.

func Resolve ¶

func Resolve(p Params) ModelCapacity

Resolve computes the physical hot context window:

usable = min(free - minFree, userLimit - reserved) * (1 - headroom)
effective = clamp(request, 0, min(modelMax, (usable - weights - overhead) / kvBytesPerToken))

Unknown inputs degrade gracefully: with no KV cost it falls back to the model ceiling (clamped by request); with no ceiling it uses the memory budget.

type Params ¶

type Params struct {
	ModelMaxCtx     int     // model's trained context ceiling (0 = unknown)
	KVBytesPerToken int64   // 0 = unknown (cannot budget by memory)
	WeightsBytes    int64   // resident model weight footprint
	OverheadBytes   int64   // fixed runtime buffers (compute graph, staging)
	FreeBytes       int64   // device free memory
	ReservedBytes   int64   // memory already reserved by resident sessions
	UserLimitBytes  int64   // user cap for modeld resident memory (0 = no cap)
	MinFreeBytes    int64   // memory to leave free for the desktop/other workloads
	Request         int     // requested window (0 = use the resolved max)
	HeadroomFrac    float64 // <=0 or >=1 falls back to DefaultHeadroomFrac
}

Params are the inputs to a capacity resolution. Zero values mean "unknown": an unknown ModelMaxCtx or KVBytesPerToken disables that side of the clamp rather than producing a bogus window.

type Policy ¶ added in v0.32.5

type Policy struct {
	MaxResidentBytes int64   `json:"max_resident_bytes,omitempty"`
	MinFreeBytes     int64   `json:"min_free_bytes,omitempty"`
	HeadroomFrac     float64 `json:"headroom_frac,omitempty"`
}

Policy is the user/operator memory policy modeld applies before opening a resident session. MaxResidentBytes is a hard ceiling on modeld's resident footprint for the served device; MinFreeBytes preserves memory for the desktop or other local workloads that may share the same device.

func LoadPolicy ¶ added in v0.32.5

func LoadPolicy(dataRoot string) Policy

LoadPolicy reads <dataRoot>/modeld.json and then applies env overrides. The JSON accepts either numeric byte fields or string fields ("8GiB", "512MiB"):

{"memory":{"max_resident":"8GiB","reserve_free":"2GiB","headroom_frac":0.15}}

func WithLaunchDefaults ¶ added in v0.32.5

func WithLaunchDefaults(p Policy, launch DeviceSnapshot) Policy

WithLaunchDefaults fills missing policy values from the launch-time device snapshot. The default resident cap is intentionally a top floor based on launch free memory, not a moving target: if memory later gets tighter, the current FreeBytes in Resolve clamps lower; if memory later gets freer, modeld does not opportunistically consume more than the launch budget.

type SystemRAM ¶

type SystemRAM struct{}

SystemRAM reports available host RAM via gopsutil — the CPU-device source.

func (SystemRAM) FreeBytes ¶

func (SystemRAM) FreeBytes() (int64, error)

func (SystemRAM) Snapshot ¶ added in v0.32.5

func (SystemRAM) Snapshot() (DeviceSnapshot, error)

Source Files ¶

View all Source files

capacity.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL