capacity

package
v0.32.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 18, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory — not the model's trained ceiling alone. modeld owns this because it owns the hardware (see docs/blueprints/modeld-interface-boundary.md and plan-llamacpp.md:16); the runtime consumes the resolved number and never computes it.

Index

Constants

View Source
const DefaultHeadroomFrac = 0.1

DefaultHeadroomFrac of free memory is reserved for activations, the compute graph, and fragmentation, leaving the rest for model weights + KV cache.

Variables

This section is empty.

Functions

func HeadroomFromEnv

func HeadroomFromEnv() float64

HeadroomFromEnv reads CONTENOX_MODELD_MEM_HEADROOM (a fraction in (0,1)), falling back to DefaultHeadroomFrac.

func KVBytesPerToken

func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64

KVBytesPerToken is the memory one token of context costs in the KV cache: K and V, across every layer and KV head, at the KV precision.

Types

type MemorySource

type MemorySource interface {
	FreeBytes() (int64, error)
}

MemorySource reports the free memory of the device a backend serves on. modeld picks the source by device: system RAM for CPU; GPU VRAM (ov::Core / ggml) is a CGO seam filled per backend when a GPU device is selected.

type ModelCapacity

type ModelCapacity struct {
	ModelMaxContext  int
	EffectiveContext int
	KVBytesPerToken  int64
	FreeBytes        int64
	WeightsBytes     int64
}

ModelCapacity is the resolved result reported to the runtime. EffectiveContext is the window modeld will actually serve and the value the cache identity (manifest context_size) must use; the rest explain how it was derived.

func Resolve

func Resolve(p Params) ModelCapacity

Resolve computes the effective context window:

effective = clamp(request, 0,
    min(modelMax, (free*(1-headroom) - weights) / kvBytesPerToken))

Unknown inputs degrade gracefully: with no KV cost it falls back to the model ceiling (clamped by request); with no ceiling it uses the memory budget.

type Params

type Params struct {
	ModelMaxCtx     int     // model's trained context ceiling (0 = unknown)
	KVBytesPerToken int64   // 0 = unknown (cannot budget by memory)
	WeightsBytes    int64   // resident model weight footprint
	FreeBytes       int64   // device free memory
	Request         int     // requested window (0 = use the resolved max)
	HeadroomFrac    float64 // <=0 or >=1 falls back to DefaultHeadroomFrac
}

Params are the inputs to a capacity resolution. Zero values mean "unknown": an unknown ModelMaxCtx or KVBytesPerToken disables that side of the clamp rather than producing a bogus window.

type SystemRAM

type SystemRAM struct{}

SystemRAM reports available host RAM via gopsutil — the CPU-device source.

func (SystemRAM) FreeBytes

func (SystemRAM) FreeBytes() (int64, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL