capacity

package

v0.32.3 Latest Latest Go to latest Published: Jun 18, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/contenox/runtime

Links

Open Source Insights

Documentation ¶

Overview ¶

Package capacity is modeld's hardware capacity planner: it resolves the EFFECTIVE context window a model can actually be served at on this device, from the model's KV-cache footprint and the device's free memory — not the model's trained ceiling alone. modeld owns this because it owns the hardware (see docs/blueprints/modeld-interface-boundary.md and plan-llamacpp.md:16); the runtime consumes the resolved number and never computes it.

Index ¶

Constants
func HeadroomFromEnv() float64
func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64
type MemorySource
type ModelCapacity
- func Resolve(p Params) ModelCapacity
type Params
type SystemRAM
- func (SystemRAM) FreeBytes() (int64, error)

Constants ¶

View Source

const DefaultHeadroomFrac = 0.1

DefaultHeadroomFrac of free memory is reserved for activations, the compute graph, and fragmentation, leaving the rest for model weights + KV cache.

Variables ¶

This section is empty.

Functions ¶

func HeadroomFromEnv ¶

func HeadroomFromEnv() float64

HeadroomFromEnv reads CONTENOX_MODELD_MEM_HEADROOM (a fraction in (0,1)), falling back to DefaultHeadroomFrac.

func KVBytesPerToken ¶

func KVBytesPerToken(nLayers, nKVHeads, headDim int, kvType string) int64

KVBytesPerToken is the memory one token of context costs in the KV cache: K and V, across every layer and KV head, at the KV precision.

Types ¶

type MemorySource ¶

type MemorySource interface {
	FreeBytes() (int64, error)
}

MemorySource reports the free memory of the device a backend serves on. modeld picks the source by device: system RAM for CPU; GPU VRAM (ov::Core / ggml) is a CGO seam filled per backend when a GPU device is selected.

type ModelCapacity ¶

type ModelCapacity struct {
	ModelMaxContext  int
	EffectiveContext int
	KVBytesPerToken  int64
	FreeBytes        int64
	WeightsBytes     int64
}

ModelCapacity is the resolved result reported to the runtime. EffectiveContext is the window modeld will actually serve and the value the cache identity (manifest context_size) must use; the rest explain how it was derived.

func Resolve ¶

func Resolve(p Params) ModelCapacity

Resolve computes the effective context window:

effective = clamp(request, 0,
    min(modelMax, (free*(1-headroom) - weights) / kvBytesPerToken))

Unknown inputs degrade gracefully: with no KV cost it falls back to the model ceiling (clamped by request); with no ceiling it uses the memory budget.

type Params ¶

type Params struct {
	ModelMaxCtx     int     // model's trained context ceiling (0 = unknown)
	KVBytesPerToken int64   // 0 = unknown (cannot budget by memory)
	WeightsBytes    int64   // resident model weight footprint
	FreeBytes       int64   // device free memory
	Request         int     // requested window (0 = use the resolved max)
	HeadroomFrac    float64 // <=0 or >=1 falls back to DefaultHeadroomFrac
}

Params are the inputs to a capacity resolution. Zero values mean "unknown": an unknown ModelMaxCtx or KVBytesPerToken disables that side of the clamp rather than producing a bogus window.

type SystemRAM ¶

type SystemRAM struct{}

SystemRAM reports available host RAM via gopsutil — the CPU-device source.

func (SystemRAM) FreeBytes ¶

func (SystemRAM) FreeBytes() (int64, error)

Source Files ¶

View all Source files

capacity.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL