cache

package
v0.23.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2026 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BeginSpeculation added in v0.23.1

func BeginSpeculation(caches []Cache) ([]Cache, *Speculation, bool)

BeginSpeculation returns cache wrappers suitable for a speculative target forward. The returned caches must only be used for that forward.

Types

type Attention added in v0.22.1

type Attention interface {
	Cache

	// Update appends (k, v) and returns an opaque nn.KVHistory for
	// this layer's SDPA.
	Update(b *batch.Batch, keys, values *mlx.Array) *nn.KVHistory
}

Attention is the contract for caches that back attention layers (KVCache, RotatingKVCache).

type Cache

type Cache interface {
	// State returns the cache-owned state roots that should be kept/evaluated.
	State() []*mlx.Array
	Free()
	Offset() int

	// Snapshot copies cache state from fromOffset to current offset into
	// pinned VRAM arrays. The active cache is unchanged.
	Snapshot(fromOffset int) Snapshot

	// Restore brings the cache to target. If snapshot is nil, rewinds
	// using the cache's own live state. Returns false if the target is
	// unreachable (e.g. target > current offset, or negative).
	Restore(snapshot Snapshot, target int) bool

	// Merge combines two sequential snapshots [a,b) and [b,c) into [a,c).
	// Takes ownership of both inputs.
	Merge(parent, child Snapshot) Snapshot

	// Split divides a snapshot [a,c) at offset b into [a,b) and [b,c).
	// Takes ownership of the input. Cache types that cannot split
	// (e.g. recurrent) return (nil, snapshot).
	Split(snapshot Snapshot, at int) (parent, child Snapshot)
}

Cache is common state management shared by every cache kind. Writers live on the specific caches

func BeginIsolatedSpeculation added in v0.23.1

func BeginIsolatedSpeculation(caches []Cache) ([]Cache, bool)

BeginIsolatedSpeculation returns cache wrappers that never mutate live cache state. It is intended for correctness instrumentation, not the hot path.

type KVCache

type KVCache struct {
	// contains filtered or unexported fields
}

func NewKVCache

func NewKVCache() *KVCache

func (*KVCache) Free

func (c *KVCache) Free()

func (*KVCache) Merge added in v0.18.3

func (c *KVCache) Merge(parent, child Snapshot) Snapshot

func (*KVCache) Offset

func (c *KVCache) Offset() int

func (*KVCache) Restore added in v0.18.3

func (c *KVCache) Restore(snapshot Snapshot, target int) bool

func (*KVCache) Snapshot added in v0.18.3

func (c *KVCache) Snapshot(fromOffset int) Snapshot

func (*KVCache) Split added in v0.18.3

func (c *KVCache) Split(snapshot Snapshot, at int) (Snapshot, Snapshot)

func (*KVCache) State

func (c *KVCache) State() []*mlx.Array

func (*KVCache) Update

func (c *KVCache) Update(_ *batch.Batch, keys, values *mlx.Array) *nn.KVHistory

Assumes B = 1; heterogeneous batches are not supported.

func (*KVCache) View added in v0.23.1

func (c *KVCache) View(_ *batch.Batch) *nn.KVHistory

View returns the current cache contents as attention history without writing.

type RecurrentCache

type RecurrentCache struct {
	// contains filtered or unexported fields
}

RecurrentCache stores state for linear-recurrent layers.

Conv state shape: [B, convTail, convDim] Delta state shape: [B, numVHeads, headVDim, headKDim]

func NewRecurrentCache

func NewRecurrentCache(convTail, convDim, numVHeads, headVDim, headKDim int32) *RecurrentCache

func (*RecurrentCache) Free

func (c *RecurrentCache) Free()

func (*RecurrentCache) Get added in v0.22.1

func (c *RecurrentCache) Get(b *batch.Batch, dtype mlx.DType) *nn.RecurrentHistory

Get returns the current conv/delta state for the SSM layer's read phase. Lazy-initializes zero-filled state tensors using b.InputIDs for the batch size; reallocates if the existing state's batch size or dtype no longer matches.

func (*RecurrentCache) Merge added in v0.18.3

func (c *RecurrentCache) Merge(parent, child Snapshot) Snapshot

func (*RecurrentCache) Offset

func (c *RecurrentCache) Offset() int

func (*RecurrentCache) Put added in v0.22.1

func (c *RecurrentCache) Put(b *batch.Batch, newConv, newDelta *mlx.Array)

Put stores the post-computation conv/delta states for the SSM layer's write phase and advances the cache offset by the current forward's real token count.

Assumes B = 1; heterogeneous batches are not supported.

func (*RecurrentCache) Restore added in v0.18.3

func (c *RecurrentCache) Restore(snapshot Snapshot, target int) bool

func (*RecurrentCache) Snapshot added in v0.18.3

func (c *RecurrentCache) Snapshot(fromOffset int) Snapshot

func (*RecurrentCache) Split added in v0.18.3

func (c *RecurrentCache) Split(snapshot Snapshot, at int) (Snapshot, Snapshot)

func (*RecurrentCache) State

func (c *RecurrentCache) State() []*mlx.Array

type RotatingKVCache

type RotatingKVCache struct {
	*KVCache
	// contains filtered or unexported fields
}

RotatingKVCache implements sliding window attention with bounded memory

func NewRotatingKVCache

func NewRotatingKVCache(maxSize int) *RotatingKVCache

func (*RotatingKVCache) Free added in v0.18.3

func (c *RotatingKVCache) Free()

func (*RotatingKVCache) Merge added in v0.18.3

func (c *RotatingKVCache) Merge(parent, child Snapshot) Snapshot

func (*RotatingKVCache) Restore added in v0.18.3

func (c *RotatingKVCache) Restore(snapshot Snapshot, target int) bool

func (*RotatingKVCache) Snapshot added in v0.18.3

func (c *RotatingKVCache) Snapshot(fromOffset int) Snapshot

func (*RotatingKVCache) Split added in v0.18.3

func (c *RotatingKVCache) Split(snapshot Snapshot, at int) (Snapshot, Snapshot)

func (*RotatingKVCache) State

func (c *RotatingKVCache) State() []*mlx.Array

func (*RotatingKVCache) Update

func (c *RotatingKVCache) Update(b *batch.Batch, keys, values *mlx.Array) *nn.KVHistory

Assumes B = 1; heterogeneous batches are not supported.

func (*RotatingKVCache) View added in v0.23.1

func (c *RotatingKVCache) View(_ *batch.Batch) *nn.KVHistory

View returns the current rotating cache contents in logical order for assistant KV sharing.

type Snapshot added in v0.18.3

type Snapshot interface {
	// Size returns the byte size of the paged-out data (in VRAM).
	Size() int
	// Close unpins the snapshot's arrays so they can be freed by Sweep.
	Close()
}

Snapshot is paged-out cache state that can be restored later.

type Speculation added in v0.23.1

type Speculation struct {
	// contains filtered or unexported fields
}

Speculation is an isolated cache transaction for speculative target validation. Updates record generated K/V without mutating the live caches; Commit appends only the accepted prefix to the live caches.

func (*Speculation) Commit added in v0.23.1

func (s *Speculation) Commit(n int)

Commit appends the accepted prefix from the speculative forward to the live caches. The target bonus token is intentionally not committed.

type Viewer added in v0.23.1

type Viewer interface {
	View(b *batch.Batch) *nn.KVHistory
}

Viewer exposes a read-only attention history for a cache.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL