syncprefixcacheindexer

package

v0.5.0-rc.1 Latest Latest Go to latest Published: Oct 25, 2025 License: Apache-2.0 Imports: 10 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/vllm-project/aibrix

Links

Open Source Insights

README ¶

Sync Prefix Cache Indexer Package

This package provides a high-performance, thread-safe indexing system for KV cache prefix matching in distributed LLM inference.

Overview

The syncprefixcacheindexer implements a two-level hash table structure optimized for:

Fast prefix matching across multiple pods
Efficient memory usage with configurable limits
Automatic eviction of stale entries
Thread-safe concurrent operations

Architecture

Two-Level Hash Structure

First Level: Model context (model name + LoRA ID)
Second Level: Prefix hash to pod mapping

This design enables efficient isolation between different models and adapters while supporting fast lookups.

Key Components

SyncPrefixHashTable (`sync_hash.go`)

The main data structure providing:

ProcessBlockStored: Index new KV cache blocks
ProcessBlockRemoved: Remove specific blocks
ProcessAllBlocksCleared: Clear all blocks for a pod
MatchPrefix: Find pods with matching token prefixes

Usage:

// Create indexer
indexer := NewSyncPrefixHashTable()

// Process a block stored event
event := BlockStored{
    BlockHashes: []int64{12345},
    Tokens:      [][]byte{tokenBytes},
    ModelName:   "llama-2-7b",
    LoraID:      -1,
    SourcePod:   "10.0.0.1",
}
err := indexer.ProcessBlockStored(event)

// Match prefix
matches, hashes := indexer.MatchPrefix(
    "llama-2-7b",
    -1,
    queryTokens,
    readyPods,
)

Event Types (`events.go`)

BlockStored: New blocks added to cache
BlockRemoved: Blocks removed from cache
AllBlocksCleared: All blocks cleared for a source

Configuration

Environment variables:

AIBRIX_SYNC_MAX_CONTEXTS: Max model contexts (default: 1000)
AIBRIX_SYNC_MAX_PREFIXES_PER_CONTEXT: Max prefixes per context (default: 10000)
AIBRIX_SYNC_EVICTION_INTERVAL_SECONDS: Eviction check interval (default: 60)
AIBRIX_SYNC_EVICTION_DURATION_MINUTES: Time before eviction (default: 20)
AIBRIX_PREFIX_CACHE_BLOCK_SIZE: Token block size (default: 16)

Performance

Optimizations

Lock-free reads for hot paths
Batch eviction to reduce lock contention
XXHash for fast, high-quality hashing
Memory pooling for reduced allocations

Benchmarks

go test -bench=. -benchmem ./pkg/utils/syncprefixcacheindexer/

Key metrics:

Insert: ~200ns per operation
Lookup: ~150ns per operation
Memory: ~64 bytes per prefix entry

Thread Safety

All operations are thread-safe through:

Read-write mutexes for each model context
Atomic operations for statistics
Safe iteration during eviction

Testing

Comprehensive test coverage including:

Unit tests for all operations
Concurrent operation stress tests
Memory leak detection
Performance benchmarks

Run tests:

go test ./pkg/utils/syncprefixcacheindexer/

Example Use Case

In a distributed LLM serving system:

vLLM pods report their cached token prefixes
The indexer maintains a global view of prefix availability
The router queries the indexer to find pods with matching prefixes
Requests are routed to pods with the highest prefix match

This significantly reduces computation by reusing cached KV states.

Documentation ¶

Index ¶

type AllBlocksCleared
type BlockRemoved
type BlockStored
type ContextData
type EngineHashMapping
type ModelContext
type PodInfo
type PrefixStore
type SyncPrefixHashTable
- func NewSyncPrefixHashTable() *SyncPrefixHashTable

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AllBlocksCleared ¶

type AllBlocksCleared struct {
}

AllBlocksCleared represents an event when all blocks are cleared This is currently not implemented as per requirements

type BlockRemoved ¶

type BlockRemoved struct {
	// BlockHashes contains the engine block hashes that were removed
	BlockHashes []int64

	// Context information
	ModelName string
	LoraID    int64 // -1 for no adapter

	// Source pod that removed these blocks (optional)
	SourcePod string
}

BlockRemoved represents an event when blocks are removed

type BlockStored ¶

type BlockStored struct {
	// BlockHashes contains the engine block hashes that were stored
	BlockHashes []int64

	// ParentBlockHash is the optional parent block hash
	// nil means this is the first block in the sequence
	ParentBlockHash *int64

	// Tokens contains the token data for each block
	// The length should match BlockHashes
	Tokens [][]byte

	// Context information
	ModelName string
	LoraID    int64 // -1 for no adapter

	// Source pod that stored these blocks
	SourcePod string
}

BlockStored represents an event when blocks are stored

type ContextData ¶

type ContextData struct {
	// contains filtered or unexported fields
}

ContextData holds all data for a specific context with separate locks

type EngineHashMapping ¶

type EngineHashMapping struct {
	// contains filtered or unexported fields
}

EngineHashMapping maintains unidirectional mapping

type ModelContext ¶

type ModelContext struct {
	ModelName string
	LoraID    int64 // -1 represents no LoRA adapter
}

ModelContext represents the first-level hash key

type PodInfo ¶

type PodInfo struct {
	LastAccessTime atomic.Int64 // Unix timestamp (lock-free update)
	SourcePod      string
}

PodInfo stores pod access information

type PrefixStore ¶

type PrefixStore struct {
	// contains filtered or unexported fields
}

PrefixStore manages prefix hashes for a specific (model, lora_id) context

type SyncPrefixHashTable ¶

type SyncPrefixHashTable struct {
	// contains filtered or unexported fields
}

SyncPrefixHashTable is the main structure

func NewSyncPrefixHashTable ¶

func NewSyncPrefixHashTable() *SyncPrefixHashTable

NewSyncPrefixHashTable creates a new sync prefix hash table

func (*SyncPrefixHashTable) AddPrefix ¶

func (s *SyncPrefixHashTable) AddPrefix(modelName string, loraID int64, podName string, prefixHashes []uint64) error

AddPrefix adds prefix hashes for a specific model/lora context and pod

func (*SyncPrefixHashTable) Close ¶

func (s *SyncPrefixHashTable) Close()

Close stops the eviction worker and cleans up resources

func (*SyncPrefixHashTable) GetPrefixHashes ¶

func (s *SyncPrefixHashTable) GetPrefixHashes(tokens []byte) []uint64

GetPrefixHashes computes prefix hashes for given tokens

func (*SyncPrefixHashTable) MatchPrefix ¶

func (s *SyncPrefixHashTable) MatchPrefix(modelName string, loraID int64, tokens []byte, readyPods map[string]struct{}) (map[string]int, []uint64)

MatchPrefix matches the input token prefix if already cached returns map[podname]%prefixmatch along with all prefix hashes

func (*SyncPrefixHashTable) ProcessAllBlocksCleared ¶

func (s *SyncPrefixHashTable) ProcessAllBlocksCleared(event AllBlocksCleared) error

ProcessAllBlocksCleared handles AllBlocksCleared events (placeholder implementation)

func (*SyncPrefixHashTable) ProcessBlockRemoved ¶

func (s *SyncPrefixHashTable) ProcessBlockRemoved(event BlockRemoved) error

ProcessBlockRemoved handles BlockRemoved events

func (*SyncPrefixHashTable) ProcessBlockStored ¶

func (s *SyncPrefixHashTable) ProcessBlockStored(event BlockStored) error

ProcessBlockStored handles BlockStored events

func (*SyncPrefixHashTable) RemovePrefix ¶

func (s *SyncPrefixHashTable) RemovePrefix(modelName string, loraID int64, podName string) error

RemovePrefix removes a specific pod from all prefix entries

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Sync Prefix Cache Indexer Package

Overview

Architecture

Two-Level Hash Structure

Key Components

SyncPrefixHashTable (sync_hash.go)

Event Types (events.go)

Configuration

Performance

Optimizations

Benchmarks

Thread Safety

Testing

Example Use Case

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

Types ¶

type AllBlocksCleared ¶

type BlockRemoved ¶

type BlockStored ¶

type ContextData ¶

type EngineHashMapping ¶

type ModelContext ¶

type PodInfo ¶

type PrefixStore ¶

type SyncPrefixHashTable ¶

func NewSyncPrefixHashTable ¶

func (*SyncPrefixHashTable) AddPrefix ¶

func (*SyncPrefixHashTable) Close ¶

func (*SyncPrefixHashTable) GetPrefixHashes ¶

func (*SyncPrefixHashTable) MatchPrefix ¶

func (*SyncPrefixHashTable) ProcessAllBlocksCleared ¶

func (*SyncPrefixHashTable) ProcessBlockRemoved ¶

func (*SyncPrefixHashTable) ProcessBlockStored ¶

func (*SyncPrefixHashTable) RemovePrefix ¶

Source Files ¶

SyncPrefixHashTable (`sync_hash.go`)

Event Types (`events.go`)