syncprefixcacheindexer

package
v0.5.0-rc.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2025 License: Apache-2.0 Imports: 10 Imported by: 0

README

Sync Prefix Cache Indexer Package

This package provides a high-performance, thread-safe indexing system for KV cache prefix matching in distributed LLM inference.

Overview

The syncprefixcacheindexer implements a two-level hash table structure optimized for:

  • Fast prefix matching across multiple pods
  • Efficient memory usage with configurable limits
  • Automatic eviction of stale entries
  • Thread-safe concurrent operations

Architecture

Two-Level Hash Structure
  1. First Level: Model context (model name + LoRA ID)
  2. Second Level: Prefix hash to pod mapping

This design enables efficient isolation between different models and adapters while supporting fast lookups.

Key Components
SyncPrefixHashTable (sync_hash.go)

The main data structure providing:

  • ProcessBlockStored: Index new KV cache blocks
  • ProcessBlockRemoved: Remove specific blocks
  • ProcessAllBlocksCleared: Clear all blocks for a pod
  • MatchPrefix: Find pods with matching token prefixes

Usage:

// Create indexer
indexer := NewSyncPrefixHashTable()

// Process a block stored event
event := BlockStored{
    BlockHashes: []int64{12345},
    Tokens:      [][]byte{tokenBytes},
    ModelName:   "llama-2-7b",
    LoraID:      -1,
    SourcePod:   "10.0.0.1",
}
err := indexer.ProcessBlockStored(event)

// Match prefix
matches, hashes := indexer.MatchPrefix(
    "llama-2-7b",
    -1,
    queryTokens,
    readyPods,
)
Event Types (events.go)
  • BlockStored: New blocks added to cache
  • BlockRemoved: Blocks removed from cache
  • AllBlocksCleared: All blocks cleared for a source

Configuration

Environment variables:

  • AIBRIX_SYNC_MAX_CONTEXTS: Max model contexts (default: 1000)
  • AIBRIX_SYNC_MAX_PREFIXES_PER_CONTEXT: Max prefixes per context (default: 10000)
  • AIBRIX_SYNC_EVICTION_INTERVAL_SECONDS: Eviction check interval (default: 60)
  • AIBRIX_SYNC_EVICTION_DURATION_MINUTES: Time before eviction (default: 20)
  • AIBRIX_PREFIX_CACHE_BLOCK_SIZE: Token block size (default: 16)

Performance

Optimizations
  • Lock-free reads for hot paths
  • Batch eviction to reduce lock contention
  • XXHash for fast, high-quality hashing
  • Memory pooling for reduced allocations
Benchmarks
go test -bench=. -benchmem ./pkg/utils/syncprefixcacheindexer/

Key metrics:

  • Insert: ~200ns per operation
  • Lookup: ~150ns per operation
  • Memory: ~64 bytes per prefix entry

Thread Safety

All operations are thread-safe through:

  • Read-write mutexes for each model context
  • Atomic operations for statistics
  • Safe iteration during eviction

Testing

Comprehensive test coverage including:

  • Unit tests for all operations
  • Concurrent operation stress tests
  • Memory leak detection
  • Performance benchmarks

Run tests:

go test ./pkg/utils/syncprefixcacheindexer/

Example Use Case

In a distributed LLM serving system:

  1. vLLM pods report their cached token prefixes
  2. The indexer maintains a global view of prefix availability
  3. The router queries the indexer to find pods with matching prefixes
  4. Requests are routed to pods with the highest prefix match

This significantly reduces computation by reusing cached KV states.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AllBlocksCleared

type AllBlocksCleared struct {
}

AllBlocksCleared represents an event when all blocks are cleared This is currently not implemented as per requirements

type BlockRemoved

type BlockRemoved struct {
	// BlockHashes contains the engine block hashes that were removed
	BlockHashes []int64

	// Context information
	ModelName string
	LoraID    int64 // -1 for no adapter

	// Source pod that removed these blocks (optional)
	SourcePod string
}

BlockRemoved represents an event when blocks are removed

type BlockStored

type BlockStored struct {
	// BlockHashes contains the engine block hashes that were stored
	BlockHashes []int64

	// ParentBlockHash is the optional parent block hash
	// nil means this is the first block in the sequence
	ParentBlockHash *int64

	// Tokens contains the token data for each block
	// The length should match BlockHashes
	Tokens [][]byte

	// Context information
	ModelName string
	LoraID    int64 // -1 for no adapter

	// Source pod that stored these blocks
	SourcePod string
}

BlockStored represents an event when blocks are stored

type ContextData

type ContextData struct {
	// contains filtered or unexported fields
}

ContextData holds all data for a specific context with separate locks

type EngineHashMapping

type EngineHashMapping struct {
	// contains filtered or unexported fields
}

EngineHashMapping maintains unidirectional mapping

type ModelContext

type ModelContext struct {
	ModelName string
	LoraID    int64 // -1 represents no LoRA adapter
}

ModelContext represents the first-level hash key

type PodInfo

type PodInfo struct {
	LastAccessTime atomic.Int64 // Unix timestamp (lock-free update)
	SourcePod      string
}

PodInfo stores pod access information

type PrefixStore

type PrefixStore struct {
	// contains filtered or unexported fields
}

PrefixStore manages prefix hashes for a specific (model, lora_id) context

type SyncPrefixHashTable

type SyncPrefixHashTable struct {
	// contains filtered or unexported fields
}

SyncPrefixHashTable is the main structure

func NewSyncPrefixHashTable

func NewSyncPrefixHashTable() *SyncPrefixHashTable

NewSyncPrefixHashTable creates a new sync prefix hash table

func (*SyncPrefixHashTable) AddPrefix

func (s *SyncPrefixHashTable) AddPrefix(modelName string, loraID int64, podName string, prefixHashes []uint64) error

AddPrefix adds prefix hashes for a specific model/lora context and pod

func (*SyncPrefixHashTable) Close

func (s *SyncPrefixHashTable) Close()

Close stops the eviction worker and cleans up resources

func (*SyncPrefixHashTable) GetPrefixHashes

func (s *SyncPrefixHashTable) GetPrefixHashes(tokens []byte) []uint64

GetPrefixHashes computes prefix hashes for given tokens

func (*SyncPrefixHashTable) MatchPrefix

func (s *SyncPrefixHashTable) MatchPrefix(modelName string, loraID int64, tokens []byte, readyPods map[string]struct{}) (map[string]int, []uint64)

MatchPrefix matches the input token prefix if already cached returns map[podname]%prefixmatch along with all prefix hashes

func (*SyncPrefixHashTable) ProcessAllBlocksCleared

func (s *SyncPrefixHashTable) ProcessAllBlocksCleared(event AllBlocksCleared) error

ProcessAllBlocksCleared handles AllBlocksCleared events (placeholder implementation)

func (*SyncPrefixHashTable) ProcessBlockRemoved

func (s *SyncPrefixHashTable) ProcessBlockRemoved(event BlockRemoved) error

ProcessBlockRemoved handles BlockRemoved events

func (*SyncPrefixHashTable) ProcessBlockStored

func (s *SyncPrefixHashTable) ProcessBlockStored(event BlockStored) error

ProcessBlockStored handles BlockStored events

func (*SyncPrefixHashTable) RemovePrefix

func (s *SyncPrefixHashTable) RemovePrefix(modelName string, loraID int64, podName string) error

RemovePrefix removes a specific pod from all prefix entries

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL