A lightweight P2P-based cache system for model distributions on Kubernetes.
Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.
Architecture
Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.
Features Overview
Model Preheat: Models could be preloaded to clusters, to specified nodes to accelerate the model serving.
Model Cache: Models will be cached after downloading for faster model loading.
Model Lifecycle Management: Manage the model lifecycle automatically with different policies, like Retain or Delete.
Plugin Framework: Filter and Score plugins could be extended to pick up the best candidates.
Memory Management(WIP): Manage the reserved memories for caching, together with LRU algorithm for GC.