A lightweight P2P-based cache system for model distributions.
Name Story: the inspiration of the name Manta is coming from Dota2, called Manta Style, which will create 2 images of your hero just like peers in the P2P network.
Architecture
Note: llmaz is just one kind of integrations, Manta can be deployed and used independently.
Features Overview
Preheat Models: Models could be preloaded to the cluster, or even specified nodes to accelerate the model serving.
Model Caching: Once models are downloaded, origin access is no longer necessary, but from other node peers.
Plug Framework: Filter and Score extension points could be customized with plugins to pick the right peers.
Model LCM: Manage the model lifecycles automatically with different configurations.
Memory Management(WIP): Specify the maximum reserved memory for use, and GC with LRU algorithm.