
    TensorFusion.AI
Next-Generation GPU Virtualization and Pooling for Enterprises
Less GPUs, More AI Apps.
    
    Explore the docs ยป
    
    View Demo
    |
    Report Bug
    |
    Request Feature
  
โพ๏ธ Tensor Fusion
 
 
 
 
 
 

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.
๐ Highlights
๐ Fractional GPU with Single TFlops/MiB Precision
๐ Battle-tested GPU-over-IP Remote GPU Sharing
โ๏ธ GPU-first Scheduling and Auto-scaling
๐ Computing Oversubscription and GPU VRAM Expansion
๐ซ GPU Pooling, Monitoring, Live Migration, AI Model Preloading and more
๐ฌ Demo
Fractional vGPU & GPU-over-IP & Distributed Allocation

AI Infra Console

GPU Live-migration [End-to-end feature WIP]
https://cdn.tensor-fusion.ai/GPU_Content_Migration.mp4
๐ Quick Start
Onboard Your Own AI Infra
๐ฌ Discussion
๐ฎ Features & Roadmap
Core GPU Virtualization Features
-  Fractional GPU and flexible oversubscription
-  Remote GPU sharing with SOTA GPU-over-IP technology, less than 4% performance loss
-  GPU VRAM expansion and hot/warm/cold tiering
-  None NVIDIA GPU/NPU vendor support
Pooling & Scheduling & Management
-  GPU/NPU pool management in Kubernetes
-  GPU-first scheduling and allocation, with single TFlops/MB precision
-  GPU node auto provisioning/termination
-  GPU compaction/bin-packing
-  Seamless onboarding experience for Pytorch, TensorFlow, llama.cpp, vLLM, Tensor-RT, SGlang and all popular AI training/serving frameworks
-  Centralized Dashboard & Control Plane
-  GPU-first autoscaling policies, auto set requests/limits/replicas
-  Request multiple vGPUs with group scheduling for large models
-  Support different QoS levels
Enterprise Features
-  GPU live-migration, snapshot/distribute/restore GPU context cross cluster, fastest in the world
-  AI model registry and preloading, build your own private MaaS(Model-as-a-Service)
-  Advanced auto-scaling policies, scale to zero, rebalance of hot GPUs
-  Advanced observability features, detailed metrics & tracing/profiling of CUDA calls
-  Monetization your GPU cluster by multi-tenancy usage measurement & billing report
-  Enterprise level high availability and resilience, support topology aware scheduling, GPU node auto failover etc.
-  Enterprise level security, complete on-premise deployment support, encryption in-transit & at-rest
-  Enterprise level compliance, SSO/SAML support, advanced audit, ReBAC control, SOC2 and other compliance reports available
-  Run on Linux Kubernetes clusters
-  Run on Linux VMs or Bare Metal (one-click onboarding to Edge K3S)
-  Run on Windows (Docs not ready, contact us for support)
-  Run on MacOS (Imagining mount a virtual NVIDIA GPU device on MacOS!)
See the open issues for a full list of proposed features (and known issues).
๐ Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (git checkout -b feature/AmazingFeature)
- Commit your Changes (git commit -m 'Add some AmazingFeature')
- Push to the Branch (git push origin feature/AmazingFeature)
- Open a Pull Request
Top contributors
   
๐ท License
- This repo is open sourced with Apache 2.0 License, which includes GPU pooling, scheduling, management features, you can use it for free and modify it.
- GPU virtualization and GPU-over-IP features are also free to use as the part of Community Plan, the implementation is not fully open sourced
- Features mentioned in "Enterprise Features" above are paid, licensed users can automatically unlock these features.
