Serverless-sample-apps
Applications for serverless test
Quick Start
Deploy http server
make docker-build
kubectl apply -f install/http_server.yaml
Test http server
- Get pod name
kubectl get pods
- Forward port in another terminal
kubectl port-forward <pod name> 8080:8080
- Test
curl localhost:8080
Model Test
This part shows how to download models from Hugging Face (via Transformers).
Prerequests
- Create a virtual environment.
- Install PyTorch and Transformers
Download the model
Please note, by default, the following line will download 'opt-125m' which contains 125 million parameters and try to distribute the model over all available devices.
python python/opt/download.py
To limit the devices used, please set CUDA_VISIBLE_DEVICES. For example, the following lines run two opt-13b models on GPU 0,1 and 2,3, which is the case shown in the below figure.
CUDA_VISIBLE_DEVICES=0,1 python python/opt/download.py --model-name opt-13b &
CUDA_VISIBLE_DEVICES=2,3 python python/opt/download.py --model-name opt-13b &
CPP Model Test
This part shows how to do model inference with C++. Make sure you have finished the above steps.
Prerequests
- Download and install PyTorch C++ library according to this. Please make sure you can successfully run the example.
- Transform the model to TorchScript. The following line will transform the model to TorchScript and save it to
python/opt/[model_name].pt.
cd python/opt
CUDA_VISIBLE_DEVICES=0 python download_and_trace.py --model-name opt-1.3b
Build and run
mkdir build
cd build
cmake ..
cmake --build . -j $(nproc)
./inference ../opt-1.3b.pt
TODO
Please compare the results with the Python version.