# aws-virtual-gpu-device-plugin **Repository Path**: tengfeiwu/aws-virtual-gpu-device-plugin ## Basic Information - **Project Name**: aws-virtual-gpu-device-plugin - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-06-15 - **Last Updated**: 2021-06-15 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 说明 原` git `仓库链接:https://github.com/awslabs/aws-virtual-gpu-device-plugin # Virtual GPU device plugin for Kubernetes The virtual device plugin for Kubernetes is a Daemonset that allows you to automatically: - Expose arbitrary number of virtual GPUs on GPU nodes of your cluster. - Run ML serving containers backed by Accelerator with low latency and low cost in your Kubernetes cluster. This repository contains AWS virtual GPU implementation of the [Kubernetes device plugin](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-management/device-plugin.md). ## Prerequisites The list of prerequisites for running the virtual device plugin is described below: * NVIDIA drivers ~= 361.93 * nvidia-docker version > 2.0 (see how to [install](https://github.com/NVIDIA/nvidia-docker) and it's [prerequisites](https://github.com/nvidia/nvidia-docker/wiki/Installation-\(version-2.0\)#prerequisites)) * docker configured with nvidia as the [default runtime](https://github.com/NVIDIA/nvidia-docker/wiki/Advanced-topics#default-runtime). * Kubernetes version >= 1.10 ## Limitations * This solution is build on top of Volta [Multi-Process Service(MPS)](https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf). You can only use it on instances types with Tesla-V100 or newer. (Only [Amazon EC2 P3 Instances](https://aws.amazon.com/ec2/instance-types/p3/) and [Amazon EC2 G4 Instances](https://aws.amazon.com/ec2/instance-types/g4/) now) * Virtual GPU device plugin by default set GPU compute mode to `EXCLUSIVE_PROCESS` which means GPU is assigned to MPS process, individual process threads can submit work to GPU concurrently via MPS server. This GPU can not be used for other purpose. * Virtual GPU device plugin only on single physical GPU instance like P3.2xlarge if you request `k8s.amazonaws.com/vgpu` more than 1 in the workloads. * Virtual GPU device plugin can not work with [Nvidia device plugin](https://github.com/NVIDIA/k8s-device-plugin) together. You can label nodes and use selector to install Virtual GPU device plugin. ## High Level Design ![device-plugin](./static/img/device-plugin.png) ## Quick Start ### Label GPU node groups ```bash kubectl label node k8s.amazonaws.com/accelerator=vgpu ``` ### Enabling virtual GPU Support in Kubernetes Update node selector label in the manifest file to match with labels of your GPU node group, then apply it to Kubernetes. ```shell $ kubectl create -f https://raw.githubusercontent.com/awslabs/aws-virtual-gpu-device-plugin/v0.1.1/manifests/device-plugin.yml ``` ### Running GPU Jobs Virtual NVIDIA GPUs can now be consumed via container level resource requirements using the resource name `k8s.amazonaws.com/vgpu`: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: resnet-deployment spec: replicas: 3 selector: matchLabels: app: resnet-server template: metadata: labels: app: resnet-server spec: # hostIPC is required for MPS communication hostIPC: true containers: - name: resnet-container image: seedjeffwan/tensorflow-serving-gpu:resnet args: # Make sure you set limit based on the vGPU account to avoid tf-serving process occupy all the gpu memory - --per_process_gpu_memory_fraction=0.2 env: - name: MODEL_NAME value: resnet ports: - containerPort: 8501 # Use virtual gpu resource here resources: limits: k8s.amazonaws.com/vgpu: 1 volumeMounts: - name: nvidia-mps mountPath: /tmp/nvidia-mps volumes: - name: nvidia-mps hostPath: path: /tmp/nvidia-mps ``` > **WARNING:** *if you don't request GPUs when using the device plugin all > the GPUs on the machine will be exposed inside your container.* Check the full example [here](./examples/README.md) ## Development Please check [Development](./DEVELOPMENT.md) for more details. ## Credits The project idea comes from [@RenaudWasTaken](https://github.com/RenaudWasTaken) comment in [kubernetes/kubernetes#52757](https://github.com/kubernetes/kubernetes/issues/52757#issuecomment-402772200) and Alibaba’s solution from [@cheyang](https://github.com/cheyang) [GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters](https://www.alibabacloud.com/blog/gpu-sharing-scheduler-extender-now-supports-fine-grained-kubernetes-clusters_594926). ## Reference AWS: - 28 Nov 2018 - [Amazon Elastic Inference – GPU-Powered Deep Learning Inference Acceleration](https://aws.amazon.com/blogs/aws/-amazon-elastic-inference-gpu-powered-deep-learning-inference-acceleration/) - 2 Dec 2018 - [Amazon Elastic Inference - Reduce Deep Learning inference costs by 75%](https://www.slideshare.net/AmazonWebServices/new-launch-introducing-amazon-elastic-inference-reduce-deep-learning-inference-cost-up-to-75-aim366-aws-reinvent-2018) - 30 JUL 2019 - [Running Amazon Elastic Inference Workloads on Amazon ECS](https://aws.amazon.com/blogs/machine-learning/running-amazon-elastic-inference-workloads-on-amazon-ecs/) - 06 SEP 2019 - [Optimizing TensorFlow model serving with Kubernetes and Amazon Elastic Inference](https://aws.amazon.com/blogs/machine-learning/optimizing-tensorflow-model-serving-with-kubernetes-and-amazon-elastic-inference/) - 03 DEC 2019 - [Introducing Amazon EC2 Inf1 Instances, high performance and the lowest cost machine learning inference in the cloud](https://aws.amazon.com/about-aws/whats-new/2019/12/introducing-amazon-ec2-inf1-instances-high-performance-and-the-lowest-cost-machine-learning-inference-in-the-cloud/) Community: - [Nvidia Turing GPU Architecture](https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf) - [Nvidia Tesla V100 GPU Architecture](https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf) - [Is sharing GPU to multiple containers feasible?](https://github.com/kubernetes/kubernetes/issues/52757) - [Fractional GPUs: Software-based Compute and Memory Bandwidth Reservation for GPUs](http://www.andrew.cmu.edu/user/sakshamj/papers/FGPU_RTAS_2019_Fractional_GPUs_Software_based_Compute_and_Memory_Bandwidth_Reservation_for_GPUs.pdf) - [GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters](https://www.alibabacloud.com/blog/gpu-sharing-scheduler-extender-now-supports-fine-grained-kubernetes-clusters_594926) - [GPU Sharing for Machine Learning Workload on Kubernetes - Henry Zhang & Yang Yu, VMware](https://www.youtube.com/watch?v=T4i33nnSZtc) - [Deep Learning inference cost optimization practice on Kubernetes - Tencent](https://static.sched.com/hosted_files/kccncosschn19eng/c5/Tencent%20Cloud%20(Chinese%20Ver.)_%E5%9F%BA%E4%BA%8EKubernetes%E8%BF%9B%E8%A1%8C%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86%E7%9A%84%E6%88%90%E6%9C%AC%E4%BC%98%E5%8C%96%E5%AE%9E%E8%B7%B5-KubeCon_China_2019.pdf) - [Gaia Scheduler: A Kubernetes-Based Scheduler Framework](https://www.semanticscholar.org/paper/Gaia-Scheduler%3A-A-Kubernetes-Based-Scheduler-Song-Deng/bf8badfda7ad15f39cae890a5ab08fd9f4374700) ## License This project is licensed under the Apache-2.0 License.