By Yoshi Tamura, Product Manager, Google Kubernetes Engine
Last year we introduced our first GPU offering for Google Kubernetes Engine with the alpha launch of NVIDIA Tesla GPUs and received an amazing customer response. Today, GPUs in Kubernetes Engine are in beta and ready to be used widely from the latest Kubernetes Engine release.
Using GPUs in Kubernetes Engine can turbocharge compute-intensive applications like machine learning (ML), image processing and financial modeling. By packaging your CUDA workloads into containers, you can benefit from the massive processing power of Kubernetes Engine’s GPUs whenever you need it, without having to manage hardware or even VMs.
With its best-in-class CPUs, GPUs, and now TPUs, Google Cloud provides the best choice, flexibility and performance for running ML workloads in the cloud. The ride-sharing pioneer Lyft, for instance, uses GPUs in Kubernetes Engine to accelerate training of its deep learning models.
“GKE clusters are ideal for deep learning workloads, with out-of-the box GPU integration, autoscaling clusters for our spiky training workloads, and integrated container logging and monitoring.”
— Luc Vincent, VP of Engineering at Lyft
Both the NVIDIA Tesla P100 and K80 GPUs are available as part of the beta—and V100s are on the way. Recently, we also introduced Preemptible GPUs as well as new lower prices to unlock new opportunities for you. Check out the latest prices for GPUs here.
Creating a cluster with GPUs in Kubernetes Engine is easy. From the Cloud Console, you can expand the machine type on the “Creating Kubernetes Cluster” page to select the types and the number of GPUs.
And if you want to add nodes with GPUs to your existing cluster, you can use the Node Pools and Cluster Autoscaler features. By using node pools with GPUs, your cluster can use GPUs whenever you need them. Autoscaler, meanwhile, can automatically create nodes with GPUs whenever pods requesting GPUs are scheduled, and scale down to zero when GPUs are no longer consumed by any active pods.
The following command creates a node pool with GPUs that can scale up to five nodes and down to zero nodes.
gcloud beta container node-pools create my-gpu-node-pool --accelerator=type=nvidia-tesla-p100,count=1 --cluster=my-existing-cluster --num-nodes 2 --min-nodes 0 --max-nodes 5 --enable-autoscaling
Behind the scenes, Kubernetes Engine applies taint and toleration techniques to ensure only pods requesting GPUs will be scheduled on the nodes with GPUs, and prevent pods that don’t require GPUs from running on them.
While Kubernetes Engine does a lot of things behind the scenes for you, we also want you to understand how your GPU jobs are performing. Kubernetes Engine exposes metrics for containers using GPUs, such as how busy the GPUs are, how much memory is available, and how much memory is allocated. You can also visualize these metrics by using Stackdriver.
|Figure 1: GPU duty cycle for three different jobs|
For a more detailed explanation of Kubernetes Engine with GPUs, for example installing NVIDIA drivers and how to configure a pod to consume GPUs, check out the documentation.
In 2017, Kubernetes Engine core-hours grew 9X year over year, and the platform is gaining momentum as a premier deployment platform for ML workloads. We’re very excited about open source projects like Kubeflow that make it easy, fast and extensible to run ML stacks in Kubernetes. We hope that the combination of these open-source ML projects and GPUs in Kubernetes Engine will help you innovate in business, engineering and science.
Thanks for the support and feedback in shaping our roadmap to better serve your needs. Keep the conversation going, and connect with us on the Kubernetes Engine Slack channel.