谷歌中国开发者社区 (GDG)
  • 主页
  • 博客
    • Android
    • Design
    • GoogleCloud
    • GoogleMaps
    • GooglePlay
    • Web
  • 社区
    • 各地社区
    • 社区历史
    • GDG介绍
    • 社区通知
  • 视频
  • 资源
    • 资源汇总
    • 精选视频
    • 优酷频道

MLPerf benchmark establishes that Google Cloud offers the most accessible scale for machine learning training

2018-12-13adminGoogleCloudNo comments

Source: MLPerf benchmark establishes that Google Cloud offers the most accessible scale for machine learning training from Google Cloud

Today marks the debut of the MLPerf 0.5 benchmark results. These tests have been designed, adopted, and promoted by many industry leaders, and the results show Google Cloud’s TPUs (Tensor Processing Units) and TPU Pods as leading systems for training machine learning models at scale, based on competitive performance across several MLPerf tests. Google Cloud customers can easily use Cloud TPUs at accessible prices today.

MLPerf benchmarks measure performance for training workloads across cloud providers and on-premise hardware platforms. MLPerf is designed to establish metrics that help you make informed decisions on how to choose the right infrastructure for your machine learning workloads. Google is a core contributor to the MLPerf benchmark suite, along with many other companies and academic institutions. Each organization conducts its own testing and submits its own results for publication, contributing to a broad survey of machine learning infrastructure available today.

For data scientists, ML practitioners, and researchers, building on-premise GPU clusters for training is capital-intensive and time-consuming—it’s much simpler to access both GPU and TPU infrastructure on Google Cloud. We’re pleased to see that MLPerf benchmark results provide evidence that GCP offers the ideal platform to train machine learning models at any scale.

Understanding the time-to-accuracy results

In our MLPerf submission, we benchmarked accelerators available on our Google Cloud infrastructure, with a focus on our latest Cloud TPUs (versions 2 and 3, both on GCP), and also on our state-of-the-art TPU v3 Pods. We submitted results for ResNet-50, an industry-standard image classification network; NMT, a neural machine translation model; and SSD, a single-shot object detector[1,2,3,4,5,6,7,8,9,10,11,12].

number_chips.png

The graphic below shows absolute training times, comparing NVIDIA’s best submitted results on a DGX-2 machine (containing 16 V100 GPUs) with results using 1/64th of a TPU v3 Pod (16 TPU v3 chips used for training). The comparison ranges across image classification (ResNet-50), object detection (SSD), and neural machine translation (NMT).

training_times.png
Training time comparison between 1/64th of a TPU v3 Pod (16 TPU v3 chips used for training, plus four separate Cloud TPU v2 chips used for evaluation) [9,10,11] and an NVIDIA DGX-2 (16 V100 GPUs) [13,14,15]

In summary, Google’s Cloud TPUs and TPU Pods deliver always-available, high-performance training across multiple workloads, ranging from image understanding to language translation. For example, it’s possible to achieve a 19% speed-up with a TPU v3 Pod on a chip-to-chip basis versus the current best-in-class on-premise system when tested on ResNet-501.

Making high performance compute accessible to everyone

eBay has been using Cloud TPU Pods for months and has seen a massive reduction in training time:

“An important ML task that took more than 40 days to run on our in-house systems completed in just four days on a fraction of a TPUv2 Pod, a 10X reduction in training time,” explains Shuai Zheng, eBay Research Scientist. “This is a game changer—the dramatic increase in training speed not only allows us to iterate faster but also allows us to avoid large up-front capital expenditures.”

CloudTPU_v2_pod.png
A full Google CloudTPU v2 Pod is able to achieve similar training times as NVIDIA’s largest-submitted-scale on-premise system consisting of 80 DGX-1s (11.3 min [9] vs. 6.3 min [16] respectively) on the ResNet-50 v1.5 image classification task using fewer than half the ML accelerator chips (256 versus 640).

Conclusion

Cloud TPUs and TPU Pods excel at many machine learning training workloads, from image training to language translation. As machine learning continues to become more and more central to their businesses, enterprises are turning to the cloud for the high performance and low cost of training of ML models. The MLPerf results reveal a 19% TPU performance advantage on a chip-to-chip basis, and even greater speedups and cost savings are possible when working with more realistic ML production workloads. For a detailed analysis of performance and cost in a training scenario with much larger inputs than MLPerf uses, see our companion blog post.

Head over to MLPerf.org for the full set of benchmark results. To find out more about Cloud TPUs, read our documentation. You can learn how to get started with individual Cloud TPUs (you can decide between v2 and v3here), or learn how to use Cloud TPUs via Cloud ML Enginehere, or try out Cloud TPUs for free right in your browser using a Colab notebook here.

Cloud TPUv2 Pods are currently available in alpha. If you’re interested in using Cloud TPU Pods, you can request access here.

1. 19% speedup with a 16-chip TPU v3 system compared to the 16-chip DGX-2 on-premise system from NVIDIA.
[*] All results herein are for MLPerf Training v0.5 Closed Division. All results are retrieved from “https://mlperf.org/results” on 12/12/2018. MLPerf is a trademark.
[1] ResNet-50 v1.5 result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.1.
[2] SSD result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.2.
[3] NMT result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.4.
[4] ResNet-50 v1.5 result by Google on TPUv2.512 + TPUv2.8 (260 chips) using TF 1.12. Result id: 0.5.3.1.
[5] ResNet-50 v1.5 result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.1.
[6] SSD result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.2.
[7] NMT result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.4.
[8] 8x Volta V100 result by Google on 8x Volta V100 (8 chips) using TF 1.12 and cuDNN 7.4. Result id: 0.5.5.1.
[9] ResNet-50 v1.5 result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.1.
[10] SSD result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.2.
[11] NMT result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.4.
[12] ResNet-50 v1.5 result by Google on TPUv3.512 + TPUv2.8 (260 chips) using TF 1.12. Result id: 0.5.27.1.
[13] ResNet-50 v1.5 result by NVIDIA on DGX-2 (16 chips) using ngc18.11_MXNet, cuDNN 7.4. Result id: 0.5.18.1.
[14] SSD result by NVIDIA on DGX-2 (16 chips) using ngc18.11_pyTorch and cuDNN 7.4. Result id: 0.5.19.2.
[15] NMT result by NVIDIA on DGX-2 (16 chips) using ngc18.11_pyTorch and cuDNN 7.4. Result id: 0.5.19.4.
[16] ResNet-50 v1.5 result by NVIDIA on 80x DGX-1 (640 chips) using ngc18.11_MXNet and cuDNN 7.4. Result id: 0.5.17.1.

除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Tags: Cloud

Related Articles

Sharpen your skills and stay competitive with the G Suite Certification

2019-02-01admin

Introducing headless Chrome support in Cloud Functions and App Engine

2018-08-16admin

How to show place type icons in Place Autocomplete results

2019-09-07admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Recent Posts

  • Admin Essentials: know your options for Modern Enterprise Browser Management
  • TheVentureCity and Google Consolidate Miami as a Tech Powerhouse
  • Keep a better eye on your Google Cloud environment
  • Using HLL++ to speed up count-distinct in massive datasets
  • Season of Docs Announces Results of 2019 Program

Recent Comments

  • admin on Using advanced Kubernetes autoscaling with Vertical Pod Autoscaler and Node Auto Provisioning
  • Martijn on Using advanced Kubernetes autoscaling with Vertical Pod Autoscaler and Node Auto Provisioning
  • Martijn on Using advanced Kubernetes autoscaling with Vertical Pod Autoscaler and Node Auto Provisioning
  • Chen Zhixiang on Concurrent marking in V8
  • admin on 使用 Android Jetpack 加快应用开发速度

Archives

  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • January 1970

Categories

  • Android
  • Design
  • Firebase
  • GoogleCloud
  • GoogleDevFeeds
  • GoogleMaps
  • GooglePlay
  • Google动态
  • iOS
  • Uncategorized
  • VR
  • Web
  • WebMaster
  • 社区
  • 通知

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

最新文章

  • Admin Essentials: know your options for Modern Enterprise Browser Management
  • TheVentureCity and Google Consolidate Miami as a Tech Powerhouse
  • Keep a better eye on your Google Cloud environment
  • Using HLL++ to speed up count-distinct in massive datasets
  • Season of Docs Announces Results of 2019 Program
  • Admin Insider: What's new in Chrome Enterprise, Release 79
  • Discover insights from text with AutoML Natural Language, now generally available
  • Introducing Storage Transfer Service for on-premises data
  • How Mynd uses G Suite to manage a flurry of acquisitions
  • W3C Trace Context Specification: What it Means for You

最多查看

  • 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (25,381)
  • Google 推出的 31 套在线课程 (22,461)
  • 谷歌招聘软件工程师 (22,337)
  • Seti UI 主题: 让你编辑器焕然一新 (13,824)
  • Android Studio 2.0 稳定版 (9,420)
  • Android N 最初预览版:开发者 API 和工具 (8,036)
  • 像 Sublime Text 一样使用 Chrome DevTools (6,325)
  • 用 Google Cloud 打造你的私有免费 Git 仓库 (6,077)
  • Google I/O 2016: Android 演讲视频汇总 (5,609)
  • 面向普通开发者的机器学习应用方案 (5,539)
  • 生还是死?Android 进程优先级详解 (5,233)
  • 面向 Web 开发者的 Sublime Text 插件 (4,341)
  • 适配 Android N 多窗口特性的 5 个要诀 (4,311)
  • 参加 Google I/O Extended,观看 I/O 直播,线下聚会! (3,624)
© 2019 中国谷歌开发者社区 - ChinaGDG