谷歌中国开发者社区 (GDG)
  • 主页
  • 博客
    • Android
    • Design
    • GoogleCloud
    • GoogleMaps
    • GooglePlay
    • Web
  • 社区
    • 各地社区
    • 社区历史
    • GDG介绍
    • 社区通知
  • 视频
  • 资源
    • 资源汇总
    • 精选视频
    • 优酷频道

220,000 cores and counting: MIT math professor breaks record for largest ever Compute Engine job

2017-04-20adminGoogleCloudNo comments

By Alex Barrett, GCP Blog Editor & Michael Basilyan, Product Manager, Compute Engine

An MIT math professor recently broke the record for the largest ever Compute Engine cluster, with 220,000 cores on Preemptible VMs, the largest known high-performance computing cluster to ever run in the public cloud.

Andrew V. Sutherland is a computational number theorist and Principal Research Scientist at MIT, and is using Compute Engine to explore generalizations of the Sato-Tate Conjecture and the conjecture of Birch and Swinnerton-Dyer to curves of higher genus. In his latest run, he explored 1017hyperelliptic curves of genus 3 in an effort to find curves whose L-functions can be easily computed, and which have potentially interesting Sato-Tate distributions. This yielded about 70,000 curves of interest, each of which will eventually have its own entry in the L-functions and Modular Forms Database (LMFDB).

Finding suitable genus 3 curves “is like searching for a needle in a fifteen-dimensional haystack,” Sutherland said. “Sometimes I like to describe my work as building telescopes for mathematicians.”

It also requires a lot of compute cycles: For each curve that’s examined, its discriminant must be computed; the discriminant of a curve serves as an upper bound on the complexity of computing its L-function. This task is trivial in genus 1, but in genus 3 may involve evaluating a 50 million term polynomial in 15 variables. Each curve that’s a candidate for inclusion in the LMFDB must also have many other of its arithmetic and geometric invariants computed, including an approximation of its L-function and Sato-Tate distribution, as well as information about any symmetries it may possess. The results can be quite large, and some of this information is stored as Cloud Storage nearline objects. Researchers can browse summaries of the results on the curve’s home page in the LMFDB, or download more detailed information to their own computer for further examination. The LMFDB provides an online interface to some 400 gigabytes of metadata housed in a MongoDB database that also runs on Compute Engine.

Sutherland began using Compute Engine in 2015. For his first-ever job, he fired up 2,250 32-core instances and completed about 60 CPU-years of computation in a single afternoon.

Before settling on Compute Engine, Sutherland ran jobs on his own 64-core machine, which could take months, or wrangled for compute time on one of MIT’s clusters. But getting the number of cores he needed often raised eyebrows, and he was limited by the software configurations he could use. By running on Compute Engine, Sutherland can install exactly the operating system, libraries and applications he needs, and thanks to root access, he can update his environment at will.

Sutherland considered running his jobs on AWS before choosing Google but was dissuaded by its Spot Instances model, which forces you to name your price up front, with prices that can vary significantly by region and fluctuate over time. A colleague encouraged him to try Compute Engine Preemptible VMs. These are full-featured instances that are priced up to 80% less than regular equivalents, but can be interrupted by Compute Engine. That was fine with Sutherland. His computations are embarrassingly parallel — they can be easily separated into multiple, independent tasks — and he grabs available instances across any and all Google Cloud Regions. An average of about 2-3% of his instances are typically preempted in any given hour, but a simple script automatically restarts them as needed until the whole job is complete.

To coordinate the instances working on a job, Sutherland uses a combination of Cloud Storage and Datastore. He used the python client API to implement a simple ticketing system in Datastore that assigns tasks to instances. Instances periodically checkpoint their progress on their local disks from which they can recover if preempted, and they store their final output data in a Cloud Storage bucket, where it may undergo further post-processing once the job has finished.

All told, having access to the scale and flexibility of Compute Engine has freed Sutherland up to think much bigger with his research. For his next run, he hopes to expand his search to non-hyperelliptic curves of genus 3, breaking his own record with a 400,000-core cluster. “It changes your whole outlook on research when you can ask a question and get an answer in hours rather than months,” he said. “You ask different questions.”



Source: 220,000 cores and counting: MIT math professor breaks record for largest ever Compute Engine job

除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Tags: AdWords

Related Articles

Your favorite languages, now on Google App Engine

2017-03-14admin

How to do serverless pixel tracking with GCP

2017-05-19admin

Distributed TensorFlow and the hidden layers of engineering work

2017-08-19admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Recent Posts

  • Introducing the CVPR 2018 On-Device Visual Intelligence Challenge
  • Kubernetes best practices: How and why to build small container images
  • DeepVariant Accuracy Improvements for Genetic Datatypes
  • Congratulations to our US Grow with Google Developer Scholars!
  • Cloud SQL for PostgreSQL now generally available and ready for your production workloads

Recent Comments

  • 鸿维 on Google 帐号登录 API 更新
  • admin on 推出 CVPR 2018 学习图像压缩挑战赛
  • Henry Chen on 推出 CVPR 2018 学习图像压缩挑战赛
  • 王中 on Google 推出的 31 套在线课程
  • Francis Wang on Google 推出的 31 套在线课程

Archives

  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • January 1970

Categories

  • Android
  • Design
  • Firebase
  • GoogleCloud
  • GoogleDevFeeds
  • GoogleMaps
  • GooglePlay
  • Google动态
  • iOS
  • Uncategorized
  • VR
  • Web
  • WebMaster
  • 社区
  • 通知

Meta

  • Register
  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

最新文章

  • Introducing the CVPR 2018 On-Device Visual Intelligence Challenge
  • Kubernetes best practices: How and why to build small container images
  • DeepVariant Accuracy Improvements for Genetic Datatypes
  • Congratulations to our US Grow with Google Developer Scholars!
  • Cloud SQL for PostgreSQL now generally available and ready for your production workloads
  • Exploring container security: Protecting and defending your Kubernetes Engine network
  • BigQuery arrives in the Tokyo region
  • What’s new in Firebase Authentication?
  • Showcase your innovations at the 2018 China-US Young Makers Competition
  • Protecting WebView with Safe Browsing

最多查看

  • 谷歌招聘软件工程师 (19,918)
  • Google 推出的 31 套在线课程 (18,087)
  • 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (14,903)
  • Seti UI 主题: 让你编辑器焕然一新 (11,117)
  • Android Studio 2.0 稳定版 (8,419)
  • Android N 最初预览版:开发者 API 和工具 (7,752)
  • 像 Sublime Text 一样使用 Chrome DevTools (5,611)
  • Google I/O 2016: Android 演讲视频汇总 (5,387)
  • 用 Google Cloud 打造你的私有免费 Git 仓库 (4,896)
  • 面向普通开发者的机器学习应用方案 (4,734)
  • 生还是死?Android 进程优先级详解 (4,709)
  • 面向 Web 开发者的 Sublime Text 插件 (4,002)
  • 适配 Android N 多窗口特性的 5 个要诀 (3,838)
  • 参加 Google I/O Extended,观看 I/O 直播,线下聚会! (3,419)
© 2018 中国谷歌开发者社区 - ChinaGDG