谷歌中国开发者社区 (GDG)
  • 主页
  • 博客
    • Android
    • Design
    • GoogleCloud
    • GoogleMaps
    • GooglePlay
    • Web
  • 社区
    • 各地社区
    • 社区历史
    • GDG介绍
    • 社区通知
  • 视频
  • 资源
    • 资源汇总
    • 精选视频
    • 优酷频道

Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing

2018-11-03adminGoogleDevFeedsNo comments

Source: Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing from Google Research

Posted by Jacob Devlin and Ming-Wei Chang, Research Scientists, Google AI Language

One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch.

This week, we open sourced a new technique for NLP pre-training called Bidirectional Encoder Representations from Transformers, or BERT. With this release, anyone in the world can train their own state-of-the-art question answering system (or a variety of other models) in about 30 minutes on a single Cloud TPU, or in a few hours using a single GPU. The release includes source code built on top of TensorFlow and a number of pre-trained language representation models. In our associated paper, we demonstrate state-of-the-art results on 11 NLP tasks, including the very competitive Stanford Question Answering Dataset (SQuAD v1.1).

What Makes BERT Different?
BERT builds upon recent work in pre-training contextual representations — including Semi-supervised Sequence Learning, Generative Pre-Training, ELMo, and ULMFit. However, unlike these previous models, BERT is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus (in this case, Wikipedia).

Why does this matter? Pre-trained representations can either be context-free or contextual, and contextual representations can further be unidirectional or bidirectional. Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary. For example, the word “bank” would have the same context-free representation in “bank account” and “bank of the river.” Contextual models instead generate a representation of each word that is based on the other words in the sentence. For example, in the sentence “I accessed the bank account,” a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account.” However, BERT represents “bank” using both its previous and next context — “I accessed the … account” — starting from the very bottom of a deep neural network, making it deeply bidirectional.

A visualization of BERT’s neural network architecture compared to previous state-of-the-art contextual pre-training methods is shown below. The arrows indicate the information flow from one layer to the next. The green boxes at the top indicate the final contextualized representation of each input word:

BERT is deeply bidirectional, OpenAI GPT is unidirectional, and ELMo is shallowly bidirectional.

The Strength of Bidirectionality
If bidirectionality is so powerful, why hasn’t it been done before? To understand why, consider that unidirectional models are efficiently trained by predicting each word conditioned on the previous words in the sentence. However, it is not possible to train bidirectional models by simply conditioning each word on its previous and next words, since this would allow the word that’s being predicted to indirectly “see itself” in a multi-layer model.

To solve this problem, we use the straightforward technique of masking out some of the words in the input and then condition each word bidirectionally to predict the masked words. For example:

While this idea has been around for a very long time, BERT is the first time it was successfully used to pre-train a deep neural network.

BERT also learns to model relationships between sentences by pre-training on a very simple task that can be generated from any text corpus: Given two sentences A and B, is B the actual next sentence that comes after A in the corpus, or just a random sentence? For example:

Training with Cloud TPUs
Everything that we’ve described so far might seem fairly straightforward, so what’s the missing piece that made it work so well? Cloud TPUs. Cloud TPUs gave us the freedom to quickly experiment, debug, and tweak our models, which was critical in allowing us to move beyond existing pre-training techniques. The Transformer model architecture, developed by researchers at Google in 2017, also gave us the foundation we needed to make BERT successful. The Transformer is implemented in our open source release, as well as the tensor2tensor library.

Results with BERT
To evaluate performance, we compared BERT to other state-of-the-art NLP systems. Importantly, BERT achieved all of its results with almost no task-specific changes to the neural network architecture. On SQuAD v1.1, BERT achieves 93.2% F1 score (a measure of accuracy), surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%:

BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks. The amount of human-labeled training data in these tasks ranges from 2,500 examples to 400,000 examples, and BERT substantially improves upon the state-of-the-art accuracy on all of them:

Making BERT Work for You
The models that we are releasing can be fine-tuned on a wide variety of NLP tasks in a few hours or less. The open source release also includes code to run pre-training, although we believe the majority of NLP researchers who use BERT will never need to pre-train their own models from scratch. The BERT models that we are releasing today are English-only, but we hope to release models which have been pre-trained on a variety of languages in the near future.

The open source TensorFlow implementation and pointers to pre-trained BERT models can be found at http://goo.gl/language/bert. Alternatively, you can get started using BERT through Colab with the notebook “BERT FineTuning with Cloud TPUs.”

You can also read our paper “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” for more details.

除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Tags: Develop

Related Articles

On-Device Conversational Modeling with TensorFlow Lite

2017-11-15admin

Android 免安装应用对所有开发者开放。立即开始构建免安装应用!

2017-05-20admin

新版 Udacity Android 课程帮助学生为 Associate Android Developer 认证做好准备

2016-12-26admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Recent Posts

  • Learning to Generalize from Sparse and Underspecified Rewards
  • Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform
  • arXiv LaTeX cleaner: safer and easier open source research papers
  • Five new investments for the Google Assistant Investments program
  • Expanding target API level requirements in 2019

Recent Comments

  • Chen Zhixiang on Concurrent marking in V8
  • admin on 使用 Android Jetpack 加快应用开发速度
  • 怪盗kidou on 使用 Android Jetpack 加快应用开发速度
  • 鸿维 on Google 帐号登录 API 更新
  • admin on 推出 CVPR 2018 学习图像压缩挑战赛

Archives

  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • January 1970

Categories

  • Android
  • Design
  • Firebase
  • GoogleCloud
  • GoogleDevFeeds
  • GoogleMaps
  • GooglePlay
  • Google动态
  • iOS
  • Uncategorized
  • VR
  • Web
  • WebMaster
  • 社区
  • 通知

Meta

  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

最新文章

  • Learning to Generalize from Sparse and Underspecified Rewards
  • Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform
  • arXiv LaTeX cleaner: safer and easier open source research papers
  • Five new investments for the Google Assistant Investments program
  • Expanding target API level requirements in 2019
  • The service mesh era: Securing your environment with Istio
  • Launchpad Accelerator Mexico now accepting startup applications
  • On the Path to Cryogenic Control of Quantum Processors
  • Re-thinking federated identity with the Continuous Access Evaluation Protocol
  • Real-time diagnostics from nanopore DNA sequencers on Google Cloud

最多查看

  • 谷歌招聘软件工程师 (21,052)
  • Google 推出的 31 套在线课程 (20,152)
  • 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (18,793)
  • Seti UI 主题: 让你编辑器焕然一新 (12,700)
  • Android Studio 2.0 稳定版 (8,963)
  • Android N 最初预览版:开发者 API 和工具 (7,935)
  • 像 Sublime Text 一样使用 Chrome DevTools (5,951)
  • Google I/O 2016: Android 演讲视频汇总 (5,520)
  • 用 Google Cloud 打造你的私有免费 Git 仓库 (5,511)
  • 面向普通开发者的机器学习应用方案 (5,201)
  • 生还是死?Android 进程优先级详解 (4,972)
  • 面向 Web 开发者的 Sublime Text 插件 (4,141)
  • 适配 Android N 多窗口特性的 5 个要诀 (4,106)
  • 参加 Google I/O Extended,观看 I/O 直播,线下聚会! (3,477)
© 2018 中国谷歌开发者社区 - ChinaGDG