谷歌中国开发者社区 (GDG)
  • 主页
  • 博客
    • Android
    • Design
    • GoogleCloud
    • GoogleMaps
    • GooglePlay
    • Web
  • 社区
    • 各地社区
    • 社区历史
    • GDG介绍
    • 社区通知
  • 视频
  • 资源
    • 资源汇总
    • 精选视频
    • 优酷频道

Introducing Cloud Text-to-Speech powered by DeepMind WaveNet technology

2018-03-27adminGoogleCloudNo comments

Source: Introducing Cloud Text-to-Speech powered by DeepMind WaveNet technology from Google Cloud Platform

By Dan Aharon, Product Manager, Cloud AI

Many Google products (e.g., the Google Assistant, Search, Maps) come with built-in high-quality text-to-speech synthesis that produces natural sounding speech. Developers have been telling us they’d like to add text-to-speech to their own applications, so today we’re bringing this technology to Google Cloud Platform with Cloud Text-to-Speech.

You can use Cloud Text-to-Speech in a variety of ways, for example:

  • To power voice response systems for call centers (IVRs) and enabling real-time natural language conversations 
  • To enable IoT devices (e.g., TVs, cars, robots) to talk back to you 
  •  To convert text-based media (e.g., news articles, books) into spoken format (e.g., podcast or audiobook)

Cloud Text-to-Speech lets you choose from 32 different voices from 12 languages and variants. Cloud Text-to-Speech correctly pronounces complex text such as names, dates, times and addresses for authentic sounding speech right out of the gate. Cloud Text-to-Speech also allows you to customize pitch, speaking rate, and volume gain, and supports a variety of audio formats, including MP3 and WAV.

Rolling in the DeepMind

In addition, we’re excited to announce that Cloud Text-to-Speech also includes a selection of high-fidelity voices built using WaveNet, a generative model for raw audio created by DeepMind. WaveNet synthesizes more natural-sounding speech and, on average, produces speech audio that people prefer over other text-to-speech technologies.

In late 2016, DeepMind introduced the first version of WaveNet — a neural network trained with a large volume of speech samples that’s able to create raw audio waveforms from scratch. During training, the network extracts the underlying structure of the speech, for example which tones follow one another and what shape a realistic speech waveform should have. When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches.

Fast forward to today, and we’re now using an updated version of WaveNet that runs on Google’s Cloud TPU infrastructure.The new, improved WaveNet model generates raw waveforms 1,000 times faster than the original model, and can generate one second of speech in just 50 milliseconds. In fact, the model is not just quicker, but also higher-fidelity, capable of creating waveforms with 24,000 samples a second. We’ve also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound.

With these adjustments, the new WaveNet model produces more natural sounding speech. In tests, people gave the new US English WaveNet voices an average mean-opinion-score (MOS) of 4.1 on a scale of 1-5 — over 20% better than for standard voices and reducing the gap with human speech by over 70%. As WaveNet voices also require less recorded audio input to produce high quality models, we expect to continue to improve both the variety as well as quality of the WaveNet voices available to Cloud customers in the coming months.

Cloud Text-to-Speech is already helping multiple customers deliver a better experience to their end users. Customers include Cisco and Dolphin ONE.

“As the leading provider of collaboration solutions, Cisco has a long history of bringing the latest technology advances into the enterprise. Google’s Cloud Text-to-Speech has enabled us to achieve the natural sound quality that our customers desire.”  

— Tim Tuttle, CTO of Cognitive Collaboration, Cisco

“Dolphin ONE’s Calll.io telephony platform offers connectivity from a multitude of devices, at practically any location. We’ve integrated Cloud Text-to-Speech into our products and allow our users to create natural call center experiences. By using Google Cloud’s machine learning tools, we’re instantly delivering cutting-edge technology to our users.” 

—Jason Berryman, Dolphin ONE

Get started today

With Cloud Text-to-Speech, you’re now a few clicks away from one of the most advanced speech technologies in the world. To learn more, please visit the documentation or our pricing page. To get started with our public beta or try out the new voices, visit the Cloud Text-to-Speech website.

除非特别声明,此文章内容采用知识共享署名 3.0许可,代码示例采用Apache 2.0许可。更多细节请查看我们的服务条款。

Tags: Cloud

Related Articles

Exploring container security: An overview

2018-03-29admin

Reflecting on our ten year App Engine journey

2018-04-14admin

Implementing an event-driven architecture on serverless — the Smart Parking story

2018-04-12admin

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Recent Posts

  • DeepVariant Accuracy Improvements for Genetic Datatypes
  • Congratulations to our US Grow with Google Developer Scholars!
  • Cloud SQL for PostgreSQL now generally available and ready for your production workloads
  • Exploring container security: Protecting and defending your Kubernetes Engine network
  • BigQuery arrives in the Tokyo region

Recent Comments

  • 鸿维 on Google 帐号登录 API 更新
  • admin on 推出 CVPR 2018 学习图像压缩挑战赛
  • Henry Chen on 推出 CVPR 2018 学习图像压缩挑战赛
  • 王中 on Google 推出的 31 套在线课程
  • Francis Wang on Google 推出的 31 套在线课程

Archives

  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • June 2015
  • January 1970

Categories

  • Android
  • Design
  • Firebase
  • GoogleCloud
  • GoogleDevFeeds
  • GoogleMaps
  • GooglePlay
  • Google动态
  • iOS
  • Uncategorized
  • VR
  • Web
  • WebMaster
  • 社区
  • 通知

Meta

  • Register
  • Log in
  • Entries RSS
  • Comments RSS
  • WordPress.org

最新文章

  • DeepVariant Accuracy Improvements for Genetic Datatypes
  • Congratulations to our US Grow with Google Developer Scholars!
  • Cloud SQL for PostgreSQL now generally available and ready for your production workloads
  • Exploring container security: Protecting and defending your Kubernetes Engine network
  • BigQuery arrives in the Tokyo region
  • What’s new in Firebase Authentication?
  • Showcase your innovations at the 2018 China-US Young Makers Competition
  • Protecting WebView with Safe Browsing
  • Protecting WebView with Safe Browsing
  • Dialogflow Enterprise Edition is now generally available

最多查看

  • 谷歌招聘软件工程师 (19,918)
  • Google 推出的 31 套在线课程 (18,087)
  • 如何选择 compileSdkVersion, minSdkVersion 和 targetSdkVersion (14,903)
  • Seti UI 主题: 让你编辑器焕然一新 (11,117)
  • Android Studio 2.0 稳定版 (8,419)
  • Android N 最初预览版:开发者 API 和工具 (7,752)
  • 像 Sublime Text 一样使用 Chrome DevTools (5,611)
  • Google I/O 2016: Android 演讲视频汇总 (5,387)
  • 用 Google Cloud 打造你的私有免费 Git 仓库 (4,896)
  • 面向普通开发者的机器学习应用方案 (4,734)
  • 生还是死?Android 进程优先级详解 (4,709)
  • 面向 Web 开发者的 Sublime Text 插件 (4,002)
  • 适配 Android N 多窗口特性的 5 个要诀 (3,838)
  • 参加 Google I/O Extended,观看 I/O 直播,线下聚会! (3,419)
© 2018 中国谷歌开发者社区 - ChinaGDG