Source: Bringing Live Transcribe’s Speech Engine to Everyone from Open Source
Earlier this year, Google launched Live Transcribe, an Android application that provides real-time automated captions for people who are deaf or hard of hearing. Through many months of user testing, we’ve learned that robustly delivering good captions for long-form conversations isn’t so easy, and we want to make it easier for developers to build upon what we’ve learned. Live Transcribe’s speech recognition is provided by Google’s state-of-the-art Cloud Speech API, which under most conditions delivers pretty impressive transcript accuracy. However, relying on the cloud introduces several complications—most notably robustness to ever-changing network connections, data costs, and latency. Today, we are sharing our transcription engine with the world so that developers everywhere can build applications with robust transcription.
Those who have worked with our Cloud Speech API know that sending infinitely long streams of audio is currently unsupported. To help solve this challenge, we take measures to close and restart streaming requests prior to hitting the timeout, including restarting the session during long periods of silence and closing whenever there is a detected pause in the speech. Otherwise, this would result in a truncated sentence or word. In between sessions, we buffer audio locally and send it upon reconnection. This reduces the amount of text lost mid-conversation—either due to restarting speech requests or switching between wireless networks.
Today, we are excited to make all of this available to developers everywhere. We hope you’ll join us in trying to build a world that is more accessible for everyone.
By Chet Gnegy, Alex Huang, and Ausmus Chang from the Live Transcribe Team