Source: From the data warehouse: Urs Hӧlzle explains how data analytics and ML can transform your business from Google Cloud
As businesses collect and analyze more and more data with every passing year, traditional infrastructure is challenged: It’s not just that there is more data; it’s coming from more sources, with different contexts and uses than the enterprise has seen in the past. Not only that, internal and external customers expect results at a faster pace, challenging both the tools and practices of traditional infrastructure.
The solution is to do well what technology has always aimed to do: Automate the rote stuff, so you can get faster to more value-added work. There are a number of ways to do this, but increasingly the most valuable is to use Artificial Intelligence, in particular Machine Learning, either overtly, or in the form of labor-saving tools and services that rely on ML.
Today, we’ll talk with one of Google’s early distinguished engineers, Urs Hӧlzle,who now plans, designs, and supports the infrastructure behind the growing user base for a number of Google products, as well as the infrastructure that serves all of our Google Cloud customers.
Urs has played an essential role at Google from nearly the beginning, leading the development of the computing and data infrastructure that first revolutionized Internet search, and eventually became a platform for maps, mobility, cloud computing, and artificial intelligence engines—systems that predict deadly illnesses and prevent Google’s own data centers from overheating.
Urs and I recently sat down to talk about how machine learning simplifies problem-solving for businesses.
Note: This interview has been abridged and edited.
Quentin: Urs, as you’ve expanded infrastructure and capacity to process information at a higher velocity, process data from multiple angles, and think of data as a much more dynamic asset, how do today’s larger quantities of data change the way people work?
Urs: The ecosystem really changed a lot, because previously, you had to do a lot of planning: you had to carefully pick which insight you wanted to go after. Now, a data analyst with a simple SQL query can at least prototype this insight at their own pace–maybe in half a day or a week. And they don’t need a software team, they don’t need an analyst, and it’s not actually a software development project anymore, and that means that the number of questions you can answer from your data just explodes.
Quentin: So you can have far more projects, you can think in novel ways, you can test at a deeper level.
Urs: Often, you’re going after the right thing, but your initial understanding is actually incorrect. As you go through it iteratively, your understanding of the problem improves. At that point, you’re asking better questions than you asked on day one. And if you can do that every day, and ask a better question every day, then just in a matter of two weeks, you might actually fundamentally change how you think about a particular customer segment—because you have a much deeper understanding of how it behaves.
Quentin: One could see AI and machine learning as a kind of a natural outgrowth of cloud computing, right? Because it’s a fundamentally better way to sort through the data, find patterns, and test things?
Urs: Yes, and in fact we’re starting to see [the worlds of machine learning and cloud infrastructure] merge. Traditionally when you had data, then you wrote the data processing, or maybe you had queries, that was the first step: “I’m just trying to find a data point again.” That was databases. Then came analytics: “let me actually analyze the data, compute statistics on it.” But, it was still relatively manual. Now, ML gives you a more powerful way to look at the data, that also does well with unstructured data like images, sound, or other data types, where traditional analytics just doesn’t work at all.[Modern data analytics tools] really make sense and make use of the data you already have. So on BigQuery today, our data warehouse, you actually have [built-in] ML functionality in your data analytics warehouse. It’s a very natural way to say, “Gee, I have this data here, can I actually make a prediction function for things where I don’t have the data?” And the answer is that yes you can, and it’s actually very easy. You can do it in a SQL statement that is roughly 10 lines long, so you don’t even need to understand how machine learning works.
Quentin: What are some of the most interesting ML problems that customers are bringing to you these days?
Urs: I think the biggest problems that companies have are in two main areas.
First, they believe that ML is the biggest opportunity, but they need to be able to translate that into actual outcomes. So it’s essential that we offer tools in our stack that make it much easier for you to use ML without being an expert. BigQuery can actually do predictions with ML, without you needing to know too much about the underlying techniques. For example, AutoML, our ML [training tools]: you can take your set of images in which you want to recognize objects, and we can automatically construct a machine learning system that recognizes them with very high accuracy. Only a year ago, you needed an expert to do that.
The second problem is really how to deal with the transition to the cloud. Every large user is going to run in a hybrid configuration for a while. Now you have two environments, and they have different rules, so you need to have two different teams and train them differently in order to figure out how these things work together.
Quentin: Doesn’t putting out a cloud management tool like Kubernetes help with coordination?
Urs: Yes, absolutely. That is one of the hardest problems, and our answer to that is Kubernetes and Google Kubernetes Engine (GKE). Now you can use Kubernetes to manage your workloads both on premise and in the cloud—with not just the same code, but of equal importance, the same configuration.