Source: Simplifying ML predictions with Google Cloud Functions from Google Cloud
If you develop on Google Cloud Platform (GCP) and haven’t already tried out Cloud Functions, our serverless event-driven platform, it’s worth taking a look. Our favorite part about Cloud Functions is that you can use it to connect all sorts of services across GCP and beyond. You can think of it as the glue between different pieces of your application. Since we focus on machine learning, we wanted to share a demo we built that uses Cloud Functions to generate predictions on the fly from a text classification model hosted on Cloud Machine Learning Engine, our managed service for training and serving ML models. It supports multiple ML frameworks including TensorFlow, Scikit-learn, Keras, and XGBoost. If you’re more of a video person, you can watch us presenting this demo at Cloud Next ‘18.
For our demo, we built an app for predicting the genres of a movie from its description using a text classification model. Here’s what it looks like:
To show off the power of Cloud ML Engine we built two versions of the model independently—one in Scikit-learn and one in TensorFlow—and built a web app to easily generate predictions from both versions. Because these models were built with entirely different frameworks and have different dependencies, it previously required a lot of code to build even a simple app that queried both models. Cloud ML Engine provides a centralized place for us to host multiple types of models, and streamlines the process of querying them.
And before we get into the details, you may be wondering why you’d need multiple versions of the same model. If you’ve got data scientists or ML engineers on your team, they may want to experiment independently with different model inputs and frameworks. Or, maybe they’ve built an initial prototype of a model and will then obtain additional training data and train a new version. A web app like the one we’ve built provides an easy way to compare output, or even load test across multiple versions.
For the frontend, we needed a way to make predictions directly from our web app. Because we wanted the demo to focus on Cloud ML Engine serving, and not on boilerplate details like authenticating our Cloud ML Engine API request, Cloud Functions was a great fit. The frontend consists of a single HTML page hosted on Cloud Storage. When a user enters a movie description in the web app and clicks “Get Prediction,” it invokes a cloud function using an HTTP trigger. The function sends the text to ML Engine, and parses the genres returned from the model to display them in the web UI.
Here’s an architecture diagram of how it all fits together:
Now it’s time to dive into the specifics of our cloud function.
One of the great things about Cloud ML Engine is that it supports models built with multiple frameworks. We’ve deployed our Scikit-learn and TensorFlow models as different versions:
Switching between versions is as simple as changing the version name in our API request. In the frontend, we pass to our function the model we’d like to query alongside the input data we want to get a prediction on. Cloud Functions handles project authentication for us out of the box, so all we need to do to authenticate is call
Don’t forget to add the
googleapis dependency to the
Then, if the authentication has no errors, we use the version parameter and input data we passed into our function to set up the ML Engine request JSON:
To make our Cloud ML Engine request, we call
projects.predict() from the
ml export in the
googleapis package, and if the request is successful we send the prediction response back to our app frontend:
So first we import the
ml module after our
Then after we have our ML Engine request JSON, we call
The exact structure of the JSON response depends on the type of machine learning model it is calling. The following is an example request and response JSON for a model that predicts a movie’s genre based on its description:
The prediction response is a nine-element one-hot vector representing nine possible movie genres. One-hot-encoding is a common output for classification problems. For this example, the third number in the output vector represents comedy, indicating that our model is predicting this movie is a comedy. If we were querying a regression model, for example to predict a continuous value such as movie revenue, you would see just a single numeric value in the
The final piece of the puzzle when setting up this cloud function is to define the CORS headers, which allows a frontend hosted on a different domain to call our cloud function. In our case, we have our frontend served directly from Cloud Storage, so we will add the
storage.googleapis URL to the
Access-Control-Allow-Origin header. Most browsers will send a blank
OPTIONS request when calling external APIs just to check the CORS headers, so we will make sure to create the headers and return a blank 204 response for those requests. Here is the complete code of our cloud function:
That’s it! Using a single cloud function, we have a serverless app that can make predictions from two ML models built with entirely different frameworks.
Want to build your own serverless ML applications with Cloud Functions? Dive into the Cloud Functions docs here. We focused on Cloud ML Engine here, but you could easily write a similar function to call any of our Machine Learning APIs or AutoML. If you’ve got questions or topics you’d like to see covered in a future post, find us on Twitter @SRobTweets and @ZackAkil.