Source: AI in Depth: serving a Keras text classifier with preprocessing using Cloud ML Engine from Google Cloud
Cloud ML Engine now supports deploying your trained model with custom online prediction Python code, now in beta. In this blog post, we show how custom online prediction code helps maintaining affinity between your preprocessing logic and your model, which is crucial to avoid training-serving skew. We show an example of building a Keras text classifier, and deploying it for online serving in Cloud ML Engine, along with its text preprocessing components.
The hard work of building a machine learning (ML) model pays off only when you deploy the model and use it in production—when you integrate it into your pre-existing systems or incorporate your model into a novel application. If your model has multiple possible consumers, you might want to deploy the model as an independent, coherent microservice that is invoked via a REST API that can automatically scale to meet demand. Although Cloud ML Engine may be better known for its training abilities, it can also serve TensorFlow, Keras, scikit-learn, and XGBoost models with REST endpoints for online prediction.
While training that model, it’s common to transform the input data into a format that improves model performance. But when performing predictions, the model expects the input data to already exist in that transformed form. For example, the model might expect a normalized numerical feature, for example TF-IDF encoding of terms in text, or a constructed feature based on a complex, custom transformation. However, the callers of your model will send “raw”, untransformed data, and the caller doesn’t (or shouldn’t) need to know which transformations are required. This means the model microservice will be responsible for applying the required transformation on the data before invoking the model for prediction.
The affinity between the preprocessing routines and the model (i.e., having both of them coupled in the same service) is crucial to avoid training-serving skew, since you’ll want to ensure that these routines are applied on any data sent to the model, with no assumptions about how the callers prepare the data. Moreover, the model-preprocessing affinity helps to decouple the model from the caller. That is, if a new model version requires new transformations, these preprocessing routines can change independently of the caller, as the caller will keep on sending data in its raw format.
Beside preprocessing, your deployed model’s microservice might also perform other operations, including postprocessing of the prediction produced by the model, or even more complex prediction routines that combine the predictions of multiple models.
To help maintain affinity of preprocessing between training and serving, Cloud ML Engine now enables users to customize the prediction routine that gets called when sending prediction requests to a model deployed on Cloud ML Engine. This feature allows you to upload a custom model prediction class, along with your exported model, to apply custom logic before or after invoking the model for prediction.
Customizing prediction routines can be useful for the following scenarios:
The above tasks can be accomplished by custom online prediction, using the standard framework supported by Cloud ML Engine, as well as with any model developed by your favorite Python-based framework, including PyTorch. All you need to do is to include the dependency libraries in the setup.py of your custom model package (as discussed below). Note that without this feature, you would need to implement the preprocessing, post-processing, or any custom prediction logic in a “wrapper” service, using, for example, App Engine. This App Engine service would also be responsible for calling the Cloud ML engine models, but this approach adds complexity to the prediction system, as well as latency prediction time.
Next we’ll demonstrate how we built a microservice that can handle both preprocessing and post-processing scenarios using the Cloud ML Engine custom online prediction, using text classification as the example. We chose to implement the text preprocessing logic and built the classifier using Keras, but thanks to Cloud ML Engine custom online prediction, you could implement the preprocessing using any other libraries (like NLTK or Scikit-learn), and build the model using any other Python-based ML framework (like TensorFlow or PyTorch). You can find the code for this example in this Colab notebook.
Text classification algorithms are at the heart of a variety of software systems that process text data at scale. The objective is to classify (categorize) text into a set of predefined classes, based on the text’s content. This text can be a tweet, a web page, a blog post, user feedback, or an email: in the context of text-oriented ML models, a single text entry (like a tweet) is usually referred to as a “document.”
Common use cases of text classification include:
You can design your text classification model in two different ways; choosing one versus the other will influence how you’ll need to prepare your data before training the model.
In our example, we utilize the sequence model approach.
Hacker News is one of many public datasets available in BigQuery. This dataset includes titles of articles from several data sources. For the following tutorial, we extracted the titles that belong to either GitHub, The New York Times, or TechCrunch, and saved them as CSV files in a publicly shared Cloud Storage bucket at the following location:
Here are some useful statistics about this dataset:
The objective of the tutorial is to build a text classification model, using Keras to identify the source of the article given its title, and deploy the model to Cloud ML Engine using custom online prediction, to be able to perform text pre-processing and prediction post-processing.
Sequence tokenization with Keras
In this example, we perform the following preprocessing steps:
MAX_SEQUENCE_LENGTHparameter. Sequences with more than 50 tokens will be right-trimmed, while sequences with fewer than 50 tokens will be left-padded with zeros.
Both the tokenization and vectorization steps are considered to be stateful transformations. In other words, you extract the vocabulary from the training data (after tokenization and keeping the top frequent words), and create a word-to-indicator lookup, for vectorization, based on the vocabulary. This lookup will be used to vectorize new titles for prediction. Thus, after creating the lookup, you need to save it to (re-)use it when serving the model.
The following block shows the code for performing text preprocessing. The
TextPreprocessor class in the
preprocess.py module includes two methods.
fit(): applied on training data to generate the lookup (tokenizer). The tokenizer is stored as an attribute in the object.
transform(): applies the tokenizer on any text data to generate the fixed-length sequence of word indicators.
The following code prepares the training and evaluation data (that is, it converts each raw text title to a NumPy array with 50 numeric indicator). Note that, you use both fit() and transform() with the training data, while you only use transform() with the evaluation data, to make use of the tokenizer generated from the training data. The outputs,
eval_texts_vectorized, will be used to train and evaluate our text classification model respectively.
Next, save the processor object (which includes the tokenizer generated from the training data) to be used when serving the model for prediction. The following code dumps the object to
The following code snippet shows the method that creates the model architecture. We create a Sequential Keras model, with an Embedding layer, Dropout layer, followed by two Conv1d and Pooling Layers, then a Dense layer with Softmax activation at the end. The model is compiled with
sparse_categorical_crossentropy loss and accuracy
acc (accuracy) evaluation metric.
The following code snippet creates the model by calling the create_model method with the required parameters, trains the model on the training data, and evaluates the trained model’s quality using the evaluation data. Lastly, the trained model is saved to
In order to apply a custom prediction routine that includes preprocessing and postprocessing, you need to wrap this logic in a Custom Model Prediction class. This class, along with the trained model and the saved preprocessing object, will be used to deploy the Cloud ML Engine online prediction microservice. The following code shows how the Custom Model Prediction class (
CustomModelPrediction) for our text classification example is implemented in the
Note the following points in the Custom Model Prediction class implementation:
from_path is a “classmethod”, responsible for loading both the model and the preprocessing object from their saved files, and instantiating a new
CustomModelPrediction object with the loaded model and preprocessor object (which are both stored as attributes to the object).
predict is the method invoked when you call the “predict” API of the deployed Cloud ML Engine model. The method does the following:
Receives the instances (list of titles) for which the prediction is needed
Prepares the text data for prediction by applying the transform() method of the "stateful"
self._model.predict() to produce the predicted class probabilities, given the prepared text.
Postprocesses the output by calling the _postprocess method.
_postprocessis the method that receives the class probabilities produced by the model, picks the label index with the highest probability, and converts this label index to a human-readable label 'github', 'nytimes', or 'techcrunch' .
Figure 1 shows an overview of how to deploy the model, along with its required artifacts for a custom prediction routine to Cloud ML Engine.
The first thing you want to do is to upload your artifacts to Cloud Storage. First, you need to upload:
Your saved (trained) model file:
'keras_saved_model.h5' (see the Training a Keras model section).
Your pickled (seralized) preprocessing objects (which contain the state needed for data transformation prior to prediction):
processor_state.pkl (see the Preprocessing Text section). Remember, this object includes the tokenizer generated from the training data.
Second, upload a python package including all the classes you need for prediction (e.g., preprocessing, model classes, and post-processing). In this example, you need to create a pip-installable tar with
preprocess.py. First, create the following
Now, generate the package by running the following command:
This creates a .tar.gz package under a new /dist directory, created in your working directory. The name of the package will be $name-$version.tar.gz where $name and $version are the ones specified in the
Once you have successfully created the package, you can upload it to Cloud Storage:
Let’s define the model name, the model version, and the Cloud ML Engine runtime (which corresponds to a TensorFlow version) required to deploy the model.
First, create a Cloud ML Engine using the following
Second, create a model version using the following
gcloud command, in which you specify the location of the model and preprocessing object (--origin), the location the package(s) including the scripts needed for your prediction (
--package-uris), and a pointer to your Custom Model Prediction class (
--model-class). This should take one to two minutes.
After deploying the model to Cloud ML Engine, you can invoke the model for prediction using the following code:
Given the titles defined in the
request object, the predicted source of each title from the deployed model would be as follows: [
‘techcrunch’, ‘techcrunch’, ‘techcrunch’, ‘nytimes’, ‘nytimes’, ‘nytimes’, ‘github’, ‘github’, ‘techcrunch’]. Note that the last one was mis-classified by the model.
In this tutorial, we built and trained a text classification model using Keras to predict the source media of a given article. The model required text preprocessing operations for preparing the training data, and preparing the incoming requests to the model deployed for online predictions. We have now shown how you can deploy the model to Cloud ML Engine with custom online prediction code, in order to perform preprocessing to the incoming prediction requests and post-processing to the prediction outputs. Enabling a custom online prediction routine in Cloud ML Engine allows for affinity between the preprocessing logic, the model, and the post-processing logic required to handle prediction request end-to-end, which helps to avoid training-serving skew, and simplifies deploying ML models for online prediction.
Thanks for following along. If you’re curious to try out some other machine learning tasks on GCP, take this specialization on Coursera. If you want to try out these examples for yourself in a local environment, run this notebook on Colab. Send a tweet to @GCPcloud if there’s anything we can change or add to make text analysis even easier on Google Cloud.