Source: New Qwiklabs Quest available: Data Science on Google Cloud Platform from Google Cloud
Data science and machine learning are two of the most in demand skill-sets available today according to a recent Harvard Business Review article. TheU.S. Bureau of Labor Statistics predicts that job growth in these areas will create 11.5 million jobs through 2026. Interest in data science and machine learning has grown dramatically as indicated by analysis of data available on Google Trends (interest in “web programming” is also shown as a baseline).
We have released a brand new set of labs designed to help you quickly and easily take advantage of advanced ML tools on Google Cloud. To dive in, all you need is a little familiarity with BigQuery and SQL. TheData Science on the Google Cloud Platform Quest, now available on the Qwiklabs platform, clearly illustrates how to develop tools and techniques to make your data smarter. In just over 9 hours, you will learn to partition a dataset, stream real-time data and ingest external data sources for analysis. Your future self will thank you!
The labs are based on a selection of examples taken from abook written by Valliappa (Lak) Lakshmanan and published by O’Reilly Media. The source text offers an excellent introduction to data science on the Google Cloud Platform.
The Quest can help you quickly up-level your abilities, by teaching fundamental skills and outlining a typical workflow for data science. Regardless of your specific use case, the data science workflow is largely consistent and is often expressed as:
Ingest: How do I access the data?
Process: What data items are required?
Query: Which data should I include?
Reporting: How to display the resultant information?
Data ingest deals with the process of how to make the source data available to the data science workflow.In this Quest, you’ll have the opportunity to explore several real world scenarios that illustrate the different approaches to moving data to the cloud for ingestion. As you go through the labs, you will gain experience with several techniques that can be applied to your own projects.
In the diagram above, we illustrate two of the more common ingest options with regard to consuming data i.e. Batch or Stream.
Batch (Synchronous) processing. This option refers to access to historical data such as overnight reporting, or billing systems.
Stream (Asynchronous) processing. For this option, information updates are performed in near real-time, for example, Internet of Things (IoT) devices or payment transactions.
It is common to see some form of programmatic approach (e.g. bash or Python script) to build both flexibility and scalability. Use these labs to practice techniques so that you can adapt to your own real-world use cases.
The Quest presents very practical advice and examples on using algorithms to handling a diverse range of data sources (e.g. structured or unstructured formatting). Learning how to handle different data sources is a key skill for any data scientist.
Once data has been loaded into a satisfactory repository, the next step is to process it. This is sometimes referred to as data wrangling/munging, which is essentially transformation and mapping of ingested data to meet requirements.
The Quest also shows how this processing can be performed using a real-world data-set with services such as Cloud Dataflow. As a data scientist, this stage teaches essential data enrichment skills that are central to creating powerful data models.
Now that the data model has been created and is in an enriched state, the workflow refocuses on defining the queries that will be used to build insight.
The Quest demonstrates a couple of techniques by using tools such as Google BigQuery and Cloud Datalab. BigQuery’s fast and intuitive interface can easily be used to querying big data sets using SQL. However you may choose to perform additional processing to further enhance your data model. Programmatic tools such as python notebooks hosted in Cloud Datalab can be particularly useful in this regard.
Working through the labs will quickly develop your understanding of the tools available and also how to manipulate data at each phase of the workflow defined.
Completing theData Science on GCP Quest lets you try for yourself the three essential steps of data model creation. Completing this self-paced course and attaining a data science badge represents the first step towards learning key techniques from one of the most popular subjects in the industry. Of course, the subject matter does include some advanced concepts, so as a precursor you may wish to take a more introductory course (e.g.Baseline: Data, ML, AI Quest) first.
Mastering the fundamentalsin this Quest is the first step to accessing some of the most sought after skills available. In the coming weeks, we’ll also release a machine learning Quest covering topics such as logistic regression, predictive services, and TensorFlow. Stay tuned for more on that.