A growing number of businesses each year are bringing their most valued assets, their data, to Google Cloud for smart analytics. Every day, customers upload petabytes of new data into BigQuery, our exabyte-scale, serverless data warehouse, and the volume of data analyzed has grown by over 300 percent in just the last year. Large enterprises and small start-ups alike trust Google Cloud to store, analyze and find insights in their data—and we want to bring them the tools they need to make data-driven insights actionable across their organizations.
Today, we’re announcing a number of new capabilities to our data analytics offerings. We’re introducing radically simple ways to move data into Google Cloud—and to clean, categorize, and understand it. We’re providing significant enhancements to our data warehousing infrastructure, and making it even easier for enterprises to seamlessly adopt BigQuery. We’re also expanding the ways we’re bringing machine learning to our analytics platform so that businesses can easily adopt predictive analytics with greater accuracy.
Here’s an overview of what’s new:
Simplifying data migration and integration
Cloud Data Fusion (beta)
BigQuery DTS SaaS application connectors (beta)
Data warehouse migration service to BigQuery (beta)
Cloud Dataflow SQL (public alpha, coming soon)
Dataflow FlexRS (beta)
Accelerating time to insights
BigQuery BI Engine (beta)
Connected sheets (beta, coming soon)
Turning data into predictions
BigQuery ML (GA, coming soon), with additional models supported
AutoML Tables (beta)
Enhancing data discovery and governance
Cloud Data Catalog (beta, coming soon)
Before you can analyze your data, you first need to move and unify it in the cloud. Today, we’re announcing several new ways we’re making it easier to bring together data from on premises, different applications, and other clouds to Google Cloud Platform (GCP).
Introducing Cloud Data Fusion: blend and transform data from disparate sources in one location
Many large organizations have massive amounts of data locked up in siloed systems and need a way to get a full or transformed view of their data to drive their use cases. Cloud Data Fusion, in beta, addresses this challenge.
Cloud Data Fusion is a fully-managed and cloud-native data integration service with a broad library of open-source transformations and more than a hundred out-of-the-box connectors for a wide array of systems and data formats. This means anyone can easily ingest and integrate data from various sources and transform that data, for example, blending or joining it with other data sources, before using BigQuery to analyze it.
Data Fusion’s control center allows you to explore and manage all your datasets and data pipelines in one location. It’s as simple as dragging and dropping data pipelines into the control center—no coding necessary.
“Data Fusion lowers the barrier to entry for big data work by providing an intuitive visual interface and pipeline abstraction,” says Robert Medeiros, R&D Architect, TELUS Digital. “This increased accessibility, combined with a growing collection of pre-built ‘connectors’ and transformations, translates to rapid results and in many cases allows data analysts and scientists to ‘self-serve’ without needing help from those with deep cloud or software engineering expertise.”
BigQuery DTS now supports over 100 SaaS application integrations through partner connectors
The BigQuery Data Transfer Service automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis. Your analytics team can lay the foundation for a data warehouse without writing a single line of code. In addition to Google’s first party apps, BigQuery Data Transfer Service now supports more than 100 popular SaaS applications, including Salesforce, Marketo, Workday, Stripe, and many more.
Data warehouse migration service: simplify migration to Google Cloud
A large number of enterprises need to modernize their data warehouse infrastructure and are now looking for easier ways to migrate those data warehouses to BigQuery. We have built a data warehouse migration service to automate migrating data and schema to BigQuery from Teradata and Amazon Redshift, as well as data loading from Amazon S3. This service will significantly reduce migration time. You can find the documentation for this process here, and our recently-announced data warehousing migration offer makes it even easier for enterprises to move from traditional data warehouses to BigQuery.
Cloud Dataflow SQL and Dataflow FlexRS: launch data pipelines with SQL and schedule jobs more flexibly
Data analysts rely on data pipelines to drive analytics, yet are often dependent on data engineers to build those pipelines. Cloud Dataflow SQL, coming soon in public alpha, makes it possible for data analysts to build their own Dataflow pipelines using familiar SQL that also automatically detects the need for batch or stream data processing.
Dataflow SQL uses the same SQL dialect used in BigQuery. This allows data analysts to use Dataflow SQL from within the BigQuery UI, to join Cloud Pub/Sub streams with files or tables from across your data infrastructure, and then to directly query the merged data in real time. This means you can generate real-time insights and create a dashboard to visualize the results.
To receive a release notification for Dataflow SQL’s public alpha, please fill out this form.
Today, we’re also announcing Dataflow Flexible Resource Scheduling (FlexRS), in beta, which offers cost benefits for batch processing jobs through scheduling flexibility, enabling overnight jobs. If you’re processing non time-sensitive data, you can benefit from preemptible resource pricing.
Once businesses have ingested their most important data into BigQuery, we help them share their data in easy-to-understand ways so users across an entire organization can take advantage of those same insights.
BigQuery BI Engine: bring business intelligence directly to your data
Data analysts and business users often use business intelligence (BI) reports and dashboards to analyze data from a data warehouse. Today, we’re introducing BigQuery BI Engine in beta, providing an extraordinarily fast, in-memory analysis service for BigQuery. With BigQuery BI Engine, users can analyze complex data sets interactively with sub-second query response time and with high concurrency. Today, BigQuery BI Engine is available through Google Data Studio for interactive reporting and dashboarding, and in the coming months, our technology partners like Looker and Tableau will be able to leverage BI Engine as well.
“With BigQuery BI Engine behind the scenes, we’re able to gain deep insights very quickly in Data Studio,” says Rolf Seegelken, Senior Data Analyst, Zalando. “The performance of even our most computationally intensive dashboards has sped up to the point where response times are now less than a second. Nothing beats ‘instant’ in today’s age, to keep our teams engaged in the data!”
Connected sheets: access the power of BigQuery through a spreadsheet interface
A wide range of business users rely on spreadsheets as an indispensable tool for data analysis. Today we’re announcing connected sheets, a new type of spreadsheet that combines the simplicity of a spreadsheet interface with the power of BigQuery. That means no row limits with this connected sheet—it works with the full dataset from BigQuery, whether that’s millions or even billions of rows of data. It also means you don’t need to learn SQL—you’re simply using regular Sheets functionality, including formulas, pivot tables, and charts, to do the analysis.
With a few clicks, you can visualize data as a dashboard in Sheets and securely share it with anyone in your organization.
“Connected sheets are helping us democratize data,” says Nikunj Shanti, Chief Product Officer at AirAsia. “Analysts and business users are able to create pivots or charts, leveraging their existing skills on massive datasets, without needing SQL. This direct access to the underlying data in BigQuery provides access to the most granular data available for analysis. It’s a game changer for AirAsia.”
Connected sheets and BigQuery BI Engine are complemented by our broad range of updates to BigQuery. These include a new, updated BigQuery interface, now in GA, as well as the general availability of BigQuery GIS, enabling seamless analysis of spatial data in BigQuery, the only cloud data warehouse to support rich GIS functionalities out-of-the-box.
Predictive insights are increasingly becoming an important way businesses can anticipate needs like estimating customer demand or scheduling routine maintenance. Data warehouses often store the most valuable data sets for the enterprise, but unlocking these insights has traditionally been the domain of machine learning experts—a skill not shared by most data analysts or business users. We’ve changed that with BigQuery ML.
BigQuery ML generally available (coming soon), with expanded machine learning models
Last year, we announced BigQuery ML, enabling data analysts to build and deploy machine learning models on massive datasets directly inside BigQuery using familiar SQL.
We’re also continuing to expand BigQuery ML functionality to address even more business needs. We’ve made new models available like k-means clustering (in beta) and matrix factorization (in alpha) to build customer segmentations and product recommendations. Customers can also now also build and directly import TensorFlow Deep Neural Network models (in alpha) through BigQuery ML.
“Geotab is providing new smart city solutions leveraging aggregate data from over 1 million connected vehicles. We’re able to use BigQuery GIS to understand traffic flow patterns and BigQuery ML helped us derive insight into predicting hazardous driving areas in cities based on inclement weather,” explains Neil Cawse, CEO of Geotab.
AutoML Tables: apply machine learning to tabular data without writing a single line of code
Not everyone who can benefit from machine learning insights is a SQL expert. To make it even easier to apply ML on structured data stored in BigQuery and Cloud Storage, we’re excited to announce AutoML Tables, in beta. AutoML Tables lets your entire team of data scientists, analysts and developers automatically build and deploy state-of-the-art machine learning models on structured data in just a few clicks, reducing the total time required from weeks to days—without writing a single line of code.
The variety, volume and velocity of data from disparate systems, business processes, and other sources has meant that many organizations increasingly grapple with data access, discovery, management, security and governance. Finding and validating datasets can often be a complex, manual process, and increasing regulatory and compliance requirements has made it all the more important.
Data Catalog: data discovery and governance, simplified
To help organizations to quickly discover, manage and understand their data assets, we’re introducing Data Catalog in beta, a fully managed and scalable metadata management service. Data Catalog offers a simple and easy-to-use search interface for data discovery, powered by the same Google search technology that supports Gmail and Drive, and offers a flexible and powerful cataloging system for capturing technical and business metadata. For security and data governance, it integrates with Cloud DLP, so you can discover and catalog sensitive data assets, and Cloud IAM, where we honor source access control lists (ACLs), simplifying access management.
After deploying Data Catalog with his team, David Parfett, Director of Data Architecture at Sky explains, “With the increasing amount of data assets in our organization, we are confident that Data Catalog will allow us to quickly and easily discover our data assets across GCP and scale in line with our growing business.”
We’re also working with strategic partners like Collibra, Informatica, Tableau, and Looker to build integrations with Data Catalog, allowing customers to have a unified data discovery experience for hybrid cloud scenarios, using their platform of choice.
“Our relationship with Google Cloud has accelerated in recent months, and this partnership is the next step in our shared commitment to providing a foundation for data governance that sets organizations up to succeed,” said Jim Cushman, Chief Product Officer for Collibra. “We’re excited to continue building this partnership, with a mutual goal of integrating our technologies and making it easier for enterprise organizations to understand and use the data that is vital to their business.”
To learn more, and request access to Data Catalog, fill out this form.
From Fortune 500 enterprises to start-ups, more and more businesses continue to look to the cloud to help them store, manage, and generate insights from their data. And we’ll continue to develop new, transformative tools to help them do just that. For more information about data analytics on Google Cloud, visit our website.