menu
arrow_back

Processing Data with Google Cloud Dataflow

1m setup · 60m access · 60m completion
Connection Details

7 Credits

info_outline
This lab costs 7 Credits to run. You can purchase credits or a subscription under My Account.

01:00:00

Processing Data with Google Cloud Dataflow

GSP198

Google Cloud Self-Paced Labs

Overview

In this lab you will simulate a real-time real world data set from a historical data set. This simulated data set will be processed from a set of text files using Python and Google Cloud Dataflow, and the resulting simulated real-time data will be stored in Google BigQuery. You will then use Google BigQuery to analyse some features of the real-time data set.

Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes via Java and Python APIs with the Apache Beam SDK. Cloud dataflow provides a serverless architecture that can be used to shard and process very large batch data sets, or high volume live streams of data, in parallel.

Google BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage.

The data set that is used provides historic information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and will be used in all of the other labs in the Data Science on Google Cloud Platform quest.

Join Qwiklabs to read the rest of this lab...and more!

  • Get temporary access to the Google Cloud Console.
  • Over 200 labs from beginner to advanced levels.
  • Bite-sized so you can learn at your own pace.
Join to Start This Lab
Score

—/20

Create a BigQuery Dataset

Run Step

/ 5

Copy the airport geolocation file to your Cloud Storage bucket

Run Step

/ 5

Process the Data using Cloud Dataflow (submit Dataflow job)

Run Step

/ 5

Run Query

Run Step

/ 5

home
Home
school
Catalog
menu
More
More